at present ,Kafka Use ZooKeeper To save and partition and broker Related metadata , And elect a broker As a cluster controller . however ,Kafka The development team wants to eliminate the need for Zookeeper Dependence , This allows metadata to be managed in a more scalable and robust way , To support more partitions , It can also simplify Kafka Deployment and configuration of .
Managing state through event flow does have its benefits , Take a number, for example （ The offset ） To describe the consumer's processing position in the event stream . Multiple consumers quickly reach the latest state by processing events that are newer than the current offset . Logs establish a clear sequence of events , And make sure consumers always move along a time axis .
While users enjoy these benefits ,Kafka But it was ignored . Metadata changes are considered independent changes , There is no connection between them . When the controller notifies （ for example LeaderAndIsrRequest） When pushing to other agents in the cluster , Some agents may receive , But not all . The controller may try again several times , But I will give up in the end , This can leave the agent in an inconsistent state .
What's worse is , although ZooKeeper Used to keep records , but ZooKeeper The state in is usually inconsistent with the state in controller memory . for example , When the chief is in ZooKeeper In the revised ISR when , The controller is usually unaware of these updates for a long time . Although the controller can be set to one time watcher, But for performance reasons , Set up watcher There is a limit to the number of times .watcher When triggered, the controller will not be told the current state , It just tells the controller that the state has changed . When the controller rereads znode And set up a new watcher when , State may be related to watcher The state of the trigger is different . If not set watcher, The controller may not know what's going on . In some cases , The problem of inconsistent state can only be solved by restarting the controller .
Metadata should not be stored in a separate system , It should be stored directly in Kafka In cluster , In this way, we can avoid all the problems caused by the state of the controller and Zookeeper Problems caused by inconsistent states . The agent should not accept a change notice , Instead, get metadata events from the event log . This ensures that metadata changes always arrive in the same order . The agent can save the metadata in a local file , On reboot , They just need to read what's changed , You don't need to read all the States , So you can support more partitions , At the same time reduce CPU Consume .
ZooKeeper It's a stand-alone system , Has its own configuration file syntax 、 Management tools and deployment patterns . For deployment Kafka, System administrators need to learn how to manage and deploy two independent distributed systems . For administrators , It could be a difficult task , Especially if they're not familiar with how to deploy Java service . Unified system deployment and configuration will greatly improve Kafka The operation and maintenance experience of , Help to expand its application .
because Kafka and Zookeeper The configuration is separate , So it's easy to make mistakes . for example , The administrator may be in Kafka Set up SASL, And mistakenly believe that this can protect all the data transmitted through the network . But in fact , To ensure data security , Still need to be in ZooKeeper Configure security in the system . Unifying the configuration of these two systems will lead to a unified security configuration model .
Last ,Kafka In the future, it may support single node deployment mode . For those who want to test quickly Kafka But for people who don't want to start multiple daemons , This is very useful , And remove the right ZooKeeper That's what makes the idea come true .
at present ,Kafka A cluster usually contains multiple proxy nodes and ZooKeeper Arbitration node . The picture above shows 4 A proxy node and 3 individual ZooKeeper node . controller （ Orange ） from ZooKeeper Quorum node load state . The line from the controller to the other proxy nodes indicates that the controller pushes updates to them , such as LeaderAndIsr and UpdateMetadata news .
Be careful , The picture above omits something . Agents other than the controller can also interact with Zookeeper signal communication , So it should be from every agent to ZooKeeper Draw a line , But drawing too many lines makes the chart look too complicated . Another problem is , External command line tools can be modified directly without controller ZooKeeper The state of , So it's hard to know if the state in the controller memory really reflects ZooKeeper The state of . In the new architecture , Three controller nodes replace the original three ZooKeeper node . The controller node and the agent node run on separate JVM in . The controller node elects a leader to handle the metadata . The controller does not push updates to the agent , Instead, let the agent get metadata updates from the leader controller , So the arrow points from the proxy to the controller , Instead of pointing from the controller to the agent .
The controller node contains a Raft Arbitration node , Responsible for managing metadata logs . This log contains the change information of cluster metadata . Originally kept in ZooKeeper Everything in , For example, theme 、 Partition 、ISRs、 The configuration, etc. , Will be saved in this log .
The controller node is based on Raft The algorithm elects the leader , Independent of any external system . The elected leader is called the master controller . The master controller handles all the RPC. The slave controller copies data from the master controller , And act as a hot backup when the main controller fails .
and ZooKeeper equally ,Raft Most nodes need to be available to continue . therefore , A three node cluster of controllers can tolerate a single node failure , A five node cluster of controllers can allow two nodes to fail , And so on .
The controller periodically writes the metadata snapshot to disk . Although conceptually it's similar to compression , But the code path is different , Because the new architecture can read state directly from memory , Instead of Rereading logs from disk .
The agent will pass the new MetadataFetch API Get updates from the master controller , Instead of having the controller push updates to the agent .
MetadataFetch Be similar to fetch request . And fetch Request the same , The agent will track the offset of the last data acquisition , And only get updated updates from the master controller .
The agent saves the metadata to disk , So the agent can start quickly , Even hundreds of thousands or even millions of partitions （ Please note that , Because this persistence mechanism is an optimization , So it may not appear in the first version ）.
Most of the time , The agent just needs to get incremental updates , Instead of a full status update . however , If the agent is too far behind the master controller , Or the proxy doesn't cache metadata at all , Then the master controller will send the complete metadata image to the agent , Instead of incremental updates .
The agent will periodically request metadata updates from the master controller . This request also acts as a heartbeat , Let the controller know that the agent is alive .
at present , The agent will start at Zookeeper Register yourself in . This registration does two things ： Let the agent know if it is selected as the controller , Also let other nodes know how to communicate with the node selected as the controller .
Removing ZooKeeper after , The agent will pass MetadataFetch API Register yourself on the controller arbitration node , Not in ZooKeeper in .
at present , If the agent loses contact with ZooKeeper Previous conversations , The controller removes it from the cluster metadata . Removing ZooKeeper after , If the agent does not send the metadata for a long enough time , The master controller will remove the agent from the cluster metadata .
at present , And ZooKeeper Agents that remain connected but isolated from the master controller can continue to serve user requests , But no metadata updates are received , This can lead to consistency issues . for example , Configured with acks=1 The producers may continue to report to the chief （ But the leader may not be the leader any more ） send data , And you can't receive LeaderAndIsrRequest notice .
It's being removed ZooKeeper after , Cluster membership is integrated with metadata update . If the agent cannot receive metadata updates , You can't continue to be a member of the cluster . Status of the agent
This article is from WeChat official account. - Big data is fun （havefun_bigdata）
The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the email@example.com Delete .
Original publication time ： 2021-01-25
Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .