- The leader （leader）, Responsible for the initiation and resolution of the vote , Update system status .
- Learners' （learner）, Including followers （follower） And the observer （bserver）,follower It is used to accept the client request and return the result to the client , Take part in the voting process
- Observer Can accept client connections , Forward the write request to leader, but observer Not in the voting process , Sync only leader The state of ,observer The goal is to extend the system , Improve read speed
- client （client）, Request originator
Zookeeper At the heart of that is atomic radio , This mechanism ensures that Server Synchronization between . The protocol that implements this mechanism is called Zab agreement .Zab There are two modes of protocol , They are recovery modes （ Elector ） And broadcast mode （ Sync ）. When the service starts or after the leader crashes ,Zab It's in recovery mode , When leaders are elected , And most of them Server Finished and leader After status synchronization of , Recovery mode is over . State synchronization ensures leader and Server Have the same system state .
In order to ensure the order consistency of transactions ,zookeeper Incremental transactions adopted id Number (zxid） To identify transactions . All proposals （proposal） It was added when it was proposed zxid. In the implementation zxid It's a 64 Digit number , It's high 32 Is it epoch Used to identify leader Does the relationship change , One at a time leader Be chosen , It will have a new one epoch, The identity currently belongs to that leader The reign of . low 32 Bits for increasing count .
• Every Server There are three states in the working process ：
LOOKING： At present Server I do not know! leader Who is it? , Searching for
LEADING： At present Server It's an elected leader
FOLLOWING：leader It has been elected , At present Server Keep up with it
The rest can be inspected ：[https://www.cnblogs.com/lpshou/archive/2013/06/14/3136738.html](https://www.cnblogs.com/lpshou/archive/2013/06/14/3136738.html)
2、Zookeeper The reading and writing mechanism of
*Zookeeper It's one by many server A cluster of * One leader, Multiple follower * Every server Keep a copy of the data * Global data consistency * Distributed read and write * Update request forwarding , from leader The implementation of
* The update request sequence goes on , From the same client The update requests of are executed in the order in which they are sent . * Atomicity of data updates , A data update is either successful , Or failure . * Globally unique data view ,client No matter which one is connected to server, Data views are consistent . * The real time , Within a certain range of events ,client Can read the latest data .
4、Zookeeper Node data operation flow
notes ：1. stay Client towards Follwer Make a request to write
2.Follwer Send the request to Leader
3.Leader When received, initiate voting and notify Follwer Vote
4.Follwer Send the result of the vote to Leader
5.Leader After summarizing the results, if you need to write , Start writing and notify the Leader, then commit;
6.Follwer Return the result of the request to Client
• Follower There are four main functions ：
• 1. towards Leader Send a request （PING news 、REQUEST news 、ACK news 、REVALIDATE news ）;
• 2 . receive Leader Message and process ;
• 3 . receive Client Request , If it's a write request , Send to Leader Vote ;
• 4 . return Client result .
• Follower The message loop processing of comes from Leader The news of ：
• 1 .PING news ： Heartbeat message ;
• 2 .PROPOSAL news ：Leader The proposal launched , requirement Follower vote ;
• 3 .COMMIT news ： Information about the latest proposal on the server side ;
• 4 .UPTODATE news ： Indicates that synchronization is complete ;
• 5 .REVALIDATE news ： according to Leader Of REVALIDATE result , Close to revalidate Of session Or is it allowed to receive messages ;
• 6 .SYNC news ： return SYNC Result to client , This message was originally initiated by the client , Used to force the latest updates .
5、Zookeeper leader The election
• By half
– 3 Taiwan machine Hang one 2>3/2
– 4 Taiwan machine hang 2 platform 2！>4/2
• A The proposal says , I want to choose myself ,B Do you agree? ？C Do you agree? ？B say , I agree to choose A;C say , I agree to choose A.( Be careful , More than half of them here , In fact, in the real world elections have been successful .
But the computer world is very strict , In addition, understand the algorithm , To continue the simulation .)
• next B The proposal says , I want to choose myself ,A Do you agree? ;A say , More than half of me have agreed to be elected , Your proposal is invalid ;C say ,A More than half have agreed to be elected ,B The proposal is invalid .
• next C The proposal says , I want to choose myself ,A Do you agree? ;A say , More than half of me have agreed to be elected , Your proposal is invalid ;B say ,A More than half have agreed to be elected ,C The proposal is invalid .
• The election has been made Leader, The back are all follower, Can only obey Leader The order of . And here's a little detail , In fact, who starts first and who is in charge .
• znode The state information of the node contains czxid, So what is zxid Well ?
• ZooKeeper Every change of state , All correspond to an increasing Transaction id, The id be called zxid. because zxid The incremental nature of , If zxid1 Less than zxid2, that zxid1 It must precede zxid2 happen .
Create any node , Or update the data of any node , Or delete any node , Will result in Zookeeper The state changes , Which leads to zxid The value of the increase .
7、Zookeeper working principle
» Zookeeper At the heart of that is atomic radio , This mechanism ensures that server Synchronization between . The protocol that implements this mechanism is called Zab agreement .Zab There are two modes of protocol , They are recovery mode and broadcast mode .
When the service starts or after the leader crashes ,Zab It's in recovery mode , When leaders are elected , And most of them server And leader After status synchronization of , Recovery mode is over .
State synchronization ensures leader and server Have the same system state
» once leader Already with most of follower After state synchronization , He can start broadcasting , That is to say, enter the broadcast state . At this time when a server Join in zookeeper In service , It will start in recovery mode ,
Find out leader, And on and on leader State synchronization . By the end of synchronization , It also participates in the news broadcast .Zookeeper The service has been maintained at Broadcast state , until leader Collapsed or leader Lost most of followers Support .
» The broadcast mode needs to guarantee proposal Be dealt with in order , therefore zk Incremental transactions adopted id Number (zxid) To guarantee . All proposals (proposal) It was added when it was proposed zxid.
In the implementation zxid It's a 64 The number for , It's high 32 Is it epoch Used to identify leader Does the relationship change , One at a time leader Be chosen , It will have a new one epoch. low 32 Bit is an incremental count .
» When leader Collapse or leader Lose most of follower, Now zk Enter recovery mode , The recovery model needs to re elect a new leader, Let all server All return to the right state .
» Every Server Ask others after startup Server Who is it going to vote for .
» For others server Of ,server Every time according to their own state to reply to their own recommendations leader Of id And the last time I did business zxid（ Every... When the system starts server Will recommend themselves ）
» Received all Server After reply , Just figure it out zxid Which is the biggest Server, And will the Server The information is set to the next time you vote Server.
» The one who gets the most votes in the process is sever For the winner , If the winner has more than half of the votes , Then change server Be selected as leader. otherwise , Continue the process , until leader To be elected
» leader Will start to wait server Connect
» Follower Connect leader, Will be the biggest zxid Send to leader
» Leader according to follower Of zxid Determine the synchronization point
» Notify... When synchronization is complete follower Has become a uptodate state
» Follower received uptodate After the news , Can accept again client Of the requests for services
8、 Data consistency and paxos Algorithm
• It is said that Paxos The difficulty in understanding the algorithm is as admirable as the popularity of the algorithm , So let's first look at how to keep the data consistent , Here's a principle ：
• In a distributed database system , If the initial state of each node is the same , Each node performs the same sequence of operations , So they end up with a consistent state .
• Paxos What problem does the algorithm solve , The solution is to ensure that each node performs the same sequence of operations . ok , It's not easy ,master Maintain a
Global write queue , All write operations must Put this queue number , So no matter how many nodes we write , As long as the write operation is numbered , Can guarantee one
Sexual nature . you 're right , this is it , But if master Hang up .
• Paxos The algorithm uses voting to globally number write operations , At the same time , Only one write operation is approved , At the same time, concurrent write operations need to win votes ,
Only writing that gets more than half of the votes will be approval （ So there will always be only one write approved ）, Other write operations failed to compete and had to initiate one more
A round of voting , That's it , In the voting day after day, year after year , All write operations are strictly numbered order . The number is strictly increasing , When a node accepts a
The number is 100 Write operations for , And then I received the number 99 Write operations for （ Because of many unforeseen reasons such as network delay ）, It immediately realizes that it data
It's not the same , Automatically stop the external service and restart the synchronization process . If any node is hung up, the data consistency of the whole cluster will not be affected （ total 2n+1 platform , Unless you hang up more than n platform ）.
Recommended books ：《 from Paxos To Zookeeper Principle and practice of distributed consistency 》
• Zookeeper Need to ensure high availability and strong consistency ;
• In order to support more clients , Need to add more Server;
• Server An increase in , The delay in the voting phase increases , Affect performance ;
• Weigh scalability and high throughput , introduce Observer
• Observer Don't vote ;
• Observers Accept client connections , And forward the write request to leader node ;
• Add more Observer node , Improve scalability , At the same time, the throughput is not affected
10、 Why? zookeeper The number of clusters , It's usually odd ？
•Leader The election algorithm uses Paxos agreement ;
•Paxos The core idea ： When the majority Server Write a successful , The task data is written successfully if there is 3 individual Server, Then two write success can ; If there is 4 or 5 individual Server, Then three write success can .
•Server The number is usually odd （3、5、7） If there is 3 individual Server, At most 1 individual Server Hang up ; If there is 4 individual Server, It is also allowed at most 1 individual Server Hang up from this ,
We can see that 3 Servers and 4 The disaster recovery capability of servers is the same , So in order to save server resources , We usually use odd numbers , Number of deployed as servers .
11、Zookeeper Data model of
» Hierarchical directory structure , Naming meets general file system specifications
» Each node is in zookeeper called znode, And it has a unique path identifier
» node Znode Can contain data and child nodes , however EPHEMERAL A node of type cannot have children
» Znode The data in can have multiple versions , For example, there are multiple data versions in a certain path , Then you need to bring the version to query the data in this path
» The client application can set the monitor on the node
» The node does not support partial read / write , It's a one-time, complete read-write
12、Zookeeper The node of
» Znode There are two types of , For a short time （ephemeral） And lasting （persistent）
» Znode The type of is determined at creation time and cannot be modified later
» brief znode At the end of the client session for ,zookeeper It will be short znode Delete , brief znode There can be no child nodes
» persistent znode Does not depend on the client session , Only when the client explicitly wants to delete the persistence znode Will be deleted
» Znode There are four forms of directory nodes
» PERSISTENT（ lasting ）
» EPHEMERAL( Temporary )
» PERSISTENT_SEQUENTIAL（ Persist the sequential numbering of the directory nodes ）
» EPHEMERAL_SEQUENTIAL（ Temporarily number the directory nodes in sequence ）