Zookeeper learning

Xiaojian Jianjian 2022-05-14 14:29:24 阅读数:112




node It's a tree structure
 Please add a picture description

ls /
create -e /xjj 1 // Create a temporary node 
create -s -e /xjj 2 // Create a temporary sequence node 
set /xjj 3
get /xjj
stat /xjj

For temporary nodes , When the client is disconnected zk Automatically delete
For permanent nodes ,zk Will not actively delete , Unless the client deletes .


zookeeper Is based on tcp A long connection , The connection will set a timeout , Within the timeout period, if the client does not send ping Confirm ,zk The connection will be closed automatically .
– Heartbeat detection mechanism


Use get -w /xjj Listening node
 Insert picture description here
When other nodes modify nodes, they will return messages
 Insert picture description here
Listen only once .

The specific process is divided into four steps , First, the client uses the function getData、exists、getChildren when , Send a request to the server . After receiving, the server stores it in a HashMap in . When modifying a node, start from hashmap Read , If hit , It's back to the client .


zookeeper Of acl adopt [scheme:id:permissions] To form a list of permissions .

1、scheme: Represents some kind of authority mechanism used , Include world、auth、digest、ip、super several .
2、id: On behalf of the users allowed to access .
3、permissions: Permission combination string , from cdrwa form , Each of these letters represents support for different permissions , Create permissions create、 Delete permission delete(d)、 Read permission read、 Write permissions write(w)、 Administrative authority admin(a).

View default permissions :
getAcl /xjj
 Insert picture description here
After removing the delete permission, the child nodes of the node cannot be deleted , But you can delete /xjj node
world:anyone:crwa, Open to anyone , Access rights are crwa( establish , read , Write , management )
 Insert picture description here
Add a password to the user addauth digest user1:123456

ZAB agreement

ZAB Full name of agreement :Zookeeper Atomic Broadcast(Zookeeper Atomic broadcasting protocol )

ZAB The agreement has four mechanisms :
1.zk Rapid leadership election in
2. Over half mechanism
3.2PC --> Two-phase commit ( Pre submission ,Ack, Submit )
4. Data synchronization

radio broadcast

 Please add a picture description

zab There are two kinds of nodes in ,1 It's the leader leader,2 It's the follower follower, among leader only one ,follower There can be multiple . The following is the specific practice of broadcasting

  1. In a follower When receiving the write command ,follower First submit the order to leader.
    leader Encapsulate the command as a transaction Proposal, Send to each follower In the message queue of , Before sending, it will be propose Generate a unique monotonically increasing id, namely zxid, Deposit in propose in .propose The treatment of the is strictly in accordance with the increasing order id, That's a guarantee Orderliness .
  2. follower After receiving the transaction message, write to the log first , Then return ack to leader.
  3. leader Received ack More than half of the total points ( It doesn't contain leader) Is sent commit Operate to follower, Send a copy at the same time Proposal to observer node . If ack Crash recovery if the number is less than half , To choose leader.
  4. follower Received commit After that, the transactions in the log are officially updated to the local database , meanwhile observer Direct updating .
  5. follower and observer After the node is updated, a copy is returned ack to leader.

zookeeper use Zab The core of the agreement , That is, as long as one server submits Proposal, Make sure that all servers eventually commit correctly Proposal. This is also CAP/BASE Realization Final consistency A manifestation of .

Crash recovery

Crash recovery is divided into two parts ,Leader The election and Data recovery .

1. Leader election

A normal organizational structure is divided into three states :
looking state : It's a wait-and-see state , This is because of internal problems in the organization , Then stop , Do something else .
following state : I'm a member of an organization , Do your own thing .
leading state : I'm the boss of an organization , Do your own thing .
observing state : Observation state , Sync leader state , Don't vote .

When leader After hanging up, all follower All by following The state becomes looking state , Prepare for the election leader.

Every time a user submits a request to write a log, the node's zxid Metropolis +1, natural , If there is no synchronization, the node will hang up , be zxid The largest node data is up to date , More likely to be elected leader.
 Insert picture description here
But if zxid All the same , You should choose myid The largest node . The comparison of two nodes is shown in the figure below .zk2 Send his vote to zk1
 chart 2
zk1 Compare zxid Same then , Compare myid, Find out zk2 The larger . So change the current vote to zk2, Modify your vote in the ballot box , At the same time zk2 Add your votes to the ballot box
 Insert picture description here
zk1 Then send the modified ticket to others ,zk2 After receiving it, deposit it into your own ballot box
 Insert picture description here
The number of votes is more than half of the total points , be zk1 and zk2 Both believe that zk2 by leader.

In fact, the bill delivered in the real situation is (leader, Serverid,Zxid,Epoch) These four values , among epoch Is explained as follows :

stay Zab Transaction number of zxid In design ,zxid It's a 64 Digit number . Which is low 32 Bits can be seen as a simple single increment counter , For each transaction request of the client ,Leader The new is born in Proposal
In business , Will add to the counter 1. And high 32 Bit stands for Leader Periodic epoch Number . epoch
The number can be understood as the age of the current cluster , Or cycle . Every time Leader After the change, it will be in epoch On the basis of 1, So old Leader
After the crash recovered , other Follower I won't listen to it any more , because Follower Obedience only epoch The highest Leader command .
Every time a new Leader , From this Leader Take out the local transaction log and fill in the maximum number on the server Proposal Of zxid, And from
zxid We can get the corresponding epoch Number , And then add 1, After that, the number will be used as a new epoch
value , And will be low 32 The digit number returns to zero , from 0 Start regenerating zxid.

1、 If the server B Received server A The data of ( The server A In an election state (LOOKING state )

1) First , Judge the logical clock value :

a) If the logical clock is sent Epoch Larger than the current logical clock . First , Update this logical clock Epoch, At the same time, clear the logic clock collected from other server The election data . then , Judge whether you need to update your current election leader Serverid. Rule of judgement rules judging: The saved zxid Maximum and leader Serverid To judge . First look at the data zxid, data zxid The great wins ; Second, judge leader Serverid,leader Serverid The great wins ; And then I'll give you my latest election results ( That's the three kinds of data mentioned above (leader Serverid,Zxid,Epoch) Broadcast to others server)

b) If the logical clock is sent Epoch Less than the current logical clock . Explain to each other server In a relatively early Epoch in , Here we only need to transfer the three kinds of data of this machine (leader Serverid,Zxid,Epoch) Just send it over .

c) If the logical clock is sent Epoch Equal to the current logical clock . According to the above judgment rules rules judging To elect leader , And then I'll give you my latest election results ( That's the three kinds of data mentioned above (leader Serverid,Zxid,Epoch) Broadcast to others server).

2) secondly , We have collected the status of all the servers : if , Set up your role based on the election results (FOLLOWING still LEADER), Just quit the election process .

Last , If the election status of all servers is not collected : You can also judge the latest election after the above process leader Is it supported by more than half of the servers , If it is , Then try to 200ms I'm going to take in the data , If there's no new data coming , It means that everyone has acquiesced to this result , Also set the role to exit the election process .

 Please add a picture description

2. Data synchronization

Elected by leaders leader It's not really leader, Only after initial synchronization can it become a real leader.

Since it's time to recover , Some scenes can't be restored ,ZAB The protocol crash recovery requirements are as follows 2 Requirements : First of all : Make sure that it has been leader The submitted proposal Must eventually be all follower Server commit . second : Make sure that the discard has been leader Out but not submitted proposal. This is also zk Guarantee Data consistency The problem that must be solved

Okay , Now start the recovery .

First step : Select the largest current ZXID, The current events are up-to-date .

The second step : new leader Take this incident proposal Submit to others follower node

The third step :follower Nodes will be based on leader The message is backed off or synchronized . The ultimate goal is to ensure that the data copies of all nodes in the cluster are consistent .

This is the whole recovery process , Actually, it's equivalent to having a log , Record every operation , Then restore the latest operation before the accident , And then synchronize .

 Please add a picture description

Zookeeper Based on Observer Deployment architecture

Early ZooKeeper In the process of cluster service running , Only Leader The server and Follow The server .

But as the ZooKeeper It is widely used in distributed environment , The design flaws of the early patterns also arise , The main problems are as follows :

As the size of the cluster grows , On the contrary, the performance of cluster processing write is reduced .

ZooKeeper Cluster cannot be deployed across domains

The main problem is , When ZooKeeper The size of the cluster is getting bigger , In the cluster Follow When the number of servers is increasing ,ZooKeeper The performance of transaction request operations such as creating data nodes will gradually decline . This is because ZooKeeper When a cluster processes transactional request operations , To be in ZooKeeper Vote on the transactional request in the cluster , Only more than half of them Follow The server voted unanimously , The write operation will be executed .

Because of this , With the cluster Follow There are more and more servers , Voting for a write and other related operations becomes more and more complicated , also Follow Network communication between servers is becoming more and more time-consuming , Cause to follow Follow Gradually increase the number of servers , Transactional processing performance is getting worse and worse .

To solve this problem , ZooKeeper A new server role has been created in the cluster , namely Observer—— Observer role server .Observer Can handle ZooKeeper Non transactional requests in the cluster , And not participate in Leader Nodes and other voting related operations . This ensures ZooKeeper Scalability of cluster performance , It also avoids the influence of too many servers participating in voting related operations ZooKeeper The ability of a cluster to handle transactional session requests .

Observer Will not receive from Leader The voting request submitted by the server , And will not receive Proposal Request information , Only from the network INFORM Types of packets .

and INFORM The inside of the information only contains information that has been Commit Operated voting information , because Observer The server receives only those that have been submitted for processing Proposal request , Will not receive uncommitted session requests . So it's isolated Observer Participate in voting operations , So that Observer Only responsible for query and other related non transactional operations , Make sure to expand multiple Observer The server does not respond to ZooKeeper Cluster write operations have an impact on performance .


Distributed lock

Logic of a distributed mall purchase , Which USES curator,interprocessmutex, Then learn .

public String buy(String commodityId, Integer number) throws Exception {

RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3);
CuratorFramework client = CuratorFrameworkFactory.newClient("", retryPolicy);
// Start client 
InterProcessMutex mutex = new InterProcessMutex(client, "/locks");
String ret = "";
try {

if (mutex.acquire(3, TimeUnit.SECONDS)) {

ret = purchaseCommodityInfo(commodityId, number);
} catch (Exception e) {

} finally {

return ret;

Study zk Principle of distributed lock :
Each node trying to acquire a lock will create a temporary ordered node zkClient.createEphemeralSequential(ROOT_NODE + "/", "lock");
Ordered nodes are created , A sequence number will be automatically added after the node , Based on this feature , For node creation , You can use the node name with the same name , Because when it's really created , The serial number will be added automatically .
The use of temporary nodes takes into account the deadlock caused by the sudden shutdown of the client after obtaining the lock and the failure to release the lock .
Use ordered nodes to improve performance , Each node only listens to whether the previous node releases , Instead of all nodes listening to this node .

When a node calls lock(), Try to get the lock first , Failure goes into blocking , Listen to the previous node . The last deleted node wakes up the current node .

public class ZkLock implements Lock {

// Counter , When the lock fails , Blocking 
private static CountDownLatch cdl = new CountDownLatch(1);
//ZooKeeper Server's IP port 
private static final String IP_PORT = "";
// The root path of the lock 
private static final String ROOT_NODE = "/Lock";
// The path to the last node 
private volatile String beforePath;
// Currently locked node path 
private volatile String currPath;
private ZkClient zkClient = new ZkClient(IP_PORT);
public ZkLock() {

// Determine if there is a root node 
if (!zkClient.exists(ROOT_NODE)) {

// Create if it does not exist 
public void lock() {

if (tryLock()) {

System.out.println(" Locking success !!");
} else {

// Attempt to lock failed , Has reached the awaited monitor 
// Try locking again 
public synchronized boolean tryLock() {

// The first time you come in and create your own temporary node 
if (StringUtils.isBlank(currPath)) {

// Temporary ordered node 
// Ordered nodes are created , A sequence number will be automatically added after the node , Based on this feature , For node creation , You can use the node name with the same name , Because when it's really created , The serial number will be added automatically 
currPath = zkClient.createEphemeralSequential(ROOT_NODE + "/", "lock");
// Sort nodes 
List<String> children = zkClient.getChildren(ROOT_NODE);
// At present, the smallest node returns to lock successfully 
if (currPath.equals(ROOT_NODE + "/" + children.get(0))) {

return true;
} else {

// Not the smallest node Just find your previous one By analogy The same is true of release 
int beforePathIndex = Collections.binarySearch(children, currPath.substring(ROOT_NODE.length() + 1)) - 1;
beforePath = ROOT_NODE + "/" + children.get(beforePathIndex);
// Return lock failed 
return false;
public void unlock() {

// Delete the node and shut down the client 
private void waitForLock() {

IZkDataListener listener = new IZkDataListener() {

// Listen for node update events 
public void handleDataChange(String s, Object o) throws Exception {

// Listen to the event that the node is deleted 
public void handleDataDeleted(String s) throws Exception {

// unblocked 
// Listen to the previous node 
this.zkClient.subscribeDataChanges(beforePath, listener);
// Determine whether the previous node exists 
if (zkClient.exists(beforePath)) {

// The last node exists 
try {

System.out.println(" Locking failed wait for ");
// Locking failed , Block waiting 
} catch (InterruptedException e) {

// Release the monitor 
zkClient.unsubscribeDataChanges(beforePath, listener);
public boolean tryLock(long time, TimeUnit unit) throws InterruptedException {

return false;
public void lockInterruptibly() throws InterruptedException {

public Condition newCondition() {

return null;


zk stay cap Position in theory /eureka

cap: Uniformity , Usability , Partition tolerance

Uniformity : Multiple nodes in the cluster , When a node changes , Other nodes should be synchronized , Agreement . Divided into strong consistency , Weak consistency , Final consistency ,zk It's ultimate consistency .
Usability : When getting data from a node that suddenly hangs up in the cluster , There will be no errors
Partition tolerance : The network between nodes is disconnected due to unexpected reasons , Each partition has fault tolerance ,

Partition fault tolerance is necessary , In case of network failure, one of consistency and availability must be selected .
zookeeper It belongs to consistency (cp) --> Leader election data recovery , But this process is not available
eureka It belongs to availability (ap) -->https://www.cnblogs.com/jichi/p/12797557.html

Look again

To learn

Consistency protocol algorithm -2PC、3PC、Paxos、Raft、ZAB、NWR Super detailed analysis

  1. Data Publishing / subscribe
  2. Load balancing
  3. Distributed coordination / notice
  4. Cluster management
  5. Cluster management
  6. master management
  7. Distributed queues


版权声明:本文为[Xiaojian Jianjian]所创,转载请带上原文链接,感谢。 https://javamana.com/2022/134/202205141340450275.html