Zookeeper (4) -- ZK cluster deployment and election

White Dew is not frost 2020-11-11 22:18:17
zookeeper zk cluster deployment election

One 、 Cluster deployment

1. Prepare three machines , Install well ZK. It is strongly recommended that an odd number of machines , because zookeeper By judging the survival of most nodes to determine whether the entire service is available .3 Nodes , Hang up 2 Indicates that the whole cluster is down , And even numbers 4 individual , Hang up 2 It also means that most of them don't survive , So it's going to hang up , On the contrary, I feel that one more machine resource is wasted .

2. Modify the configuration file

Fixed grammar format :server. node ID=ip: Data synchronization port : Election port

node ID: service id Specify manually 1 to 125 Number between , And write it to the corresponding service node {dataDir}/myid In file .

IP Address : Remote node IP Address

Data synchronization port : Master slave synchronous data replication port .

Remote port : Master node hung , Select the communication port of the new master node .

Such as :




All three machines have the same content :


 3. The... Specified in the configuration file dataDir Create... Under the directory myid File and write the corresponding value in the configuration

my dataDir Directory is /tmp/zookeeper


Corresponding to IP Corresponding node ID write in


 4. Each uses the configuration file to start the service

./zkServer.sh start ../conf/zoo.cfg


5. Check the status of each node

./zkServer.sh status





  We can see that two of them are follower One is leader


6. Connect clusters

Any node can be connected to the cluster , You can also connect every one of them , Use , Division

zkCli.sh Add parameters after -server Indicates that the connection is specified IP And port

./zkCli.sh -server

Connect clusters 67 node Write data , And then connect 68 You can also see the data written , It means the data is synchronized



  If we stop a machine , Clusters are still available , If it's stopped leader, Then the cluster will elect a new leader, The entire cluster is not available at election time . If you shut down two machines , The cluster will not be available .


Two 、 The cluster character

The front by ./zkServer.sh status Command we see in the cluster role leader and follower, One more observer



Master node , Also known as leader . For writing data , By election , If it goes down, it will elect a new master node .


Child node , Also known as follower . For reading data . At the same time, it is also the alternative node of the master node , And with the right to vote .


Secondary child nodes , Also known as observer . For reading data , And follower The difference is that there is no right to vote , Cannot select primary node . And when calculating the available state of the cluster, it will not observer Calculated in .


observer To configure :

Just add... To the cluster configuration observer Suffixes are enough , Examples are as follows :



3、 ... and 、 A cluster election

Let's go through ./zkServer.sh status Instructions have seen 68 Mechanical leader, 67 and 69 yes follower

Why? 68 yes leader Well ? When it comes to mass elections , The first round is all for yourself , Every time after that, I'm going to invest more than myself myid Large adjacent nodes . If the vote is more than half, the election is over .



If it is four nodes, there will be a third round of elections , The first node in the third round will be cast to the third node , So if it is 4 Nodes, so leader It's going to be the third node .

There are two kinds of election in cluster nodes , One is node startup , The other is that more than half of the nodes can't communicate with leader Establishing a connection .

When the node is initially started, it will look in the cluster for Leader node , If it is found, it will be with Leader Establishing a connection , Its own state changes follower or observer. If not found Leader, The current node state will change LOOKING, Enter the election process .

During cluster operation, if there is follower or observer As long as the node downtime is less than half, it will not affect the normal operation of the entire cluster service . But if leader Downtime , External services will be suspended , all follower Will enter the LOOKING state , Enter the election process .


Four 、 Data synchronization

zookeeper Data synchronization is to ensure the data in each node Uniformity . One is the normal client data submission , The other is data synchronization after service outage recovery . In the previous operation, we also saw that after writing data on a machine , There's data on other machines .

When data is written , The request may be sent to follower Of , The request will be forwarded to leader

1.client towards zk Medium server Send write request , If it's time to server No leader, The write request is forwarded to leader server,leader The transaction will be requested to proposal The form is distributed to follower;

2. When follower Yes, I have leader Of proposal when , Process according to the order of receiving proposal;

3. When Leader received follower For something proposal More than half ack after , The transaction commit is initiated , Launch a new one commit Of proposal

4.Follower received commit Of proposal after , Record transaction commit , And update the data to the memory database ; When you write successfully , Feedback to client.

  If there is a follower The node is down , Because not more than half of the nodes are down , The cluster can still work normally . When leader New client request received , At this time, it is unable to synchronize with the down node . The data are different . To solve this problem , When the node starts , The first thing is to find the current Leader, Whether the comparison data is consistent with . If not, it will start to synchronize , After the synchronization is completed, the external services are provided .

that zk How to confirm the data version , It's through the introduction of Zxid, For comparison . Able to participate leader The election node is also zxid The latest node ( newest zxid The data is complete )


Zxid It's a length 64 Digit number , Which is low 32 Bits are incremented by numbers , Any change in data will result in , low 32 The number of bits is simply added with 1. high 32 Is it leader Period number , Whenever a new leader when , new leader Just take it out of the local log ZXID, And then it resolves to high 32 The period number of bits , add 1, And then lower 32 All bits are set to 0. This ensures that every new election leader after , To ensure the ZXID It's unique and incremental .

In short, elections will make zxid The height of 32 Data plus 1, Every time the data changes, it makes zxid It's low 32 Bit data plus 1, therefore zxid The largest node data is always the most complete one .


5、 ... and 、 Cluster operation and maintenance instructions

Zk Some operation and maintenance related instructions are provided , Can pass telnet or nc towards zk Give orders . These orders all have 4 It is also called four character operation and maintenance command .

By default, these commands are off , Configure through the configuration file 4lw.commands.whitelist To turn on these commands

Partially open :4lw.commands.whitelist=stat, conf, isro,envi

All on :4lw.commands.whitelist=*


Installation may be required Netcat Tools

yum install -y nc

Check the server and client connection status :

echo stat | nc localhost 2181


1.conf3.3.0 What's new in : Print details about service configuration .

2.crst3.3.0 What's new in : Reset all connected connections / Session Statistics .

3.dump: List outstanding sessions and temporary nodes . This only applies to leader.

4.envi: Print details about the service environment

6.ruok: Test whether the server is running in a non error state . If the server is running , It will take imok Respond to . otherwise , It will not respond at all . Respond to “ imok” It does not necessarily mean that the server has joined the Arbitration , Only the server process is active and bound to the specified client port . Use “ stat” Get more information about state arbitration and client connection information .

7.srst: Reset server statistics .

8.srvr3.3.0 New features in : List the full details of the server .

9.stat: List brief details of the server and connected clients .

10.wchs3.3.0 What's new in : List brief information about server monitoring .

11.wchc3.3.0 What's new in : List details about server monitoring by session . This will output with relevant monitoring ( route ) Conversation ( Connect ) list . Please note that , according to watch The number of , This operation can be expensive ( That is, it affects the server performance ), Please use with care .

12.dirs3.5.1 What's new in : Displays the total size of snapshot and log files in bytes

13.wchp3.3.0 What's new in : List details about server monitoring by path . This will output the path with the associated session (znode) list . Please note that , According to the number of watches , This operation can be expensive ( That is, it affects the server performance ), Please use with care .

14.mntr3.4.0 What's new in : Output a list of variables that can be used to monitor the health of the cluster .


本文为[White Dew is not frost]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云