Detailed explanation of Kafka water level

solve 2021-01-21 06:45:26
detailed explanation kafka water level

What is water level

kafka Water level is used to describe , The number of visible data in a partition offset. Maybe you need to know these ：

1. hw（ The water level ） You can understand it as a whole （ All copies are the smallest offset） Of offset, It's a partition
2. LEO Represents the maximum of all messages for that replica offset, For a copy , That is, every copy has LEO, And it's different . The smallest of all copies LEO It's the water level

The water level

Why water level ？

In order to ensure the consistency of data

Kafka The operation of the middle water level

When Leader Receive a message and write successfully , Its LEO Then and immediately +1. Copies will go to and Leader To synchronize , Every message synchronized , Its own LEO The corresponding +1, And the water level is the smallest of all copies LEO, So it's going to slowly increase .

• and ACK The relationship between In fact, there is no necessary relationship , ACK It's used to keep the data from being lost , and hw It is used to ensure the consistency of client consumption , However, when ACK=-1, Because it will wait until the data is completely written to all copies , To return to success , That is to say, all copies of LEO all +1, The water level is bound to be +1, At this time, data loss can be avoided . Otherwise, there is a risk of data loss
• Why? ACK != -1 Data can be lost Like a partition , Four copies 1,2,3,4： 1：LEO = 10（Leader） 2：LEO = 8 3：LEO = 7 4：LEO = 6 here hw = 6, When 1 Number Leader Hang up , 3 No.1 was elected the new Leader, Then all surviving copies will first hw Clear all data except , And then from Leader The data after synchronizing the water level such as 2 Copy number ： Clear your own data first 7,8,LEO become 6, Then go to Leader Sync 6 Later data , Even old Leader1 It's back ,6-10 The number one data is still , We also need to Down time hw After that, all the data will be cleared , From again Leader Synchronous data therefore ... The conclusion is that , As long as the water level does not cover the data , There is a risk of loss . This is also hw The significance of , Ensure data consistency

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

https://javamana.com/2021/01/20210121064236954a.html