[big data bibiji 20210123] don't ask, ask is Kafka highly reliable

Big data is fun 2021-02-23 16:45:48
big data bibiji don ask

High reliability analysis

Kafka The guarantee of high reliability comes from its robust replica (replication) Strategy . By adjusting its copy parameters , You can make Kafka The ease of operation between performance and reliability .Kafka from 0.8.x The version begins to provide Partition Replication of levels ,replication Quantity can be configured in files (default.replication.refactor) In or create Topic When it's time to specify .

Here first from Kafka Start with the file storage mechanism , Learn from the bottom Kafka Storage details of , And then there's a microcosmic understanding of storage . After through Kafka Replication principle and synchronization to explain the concept of the macro level . Finally from the ISR,HW,leader Election and data reliability and persistence assurance and other dimensions to enrich the knowledge of Kafka Cognition of relevant knowledge points .

Kafka File storage mechanism

Kafka Chinese news is based on Topic To classify , Producers through Topic towards Kafka Broker Send a message , Consumers go through Topic Reading data . However Topic On the physical level, it can also use Partition Grouping , One Topic It can be divided into several Partition, that Topic as well as Partition How is it stored ?Partition You can also subdivide it into Segment, One partition In physics, there are many Segment form , So these Segment And what is it ? Let's find out one by one .

For the sake of explaining the problem , Suppose there is only one Kafka colony , And there is only one cluster Kafka Broker, There's only one physical machine . In this Kafka Broker Middle configuration log.dirs=/tmp/Kafka-logs, To set up Kafka Message file storage directory , At the same time, create one called topic_zzh_test Of Topic,Partition The quantity of is 4(kafka-topics.sh --create --zookeeper localhost:2181 --partitions 4 --topic topic_zzh_test --replication-factor 1). So we're in /tmp/Kafka-logs You can see the build in the directory 4 A catalog :

drwxr-xr-x 2 root root 4096 Apr 10 16:10 topic_vms_test-0
drwxr-xr-x 2 root root 4096 Apr 10 16:10 topic_vms_test-1
drwxr-xr-x 2 root root 4096 Apr 10 16:10 topic_vms_test-2
drwxr-xr-x 2 root root 4096 Apr 10 16:10 topic_vms_test-3

stay Kafka File storage , The same Topic There are many different Partition, Every Partiton For a directory ,Partition The name rule of is :topic name + Sequence number , The first number is from 0 Starting meter , The largest serial number is Partition Quantity reduction 1,Partition It's a concept in real physics , and Topic It's a logical concept .

above-mentioned Partition You can also subdivide it into Segment, This Segment What is it again? ? If you use Partition Is the minimum storage unit , We can imagine when Kafka Producer Keep sending messages , It is bound to cause Partition Unlimited expansion of documents , This has a serious impact on the maintenance of message files and the cleaning of messages that have been consumed , So here we are Segment As a unit, it will Partition To subdivide . Every Partition( Catalog ) It's equivalent to a huge file being evenly distributed to multiple equal sized Segment( paragraph ) In the data file ( Every Segment The number of messages in the file is not necessarily equal ) This feature is also convenient Old Segment The deletion of , That is, it is convenient to clean up the messages that have been consumed , Improve disk utilization . Every Partition Just need to support sequential read-write ,Segment The file life cycle is configured by the server (log.segment.bytes,log.roll.{ms,hours} And so on ) decision .

Segment The document consists of two parts , Respectively index Document and log file , Expressed as Segment Index files and data files . The command rule for these two files is :Partition First of all segment from 0 Start , Each of the following Segment The file name is last Segment Of the last message in the file offset value , The value is 64 position ,20 Digit character length , There's no number 0 fill , as follows :


On the surface of the above Segment File as an example , display Segment:00000000000000170410 Of index Document and log The corresponding relationship of the document , Here's the picture :

Pictured above ,index Index files store a lot of metadata ,log Data files store a lot of messages , The metadata in the index file points to the corresponding data file message The physical offset address of .

1. How to locate data location according to index file metadata ?
Such as :index Index file metadata [3,348], stay log The data file represents the second 3 A message , In the global partition In the said 170410+3=170413 A message , The message is in the corresponding log The physical offset address in the file is 348.
2. So how to get from partition Pass through offset lookup message Well ?
Such as : Read offset=170418 The news of , lookup segment file , among ,
α. 00000000000000000000.index For the first file ,
β. 00000000000000170410.index(start offset=170410+1=170411),
γ. 00000000000000239430.index(start offset=239430+1=239431),
therefore , location offset=170418 stay 00000000000000170410.index In index file . Other follow-up documents can be analogized in turn , Name and arrange these files by offset , Then according to the binary search method can quickly locate to the specific file location . secondly , according to 00000000000000170410.index In the document [8,1325] Locate the 00000000000000170410.log In the document 1325 The location to read .
3. So how do you know when to finish reading this message , Otherwise, I'll read the next message ?
Because messages have a fixed physical structure , Include :offset(8 Bytes)、 Size of message body (4 Bytes)、crc32(4 Bytes)、magic(1 Byte)、attributes(1 Byte)、key length(4 Bytes)、key(K Bytes)、payload(N Bytes) Wait for the fields , Can determine the size of a message , That is, where the read ends .

Replication principle and synchronization mode

Kafka in Topic Each Partition There is a pre written log file , although Partition It can be further subdivided into several Segment file , But for the upper application, you can use Partition Think of it as the smallest storage unit ( One has many Segment Document splicing “ Giant ” file ), Every Partition It's all made up of a series of ordered 、 Immutable message composition , These messages are continuously appended to Partition in .

There are two new nouns in the picture above :HW and LEO. Let's start with LEO,LogEndOffset Abbreviation , Represent each Partition Of log The last item Message The location of .HW yes HighWatermark Abbreviation , Refer to Consumer Can see this Partition The location of , This involves the concept of multiple copies , Let me first mention , In the next section, we will see more details .

Get down to business , To improve the reliability of messages ,Kafka Every Topic Of Partition Yes N Copies (replicas), among N(>=1) yes Topic The replication factor (replica fator) The number of .Kafka Automatic failover through multi replica mechanism , When Kafka One of the clusters Broker The service is still available in case of failure . stay Kafka When replication occurs in, make sure that Partition The logs of can be written to other nodes in an orderly manner ,N individual replicas in , One of them replica by Leader, Everything else is Follower, Leader Handle Partition All write requests for , meanwhile ,Follower Will passively and regularly copy Leader The data on the .

As shown in the figure below ,Kafka In the cluster has 4 individual Broker, some Topic Yes 3 individual Partition, And the replication factor is the number of copies 3:

Kafka Provide data replication algorithm guarantee , If Leader To fail or hang up , A new... Will be elected Leader, And accept the writing of client message .Kafka Make sure to select a replica from the list of synchronous replicas as Leader, Or say Follower Catch up with Leader data .Leader Responsible for maintenance and tracking ISR(In-Sync Replicas Abbreviation , The replica synchronization queue ) All in Follower A state of lag . When Producer Send a message to Broker after ,Leader Write the message and copy it to all Follower. After the message is submitted, it is successfully copied to all synchronous copies . Message replication latency is the slowest Follower Limit , It's important to quickly detect slow copies , If Follower“ backward ” Too much or failure ,leader Will take it from ISR Delete in .


In the last section we talked about ISR (In-Sync Replicas), This refers to the replica synchronization queue . The number of copies is right Kafka The throughput has a certain impact , But it greatly enhances usability . By default ,Kafka Of replica The number of 1, each Partition There's a unique one Leader, To ensure the reliability of the message , In general, its value will be ( from Broker Parameters of default.replication.factor Appoint ) The size is set to greater than 1, such as 3. All the copies (replicas) Collectively referred to as Assigned Replicas, namely AR.ISR yes AR A subset of , from Leader maintain ISR list ,Follower from Leader There are some delays in synchronizing data ( Including delay time replica.lag.time.max.ms And the number of delays replica.lag.max.messages Two dimensions , The latest version 0.10.x China only supports replica.lag.time.max.ms This dimension ), Any one that exceeds the threshold will put Follower Remove ISR, Deposit in OSR(Outof-Sync Replicas) list , Newly added Follower It will also be stored in OSR in .AR=ISR+OSR.

Kafka Removed after version replica.lag.max.messages Parameters , Just keep replica.lag.time.max.ms As ISR Parameters for replica management in . Why do you do this ?replica.lag.max.messages Indicates that a copy is currently behind Leader The number of messages for exceeds the value of this parameter , that Leader It will Follower from ISR Delete in . Suppose the settings replica.lag.max.messages=4, So if Producer Send once to Broker The number of messages is less than 4 Stripe time , Because in leader Accept to Producer After the message sent , and follower Before the copy starts pulling these messages ,follower backward leader No more than 4 Bar message , So there is no follower Removed from the ISR, So at this point replica.lag.max.message It seems reasonable to set . however Producer Initiate instantaneous peak traffic ,Producer More than 4 Stripe time , That's more than replica.lag.max.messages, here Follower Will be considered to be associated with Feader The copy is out of sync , And got kicked out ISR. But actually these Follower They are all alive and have no performance problems . Then catch up with Leader, And was rejoined ISR. And then there's a lot of them picking out ISR And then back to ISR, This undoubtedly increases the unnecessary loss of performance . And this parameter is Broker Overall . The settings are too big , The impact is really “ backward ”Follower The removal of ; It's too small , Lead to Follower In and out of . There is no way to give a suitable replica.lag.max.messages Value , therefore , The new version of the Kafka Removed this parameter .

Be careful :ISR It includes :Leader and Follower.

The previous section also deals with a concept , namely HW.HW Commonly known as high water level ,HighWatermark Abbreviation , Take a Partition Corresponding ISR The smallest of all LEO As HW,Consumer At most, we can only consume HW Where it is . And each of them replica There are HW,Leader and Follower They are responsible for updating their own HW The state of . about Leader Newly written messages ,Consumer No immediate consumption ,Leader Will wait for the message to be owned ISR Medium replicas Update after synchronization HW, Only then can the message be Consumer consumption . This ensures that if Leader Where Broker invalid , The news is still available from the newly elected Leader In order to get . For internal Broker Read requests for , No, HW The limitation of .

The following figure illustrates in detail when Producer Production news to Broker after ,ISR as well as HW and LEO The flow process of :

thus it can be seen ,Kafka The replication mechanism of is not a complete synchronous replication , It's not just asynchronous replication . in fact , Synchronous replication requires all working Follower It's all copied , This news will be commit, This mode of replication greatly affects throughput . And asynchronous replication ,Follower Asynchronously from Leader Copy the data , Data only needs to be Leader write in log Is considered to have been commit, In this case, if follower They haven't been copied yet , Behind the Leader when , All of a sudden Leader Downtime , You lose data . and Kafka This use of ISR The way to balance data loss and throughput is very good .

Kafka Of ISR Management will eventually feed back to Zookeeper Node . The specific location is :/brokers/topics/[topic]/partitions/[partition]/state. At present, there are two places where this Zookeeper For maintenance :

  1. Controller To maintain the :Kafka One of the clusters Broker Will be elected as Controller, Mainly responsible for partition Management and replica state management , It also performs a redistribution like partition Management tasks like that . Under certain conditions ,Controller Under the LeaderSelector Will elect a new Leader,ISR And the new leader_epoch And controller_epoch write in Zookeeper In the related nodes of . At the same time initiate LeaderAndIsrRequest Inform all replicas.
  2. Leader To maintain the :Leader There is a separate thread that detects ISR in Follower Whether to break away from ISR, If you find that ISR change , Then the new ISR The information returned to Zookeeper In the related nodes of .

This article is from WeChat official account. - Big data is fun (havefun_bigdata)

The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the yunjia_community@tencent.com Delete .

Original publication time : 2021-01-23

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

本文为[Big data is fun]所创,转载请带上原文链接,感谢

  1. J2EE
  2. Vue uses SDK to upload seven cows
  3. k8s-dns
  4. JavaScript mailbox verification - regular verification
  5. k8s-dashboard
  6. How many questions can you answer?
  7. Spring annotation -- transactional
  8. [k8s cluster] construction steps
  9. k8s-kubeadm
  10. k8s-etcd
  11. Using HashMap to improve search performance in Java
  12. There is no class problem when Maven publishes jar package
  13. JavaScriptBOM操作
  14. J2EE
  15. k8s-prometheus-memory
  16. k8s-prometheus disk
  17. k8s-prometheus
  18. JavaScript BOM operation
  19. k8s-prometheus-memory
  20. k8s-prometheus disk
  21. k8s-prometheus
  22. Linux Disk Command
  23. Linux FS
  24. 使用docker-compose &WordPress建站
  25. Linux Command
  26. This time, thoroughly grasp the depth of JavaScript copy
  27. Linux Disk Command
  28. Linux FS
  29. Using docker compose & WordPress to build a website
  30. Linux Command
  31. 摊牌了,我 HTTP 功底贼好!
  32. shiro 报 Submitted credentials for token
  33. It's a showdown. I'm good at it!
  34. Shiro submitted credentials for token
  35. Linux Stress test
  36. Linux Root Disk Extension
  37. Linux Stress test
  38. Linux Root Disk Extension
  39. Redis高级客户端Lettuce详解
  40. springboot学习-综合运用(一)
  41. 忘记云服务器上MySQL数据库的root密码时如何重置密码?
  42. Detailed explanation of lettuce, an advanced client of redis
  43. Springboot learning integrated application (1)
  44. Linux File Recover
  45. Linux-Security
  46. How to reset the password when you forget the root password of MySQL database on the cloud server?
  47. Linux File Recover
  48. Linux-Security
  49. LiteOS:盘点那些重要的数据结构
  50. Linux Memory
  51. Liteos: inventory those important data structures
  52. Linux Memory
  53. 手把手教你使用IDEA2020创建SpringBoot项目
  54. Hand in hand to teach you how to create a springboot project with idea2020
  55. spring boot 整合swagger2生成API文档
  56. Spring boot integrates swagger2 to generate API documents
  57. linux操作系统重启后 解决nginx的pid消失问题
  58. Solve the problem of nginx PID disappearing after Linux operating system restart
  59. JAVA版本号含义
  60. The meaning of java version number