[big data bibiji 20210123] don't ask, ask is the most reliable Kafka

Wang Zhiwu 2021-01-23 18:46:38
big data bibiji don ask

High reliability analysis

Kafka The guarantee of high reliability comes from its robust replica (replication) Strategy . By adjusting its copy parameters , You can make Kafka The ease of operation between performance and reliability .Kafka from 0.8.x The version begins to provide Partition Replication of levels ,replication Quantity can be configured in files (default.replication.refactor) In or create Topic When it's time to specify .

Here first from Kafka Start with the file storage mechanism , Learn from the bottom Kafka Storage details of , And then there's a microcosmic understanding of storage . After through Kafka Replication principle and synchronization to explain the concept of the macro level . Finally from the ISR,HW,leader Election and data reliability and persistence assurance and other dimensions to enrich the knowledge of Kafka Cognition of relevant knowledge points .

Kafka File storage mechanism

Kafka Chinese news is based on Topic To classify , Producers through Topic towards Kafka Broker Send a message , Consumers go through Topic Reading data . However Topic On the physical level, it can also use Partition Grouping , One Topic It can be divided into several Partition, that Topic as well as Partition How is it stored ?Partition You can also subdivide it into Segment, One partition In physics, there are many Segment form , So these Segment And what is it ? Let's find out one by one .

For the sake of explaining the problem , Suppose there is only one Kafka colony , And there is only one cluster Kafka Broker, There's only one physical machine . In this Kafka Broker Middle configuration log.dirs=/tmp/Kafka-logs, To set up Kafka Message file storage directory , At the same time, create one called topic_zzh_test Of Topic,Partition The quantity of is 4(kafka-topics.sh --create --zookeeper localhost:2181 --partitions 4 --topic topic_zzh_test --replication-factor 1). So we're in /tmp/Kafka-logs You can see the build in the directory 4 A catalog :

drwxr-xr-x 2 root root 4096 Apr 10 16:10 topic_vms_test-0
drwxr-xr-x 2 root root 4096 Apr 10 16:10 topic_vms_test-1
drwxr-xr-x 2 root root 4096 Apr 10 16:10 topic_vms_test-2
drwxr-xr-x 2 root root 4096 Apr 10 16:10 topic_vms_test-3

stay Kafka File storage , The same Topic There are many different Partition, Every Partiton For a directory ,Partition The name rule of is :topic name + Sequence number , The first number is from 0 Starting meter , The largest serial number is Partition Quantity reduction 1,Partition It's a concept in real physics , and Topic It's a logical concept .

above-mentioned Partition You can also subdivide it into Segment, This Segment What is it again? ? If you use Partition Is the minimum storage unit , We can imagine when Kafka Producer Keep sending messages , It is bound to cause Partition Unlimited expansion of documents , This has a serious impact on the maintenance of message files and the cleaning of messages that have been consumed , So here we are Segment As a unit, it will Partition To subdivide . Every Partition( Catalog ) It's equivalent to a huge file being evenly distributed to multiple equal sized Segment( paragraph ) In the data file ( Every Segment The number of messages in the file is not necessarily equal ) This feature is also convenient Old Segment The deletion of , That is, it is convenient to clean up the messages that have been consumed , Improve disk utilization . Every Partition Just need to support sequential read-write ,Segment The file life cycle is configured by the server (log.segment.bytes,log.roll.{ms,hours} And so on ) decision .

Segment The document consists of two parts , Respectively index Document and log file , Expressed as Segment Index files and data files . The command rule for these two files is :Partition First of all segment from 0 Start , Each of the following Segment The file name is last Segment Of the last message in the file offset value , The value is 64 position ,20 Digit character length , There's no number 0 fill , as follows :


On the surface of the above Segment File as an example , display Segment:00000000000000170410 Of index Document and log The corresponding relationship of the document , Here's the picture :


Pictured above ,index Index files store a lot of metadata ,log Data files store a lot of messages , The metadata in the index file points to the corresponding data file message The physical offset address of .

1. How to locate data location according to index file metadata ?
Such as :index Index file metadata [3,348], stay log The data file represents the second 3 A message , In the global partition In the said 170410+3=170413 A message , The message is in the corresponding log The physical offset address in the file is 348.
2. So how to get from partition Pass through offset lookup message Well ?
Such as : Read offset=170418 The news of , lookup segment file , among ,
α. 00000000000000000000.index For the first file ,
β. 00000000000000170410.index(start offset=170410+1=170411),
γ. 00000000000000239430.index(start offset=239430+1=239431),
therefore , location offset=170418 stay 00000000000000170410.index In index file . Other follow-up documents can be analogized in turn , Name and arrange these files by offset , Then according to the binary search method can quickly locate to the specific file location . secondly , according to 00000000000000170410.index In the document [8,1325] Locate the 00000000000000170410.log In the document 1325 The location to read .
3. So how do you know when to finish reading this message , Otherwise, I'll read the next message ?
Because messages have a fixed physical structure , Include :offset(8 Bytes)、 Size of message body (4 Bytes)、crc32(4 Bytes)、magic(1 Byte)、attributes(1 Byte)、key length(4 Bytes)、key(K Bytes)、payload(N Bytes) Wait for the fields , Can determine the size of a message , That is, where the read ends .

Replication principle and synchronization mode

Kafka in Topic Each Partition There is a pre written log file , although Partition It can be further subdivided into several Segment file , But for the upper application, you can use Partition Think of it as the smallest storage unit ( One has many Segment Document splicing “ Giant ” file ), Every Partition It's all made up of a series of ordered 、 Immutable message composition , These messages are continuously appended to Partition in .


There are two new nouns in the picture above :HW and LEO. Let's start with LEO,LogEndOffset Abbreviation , Represent each Partition Of log The last item Message The location of .HW yes HighWatermark Abbreviation , Refer to Consumer Can see this Partition The location of , This involves the concept of multiple copies , Let me first mention , In the next section, we will see more details .

Get down to business , To improve the reliability of messages ,Kafka Every Topic Of Partition Yes N Copies (replicas), among N(>=1) yes Topic The replication factor (replica fator) The number of .Kafka Automatic failover through multi replica mechanism , When Kafka One of the clusters Broker The service is still available in case of failure . stay Kafka When replication occurs in, make sure that Partition The logs of can be written to other nodes in an orderly manner ,N individual replicas in , One of them replica by Leader, Everything else is Follower, Leader Handle Partition All write requests for , meanwhile ,Follower Will passively and regularly copy Leader The data on the .

As shown in the figure below ,Kafka In the cluster has 4 individual Broker, some Topic Yes 3 individual Partition, And the replication factor is the number of copies 3:


Kafka Provide data replication algorithm guarantee , If Leader To fail or hang up , A new... Will be elected Leader, And accept the writing of client message .Kafka Make sure to select a replica from the list of synchronous replicas as Leader, Or say Follower Catch up with Leader data .Leader Responsible for maintenance and tracking ISR(In-Sync Replicas Abbreviation , The replica synchronization queue ) All in Follower A state of lag . When Producer Send a message to Broker after ,Leader Write the message and copy it to all Follower. After the message is submitted, it is successfully copied to all synchronous copies . Message replication latency is the slowest Follower Limit , It's important to quickly detect slow copies , If Follower“ backward ” Too much or failure ,leader Will take it from ISR Delete in .


In the last section we talked about ISR (In-Sync Replicas), This refers to the replica synchronization queue . The number of copies is right Kafka The throughput has a certain impact , But it greatly enhances usability . By default ,Kafka Of replica The number of 1, each Partition There's a unique one Leader, To ensure the reliability of the message , In general, its value will be ( from Broker Parameters of default.replication.factor Appoint ) The size is set to greater than 1, such as 3. All the copies (replicas) Collectively referred to as Assigned Replicas, namely AR.ISR yes AR A subset of , from Leader maintain ISR list ,Follower from Leader There are some delays in synchronizing data ( Including delay time replica.lag.time.max.ms And the number of delays replica.lag.max.messages Two dimensions , The latest version 0.10.x China only supports replica.lag.time.max.ms This dimension ), Any one that exceeds the threshold will put Follower Remove ISR, Deposit in OSR(Outof-Sync Replicas) list , Newly added Follower It will also be stored in OSR in .AR=ISR+OSR.

Kafka Removed after version replica.lag.max.messages Parameters , Just keep replica.lag.time.max.ms As ISR Parameters for replica management in . Why do you do this ?replica.lag.max.messages Indicates that a copy is currently behind Leader The number of messages for exceeds the value of this parameter , that Leader It will Follower from ISR Delete in . Suppose the settings replica.lag.max.messages=4, So if Producer Send once to Broker The number of messages is less than 4 Stripe time , Because in leader Accept to Producer After the message sent , and follower Before the copy starts pulling these messages ,follower backward leader No more than 4 Bar message , So there is no follower Removed from the ISR, So at this point replica.lag.max.message It seems reasonable to set . however Producer Initiate instantaneous peak traffic ,Producer More than 4 Stripe time , That's more than replica.lag.max.messages, here Follower Will be considered to be associated with Feader The copy is out of sync , And got kicked out ISR. But actually these Follower They are all alive and have no performance problems . Then catch up with Leader, And was rejoined ISR. And then there's a lot of them picking out ISR And then back to ISR, This undoubtedly increases the unnecessary loss of performance . And this parameter is Broker Overall . The settings are too big , The impact is really “ backward ”Follower The removal of ; It's too small , Lead to Follower In and out of . There is no way to give a suitable replica.lag.max.messages Value , therefore , The new version of the Kafka Removed this parameter .

Be careful :ISR It includes :Leader and Follower.

The previous section also deals with a concept , namely HW.HW Commonly known as high water level ,HighWatermark Abbreviation , Take a Partition Corresponding ISR The smallest of all LEO As HW,Consumer At most, we can only consume HW Where it is . And each of them replica There are HW,Leader and Follower They are responsible for updating their own HW The state of . about Leader Newly written messages ,Consumer No immediate consumption ,Leader Will wait for the message to be owned ISR Medium replicas Update after synchronization HW, Only then can the message be Consumer consumption . This ensures that if Leader Where Broker invalid , The news is still available from the newly elected Leader In order to get . For internal Broker Read requests for , No, HW The limitation of .

The following figure illustrates in detail when Producer Production news to Broker after ,ISR as well as HW and LEO The flow process of :


thus it can be seen ,Kafka The replication mechanism of is not a complete synchronous replication , It's not just asynchronous replication . in fact , Synchronous replication requires all working Follower It's all copied , This news will be commit, This mode of replication greatly affects throughput . And asynchronous replication ,Follower Asynchronously from Leader Copy the data , Data only needs to be Leader write in log Is considered to have been commit, In this case, if follower They haven't been copied yet , Behind the Leader when , All of a sudden Leader Downtime , You lose data . and Kafka This use of ISR The way to balance data loss and throughput is very good .

Kafka Of ISR Management will eventually feed back to Zookeeper Node . The specific location is :/brokers/topics/[topic]/partitions/[partition]/state. At present, there are two places where this Zookeeper For maintenance :

  1. Controller To maintain the :Kafka One of the clusters Broker Will be elected as Controller, Mainly responsible for partition Management and replica state management , It also performs a redistribution like partition Management tasks like that . Under certain conditions ,Controller Under the LeaderSelector Will elect a new Leader,ISR And the new leader_epoch And controller_epoch write in Zookeeper In the related nodes of . At the same time initiate LeaderAndIsrRequest Inform all replicas.
  2. Leader To maintain the :Leader There is a separate thread that detects ISR in Follower Whether to break away from ISR, If you find that ISR change , Then the new ISR The information returned to Zookeeper In the related nodes of .

Welcome to your attention ,《 The way of big data becoming God 》 Series articles

Welcome to your attention ,《 The way of big data becoming God 》 Series articles

Welcome to your attention ,《 The way of big data becoming God 》 Series articles

本文为[Wang Zhiwu]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云