Kafka performance: why so fast?

Fan Li 2021-04-08 11:55:42
kafka performance fast

Explain performance as Kafka The beginning of the journey , Let's learn more about Kafka “ fast ” The inside secret of . You can not only learn Kafka Various means of performance optimization , We can also refine various performance optimization methodologies , These methodologies can also be applied to our own projects , Help us write high performance projects .

Guan Gong and Qin Qiong

“65: Redis and Kafka It's totally different middleware , Is there any comparison ? ”

Yes , So this article is not about 《 The selection of distributed cache 》, Neither 《 Comparison of distributed middleware 》. We focus on the performance optimization of these two projects in different fields , Take a look at the common means of performance optimization for excellent projects , As well as the optimization methods for the characteristics of different scenes .

A lot of people learn a lot , Learned a lot about the framework , But when it comes to practical problems , But often feel the lack of knowledge . This is the failure to systematize the knowledge learned , There is no abstraction from the concrete implementation that can work methodology .

It's important to learn about open source projects inductive , Summarize the methodology of excellent implementation of different projects , then deductive Go to self practice .

The opening remarks

“ Margo : rational 、 objective 、 Prudence is a characteristic of programmers , And advantages , But a lot of times we also need to bring a little emotional , With a little impulse , This time can help us make decisions faster .「 Pessimists are right 、 Optimists succeed .」 I hope everyone is an optimistic problem solver . ”

Kafka Performance panorama

From a highly abstract point of view , Performance problems cannot escape from the following three aspects :

  • The Internet
  • disk
  • Complexity

about Kafka For this kind of network distributed queue , Network and disk are the top priority of optimization . Aiming at the above abstract problems , The solution is highly abstract and simple :

  • Concurrent
  • Compress
  • Batch
  • cache
  • Algorithm

I know the problems and ideas , Let's see , stay Kafka in , What are the roles , And these roles are the points that can be optimized :

  • Producer
  • Broker
  • Consumer

Yes , All the questions , Ideas , The optimization points have been listed , We can refine as much as possible , All three directions can be refined , such , All the implementation is clear at a glance , Even if you don't look Kafka The implementation of the , We can also think of one or two things that can be optimized .

That's the way of thinking . Raise questions > List the problem points > List optimization methods > List specific entry points > tradeoff And refine the implementation .

Now? , You can also think of your own optimization methods , You don't have to be perfect , It doesn't matter whether it's realized or not , Think about it a little bit .

“65 Brother : No way. , I'm stupid , Lazy, too , You'd better tell me directly , I'm good at whoring for nothing . ”

Sequential writing

“65 Brother : The somebody else Redis It's a pure memory based system , you kafka And read and write disks , Energy ratio ? ”

Why is disk writing slow ?

We can't just know the conclusion , And I don't know why . Answer that question , We have to go back to the operating system course we learned at school .65 Do you still have the textbook ? Come on , Turn to the chapter on disk , Let's review how disks work .

“65 Brother : There are still ghosts , There's no book in the middle of the class . If it wasn't for the exam, my eyes would be good , I guess I haven't graduated yet . ”

Look at the classic big picture :

Complete a disk IO, Need to go through Seek the way rotate and The data transfer Three steps .

Affect disk IO The performance factor also occurs in the above three steps , So the main time spent is :

  1. Seek time :Tseek It refers to the time required to move the read / write head to the correct track . The shorter the search time ,I/O The faster the operation , At present, the average seek time of the disk is generally 3-15ms.
  2. Rotation delay :Trotation It refers to the time required for disk rotation to move the sector in which the requested data is located to the bottom of the read-write disk . The rotation delay depends on the disk speed , It usually takes time to rotate a disk for one revolution 1/2 Express . such as :7200rpm The average disk rotation delay of is about 60*1000/7200/2 = 4.17ms, And the speed is 15000rpm The average rotation delay of the disk is 2ms.
  3. Data transfer time :Ttransfer It refers to the time required to complete the transmission of the requested data , It depends on the data rate , Its value is equal to the data size divided by the data transfer rate . at present IDE/ATA Can achieve 133MB/s,SATA II Accessible 300MB/s The interface data transfer rate , Data transmission time is usually far less than the time consumed by the first two parts . Simple calculation can be ignored .

therefore , If you leave it out when you write to disk Seek the way rotate It can greatly improve the performance of disk reading and writing .

Kafka use Sequential writing File to improve disk write performance . Sequential writing file , Basically reduced the number of disks Seek the way and rotate The number of times . The head doesn't have to dance on the track anymore , It's going all the way .

Kafka Each partition is an ordered , Immutable message sequence , New news is constantly added to Partition At the end of , stay Kafka in Partition It's just a logical concept ,Kafka take Partition Divided into multiple Segment, Every Segment Corresponding to a physical file ,Kafka Yes segment File append write , That's how to write files in sequence .

“65 Brother : Why? Kafka You can use the method of appending ? ”

This sum Kafka Is related to the nature of , Let's see Kafka and Redis, To put it bluntly ,Kafka It's just one. Queue, and Redis It's just one. HashMap.Queue and Map What's the difference ?

Queue yes FIFO Of , Data is in order ;HashMap The data is out of order , It's random reading and writing .Kafka The immutability of , Order makes Kafka You can write files by appending them .

In fact, many data systems that conform to the above characteristics , You can use add write to optimize disk performance . The typical ones are Redis Of AOF file , Of various databases WAL(Write ahead log) Mechanism and so on .

“ So clearly understand the characteristics of their own business , Can be targeted to make optimization . ”

Zero copy

“65 Brother : ha-ha , I was asked about this in the interview . It's a pity that the answer is not so good , alas . ”

What is zero copy ?

We from Kafka From the perspective of the scene ,Kafka Consumer Consumption is stored in Broker Disk data , Read from Broker Disk to network transmission to Consumer, What system interactions are involved in the process .Kafka Consumer from Broker Consumption data ,Broker Read Log, I used sendfile. If traditional IO Model , The pseudo code logic is as follows :


Pictured , If the traditional IO technological process , Read the network first IO, Write to disk again IO, You actually need to put the data Copy The four time .

  1. for the first time : Read disk files to the operating system kernel buffer ;
  2. The second time : The data in the kernel buffer is ,copy To the application buffer;
  3. The third step : Put the application buffer Data in ,copy To socket Network send buffer ;
  4. The fourth time : take socket buffer The data of ,copy To the network card , Network transmission by network card .

“65 Brother : ah , Is the operating system so stupid ?copy Come on copy Yes . ”

It's not the operating system , The design of the operating system is that each application has its own user memory , User memory is isolated from kernel memory , This is for the sake of program and system security , Otherwise, the memory of every application will be all over the place , Read and write at will. That's enough .

however , also Zero copy technology , english ——Zero-Copy. Zero copy Is to minimize the number of copies of the above data , So as to reduce the number of copies CPU expenses , Reduce the number of context switches in user mode and kernel mode , So as to optimize the performance of data transmission .

There are three common zero copy ideas :

  • direct I/O: Data directly across the kernel , In the user address space and I/O Between devices , The kernel only performs necessary auxiliary work such as virtual storage configuration ;
  • Avoid copying data between kernel and user space : When the application does not need to access the data , You can avoid copying data from kernel space to user space ;
  • When writing copy : Data doesn't need to be copied in advance , It's a partial copy when it needs to be modified .

Kafka Using the mmap and sendfile The way to achieve Zero copy . They correspond to each other Java Of MappedByteBuffer and FileChannel.transferTo.

Use Java NIO Realization Zero copy , as follows :


Under this model , The number of context switches is reduced to one . To be specific ,transferTo() Method indicates that the block device passes through DMA The engine reads the data into the read buffer . then , Copy the buffer to another kernel buffer for temporary storage to the socket . Last , Socket buffer passed through DMA Copied to the NIC buffer .

We reduced the number of copies from four to three , And only one of these copies involves CPU. We also reduced the number of context switches from four to two . This is a big improvement , But there's no zero copy query yet . When running Linux kernel 2.4 And later, as well as the network interface card supporting the collection operation , The latter can be realized as a further optimization . As shown below .

According to the previous example , call transferTo() The method will allow the device to pass through DMA The engine reads the data into the kernel read buffer . however , Use gather In operation , There is no copy between the read buffer and the socket buffer . In its place , to NIC A pointer to the read buffer and the offset and length , The offset and length are determined by DMA eliminate .CPU Never participate in copying buffers .

About Zero copy details , You can read this article in detail (Zero-copy) Analysis and application .


producer Production news to Broker when ,Broker Will use pwrite() system call 【 Corresponding to Java NIO Of FileChannel.write() API】 Write data by offset , At this point, the data will be written first page cache.consumer When consuming news ,Broker Use sendfile() system call 【 Corresponding FileChannel.transferTo() API】, Zero copy data from page cache Transferred to the broker Of Socket buffer, And then through the network .

leader And follower Synchronization between , With the above consumer The process of consuming data is the same .

page cache The data in the kernel will change with the data in the kernel flusher The scheduling of threads and the management of sync()/fsync() Call to write back to disk , Even if the process crashes , And don't worry about data loss . in addition , If consumer The news of consumption is not there page cache in , Will go to disk to read , By the way, it will read out some adjacent blocks and put them in page cache, To facilitate the next reading .

So if Kafka producer The rate of production is in line with consumer There is no big difference in the rate of consumption , Then you can almost rely on the right broker page cache Read and write to complete the whole production - Consumption process , Very little disk access .

A network model

“65 Brother : The Internet , As Java The programmer , Naturally Netty ”

Yes ,Netty yes JVM An excellent network framework in this field , Provides high performance network services . majority Java Programmers talk about network frameworks , The first thing that comes to mind Netty.Dubbo、Avro-RPC And so on, excellent frameworks use Netty As the underlying network communication framework .

Kafka I realized the network model to do RPC. Bottom based Java NIO, Adopt and Netty Same Reactor Threading model .

Reacotr The model is mainly divided into three roles

  • Reactor: hold IO Events are assigned to the corresponding handler Handle
  • Acceptor: Handle client connection Events
  • Handler: Dealing with non blocking tasks

In traditional blocking IO In the model , Each connection needs to be handled by a separate thread , When the concurrency is large , There are many threads created , Occupancy resources ; Use blocking IO Model , After the connection is established , If the current thread has no data to read , The thread will block the read operation , Waste of resources

For traditional blocking IO Two problems of the model ,Reactor The model is based on pooling , Avoid creating threads for each connection , After the connection is completed, the business processing is handed over to the thread pool for processing ; be based on IO Reuse model , Multiple connections share the same blocking object , Don't wait for all the connections . Traversal to new data can be processed , The operating system notifies the program , The thread jumps out of the blocking state , Do business logic processing

Kafka That is, based on Reactor The model implements multiplexing and processing thread pool . Its design is as follows :

It contains a Acceptor Threads , For handling new connections ,Acceptor Yes N individual Processor Threads select and read socket request ,N individual Handler Threads process requests and respond accordingly , That is to deal with business logic .

I/O Multiplexing can be done by combining multiple I/O The blocks are multiplexed to the same select On the block , Thus, the system can process multiple client requests at the same time in the case of single thread . Its biggest advantage is the small system overhead , And there's no need to create new processes or threads , It reduces the resource cost of the system .

summary : Kafka Broker Of KafkaServer Design is an excellent network architecture , I want to know Java Network programming , Or students who need to use this technology might as well read the source code . follow-up 『 Margo 』 Of Kafka The article series will also cover the interpretation of this source code .

Batch and compression

Kafka Producer towards Broker Sending a message is not the sending of a message . Have used Kafka You should know ,Producer There are two important parameters :batch.size and linger.ms. These two parameters are the same as Producer It's about bulk sending .

Kafka Producer The execution process of is shown in the figure below :

The messages are sent through the following processors in turn :

  • Serialize: Keys and values are serialized according to the serializer passed . Excellent serialization can improve the efficiency of network transmission .
  • Partition: Decide which partition of the topic to write messages to , By default, follow murmur2 Algorithm . Custom partition programs can also be passed to producers , To control which partition messages should be written to .
  • Compress: By default , stay Kafka Compression is not enabled in the producer .Compression Not only can it be transferred faster from producer to agent , It can also be replicated faster in the process of transmission . Compression helps improve throughput , Reduce latency and improve disk utilization .
  • Accumulate:Accumulate seeing the name of a thing one thinks of its function , It's a message accumulator . Its interior is for each Partition Maintain a Deque deque , The queue holds the batch data to be sent ,Accumulate Accumulate data to a certain amount , Or within a certain expiration time , The data is sent out in batches . Records are accumulated in the buffer of each partition of the subject . Grouping records according to the producer batch size attribute . There is a separate accumulator for each topic in the partition / buffer .
  • Group Send: Batches of partitions in the record accumulator are grouped by the agent to which they are sent . The records in the batch are based on batch.size and linger.ms Property to the agent . The record is sent by the producer based on two conditions . When the defined batch size or delay time is reached .

Kafka Support multiple compression algorithms :lz4、snappy、gzip.Kafka 2.1.0 Formal support ZStandard —— ZStandard yes Facebook Open source compression algorithm , Designed to provide ultra-high compression ratio (compression ratio), For details, see zstd.

Producer、Broker and Consumer Using the same compression algorithm , stay producer towards Broker Write data ,Consumer towards Broker You can read data without even decompressing it , In the end in Consumer Poll Unzip it when the message arrives , This saves a lot of network and disk overhead .

Partition concurrency

Kafka Of Topic It can be divided into several Partition, Every Paritition It's like a queue , Keep the data in order . The same Group The next difference Consumer Concurrent consumption Paritition, Partitioning is actually tuning Kafka The smallest unit of parallelism , therefore , so to speak , Every time you add one Paritition It adds a consumption concurrency .

Kafka Excellent partition allocation algorithm ——StickyAssignor, It can ensure that the partition allocation is as balanced as possible , And the result of each redistribution should be consistent with that of the last one . such , The partition of the whole cluster should be balanced as much as possible , each Broker and Consumer It's not going to tilt too much .

“65 Brother : The more partitions, the better ? ”

Of course not. .

More partitions require more file handles to be opened

stay kafka Of broker in , Each partition is compared to a directory in the file system . stay kafka In the data log file directory of , Each log segment is assigned two files , An index file and a data file . therefore , With partition Increase of , The number of file handles required has increased dramatically , If necessary, you need to adjust the number of file handles allowed to be opened by the operating system .

client / The more memory the server needs to use

client producer A parameter batch.size, The default is 16KB. It caches messages for each partition , Once it's full, it's packed and sent out in batches . It looks like it's a performance enhancing design . But obviously , Because this parameter is partition level , If you have more partitions , This part of the cache also takes up more memory .

Reduce high availability

The more partitions , Every Broker The more partitions are allocated on the , When one happens Broker Downtime , So the recovery time will be very long .

File structure

Kafka The message is Topic Categorize units , each Topic They are independent of each other , They don't influence each other . Every Topic It can also be divided into one or more partitions . Each partition has a log file to record message data .

Kafka Each partition log is physically divided into several... By size Segment.

  • segment file form : from 2 Most of the components , Respectively index file and data file, this 2 One file to one , Pairs appear , suffix ”.index” and “.log” Expressed as segment Index file 、 Data files .
  • segment File naming rules :partion First of all segment from 0 Start , Each of the following segment The file name is last segment Of the last message in the file offset value . The maximum value is 64 position long size ,19 Digit character length , There's no number 0 fill .

index Using sparse index , So each of them index File size is limited ,Kafka use mmap The way , Direct will index Files mapped to memory , That's right index You don't need to operate the disk for the operation of IO.mmap Of Java Implement correspondence MappedByteBuffer .

“65 Brother notes :mmap It's a way to map files in memory . Mapping a file or other object to the address space of a process , Realize the one-to-one mapping between the file disk address and the virtual address space of a process . After implementing such a mapping relationship , The process can read and write this section of memory by pointer , The system will automatically write back the dirty page to the corresponding file disk , That is, the operation on the file is completed without calling read,write Wait for the system to call the function . contrary , Kernel space changes to this area also directly reflect user space , So we can share files among different processes . ”

Kafka Make full use of dichotomy to find the corresponding offset The location of the message :

  1. Follow the dichotomy to find less than offset Of segment Of .log and .index
  2. Using goals offset Subtract... From the file name offset Get the news in this segment Offset in .
  3. Again, use dichotomy in index Find the corresponding index in the file .
  4. To log In file , In order to find , Until I find offset The corresponding message .


Kafka It's an excellent open source project . Its optimization on performance is incisive , It's a project worthy of our in-depth study . Whether it's thought or realization , We should all take a serious look , Think about it .

Kafka performance optimization :

  1. Zero copy networks and disks
  2. Excellent network model , be based on Java NIO
  3. Efficient file data structure design
  4. Parition Parallel and scalable
  5. Data batch transmission
  6. data compression
  7. Read and write disks sequentially
  8. Lockless lightweight offset

This article is from WeChat official account. - High performance server development (easyserverdev)

The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the yunjia_community@tencent.com Delete .

Original publication time : 2021-04-06

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

本文为[Fan Li]所创,转载请带上原文链接,感谢

  1. A love diary about http
  2. navicat连接win10 mysql8.0 报错2059
  3. [rocketmq source code analysis] in depth message storage (3)
  4. Implementation of service configuration center with spring cloud + Nacos (Hoxton version)
  5. SCIP: constructing data abstraction -- Explanation of queue and tree in data structure
  6. SCIP: abstraction of construction process -- object oriented explanation
  7. Using docker to build elasticsearch + kibana cluster
  8. What are the spring IOC features? I can't understand the source code!
  9. Spring cloud upgrade road - 2020.0. X - 3. Accesslog configuration of undertow
  10. 导致Oracle性能抖动的参数提醒
  11. 风险提醒之Oracle RAC高可用失效
  12. 小机上运行Oracle需要注意的进程调度bug
  13. Oracle内存过度消耗风险提醒
  14. Oracle SQL monitor
  15. 使用Bifrost实现Mysql的数据同步
  16. 揭秘Oracle数据库truncate原理
  17. 看了此文,Oracle SQL优化文章不必再看!
  18. Mybatis (3) map and fuzzy query expansion
  19. Kafka性能篇:为何这么“快”?
  20. 两个高频设计类面试题:如何设计HashMap和线程池
  21. [TTS] AIX - & gt; Linux -- Based on RMAN (real environment)
  22. 为什么学编程大部分人选Java编程语言?
  23. Redis 高可用篇:你管这叫 Sentinel 哨兵集群原理
  24. redis 为什么把简单的字符串设计成 SDS?
  25. [TTS] transfer table space AIX - & gt; Linux based on RMAN
  26. Linux 网卡数据收发过程分析
  27. Redis 高可用篇:你管这叫 Sentinel 哨兵集群原
  28. Redis 6.X Cluster 集群搭建
  29. [TTS] transfer table space AIX ASM - & gt; Linux ASM
  30. [TTS] transfer table space Linux ASM - & gt; AIX ASM
  31. 高性能通讯框架——Netty
  32. Brief introduction and test of orchestrator, a high availability management tool for MySQL
  33. [TTS] transfer table space Linux - & gt; AIX based on RMAN
  34. A love diary about http
  35. [rocketmq source code analysis] in depth message storage (3)
  36. Implementation of service configuration center with spring cloud + Nacos (Hoxton version)
  37. SiCp: abstraction of construction process -- object oriented explanation
  38. springboot网上点餐系统
  39. 【SPM】oracle如何固定执行计划
  40. 用好HugePage,告别Linux性能故障
  41. 3 W word long text, java basic interview questions! It's amazing!!!
  42. Spring cloud upgrade road - 2020.0. X - 3. Accesslog configuration of undertow
  43. Win10 uninstall mysql5.7
  44. CentOS下dotnet Core使用HttpWebRequest进行HTTP通讯,系统存在大量CLOSE_WAIT连接问题的分析,已解决。
  45. MySQL batch insert, how not to insert duplicate data?
  46. K8s cronjob application example
  47. Unconventional method, easy to deal with Oracle database critical exception
  48. How to use sqlplus - prelim in Oracle hang
  49. How to search Oracle official documents in full text
  50. Install mysql8.0 on win10
  51. Oracle OCR的备份与恢复
  52. Oracle kill session相关问题
  53. 《Oracle DBA工作笔记》第二章 常用工具和问题分析
  54. Oracle回收站及flashback drop
  55. Hand in hand to teach you to write a spring IOC container
  56. Exception in Java (1) - basic concept
  57. 3w 字长文爆肝 Java 基础面试题!太顶了!!!
  58. Error 2059 when Navicat connects to win10 mysql8.0
  59. Parameter reminder causing Oracle Performance jitter
  60. 「技术分享」Java线程状态间的互相转换看这个就行了