[rocketmq source code analysis] in depth message storage (3)

AntzUhl 2021-04-08 11:19:45
rocketmq source code analysis depth

Previous review

CommitLog piece ——【RocketMQ Source code analysis 】 Deep into message storage (1)

ConsumeQueue piece ——【RocketMQ Source code analysis 】 Deep into message storage (2)

The previous two articles have talked about how messages are stored in CommitLog, as well as ConsumeQueue Construction process of , In chapter three , We have a hurdle we have to cross ,MappedFile —— Memory file mapping .

MappedFile The existence of is RocketMQ The key to choosing to store messages directly to disk , In the first chapter CommitLog In the beginning of the storage process , I just wrote a thought .

  1. Both memory and local disk are used
  2. Filling and swapping
  3. Files mapped to memory
  4. Random read interface to access

Here are some key sentences , We can't do without what this article is going to say MappedFile.

RocketMQ Since you want to store files interactively with disk , Different IO Methods vary widely in performance gap , How to connect with disk efficiently / Memory interaction , It is an important sign of the strength of many middleware related to storage .

Implement an in-process queue based message persistence storage engine

This is the topic of Tianchi middleware competition a few years ago , The goal is to design a system that uses limited memory 、 More disk space to implement a message queue , In fact, the thinking has been mentioned in the first article , The point is that he requires the queue to support aggregation operations .

It reminds me of ElasticSearch The aggregation scene of , If you want to implement such a complex aggregation function , It's too Southern .

But the good thing is that the topic only requires adding messages in a specified period of time , This is nothing more than maintaining a message storage offset and time storage .

To learn more about memory file mapping , We can read the source code , This is relative to CommitLog、ConsumeQueue It's much lower , More about IO、Buffer、PageCache Such as knowledge .

From page table to zero copy

When I used to learn assembly language , There are two addressing related registers .

Segment register 、 Index register .

stay 8086 The s , The address bus is 20 position , But registers 16 position , Addressing capacity is limited , In order to ensure 1M The ability to address , It's going to be two 16 Bit registers are used together , In the form of segment base address and offset address , achieve 1M Addressing capability .

The idea is the same in OS protected mode , If we had one 32 Bit operating system , Memory 4GB.

Let's think about its memory layout , Kernel space and user space are well-known concepts , If the memory space doesn't do anything , In order, let's visit , The first big problem is memory isolation , How can two processes avoid memory pollution , And that leads to Java A problem of memory allocation in virtual machines , The allocated memory space is cleaned up by the garbage collector , The rest of the space may not be continuous , The next object that needs to occupy large memory may not be able to be stored ,JVM You can choose to recycle - Clean up in such a way that there is no debris , This is because there are references on the stack pointing to the heap , Even if a large object is moved, don't worry , But the operating system is different , If you want to use something like JVM Recycling - Clean up the way to reduce fragmented memory , The first problem we have to face is Change of address , Subsequent processes may not find the target when addressing .

Notice here Change of address , Because we'll also talk about , Operating system PageCache Improper operation can also cause this problem .

There's another problem , This kind of sequential space is not safe , All processes can access each other's addresses , This is a common tool for some modifiers .

Based on the above questions , The operating system is mapped into protected mode , Adjust memory space to virtual memory based on page table , Separate from actual physical memory .

The current page table is usually a secondary page table , The so-called two-level page table is to paginate the page table again , All page table items in a page table are stored continuously , Page tables are essentially a bunch of data , It is also stored in memory in pages .

The first level is called the page table of contents . The physical address of each page table is shown in the page directory table as a page directory entry (PDE) In the form of ,4MB Page table can be divided into 1K(4MB/4KB) A page , The description of each page requires 4 Bytes , So the page table of contents takes up 4K size , It's exactly the size of a standard page , It points to the second level table . The high of the linear address 10 Bit produces the first level index , In the table entries obtained from the index , Designated and selected 1K One of the secondary tables is a page table .

The second level is called the page table , Store in a 4K The size of the page , contain 1K Table items , Each table entry contains the physical base address of a page . In the middle of a linear address 10 Bits produce a secondary index , You can get a page table entry that contains the physical address of the page . The height of this physical address 20 Bit and linear address of low 12 Bits form the final physical address .

With the page table can be a good division of process space , And reduce debris space , For a process , Theoretically, the maximum usable space is 4GB. Based on this , Most of the memory operations of the operating system are based on pages (4KB).

The mapping of virtual memory makes it more convenient for the operating system to manage and divide memory , The unit that actually maps the virtual address to the physical address is MMU,mmap Memory file mapping is the same , adopt MMU Map to file .

To solve the problem of disk IO The problem of inefficiency , The operating system adds space to the process space , Used for address mapping with disk files , This part of memory is also the virtual memory address , Operate this part of memory through the pointer , The system will automatically write the processed page back to the corresponding disk file location , You don't have to call the system read、write Such as function , Kernel space changes to this area also directly reflect user space , So we can share files among different processes .

This part of memory mapping needs to maintain a page table , For managing memory —— The mapping of file addresses , If the current virtual memory address cannot find the corresponding physical address , There will be so-called missing pages , When the page is missing, the system will set the page number in... According to the address offset PageCache Check whether the destination address has been cached in , If there is, point directly to the PageCache Address , If not, you need to load the target file into PageCache in .

adopt mmap The mapping function of , Can avoid IO operation , Direct access to memory , This is called zero copy technology .

I'll start with a few pictures IO To zero copy .

This is the most common file server file transfer process , First, read the file from the physical device to the kernel space in the kernel state , This is a direct memory copy , Then the user process needs to read the data from the kernel into the user process space , Complete the process of reading , This is a time CPU Copy , thus , The process of reading is complete , The process needs to send data to the client , At this time, it is necessary to put the data into the kernel space socket It's about , And then it's sent out through the protocol layer .

The whole process takes two CPU Copy 、 Two direct memory copies , You also need to switch between kernel mode and user mode .( The first one is : The four time )

The second model introduces mmap, Mapping between kernel space and user space , You can make socket The copy function can be completed by operating kernel space directly , There's no need to switch between kernel mode and user mode ,write The system call causes the kernel to copy data from the original kernel buffer to the kernel buffer associated with the socket .

This is the way to use mmap Instead of read, Although it seems to reduce the number of copies , But there are risks . When mapping a file to memory , And then call write, In another process write The same file , There will be a system error .( The second kind : Three times )

The third model , be based on Linux The introduction of sendfile system call , Not only does it reduce file copies , It also reduces system switching ,sendfile Can directly complete the copy process of kernel space , Copy from kernel space to socket space , Thus skipping user space .( The third kind of : Three times )

The fourth model , In kernel version 2.4 in , Yes sendfile optimized , Data can be sent directly from kernel space to the protocol processor , It also eliminates data copies to the socket area , No change for user level applications .( A fourth : two )

Sum up , In the process of data transmission, the data will not result in redundant copies , There will be no redundant backup in the kernel and user mode space , This is called zero copy technology , be based on sendfile And mmap.

Back to RocketMQ

MQ yes IO Big users ,MMap、FileChannel、RandomAccessFile yes MQ The most commonly used method of file manipulation .

RocketMQ Support MMap And FileChannel, By default MMap, stay PageCache When busy , Will use FileChannel, It can also be avoided PageCache Competitive lock .

stay MappedFile Class , You can see FileChannel And MappedByteBuffer Two variables , stay Java Code can be passed through FileChannel Of map Method to map the file to virtual memory .

stay MappedFile Of init It can also be seen in the method mmap Initialization process .

In the actual writing process , Operation of the buffer May be mmap It could be TransientStorePool Apply for direct memory , Avoid pages being swapped out to the swap area .

TransientStorePool Whether to enable depends on TransientStorePoolEnable determine , When opened , It means to use the memory outside the heap to store data preferentially , adopt Commit Thread to memory mapping Buffer in .

TransientStorePool Is a simple pooling class , It includes the size of the pool , The size of each cell , Storage unit queue and storage configuration class . Specific initialization operations can be performed in init Methods see recycling allocateDirect apply JVM Memory space outside , Compared with allocate Applied for JVM Memory in , Faster off heap memory operation , The process of copying data from outside the heap to inside the heap is eliminated .

After applying to memory , Got the memory address of the application .

Pointer pointer = new Pointer(address);
LibC.INSTANCE.mlock(pointer, new NativeLong(fileSize));

When you get the address , Create a pointer to that , Call the local link library method , Lock the memory of the address , Prevent release .

Sum up , I believe you've made a list of pages 、 file system IO I have a certain understanding of the operation .


  1. Java 学生成绩管理系统课程设计,附源码!
  2. Java arbitrary audio to MP3
  3. DNS of docker
  4. Docker - build log monitoring system
  5. SSM + MySQL + Maven + Shiro WMS
  6. Top 10 reasons to fall in love with java!
  7. 一本关于HTTP的恋爱日记
  8. 【RocketMQ源码分析】深入消息存储(3)
  9. SpringCloud+Nacos实现服务配置中心(Hoxton版本)
  10. SCIP:构造数据抽象--数据结构中队列与树的解释
  11. SCIP:构造过程抽象--面向对象的解释
  12. 使用 docker 进行 ElasticSearch + Kibana 集群搭建
  13. Spring IOC 特性有哪些,不会读不懂源码!
  14. [DB Bao 41] use of PMM -- monitoring mysql, PG, mongodb, proxysql, etc
  15. Spring Cloud 升级之路 - 2020.0.x - 3. Undertow 的 accesslog 配置
  16. [DB Bao 42] MySQL high availability architecture MHA + proxysql realizes read-write separation and load balancing
  17. [DataGuard] recovery of physical DG in case of losing archive files in main database (7)
  18. MyBatis(3)Map和模糊查询拓展
  19. 【TTS】AIX->Linux--基于RMAN(真实环境)
  20. 【TTS】传输表空间AIX->linux基于rman
  21. 【TTS】传输表空间AIX asm -> linux asm
  22. 【TTS】传输表空间Linux asm -> AIX asm
  23. 【DB宝40】MySQL高可用管理工具Orchestrator简介及测试
  24. 【TTS】传输表空间Linux ->AIX 基于rman
  25. 一本关于HTTP的恋爱日记
  26. 【RocketMQ源码分析】深入消息存储(3)
  27. SpringCloud+Nacos实现服务配置中心(Hoxton版本)
  28. SICP:构造过程抽象--面向对象的解释
  29. 3w 字长文爆肝 Java 基础面试题!太顶了!!!
  30. Spring Cloud 升级之路 - 2020.0.x - 3. Undertow 的 accesslog 配置
  31. win10卸载mysql5.7
  32. MySQL 批量插入,如何不插入重复数据?
  33. k8s cronjob应用示例
  34. 非常规方法,轻松应对Oracle数据库危急异常
  35. Oracle hang 之sqlplus -prelim使用方法
  36. 如何全文搜索oracle官方文档
  37. Java student achievement management system course design, with source code!
  38. win10安装mysql8.0
  39. 手把手教你写一个spring IOC容器
  40. JAVA 中的异常(1)- 基本概念
  41. A love diary about http
  42. navicat连接win10 mysql8.0 报错2059
  43. [rocketmq source code analysis] in depth message storage (3)
  44. Implementation of service configuration center with spring cloud + Nacos (Hoxton version)
  45. SCIP: constructing data abstraction -- Explanation of queue and tree in data structure
  46. SCIP: abstraction of construction process -- object oriented explanation
  47. Using docker to build elasticsearch + kibana cluster
  48. What are the spring IOC features? I can't understand the source code!
  49. Spring cloud upgrade road - 2020.0. X - 3. Accesslog configuration of undertow
  50. 导致Oracle性能抖动的参数提醒
  51. 风险提醒之Oracle RAC高可用失效
  52. 小机上运行Oracle需要注意的进程调度bug
  53. Oracle内存过度消耗风险提醒
  54. Oracle SQL monitor
  55. 使用Bifrost实现Mysql的数据同步
  56. 揭秘Oracle数据库truncate原理
  57. 看了此文,Oracle SQL优化文章不必再看!
  58. Mybatis (3) map and fuzzy query expansion
  59. Kafka性能篇:为何这么“快”?
  60. 两个高频设计类面试题:如何设计HashMap和线程池