Previous review
CommitLog piece ——【RocketMQ Source code analysis 】 Deep into message storage (1)
ConsumeQueue piece ——【RocketMQ Source code analysis 】 Deep into message storage (2)
The previous two articles have talked about how messages are stored in CommitLog, as well as ConsumeQueue Construction process of , In chapter three , We have a hurdle we have to cross ,MappedFile —— Memory file mapping .
MappedFile The existence of is RocketMQ The key to choosing to store messages directly to disk , In the first chapter CommitLog In the beginning of the storage process , I just wrote a thought .
- Both memory and local disk are used
- Filling and swapping
- Files mapped to memory
- Random read interface to access
Here are some key sentences , We can't do without what this article is going to say MappedFile.
RocketMQ Since you want to store files interactively with disk , Different IO Methods vary widely in performance gap , How to connect with disk efficiently / Memory interaction , It is an important sign of the strength of many middleware related to storage .
Implement an in-process queue based message persistence storage engine
This is the topic of Tianchi middleware competition a few years ago , The goal is to design a system that uses limited memory 、 More disk space to implement a message queue , In fact, the thinking has been mentioned in the first article , The point is that he requires the queue to support aggregation operations .

It reminds me of ElasticSearch The aggregation scene of , If you want to implement such a complex aggregation function , It's too Southern .
But the good thing is that the topic only requires adding messages in a specified period of time , This is nothing more than maintaining a message storage offset and time storage .
To learn more about memory file mapping , We can read the source code , This is relative to CommitLog、ConsumeQueue It's much lower , More about IO、Buffer、PageCache Such as knowledge .
From page table to zero copy
When I used to learn assembly language , There are two addressing related registers .
Segment register 、 Index register .
stay 8086 The s , The address bus is 20 position , But registers 16 position , Addressing capacity is limited , In order to ensure 1M The ability to address , It's going to be two 16 Bit registers are used together , In the form of segment base address and offset address , achieve 1M Addressing capability .
The idea is the same in OS protected mode , If we had one 32 Bit operating system , Memory 4GB.
Let's think about its memory layout , Kernel space and user space are well-known concepts , If the memory space doesn't do anything , In order, let's visit , The first big problem is memory isolation , How can two processes avoid memory pollution , And that leads to Java A problem of memory allocation in virtual machines , The allocated memory space is cleaned up by the garbage collector , The rest of the space may not be continuous , The next object that needs to occupy large memory may not be able to be stored ,JVM You can choose to recycle - Clean up in such a way that there is no debris , This is because there are references on the stack pointing to the heap , Even if a large object is moved, don't worry , But the operating system is different , If you want to use something like JVM Recycling - Clean up the way to reduce fragmented memory , The first problem we have to face is Change of address
, Subsequent processes may not find the target when addressing .
Notice here
Change of address
, Because we'll also talk about , Operating system PageCache Improper operation can also cause this problem .
There's another problem , This kind of sequential space is not safe , All processes can access each other's addresses , This is a common tool for some modifiers .
Based on the above questions , The operating system is mapped into protected mode , Adjust memory space to virtual memory based on page table , Separate from actual physical memory .
The current page table is usually a secondary page table , The so-called two-level page table is to paginate the page table again , All page table items in a page table are stored continuously , Page tables are essentially a bunch of data , It is also stored in memory in pages .
The first level is called the page table of contents . The physical address of each page table is shown in the page directory table as a page directory entry (PDE) In the form of ,4MB Page table can be divided into 1K(4MB/4KB) A page , The description of each page requires 4 Bytes , So the page table of contents takes up 4K size , It's exactly the size of a standard page , It points to the second level table . The high of the linear address 10 Bit produces the first level index , In the table entries obtained from the index , Designated and selected 1K One of the secondary tables is a page table .
The second level is called the page table , Store in a 4K The size of the page , contain 1K Table items , Each table entry contains the physical base address of a page . In the middle of a linear address 10 Bits produce a secondary index , You can get a page table entry that contains the physical address of the page . The height of this physical address 20 Bit and linear address of low 12 Bits form the final physical address .
With the page table can be a good division of process space , And reduce debris space , For a process , Theoretically, the maximum usable space is 4GB. Based on this , Most of the memory operations of the operating system are based on pages (4KB).
The mapping of virtual memory makes it more convenient for the operating system to manage and divide memory , The unit that actually maps the virtual address to the physical address is MMU,mmap Memory file mapping is the same , adopt MMU Map to file .
To solve the problem of disk IO The problem of inefficiency , The operating system adds space to the process space , Used for address mapping with disk files , This part of memory is also the virtual memory address , Operate this part of memory through the pointer , The system will automatically write the processed page back to the corresponding disk file location , You don't have to call the system read、write Such as function , Kernel space changes to this area also directly reflect user space , So we can share files among different processes .
This part of memory mapping needs to maintain a page table , For managing memory —— The mapping of file addresses , If the current virtual memory address cannot find the corresponding physical address , There will be so-called missing pages , When the page is missing, the system will set the page number in... According to the address offset PageCache Check whether the destination address has been cached in , If there is, point directly to the PageCache Address , If not, you need to load the target file into PageCache in .
adopt mmap The mapping function of , Can avoid IO operation , Direct access to memory , This is called zero copy technology .
I'll start with a few pictures IO To zero copy .
This is the most common file server file transfer process , First, read the file from the physical device to the kernel space in the kernel state , This is a direct memory copy , Then the user process needs to read the data from the kernel into the user process space , Complete the process of reading , This is a time CPU Copy , thus , The process of reading is complete , The process needs to send data to the client , At this time, it is necessary to put the data into the kernel space socket It's about , And then it's sent out through the protocol layer .
The whole process takes two CPU Copy 、 Two direct memory copies , You also need to switch between kernel mode and user mode .( The first one is : The four time )
The second model introduces mmap, Mapping between kernel space and user space , You can make socket The copy function can be completed by operating kernel space directly , There's no need to switch between kernel mode and user mode ,write The system call causes the kernel to copy data from the original kernel buffer to the kernel buffer associated with the socket .
This is the way to use mmap Instead of read, Although it seems to reduce the number of copies , But there are risks . When mapping a file to memory , And then call write, In another process write The same file , There will be a system error .( The second kind : Three times )
The third model , be based on Linux The introduction of sendfile system call , Not only does it reduce file copies , It also reduces system switching ,sendfile Can directly complete the copy process of kernel space , Copy from kernel space to socket space , Thus skipping user space .( The third kind of : Three times )
The fourth model , In kernel version 2.4 in , Yes sendfile optimized , Data can be sent directly from kernel space to the protocol processor , It also eliminates data copies to the socket area , No change for user level applications .( A fourth : two )
Sum up , In the process of data transmission, the data will not result in redundant copies , There will be no redundant backup in the kernel and user mode space , This is called zero copy technology , be based on sendfile And mmap.
Back to RocketMQ
MQ yes IO Big users ,MMap、FileChannel、RandomAccessFile yes MQ The most commonly used method of file manipulation .
RocketMQ Support MMap And FileChannel, By default MMap, stay PageCache When busy , Will use FileChannel, It can also be avoided PageCache Competitive lock .
stay MappedFile Class , You can see FileChannel And MappedByteBuffer Two variables , stay Java Code can be passed through FileChannel Of map Method to map the file to virtual memory .
stay MappedFile Of init It can also be seen in the method mmap Initialization process .
In the actual writing process , Operation of the buffer May be mmap It could be TransientStorePool Apply for direct memory , Avoid pages being swapped out to the swap area .
TransientStorePool Whether to enable depends on TransientStorePoolEnable determine , When opened , It means to use the memory outside the heap to store data preferentially , adopt Commit Thread to memory mapping Buffer in .
TransientStorePool Is a simple pooling class , It includes the size of the pool , The size of each cell , Storage unit queue and storage configuration class . Specific initialization operations can be performed in init Methods see recycling allocateDirect apply JVM Memory space outside , Compared with allocate Applied for JVM Memory in , Faster off heap memory operation , The process of copying data from outside the heap to inside the heap is eliminated .
After applying to memory , Got the memory address of the application .
Pointer pointer = new Pointer(address);
LibC.INSTANCE.mlock(pointer, new NativeLong(fileSize));
When you get the address , Create a pointer to that , Call the local link library method , Lock the memory of the address , Prevent release .
Sum up , I believe you've made a list of pages 、 file system IO I have a certain understanding of the operation .