Linux memory management (1) physical memory initialization

sky-heaven 2020-11-09 12:00:46
linux memory management physical memory

from :

project :Linux Memory management topic

key word : User kernel space partition 、Node/Zone/Page、memblock、PGD/PUD/PMD/PTE、lowmem/highmem、ZONE_DMA/ZONE_NORMAL/ZONE_HIGHMEM、Watermark、MIGRATE_TYPES.


Physical memory is initialized with Linux Kernel initialization , At the same time, memory management is the foundation of many other functions . Coupled with various modules in the kernel .

Before initialization , understand Linux Memory management framework It helps to have a general image of memory management .

First , You need to know how the entire user and kernel space is divided (3:1、2:2), And then from Node->Zone->Page Initialization at the level of , Until the memory is available .

About Nodes、Zones、Pages The relationship between the three ,《ULVMM》 Figure 2.1 Introduce , although zone_mem_map One layer has been replaced , But it still reflects the hierarchical tree relationship between them .

pg_data_t Corresponding to one Node,node_zones It contains different Zone;Zone The next definition is per_cpu_pageset, take page and cpu binding .

 Copy code
 Copy code


1. User space and kernel space partition

32 position Linux in , The virtual address space consists of 4GB. Divide the entire virtual address space into user space + Kernel space , There are three kinds of :

 Copy code
prompt "Memory split"
depends on MMU
default VMSPLIT_3G
Select the desired split between kernel and user memory.
If you are not absolutely sure what you are doing, leave this
option alone!
config VMSPLIT_3G
bool "3G/1G user/kernel split"
config VMSPLIT_2G
bool "2G/2G user/kernel split"
config VMSPLIT_1G
bool "1G/3G user/kernel split"
default PHYS_OFFSET if !MMU
default 0x40000000 if VMSPLIT_1G
default 0x80000000 if VMSPLIT_2G
default 0xC0000000
 Copy code

  The result of this configuration is generated autoconf.h Defined #define CONFIG_PAGE_OFFSET 0xC0000000.

stay arch/arm/include/asm/memory.h in , It can be seen that PAGE_OFFSET It's the watershed between user space and kernel space . It's also the starting point for using kernel space .

 Copy code
/* PAGE_OFFSET - the virtual address of the start of the kernel image */
static inline phys_addr_t __virt_to_phys(unsigned long x)
return (phys_addr_t)x - PAGE_OFFSET + PHYS_OFFSET;
static inline unsigned long __phys_to_virt(phys_addr_t x)
 Copy code


 2. Get the physical memory size

All subsequent initialization and memory management are based on physical memory , So first get the physical address and size of the physical memory .

adopt DTS Get physical memory properties , Then parse and add to memblock In the subsystem .

 Copy code
memory@60000000 {
device_type = "memory";
reg = <0x60000000 0x40000000>;
 Copy code


According to the above dts, stay start_kernel-->setup_arch-->setup_machine_fdt-->early_init_dt_scan_nodes-->of_scan_flat_dt( Traverse Nodes)-->early_init_dt_scan_memory( Initializing a single memory Node).

The result is from DTS It is concluded that base size Namely 0x60000000 0x40000000.

 Copy code
int __init early_init_dt_scan_memory(unsigned long node, const char *uname,
int depth, void *data)
const char *type = of_get_flat_dt_prop(node, "device_type", NULL);----------------------------------device_type = "memory"
reg = of_get_flat_dt_prop(node, "linux,usable-memory", &l);
if (reg == NULL)
reg = of_get_flat_dt_prop(node, "reg", &l);---------------------------------------------------reg = <0x60000000 0x40000000>
if (reg == NULL)
return 0;
endp = reg + (l / sizeof(__be32));
pr_debug("memory scan node %s, reg size %d, data: %x %x %x %x,\n",
uname, l, reg[0], reg[1], reg[2], reg[3]);
while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
u64 base, size;
base = dt_mem_next_cell(dt_root_addr_cells, &reg);--------------------------------------------0x60000000
size = dt_mem_next_cell(dt_root_size_cells, &reg);--------------------------------------------0x40000000
early_init_dt_add_memory_arch(base, size);---------------------------------------------------- Conduct base, size Effectiveness check
return 0;
 Copy code


And then according to the resolution base/size, call early_init_dt_add_memory_arch-->memblock_add-->memblock_add_range Add the parsed physical memory to memblock In the subsystem .

 Copy code
struct memblock {
bool bottom_up; /* is bottom up direction? */
phys_addr_t current_limit;
struct memblock_type memory;------------------- Add physical memory area
struct memblock_type reserved;----------------- Add reserved memory area
struct memblock_type physmem;
 Copy code


memblock_add Used to add region To memblock.memory in ; There are many places in the kernel initialization phase ( For example, ha arm_memblock_init) Use memblock_reserve take region Add to memblock.reserved.

memblock_remove Used to put a region from memblock.memory Remove ,memblock_free Used to put a region from memblock.reserved Remove .

The addresses in this are all physical addresses , All the information is in memblock In this global variable .

 Copy code
int __init_memblock memblock_add_range(struct memblock_type *type,
phys_addr_t base, phys_addr_t size,
int nid, unsigned long flags)
bool insert = false;
phys_addr_t obase = base;
phys_addr_t end = base + memblock_cap_size(base, &size);
int i, nr_new;
if (!size)
return 0;
/* special case for empty array */
if (type->regions[0].size == 0) {
WARN_ON(type->cnt != 1 || type->total_size);
type->regions[0].base = base;
type->regions[0].size = size;
type->regions[0].flags = flags;
memblock_set_region_node(&type->regions[0], nid);
type->total_size = size;
return 0;
* The following is executed twice. Once with %false @insert and
* then with %true. The first counts the number of regions needed
* to accomodate the new area. The second actually inserts them.
 Copy code



In the kernel boot phase , There is also a need for memory management , But the partner system is not initialized at this time . Used in the early kernel bootmem Mechanism , As a memory allocator in the kernel initialization phase .

It was later used memblock As the kernel initialization phase, the memory allocator , For memory allocation and release .

CONFIG_NO_BOOTMEM Used to decide whether to use bootmem,Vexpress Can make , So use memblock As an initialization phase of the memory allocator .

because bootmem and memblock both API compatible , So the user doesn't feel it . Use memblock When compiling mm/nobootmem.c, call memblock.c Distributor interface in .


3. Physical memory mapping

Because it wasn't opened CONFIG_ARM_LPAE,Linux The page table uses two layers of mapping . therefore PGD->PUD->PMD->PTE In the middle of the PUD/PMD Omitted ,pmd_off_k The return value of is actually pgd_offset_k.

 Copy code
static inline pmd_t *pmd_off_k(unsigned long virt)
return pmd_offset(pud_offset(pgd_offset_k(virt), virt), virt);
#define pgd_index(addr) ((addr) >> PGDIR_SHIFT)
#define pgd_offset(mm, addr) ((mm)->pgd + pgd_index(addr))
/* to find an entry in a kernel page-table-directory */
#define pgd_offset_k(addr) pgd_offset(&init_mm, addr)-------- The actual is addr Move right PGDIR_SHIFT position , And then relative to init_mm.pgd namely swapper_pg_dir The migration .swapper_pg_dir It's where the kernel page tables are stored .
 Copy code


prepare_page_table Used to empty page table entries , In fact, three sections of address page entries have been cleared ,0~MODULES_VADDR、MODULES_VADDR~PAGE_OFFSET、0xef800000~VMALLOC_START.


 Copy code
static inline void prepare_page_table(void)
unsigned long addr;
phys_addr_t end;
* Clear out all the mappings below the kernel image.
for (addr = 0; addr < MODULES_VADDR; addr += PMD_SIZE)------------------------ eliminate 0~MODULES_VADDR Address segment primary page table .
/* The XIP kernel is mapped in the module area -- skip over it */
addr = ((unsigned long)_etext + PMD_SIZE - 1) & PMD_MASK;
for ( ; addr < PAGE_OFFSET; addr += PMD_SIZE)-------------------------------- eliminate MODULES_VADDR~PAGE_OFFSET Address segment primary page table .
* Find the end of the first block of lowmem.
end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
if (end >= arm_lowmem_limit)-------------------------------------------------end=0x60000000+0x40000000, arm_lowmem_limit=0x8f800000
end = arm_lowmem_limit;
* Clear out all the kernel space mappings, except for the first
* memory bank, up to the vmalloc region.
for (addr = __phys_to_virt(end);
addr < VMALLOC_START; addr += PMD_SIZE)--------------------------------- here end take 0x8f800000, To a virtual address 0xef800000. eliminate 0xef800000~VMALLOC_START Address segment primary page table .
 Copy code


  The real way to create a page table is in map_lowmem Created two interval mapping interval one 0x60000000~0x60800000(0xc0000000~0xc0800000) And interval two 0x60800000~0x8f800000(0xc0800000~0xef800000).

Interval one : Have read-write execution rights , Mainly used for storing Kernel Code data segment , It also includes swapper_pg_dir Content .

Interval two : With reading and writing , No execution is allowed , yes Normal Memory part .

It can be seen that the virtual to physical address mapping of these two intervals is linear , But there are two special pages at the end that are not linear maps .

 Copy code
static void __init map_lowmem(void)
struct memblock_region *reg;
phys_addr_t kernel_x_start = round_down(__pa(_stext), SECTION_SIZE);
phys_addr_t kernel_x_end = round_up(__pa(__init_end), SECTION_SIZE);--------------kernel_x_start=0x60000000, kernel_x_end=60800000
/* Map all the lowmem memory banks. */
for_each_memblock(memory, reg) {
phys_addr_t start = reg->base;
phys_addr_t end = start + reg->size;----------------------start=0x60000000, end=0x8f800000
struct map_desc map;
if (end > arm_lowmem_limit)
end = arm_lowmem_limit;------------------------------- because arm_lowmem_limit=0x8f800000, therefore end=0x8f800000
if (start >= end)
if (end < kernel_x_start) {
map.pfn = __phys_to_pfn(start);
map.virtual = __phys_to_virt(start);
map.length = end - start;
map.type = MT_MEMORY_RWX;
} else if (start >= kernel_x_end) {
map.pfn = __phys_to_pfn(start);
map.virtual = __phys_to_virt(start);
map.length = end - start;
map.type = MT_MEMORY_RW;
} else {
/* This better cover the entire kernel */
if (start < kernel_x_start) {
map.pfn = __phys_to_pfn(start);
map.virtual = __phys_to_virt(start);
map.length = kernel_x_start - start;
map.type = MT_MEMORY_RW;
map.pfn = __phys_to_pfn(kernel_x_start);
map.virtual = __phys_to_virt(kernel_x_start);
map.length = kernel_x_end - kernel_x_start;
map.type = MT_MEMORY_RWX;
create_mapping(&map);-------------- Create a virtual address 0xc0000000 - 0xc0800000 To the physical address 0x60000000 - 0x60800000 The mapping relation of , The attribute is MT_MEMORY_RWX.
if (kernel_x_end < end) {
map.pfn = __phys_to_pfn(kernel_x_end);
map.virtual = __phys_to_virt(kernel_x_end);
map.length = end - kernel_x_end;
map.type = MT_MEMORY_RW;
create_mapping(&map);---------- Create a virtual address 0xc0800000 - 0xef800000 To the physical address 0x60800000 - 0x8f800000 The mapping relation of , The attribute is MT_MEMORY_RW.
 Copy code


There is also a portion of memory mapped to devicemaps_init In the middle of , Yes vectors mapping :

MT_HIGH_VECTORS: Virtual address -0xffff0000~0xffff1000, The corresponding physical address is 0x8f7fe000~0x8f7ff000.

MT_LOW_VECTORS: Virtual address -0xffff1000~0xffff2000, The corresponding physical address is 0x8f7ff000~0x8f800000.


 Copy code
static void __init devicemaps_init(const struct machine_desc *mdesc)
struct map_desc map;
unsigned long addr;
void *vectors;
printk("%s\n", __func__);
* Allocate the vector page early.
vectors = early_alloc(PAGE_SIZE * 2);
for (addr = VMALLOC_START; addr; addr += PMD_SIZE)
* Map the kernel if it is XIP.
* It is always first in the modulearea.
map.pfn = __phys_to_pfn(CONFIG_XIP_PHYS_ADDR & SECTION_MASK);
map.virtual = MODULES_VADDR;
map.length = ((unsigned long)_etext - map.virtual + ~SECTION_MASK) & SECTION_MASK;
map.type = MT_ROM;
* Map the cache flushing regions.
map.pfn = __phys_to_pfn(FLUSH_BASE_PHYS);
map.virtual = FLUSH_BASE;
map.length = SZ_1M;
map.type = MT_CACHECLEAN;
map.pfn = __phys_to_pfn(FLUSH_BASE_PHYS + SZ_1M);
map.length = SZ_1M;
map.type = MT_MINICLEAN;
* Create a mapping for the machine vectors at the high-vectors
* location (0xffff0000). If we aren't using high-vectors, also
* create a mapping at the low-vectors virtual address.
map.pfn = __phys_to_pfn(virt_to_phys(vectors));
map.virtual = 0xffff0000;
map.length = PAGE_SIZE;
map.type = MT_HIGH_VECTORS;
map.type = MT_LOW_VECTORS;
create_mapping(&map);---------- Virtual address 0xffff0000 - 0xffff1000 Mapping to 0x8f7fe000 - 0x8f7ff000, The attribute is MT_HIGH_VECTORS.
if (!vectors_high()) {
map.virtual = 0;
map.length = PAGE_SIZE * 2;
map.type = MT_LOW_VECTORS;
create_mapping(&map);------ Virtual address 0xffff1000 - 0xffff2000 Mapping to 0x8f7ff000 - 0x8f800000, The attribute is MT_LOW_VECTORS.
/* Now create a kernel read-only mapping */
map.pfn += 1;
map.virtual = 0xffff0000 + PAGE_SIZE;
map.length = PAGE_SIZE;
map.type = MT_LOW_VECTORS;
* Ask the machine support to map in the statically mapped devices.
if (mdesc->map_io)
/* Reserve fixed i/o space in VMALLOC region */
* Finally flush the caches and tlb to ensure that we're in a
* consistent state wrt the writebuffer. This also ensures that
* any write-allocated cache lines in the vector page are written
* back. After this point, we can start to touch devices again.
 Copy code


void __init sanity_check_meminfo(void)

????? How can these pages be guaranteed not to be used for other purposes ?????

4. zone initialization

Memory management will be a memory Node Divided into several zone Conduct management , Definition zone Type in the enum zone_type in .

Vexpress It defines NORMAL and HIGHMEM Two kinds of ,zone The initialization of is in bootmem_init In the middle of . adopt find_limits Find the physical memory start frame number min_low_pfn、 End frame number max_pfn、NORMAL The end frame number of the region max_low_pfn.

 Copy code
void __init bootmem_init(void)
unsigned long min, max_low, max_high;
max_low = max_high = 0;
find_limits(&min, &max_low, &max_high);----------------------min_now_pfn=0x60000 max_low_pfn=0x8f800 max_pfn=0xa0000, Through global variables memblock pick up information
zone_sizes_init(min, max_low, max_high);--------------------- from min_low_pfn To max_low_pfn yes ZONE_NORMAL,max_low_pfn To max_pfn yes ZONE_HIGHMEM.
* This doesn't seem to be used by the Linux memory manager any
* more, but is used by ll_rw_block. If we can get rid of it, we
* also get rid of some of the stuff above as well.
min_low_pfn = min;
max_low_pfn = max_low;
max_pfn = max_high;
 Copy code


zone_sizes_init Calculate each of them zone Size and zone Between hole, And then call free_area_init_node Create memory node zone.

 Copy code
void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
unsigned long node_start_pfn, unsigned long *zholes_size)
pg_data_t *pgdat = NODE_DATA(nid);-------------------------------------------- obtain nid Corresponding Node data structure
unsigned long start_pfn = 0;
unsigned long end_pfn = 0;
/* pg_data_t should be reset to zero when it's allocated */
WARN_ON(pgdat->nr_zones || pgdat->classzone_idx);
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
calculate_node_totalpages(pgdat, start_pfn, end_pfn,
zones_size, zholes_size);--------------------------------------- Calculation Node Of page number ,1GB/4KB=262144
printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx, node_mem_map %08lx\n",
nid, (unsigned long)pgdat,
(unsigned long)pgdat->node_mem_map);
free_area_init_core(pgdat, start_pfn, end_pfn,
zones_size, zholes_size);---------------------------------------- Initialize one by one Node Medium Zone
static void __paginginit free_area_init_core(struct pglist_data *pgdat,
unsigned long node_start_pfn, unsigned long node_end_pfn,
unsigned long *zones_size, unsigned long *zholes_size)
enum zone_type j;
int nid = pgdat->node_id;
unsigned long zone_start_pfn = pgdat->node_start_pfn;
int ret;
pgdat->numabalancing_migrate_nr_pages = 0;
pgdat->numabalancing_migrate_next_window = jiffies;
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
unsigned long size, realsize, freesize, memmap_pages;
size = zone_spanned_pages_in_node(nid, j, node_start_pfn,
node_end_pfn, zones_size);
realsize = freesize = size - zone_absent_pages_in_node(nid, j,
* Adjust freesize so that it accounts for how much memory
* is used by this zone for memmap. This affects the watermark
* and per-cpu initialisations
memmap_pages = calc_memmap_size(size, realsize);-------------------------------- Calculation struct page The amount of space the province needs to spend .
if (!is_highmem_idx(j)) {-------------------------------------------------------HIGHMEM Do not calculate the mapping cost page number .
if (freesize >= memmap_pages) {
freesize -= memmap_pages;
if (memmap_pages)
" %s zone: %lu pages used for memmap\n",
zone_names[j], memmap_pages);
} else
" %s zone: %lu pages exceeds freesize %lu\n",
zone_names[j], memmap_pages, freesize);
/* Account for reserved pages */
if (j == 0 && freesize > dma_reserve) {
freesize -= dma_reserve;
printk(KERN_DEBUG " %s zone: %lu pages reserved\n",
zone_names[0], dma_reserve);
if (!is_highmem_idx(j))
nr_kernel_pages += freesize;
/* Charge for highmem memmap if there are enough kernel pages */
else if (nr_kernel_pages > memmap_pages * 2)
nr_kernel_pages -= memmap_pages;
nr_all_pages += freesize;
zone->spanned_pages = size;
zone->present_pages = realsize;
* Set an approximate value for lowmem here, it will be adjusted
* when the bootmem allocator frees pages into the buddy system.
* And all highmem pages will be managed by the buddy system.
zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
zone->node = nid;
zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
/ 100;
zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
zone->name = zone_names[j];
zone->zone_pgdat = pgdat;
/* For bootup, initialized properly in watermark setup */
mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
if (!size)
setup_usemap(pgdat, zone, zone_start_pfn, size);
ret = init_currently_empty_zone(zone, zone_start_pfn,
memmap_init(size, nid, j, zone_start_pfn);---------------------------------------
zone_start_pfn += size;
 Copy code


The result of the above function is as follows :

 Copy code
On node 0 totalpages: 262144------------------------------------------------262144*4KB=1GB
free_area_init_node: node 0, pgdat c0782480, node_mem_map eeffa000
Normal zone: 1520 pages used for memmap-----------------------------------struct page size 32Byte,194560*32B/4KB=1520Page
Normal zone: 0 pages reserved
Normal zone: 194560 pages, LIFO batch:31----------------------------------194560*4KB=760MB
HighMem zone: 67584 pages, LIFO batch:15----------------------------------67584*4KB=264MB
 Copy code

  therefore ZONE_NORMAL The corresponding physical address is 0x60000000 - 0x8f800000,ZONE_HIGHMEM The corresponding physical address is 0x8f800000 - 0xa0000000.




Every zone The water level will be calculated during system initialization :WMARK_MIN、WMARK_LOW、WMARK_HIGH. These parameters are in kswapd When reclaiming page memory .

 Copy code
enum zone_watermarks {
#define min_wmark_pages(z) (z->watermark[WMARK_MIN])
#define low_wmark_pages(z) (z->watermark[WMARK_LOW])
#define high_wmark_pages(z) (z->watermark[WMARK_HIGH])
struct zone {
/* Read-mostly fields */
/* zone watermarks, access with *_wmark_pages(zone) macros */
unsigned long watermark[NR_WMARK];
 Copy code

An important parameter for calculating water level min_free_kbytes Is in init_per_zone_wmark_min In the :


 Copy code
module_init(init_per_zone_wmark_min)------------------------------------------ Calculation min_free_kbytes=3489
__setup_per_zone_wmarks-->-------------------------------------------- Calculation WMARK_HIGH/WMARK_LOW
* Initialise min_free_kbytes.
* For small machines we want it small (128k min). For large machines
* we want it large (64MB max). But it is not linear, because network
* bandwidth does not increase linearly with machine size. We use
* min_free_kbytes = 4 * sqrt(lowmem_kbytes), for better accuracy:
* min_free_kbytes = sqrt(lowmem_kbytes * 16)
* which yields
* 16MB: 512k
* 32MB: 724k
* 64MB: 1024k
* 128MB: 1448k
* 256MB: 2048k
* 512MB: 2896k
* 1024MB: 4096k
* 2048MB: 5792k
* 4096MB: 8192k
* 8192MB: 11584k
* 16384MB: 16384k
int __meminit init_per_zone_wmark_min(void)
unsigned long lowmem_kbytes;
int new_min_free_kbytes;
lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);-------- be equal to lowmem_kbytes=761100
new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);----------------761100*16 Square root =3489.
if (new_min_free_kbytes > user_min_free_kbytes) {------------------user_min_free_kbytes=-1, therefore min_free_kbytes=3489. accord with [128B, 64MB]
min_free_kbytes = new_min_free_kbytes;
if (min_free_kbytes < 128)
min_free_kbytes = 128;
if (min_free_kbytes > 65536)
min_free_kbytes = 65536;
} else {
pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
new_min_free_kbytes, user_min_free_kbytes);
return 0;
 Copy code


The water level is calculated by __setup_per_zone_wmarks Accomplished :

 Copy code
static void __setup_per_zone_wmarks(void)
unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);------------min_free_kbytes=3489, therefore pages_min=3489/2=872
unsigned long lowmem_pages = 0;
struct zone *zone;
unsigned long flags;
/* Calculate total number of !ZONE_HIGHMEM pages */
for_each_zone(zone) {
if (!is_highmem(zone))
lowmem_pages += zone->managed_pages;----------------------------- Only calculate lowmem, therefore lowmem_pages=190273
for_each_zone(zone) {
u64 tmp;
spin_lock_irqsave(&zone->lock, flags);
tmp = (u64)pages_min * zone->managed_pages;
do_div(tmp, lowmem_pages);------------------------------------------Normal:tmp=872*190273/190273=872;Highmem:tmp=872*67584/190273=309
if (is_highmem(zone)) {
* __GFP_HIGH and PF_MEMALLOC allocations usually don't
* need highmem pages, so cap pages_min to a small
* value here.
* deltas controls asynch page reclaim, and so should
* not be capped for highmem.
unsigned long min_pages;
min_pages = zone->managed_pages / 1024;
min_pages = clamp(min_pages, SWAP_CLUSTER_MAX, 128UL);
zone->watermark[WMARK_MIN] = min_pages;-------------------------Highmen:min_pages=67584/1024=66
} else {
* If it's a lowmem zone, reserve a number of pages
* proportionate to the zone's size.
zone->watermark[WMARK_MIN] = tmp;--------------------------------Normal:872
zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + (tmp >> 2);----Normal:872+872/4=1090;Highmem:66+309/4=143
zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1);----Normal:872+872/2=1308;Highmem:66+309/2=220
__mod_zone_page_state(zone, NR_ALLOC_BATCH,
high_wmark_pages(zone) - low_wmark_pages(zone) -
spin_unlock_irqrestore(&zone->lock, flags);
/* update totalreserve_pages */
 Copy code


Print each zone The information is as follows :

Normal min=872 low=1090 high=1308 zone_start_pfn=393216 managed_pages=190273 spanned_pages=194560 present_pages=194560---- size :0x2f800*4KB=760MB, You can use :190273 Page
HighMem min=66 low=143 high=220 zone_start_pfn=587776 managed_pages=67584 spanned_pages=67584 present_pages=67584--------- size :0x10800*4KB=264MB, You can use :67584 Pagepresent_pages=67584Movable min=32 low=32 high=32 zone_start_pfn=0 managed_pages=0 spanned_pages=0 present_pages=0



5. Physical memory initialization

Physical memory pages need to be added to the partner system , Partner system is a dynamic storage management method . When a user applies , Allocate a memory block of the right size , On the contrary, reclaim the memory block when it is released .

The management of free pages assigned by the partner system is based on two properties : The size of the page ,2 Of order The next power page ; And the migration type of the page .

 Copy code
struct zone {
* Flags for a pageblock_nr_pages block. See pageblock-flags.h.
* In SPARSEMEM, this map is stored in struct mem_section
unsigned long *pageblock_flags;-----------------------------zone in pageblock Corresponding MIGRATE_TYPE
#endif /* CONFIG_SPARSEMEM */...
/* free areas of different sizes */
struct free_area free_area[MAX_ORDER];--------------------------- according to order Distinguished free page block list
 Copy code



 Copy code
enum {
MIGRATE_UNMOVABLE,-------------------- The contents of the page box cannot be moved , The location must be fixed in memory , Can't move anywhere else , Most of the pages allocated by the core kernel fall into this category .
MIGRATE_RECLAIMABLE,------------------ The contents of the page frame can be recycled , Can't move directly . Because you can also rebuild pages from certain sources , For example, the data of the mapping file belongs to this category ,kswapd According to certain rules , Recycle these pages periodically .
MIGRATE_MOVABLE,---------------------- The contents of the page box can be moved , A page that belongs to a user space application belongs to this type of page , They are mapped through page tables , So just update the page table entries , And copy the data to a new location . Of course, pay attention to , A page may be shared by multiple processes , Corresponding to multiple page table entries .
MIGRATE_PCPTYPES, /* the number of types on the pcp lists */----- Used to express every CPU The number of migration types of linked lists in the data structure of the page frame cache .
* MIGRATE_CMA migration type is designed to mimic the way
* ZONE_MOVABLE works. Only movable pages can be allocated
* from MIGRATE_CMA pageblocks and page allocator never
* implicitly change migration type of MIGRATE_CMA pageblock.
* The way to use it is to change migratetype of a range of
* pageblocks to MIGRATE_CMA which can be done by
* __free_pageblock_cma() function. What is important though
* is that a range of pageblocks must be aligned to
* MAX_ORDER_NR_PAGES should biggest page be bigger then
* a single pageblock.
MIGRATE_CMA,------------------------ Reserve some memory for the driver , But when the driver is not in use , Partner systems can be allocated to user processes for anonymous memory or page caching . And when the driver needs to be used , The memory occupied by the process will be reclaimed or migrated to free up the reserved memory previously occupied , For drive use .
MIGRATE_ISOLATE, /* can't allocate from here */----------------- Page boxes cannot be assigned from this list , Because this list is specifically for NUMA Nodes move physical memory pages , Move the physical memory content to the most frequently used page CPU.
 Copy code




 Copy code
/* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
#define pageblock_order (MAX_ORDER-1)
#define pageblock_nr_pages (1UL << pageblock_order)
 Copy code






Contact information

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云