Linux memory management (1) physical memory initialization

sky-heaven 2020-11-09 12:00:46
linux memory management physical memory


from :https://www.cnblogs.com/arnoldlu/p/8060121.html

project :Linux Memory management topic

key word : User kernel space partition 、Node/Zone/Page、memblock、PGD/PUD/PMD/PTE、lowmem/highmem、ZONE_DMA/ZONE_NORMAL/ZONE_HIGHMEM、Watermark、MIGRATE_TYPES.

 

Physical memory is initialized with Linux Kernel initialization , At the same time, memory management is the foundation of many other functions . Coupled with various modules in the kernel .

Before initialization , understand Linux Memory management framework It helps to have a general image of memory management .

First , You need to know how the entire user and kernel space is divided (3:1、2:2), And then from Node->Zone->Page Initialization at the level of , Until the memory is available .

About Nodes、Zones、Pages The relationship between the three ,《ULVMM》 Figure 2.1 Introduce , although zone_mem_map One layer has been replaced , But it still reflects the hierarchical tree relationship between them .

pg_data_t Corresponding to one Node,node_zones It contains different Zone;Zone The next definition is per_cpu_pageset, take page and cpu binding .

 Copy code
start_kernel-->
page_address_init
setup_arch-->setup_machine_fdt-->early_init_dt_scan_nodes-->early_init_dt_scan_memory-->early_init_dt_add_memory_arch-->memblock_add
init_mm
early_paging_init
setup_dma_zone
sanity_check_meminfo
arm_memblock_init
paging_init
mm_init_cpumask
build_all_zonelists-------------------------
page_alloc_init
vfs_caches_init_early
mm_init
kmem_cache_init_late
debug_objects_mem_init
kmemleak_init
setup_per_cpu_pageset
numa_policy_init
anon_vma_init
page_writeback_init
 Copy code

 

1. User space and kernel space partition

32 position Linux in , The virtual address space consists of 4GB. Divide the entire virtual address space into user space + Kernel space , There are three kinds of :

 Copy code
choice
prompt "Memory split"
depends on MMU
default VMSPLIT_3G
help
Select the desired split between kernel and user memory.
If you are not absolutely sure what you are doing, leave this
option alone!
config VMSPLIT_3G
bool "3G/1G user/kernel split"
config VMSPLIT_2G
bool "2G/2G user/kernel split"
config VMSPLIT_1G
bool "1G/3G user/kernel split"
endchoice
config PAGE_OFFSET
hex
default PHYS_OFFSET if !MMU
default 0x40000000 if VMSPLIT_1G
default 0x80000000 if VMSPLIT_2G
default 0xC0000000
 Copy code

  The result of this configuration is generated autoconf.h Defined #define CONFIG_PAGE_OFFSET 0xC0000000.

stay arch/arm/include/asm/memory.h in , It can be seen that PAGE_OFFSET It's the watershed between user space and kernel space . It's also the starting point for using kernel space .

 Copy code
/* PAGE_OFFSET - the virtual address of the start of the kernel image */
#define PAGE_OFFSET UL(CONFIG_PAGE_OFFSET)
static inline phys_addr_t __virt_to_phys(unsigned long x)
{
return (phys_addr_t)x - PAGE_OFFSET + PHYS_OFFSET;
}
static inline unsigned long __phys_to_virt(phys_addr_t x)
{
return x - PHYS_OFFSET + PAGE_OFFSET;
}
 Copy code

 

 2. Get the physical memory size

All subsequent initialization and memory management are based on physical memory , So first get the physical address and size of the physical memory .

adopt DTS Get physical memory properties , Then parse and add to memblock In the subsystem .

 Copy code
arch/arm/boot/dts/vexpress-v2p-ca9.dts:
memory@60000000 {
device_type = "memory";
reg = <0x60000000 0x40000000>;
};
 Copy code

 

According to the above dts, stay start_kernel-->setup_arch-->setup_machine_fdt-->early_init_dt_scan_nodes-->of_scan_flat_dt( Traverse Nodes)-->early_init_dt_scan_memory( Initializing a single memory Node).

The result is from DTS It is concluded that base size Namely 0x60000000 0x40000000.

 Copy code
int __init early_init_dt_scan_memory(unsigned long node, const char *uname,
int depth, void *data)
{
const char *type = of_get_flat_dt_prop(node, "device_type", NULL);----------------------------------device_type = "memory"
...
reg = of_get_flat_dt_prop(node, "linux,usable-memory", &l);
if (reg == NULL)
reg = of_get_flat_dt_prop(node, "reg", &l);---------------------------------------------------reg = <0x60000000 0x40000000>
if (reg == NULL)
return 0;
endp = reg + (l / sizeof(__be32));
pr_debug("memory scan node %s, reg size %d, data: %x %x %x %x,\n",
uname, l, reg[0], reg[1], reg[2], reg[3]);
while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
u64 base, size;
base = dt_mem_next_cell(dt_root_addr_cells, &reg);--------------------------------------------0x60000000
size = dt_mem_next_cell(dt_root_size_cells, &reg);--------------------------------------------0x40000000
...
early_init_dt_add_memory_arch(base, size);---------------------------------------------------- Conduct base, size Effectiveness check
}
return 0;
}
 Copy code

 

And then according to the resolution base/size, call early_init_dt_add_memory_arch-->memblock_add-->memblock_add_range Add the parsed physical memory to memblock In the subsystem .

 Copy code
struct memblock {
bool bottom_up; /* is bottom up direction? */
phys_addr_t current_limit;
struct memblock_type memory;------------------- Add physical memory area
struct memblock_type reserved;----------------- Add reserved memory area
#ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
struct memblock_type physmem;
#endif
};
 Copy code

 

memblock_add Used to add region To memblock.memory in ; There are many places in the kernel initialization phase ( For example, ha arm_memblock_init) Use memblock_reserve take region Add to memblock.reserved.

memblock_remove Used to put a region from memblock.memory Remove ,memblock_free Used to put a region from memblock.reserved Remove .

The addresses in this are all physical addresses , All the information is in memblock In this global variable .

 Copy code
int __init_memblock memblock_add_range(struct memblock_type *type,
phys_addr_t base, phys_addr_t size,
int nid, unsigned long flags)
{
bool insert = false;
phys_addr_t obase = base;
phys_addr_t end = base + memblock_cap_size(base, &size);
int i, nr_new;
if (!size)
return 0;
/* special case for empty array */
if (type->regions[0].size == 0) {
WARN_ON(type->cnt != 1 || type->total_size);
type->regions[0].base = base;
type->regions[0].size = size;
type->regions[0].flags = flags;
memblock_set_region_node(&type->regions[0], nid);
type->total_size = size;
return 0;
}
repeat:
/*
* The following is executed twice. Once with %false @insert and
* then with %true. The first counts the number of regions needed
* to accomodate the new area. The second actually inserts them.
*/
...
}
 Copy code

 

memblock

In the kernel boot phase , There is also a need for memory management , But the partner system is not initialized at this time . Used in the early kernel bootmem Mechanism , As a memory allocator in the kernel initialization phase .

It was later used memblock As the kernel initialization phase, the memory allocator , For memory allocation and release .

CONFIG_NO_BOOTMEM Used to decide whether to use bootmem,Vexpress Can make , So use memblock As an initialization phase of the memory allocator .

because bootmem and memblock both API compatible , So the user doesn't feel it . Use memblock When compiling mm/nobootmem.c, call memblock.c Distributor interface in .

 

3. Physical memory mapping

Because it wasn't opened CONFIG_ARM_LPAE,Linux The page table uses two layers of mapping . therefore PGD->PUD->PMD->PTE In the middle of the PUD/PMD Omitted ,pmd_off_k The return value of is actually pgd_offset_k.

 Copy code
arch\arm\mm\mm.h:
static inline pmd_t *pmd_off_k(unsigned long virt)
{
return pmd_offset(pud_offset(pgd_offset_k(virt), virt), virt);
}
arch\arm\include\asm\pgtable.h:
#define pgd_index(addr) ((addr) >> PGDIR_SHIFT)
#define pgd_offset(mm, addr) ((mm)->pgd + pgd_index(addr))
/* to find an entry in a kernel page-table-directory */
#define pgd_offset_k(addr) pgd_offset(&init_mm, addr)-------- The actual is addr Move right PGDIR_SHIFT position , And then relative to init_mm.pgd namely swapper_pg_dir The migration .swapper_pg_dir It's where the kernel page tables are stored .
 Copy code

 

prepare_page_table Used to empty page table entries , In fact, three sections of address page entries have been cleared ,0~MODULES_VADDR、MODULES_VADDR~PAGE_OFFSET、0xef800000~VMALLOC_START.

 

 Copy code
static inline void prepare_page_table(void)
{
unsigned long addr;
phys_addr_t end;
/*
* Clear out all the mappings below the kernel image.
*/
for (addr = 0; addr < MODULES_VADDR; addr += PMD_SIZE)------------------------ eliminate 0~MODULES_VADDR Address segment primary page table .
pmd_clear(pmd_off_k(addr));
#ifdef CONFIG_XIP_KERNEL
/* The XIP kernel is mapped in the module area -- skip over it */
addr = ((unsigned long)_etext + PMD_SIZE - 1) & PMD_MASK;
#endif
for ( ; addr < PAGE_OFFSET; addr += PMD_SIZE)-------------------------------- eliminate MODULES_VADDR~PAGE_OFFSET Address segment primary page table .
pmd_clear(pmd_off_k(addr));
/*
* Find the end of the first block of lowmem.
*/
end = memblock.memory.regions[0].base + memblock.memory.regions[0].size;
if (end >= arm_lowmem_limit)-------------------------------------------------end=0x60000000+0x40000000, arm_lowmem_limit=0x8f800000
end = arm_lowmem_limit;
/*
* Clear out all the kernel space mappings, except for the first
* memory bank, up to the vmalloc region.
*/
for (addr = __phys_to_virt(end);
addr < VMALLOC_START; addr += PMD_SIZE)--------------------------------- here end take 0x8f800000, To a virtual address 0xef800000. eliminate 0xef800000~VMALLOC_START Address segment primary page table .
pmd_clear(pmd_off_k(addr));
}
 Copy code

 

  The real way to create a page table is in map_lowmem Created two interval mapping interval one 0x60000000~0x60800000(0xc0000000~0xc0800000) And interval two 0x60800000~0x8f800000(0xc0800000~0xef800000).

Interval one : Have read-write execution rights , Mainly used for storing Kernel Code data segment , It also includes swapper_pg_dir Content .

Interval two : With reading and writing , No execution is allowed , yes Normal Memory part .

It can be seen that the virtual to physical address mapping of these two intervals is linear , But there are two special pages at the end that are not linear maps .

 Copy code
static void __init map_lowmem(void)
{
struct memblock_region *reg;
phys_addr_t kernel_x_start = round_down(__pa(_stext), SECTION_SIZE);
phys_addr_t kernel_x_end = round_up(__pa(__init_end), SECTION_SIZE);--------------kernel_x_start=0x60000000, kernel_x_end=60800000
/* Map all the lowmem memory banks. */
for_each_memblock(memory, reg) {
phys_addr_t start = reg->base;
phys_addr_t end = start + reg->size;----------------------start=0x60000000, end=0x8f800000
struct map_desc map;
if (end > arm_lowmem_limit)
end = arm_lowmem_limit;------------------------------- because arm_lowmem_limit=0x8f800000, therefore end=0x8f800000
if (start >= end)
break;
if (end < kernel_x_start) {
map.pfn = __phys_to_pfn(start);
map.virtual = __phys_to_virt(start);
map.length = end - start;
map.type = MT_MEMORY_RWX;
create_mapping(&map);
} else if (start >= kernel_x_end) {
map.pfn = __phys_to_pfn(start);
map.virtual = __phys_to_virt(start);
map.length = end - start;
map.type = MT_MEMORY_RW;
create_mapping(&map);
} else {
/* This better cover the entire kernel */
if (start < kernel_x_start) {
map.pfn = __phys_to_pfn(start);
map.virtual = __phys_to_virt(start);
map.length = kernel_x_start - start;
map.type = MT_MEMORY_RW;
create_mapping(&map);
}
map.pfn = __phys_to_pfn(kernel_x_start);
map.virtual = __phys_to_virt(kernel_x_start);
map.length = kernel_x_end - kernel_x_start;
map.type = MT_MEMORY_RWX;
create_mapping(&map);-------------- Create a virtual address 0xc0000000 - 0xc0800000 To the physical address 0x60000000 - 0x60800000 The mapping relation of , The attribute is MT_MEMORY_RWX.
if (kernel_x_end < end) {
map.pfn = __phys_to_pfn(kernel_x_end);
map.virtual = __phys_to_virt(kernel_x_end);
map.length = end - kernel_x_end;
map.type = MT_MEMORY_RW;
create_mapping(&map);---------- Create a virtual address 0xc0800000 - 0xef800000 To the physical address 0x60800000 - 0x8f800000 The mapping relation of , The attribute is MT_MEMORY_RW.
}
}
}
}
 Copy code

 

There is also a portion of memory mapped to devicemaps_init In the middle of , Yes vectors mapping :

MT_HIGH_VECTORS: Virtual address -0xffff0000~0xffff1000, The corresponding physical address is 0x8f7fe000~0x8f7ff000.

MT_LOW_VECTORS: Virtual address -0xffff1000~0xffff2000, The corresponding physical address is 0x8f7ff000~0x8f800000.

 

 Copy code
static void __init devicemaps_init(const struct machine_desc *mdesc)
{
struct map_desc map;
unsigned long addr;
void *vectors;
printk("%s\n", __func__);
/*
* Allocate the vector page early.
*/
vectors = early_alloc(PAGE_SIZE * 2);
early_trap_init(vectors);
for (addr = VMALLOC_START; addr; addr += PMD_SIZE)
pmd_clear(pmd_off_k(addr));
/*
* Map the kernel if it is XIP.
* It is always first in the modulearea.
*/
#ifdef CONFIG_XIP_KERNEL
map.pfn = __phys_to_pfn(CONFIG_XIP_PHYS_ADDR & SECTION_MASK);
map.virtual = MODULES_VADDR;
map.length = ((unsigned long)_etext - map.virtual + ~SECTION_MASK) & SECTION_MASK;
map.type = MT_ROM;
create_mapping(&map);
#endif
/*
* Map the cache flushing regions.
*/
#ifdef FLUSH_BASE
map.pfn = __phys_to_pfn(FLUSH_BASE_PHYS);
map.virtual = FLUSH_BASE;
map.length = SZ_1M;
map.type = MT_CACHECLEAN;
create_mapping(&map);
#endif
#ifdef FLUSH_BASE_MINICACHE
map.pfn = __phys_to_pfn(FLUSH_BASE_PHYS + SZ_1M);
map.virtual = FLUSH_BASE_MINICACHE;
map.length = SZ_1M;
map.type = MT_MINICLEAN;
create_mapping(&map);
#endif
/*
* Create a mapping for the machine vectors at the high-vectors
* location (0xffff0000). If we aren't using high-vectors, also
* create a mapping at the low-vectors virtual address.
*/
map.pfn = __phys_to_pfn(virt_to_phys(vectors));
map.virtual = 0xffff0000;
map.length = PAGE_SIZE;
#ifdef CONFIG_KUSER_HELPERS
map.type = MT_HIGH_VECTORS;
#else
map.type = MT_LOW_VECTORS;
#endif
create_mapping(&map);---------- Virtual address 0xffff0000 - 0xffff1000 Mapping to 0x8f7fe000 - 0x8f7ff000, The attribute is MT_HIGH_VECTORS.
if (!vectors_high()) {
map.virtual = 0;
map.length = PAGE_SIZE * 2;
map.type = MT_LOW_VECTORS;
create_mapping(&map);------ Virtual address 0xffff1000 - 0xffff2000 Mapping to 0x8f7ff000 - 0x8f800000, The attribute is MT_LOW_VECTORS.
}
/* Now create a kernel read-only mapping */
map.pfn += 1;
map.virtual = 0xffff0000 + PAGE_SIZE;
map.length = PAGE_SIZE;
map.type = MT_LOW_VECTORS;
create_mapping(&map);
/*
* Ask the machine support to map in the statically mapped devices.
*/
if (mdesc->map_io)
mdesc->map_io();
else
debug_ll_io_init();
fill_pmd_gaps();
/* Reserve fixed i/o space in VMALLOC region */
pci_reserve_io();
/*
* Finally flush the caches and tlb to ensure that we're in a
* consistent state wrt the writebuffer. This also ensures that
* any write-allocated cache lines in the vector page are written
* back. After this point, we can start to touch devices again.
*/
local_flush_tlb_all();
flush_cache_all();
}
 Copy code

 

void __init sanity_check_meminfo(void)

????? How can these pages be guaranteed not to be used for other purposes ?????

4. zone initialization

Memory management will be a memory Node Divided into several zone Conduct management , Definition zone Type in the enum zone_type in .

Vexpress It defines NORMAL and HIGHMEM Two kinds of ,zone The initialization of is in bootmem_init In the middle of . adopt find_limits Find the physical memory start frame number min_low_pfn、 End frame number max_pfn、NORMAL The end frame number of the region max_low_pfn.

 Copy code
void __init bootmem_init(void)
{
unsigned long min, max_low, max_high;
memblock_allow_resize();
max_low = max_high = 0;
find_limits(&min, &max_low, &max_high);----------------------min_now_pfn=0x60000 max_low_pfn=0x8f800 max_pfn=0xa0000, Through global variables memblock pick up information
...
zone_sizes_init(min, max_low, max_high);--------------------- from min_low_pfn To max_low_pfn yes ZONE_NORMAL,max_low_pfn To max_pfn yes ZONE_HIGHMEM.
/*
* This doesn't seem to be used by the Linux memory manager any
* more, but is used by ll_rw_block. If we can get rid of it, we
* also get rid of some of the stuff above as well.
*/
min_low_pfn = min;
max_low_pfn = max_low;
max_pfn = max_high;
}
 Copy code

 

zone_sizes_init Calculate each of them zone Size and zone Between hole, And then call free_area_init_node Create memory node zone.

 Copy code
void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
unsigned long node_start_pfn, unsigned long *zholes_size)
{
pg_data_t *pgdat = NODE_DATA(nid);-------------------------------------------- obtain nid Corresponding Node data structure
unsigned long start_pfn = 0;
unsigned long end_pfn = 0;
/* pg_data_t should be reset to zero when it's allocated */
WARN_ON(pgdat->nr_zones || pgdat->classzone_idx);
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
...
calculate_node_totalpages(pgdat, start_pfn, end_pfn,
zones_size, zholes_size);--------------------------------------- Calculation Node Of page number ,1GB/4KB=262144
alloc_node_mem_map(pgdat);
#ifdef CONFIG_FLAT_NODE_MEM_MAP
printk(KERN_DEBUG "free_area_init_node: node %d, pgdat %08lx, node_mem_map %08lx\n",
nid, (unsigned long)pgdat,
(unsigned long)pgdat->node_mem_map);
#endif
free_area_init_core(pgdat, start_pfn, end_pfn,
zones_size, zholes_size);---------------------------------------- Initialize one by one Node Medium Zone
}
static void __paginginit free_area_init_core(struct pglist_data *pgdat,
unsigned long node_start_pfn, unsigned long node_end_pfn,
unsigned long *zones_size, unsigned long *zholes_size)
{
enum zone_type j;
int nid = pgdat->node_id;
unsigned long zone_start_pfn = pgdat->node_start_pfn;
int ret;
pgdat_resize_init(pgdat);
#ifdef CONFIG_NUMA_BALANCING
spin_lock_init(&pgdat->numabalancing_migrate_lock);
pgdat->numabalancing_migrate_nr_pages = 0;
pgdat->numabalancing_migrate_next_window = jiffies;
#endif
init_waitqueue_head(&pgdat->kswapd_wait);
init_waitqueue_head(&pgdat->pfmemalloc_wait);
pgdat_page_ext_init(pgdat);
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
unsigned long size, realsize, freesize, memmap_pages;
size = zone_spanned_pages_in_node(nid, j, node_start_pfn,
node_end_pfn, zones_size);
realsize = freesize = size - zone_absent_pages_in_node(nid, j,
node_start_pfn,
node_end_pfn,
zholes_size);
/*
* Adjust freesize so that it accounts for how much memory
* is used by this zone for memmap. This affects the watermark
* and per-cpu initialisations
*/
memmap_pages = calc_memmap_size(size, realsize);-------------------------------- Calculation struct page The amount of space the province needs to spend .
if (!is_highmem_idx(j)) {-------------------------------------------------------HIGHMEM Do not calculate the mapping cost page number .
if (freesize >= memmap_pages) {
freesize -= memmap_pages;
if (memmap_pages)
printk(KERN_DEBUG
" %s zone: %lu pages used for memmap\n",
zone_names[j], memmap_pages);
} else
printk(KERN_WARNING
" %s zone: %lu pages exceeds freesize %lu\n",
zone_names[j], memmap_pages, freesize);
}
/* Account for reserved pages */
if (j == 0 && freesize > dma_reserve) {
freesize -= dma_reserve;
printk(KERN_DEBUG " %s zone: %lu pages reserved\n",
zone_names[0], dma_reserve);
}
if (!is_highmem_idx(j))
nr_kernel_pages += freesize;
/* Charge for highmem memmap if there are enough kernel pages */
else if (nr_kernel_pages > memmap_pages * 2)
nr_kernel_pages -= memmap_pages;
nr_all_pages += freesize;
zone->spanned_pages = size;
zone->present_pages = realsize;
/*
* Set an approximate value for lowmem here, it will be adjusted
* when the bootmem allocator frees pages into the buddy system.
* And all highmem pages will be managed by the buddy system.
*/
zone->managed_pages = is_highmem_idx(j) ? realsize : freesize;
#ifdef CONFIG_NUMA
zone->node = nid;
zone->min_unmapped_pages = (freesize*sysctl_min_unmapped_ratio)
/ 100;
zone->min_slab_pages = (freesize * sysctl_min_slab_ratio) / 100;
#endif
zone->name = zone_names[j];
spin_lock_init(&zone->lock);
spin_lock_init(&zone->lru_lock);
zone_seqlock_init(zone);
zone->zone_pgdat = pgdat;
zone_pcp_init(zone);
/* For bootup, initialized properly in watermark setup */
mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages);
lruvec_init(&zone->lruvec);
if (!size)
continue;
set_pageblock_order();
setup_usemap(pgdat, zone, zone_start_pfn, size);
ret = init_currently_empty_zone(zone, zone_start_pfn,
size, MEMMAP_EARLY);
BUG_ON(ret);
memmap_init(size, nid, j, zone_start_pfn);---------------------------------------
zone_start_pfn += size;
}
}
 Copy code

 

The result of the above function is as follows :

 Copy code
On node 0 totalpages: 262144------------------------------------------------262144*4KB=1GB
free_area_init_node: node 0, pgdat c0782480, node_mem_map eeffa000
Normal zone: 1520 pages used for memmap-----------------------------------struct page size 32Byte,194560*32B/4KB=1520Page
Normal zone: 0 pages reserved
Normal zone: 194560 pages, LIFO batch:31----------------------------------194560*4KB=760MB
HighMem zone: 67584 pages, LIFO batch:15----------------------------------67584*4KB=264MB
 Copy code

  therefore ZONE_NORMAL The corresponding physical address is 0x60000000 - 0x8f800000,ZONE_HIGHMEM The corresponding physical address is 0x8f800000 - 0xa0000000.

build_all_zonelists_init

ZONE_PADDING()

watermark

Every zone The water level will be calculated during system initialization :WMARK_MIN、WMARK_LOW、WMARK_HIGH. These parameters are in kswapd When reclaiming page memory .

 Copy code
enum zone_watermarks {
WMARK_MIN,
WMARK_LOW,
WMARK_HIGH,
NR_WMARK
};
#define min_wmark_pages(z) (z->watermark[WMARK_MIN])
#define low_wmark_pages(z) (z->watermark[WMARK_LOW])
#define high_wmark_pages(z) (z->watermark[WMARK_HIGH])
struct zone {
/* Read-mostly fields */
/* zone watermarks, access with *_wmark_pages(zone) macros */
unsigned long watermark[NR_WMARK];
...
}
 Copy code

An important parameter for calculating water level min_free_kbytes Is in init_per_zone_wmark_min In the :

 

 Copy code
mm/page_alloc.c:
module_init(init_per_zone_wmark_min)------------------------------------------ Calculation min_free_kbytes=3489
setup_per_zone_wmarks-->
__setup_per_zone_wmarks-->-------------------------------------------- Calculation WMARK_HIGH/WMARK_LOW
/*
* Initialise min_free_kbytes.
*
* For small machines we want it small (128k min). For large machines
* we want it large (64MB max). But it is not linear, because network
* bandwidth does not increase linearly with machine size. We use
*
* min_free_kbytes = 4 * sqrt(lowmem_kbytes), for better accuracy:
* min_free_kbytes = sqrt(lowmem_kbytes * 16)
*
* which yields
*
* 16MB: 512k
* 32MB: 724k
* 64MB: 1024k
* 128MB: 1448k
* 256MB: 2048k
* 512MB: 2896k
* 1024MB: 4096k
* 2048MB: 5792k
* 4096MB: 8192k
* 8192MB: 11584k
* 16384MB: 16384k
*/
int __meminit init_per_zone_wmark_min(void)
{
unsigned long lowmem_kbytes;
int new_min_free_kbytes;
lowmem_kbytes = nr_free_buffer_pages() * (PAGE_SIZE >> 10);-------- be equal to lowmem_kbytes=761100
new_min_free_kbytes = int_sqrt(lowmem_kbytes * 16);----------------761100*16 Square root =3489.
if (new_min_free_kbytes > user_min_free_kbytes) {------------------user_min_free_kbytes=-1, therefore min_free_kbytes=3489. accord with [128B, 64MB]
min_free_kbytes = new_min_free_kbytes;
if (min_free_kbytes < 128)
min_free_kbytes = 128;
if (min_free_kbytes > 65536)
min_free_kbytes = 65536;
} else {
pr_warn("min_free_kbytes is not updated to %d because user defined value %d is preferred\n",
new_min_free_kbytes, user_min_free_kbytes);
}
setup_per_zone_wmarks();
refresh_zone_stat_thresholds();
setup_per_zone_lowmem_reserve();
setup_per_zone_inactive_ratio();
return 0;
}
 Copy code

 

The water level is calculated by __setup_per_zone_wmarks Accomplished :

 Copy code
static void __setup_per_zone_wmarks(void)
{
unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);------------min_free_kbytes=3489, therefore pages_min=3489/2=872
unsigned long lowmem_pages = 0;
struct zone *zone;
unsigned long flags;
/* Calculate total number of !ZONE_HIGHMEM pages */
for_each_zone(zone) {
if (!is_highmem(zone))
lowmem_pages += zone->managed_pages;----------------------------- Only calculate lowmem, therefore lowmem_pages=190273
}
for_each_zone(zone) {
u64 tmp;
spin_lock_irqsave(&zone->lock, flags);
tmp = (u64)pages_min * zone->managed_pages;
do_div(tmp, lowmem_pages);------------------------------------------Normal:tmp=872*190273/190273=872;Highmem:tmp=872*67584/190273=309
if (is_highmem(zone)) {
/*
* __GFP_HIGH and PF_MEMALLOC allocations usually don't
* need highmem pages, so cap pages_min to a small
* value here.
*
* The WMARK_HIGH-WMARK_LOW and (WMARK_LOW-WMARK_MIN)
* deltas controls asynch page reclaim, and so should
* not be capped for highmem.
*/
unsigned long min_pages;
min_pages = zone->managed_pages / 1024;
min_pages = clamp(min_pages, SWAP_CLUSTER_MAX, 128UL);
zone->watermark[WMARK_MIN] = min_pages;-------------------------Highmen:min_pages=67584/1024=66
} else {
/*
* If it's a lowmem zone, reserve a number of pages
* proportionate to the zone's size.
*/
zone->watermark[WMARK_MIN] = tmp;--------------------------------Normal:872
}
zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + (tmp >> 2);----Normal:872+872/4=1090;Highmem:66+309/4=143
zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1);----Normal:872+872/2=1308;Highmem:66+309/2=220
__mod_zone_page_state(zone, NR_ALLOC_BATCH,
high_wmark_pages(zone) - low_wmark_pages(zone) -
atomic_long_read(&zone->vm_stat[NR_ALLOC_BATCH]));
setup_zone_migrate_reserve(zone);
spin_unlock_irqrestore(&zone->lock, flags);
}
/* update totalreserve_pages */
calculate_totalreserve_pages();
}
 Copy code

 

Print each zone The information is as follows :

Normal min=872 low=1090 high=1308 zone_start_pfn=393216 managed_pages=190273 spanned_pages=194560 present_pages=194560---- size :0x2f800*4KB=760MB, You can use :190273 Page
HighMem min=66 low=143 high=220 zone_start_pfn=587776 managed_pages=67584 spanned_pages=67584 present_pages=67584--------- size :0x10800*4KB=264MB, You can use :67584 Pagepresent_pages=67584Movable min=32 low=32 high=32 zone_start_pfn=0 managed_pages=0 spanned_pages=0 present_pages=0

 

 

5. Physical memory initialization

Physical memory pages need to be added to the partner system , Partner system is a dynamic storage management method . When a user applies , Allocate a memory block of the right size , On the contrary, reclaim the memory block when it is released .

The management of free pages assigned by the partner system is based on two properties : The size of the page ,2 Of order The next power page ; And the migration type of the page .

 Copy code
struct zone {
...
#ifndef CONFIG_SPARSEMEM
/*
* Flags for a pageblock_nr_pages block. See pageblock-flags.h.
* In SPARSEMEM, this map is stored in struct mem_section
*/
unsigned long *pageblock_flags;-----------------------------zone in pageblock Corresponding MIGRATE_TYPE
#endif /* CONFIG_SPARSEMEM */...
/* free areas of different sizes */
struct free_area free_area[MAX_ORDER];--------------------------- according to order Distinguished free page block list
...
}
 Copy code

 

MIGRATE_TYPES

 Copy code
enum {
MIGRATE_UNMOVABLE,-------------------- The contents of the page box cannot be moved , The location must be fixed in memory , Can't move anywhere else , Most of the pages allocated by the core kernel fall into this category .
MIGRATE_RECLAIMABLE,------------------ The contents of the page frame can be recycled , Can't move directly . Because you can also rebuild pages from certain sources , For example, the data of the mapping file belongs to this category ,kswapd According to certain rules , Recycle these pages periodically .
MIGRATE_MOVABLE,---------------------- The contents of the page box can be moved , A page that belongs to a user space application belongs to this type of page , They are mapped through page tables , So just update the page table entries , And copy the data to a new location . Of course, pay attention to , A page may be shared by multiple processes , Corresponding to multiple page table entries .
MIGRATE_PCPTYPES, /* the number of types on the pcp lists */----- Used to express every CPU The number of migration types of linked lists in the data structure of the page frame cache .
MIGRATE_RESERVE = MIGRATE_PCPTYPES,
#ifdef CONFIG_CMA
/*
* MIGRATE_CMA migration type is designed to mimic the way
* ZONE_MOVABLE works. Only movable pages can be allocated
* from MIGRATE_CMA pageblocks and page allocator never
* implicitly change migration type of MIGRATE_CMA pageblock.
*
* The way to use it is to change migratetype of a range of
* pageblocks to MIGRATE_CMA which can be done by
* __free_pageblock_cma() function. What is important though
* is that a range of pageblocks must be aligned to
* MAX_ORDER_NR_PAGES should biggest page be bigger then
* a single pageblock.
*/
MIGRATE_CMA,------------------------ Reserve some memory for the driver , But when the driver is not in use , Partner systems can be allocated to user processes for anonymous memory or page caching . And when the driver needs to be used , The memory occupied by the process will be reclaimed or migrated to free up the reserved memory previously occupied , For drive use .
#endif
#ifdef CONFIG_MEMORY_ISOLATION
MIGRATE_ISOLATE, /* can't allocate from here */----------------- Page boxes cannot be assigned from this list , Because this list is specifically for NUMA Nodes move physical memory pages , Move the physical memory content to the most frequently used page CPU.
#endif
MIGRATE_TYPES
}
 Copy code

 

 

pageblock

 Copy code
#ifdef CONFIG_HUGETLB_PAGE
...
#else /* CONFIG_HUGETLB_PAGE */
/* If huge pages are not used, group by MAX_ORDER_NR_PAGES */
#define pageblock_order (MAX_ORDER-1)
#endif /* CONFIG_HUGETLB_PAGE */
#define pageblock_nr_pages (1UL << pageblock_order)
 Copy code

 

CONFIG_NO_BOOTMEM

usemap_size

memmap_init

 

Contact information :arnoldlu@qq.com
版权声明
本文为[sky-heaven]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云