Analysis of the core technology of docker cgroups

Binary community 2020-11-07 20:55:52
analysis core technology docker cgroups


  • [1. Cgroups brief introduction ]

     - [1.1 Function and positioning ]
    - [1.2 Introduction to related concepts ]
    - [1.3 Subsystem ]
    - [1.4 cgroups file system ]
  • [2. cgroups Subsystem ]

     - [2.1 cpu Subsystem ]
    - [2.2 cpuacct Subsystem ]
    - [2.3 cpuset Subsystem ]
    - [2.4 memory Subsystem ]
    - [2.5 blkio Subsystem - block io]
  • [3. cgroups Installation and use ]

     - [3.1 cgroup Installation ]
    - [3.2 Join the process to the resource restriction group ]
  • [4. summary ]

1. Cgroups brief introduction

1.1 Function and positioning

Cgroups Full name Control Groups, yes Linux Physical resource isolation mechanism provided by kernel , Through this mechanism , Can realize the right Linux Resource constraints for processes or process groups 、 Isolation and statistics functions . For example, through cgroup Limit resource usage for specific processes , For example, use a certain number of cpu Cores and a specific size of memory , In case of resource overrun , Will be suspended or killed .Cgroup Is in the 2.6 The kernel consists of Google The company led the introduction of , It is Linux The kernel is the technical foundation of resource virtualization ,LXC(Linux Containers) and docker The resource isolation technology used by the container , It is Cgroup.

1.2 Introduction to related concepts

  • Mission (task): stay cgroup in , The task is a process .
  • Control group (control group): cgroup Resource control is implemented in the form of control group , The control group indicates the quota limit for the resource . Processes can be added to a control group , You can also migrate to another control group .
  • Hierarchy (hierarchy): The control group has hierarchy , A tree like structure , The control group of the child node inherits the properties of the parent control group ( Resource quotas 、 Limit etc. ).
  • Subsystem (subsystem): A subsystem is actually a controller of resources , such as memory The subsystem can control the use of process memory . Subsystems need to be added to a certain level , Then all control groups at that level , Are controlled by this subsystem .

The relationship between concepts :

  • Subsystems can be attached to multiple levels , If and only if these levels have no other subsystems , For example, there is only one of the two levels at the same time cpu Subsystem , Yes. .
  • Multiple subsystems can be attached to a hierarchy .
  • A task can be more than one cgroup Members of , But these cgroup It has to be at a different level .
  • The child process automatically becomes the parent process cgroup Members of , You can move subprocesses to different cgroup in .

cgroup The diagram is as follows : Two tasks make up one Task Group, And used CPU and Memory Two subsystems of cgroup, Used to control the CPU and MEM Resource isolation of .

1.3 Subsystem

  • cpu: Limiting process cpu Usage rate .
  • cpuacct Subsystem , You can count cgroups In the process of cpu Use report .
  • cpuset: by cgroups The processes in are assigned separate cpu Node or memory node .
  • memory: Limiting process memory Usage quantity .
  • blkio: Block devices that restrict processes io.
  • devices: The control process has access to certain devices .
  • net_cls: Mark cgroups Network packets for processes in , And then you can use tc modular (traffic control) Control packets .
  • net_prio: Limit the priority of process network traffic .
  • huge_tlb: Limit HugeTLB Use .
  • freezer: Suspend or resume cgroups Process in .
  • ns: control cgroups Processes in use different namespace.

1.4 cgroups file system

Linux By way of documents , take cgroups The function and configuration of are exposed to users , Thanks to Linux Virtual file system (VFS).VFS Hide the details of the specific file system , Provide a unified file system for user mode API Interface ,cgroups and VFS Between the link parts , be called cgroups file system . For example, hanging on cpu、cpuset、memory Three subsystems to /cgroups/cpu_mem Under the table of contents

mount -t cgroup -o cpu,cpuset,memory cpu_mem /cgroups/cpu_mem

About the virtual file system mechanism , see Talking about Linux Virtual file system mechanism

2. cgroups Subsystem

Here is a brief introduction to the concept and usage of several common subsystems , Include cpu、cpuacct、cpuset、memory、blkio.

2.1 cpu Subsystem

cpu Subsystem restrictions on CPU The interview of , Each parameter exists independently in cgroups Virtual file system in the pseudo file , The parameters are explained as follows :

  • cpu.shares: cgroup The allocation of time . such as cgroup A The settings are 1,cgroup B The settings are 2, that B Task acquisition in cpu Time for , yes A Of the task 2 times .
  • cpu.cfs_period_us: The period of time quota adjustment for a fully fair scheduler .
  • cpu.cfs_quota_us: The amount of time that can be consumed in the cycle of a fully fair scheduler .
  • cpu.stat Statistics

    • nr_periods The number of times you enter a cycle
    • nr_throttled The number of times the runtime has been adjusted
    • throttled_time Time for adjustment

2.2 cpuacct Subsystem

Subsystem generation cgroup The task uses CPU Resource report , Do not do resource limiting function .

  • cpuacct.usage: The cgroup All tasks in the CPU Time (ns nanosecond )
  • cpuacct.stat: The cgroup All tasks in the CPU Time , distinguish user and system Time .
  • cpuacct.usage_percpu: The cgroup All tasks in use each CPU Time of counting .

adopt cpuacct How to calculate CPU The utilization rate ? Can pass cpuacct.usage To calculate the overall CPU utilization , The calculation is as follows :

# 1. Get the current time ( nanosecond )
tstart=$(date +%s%N)
# 2. obtain cpuacct.usage
cstart=$(cat /xxx/cpuacct.usage)
# 3. interval 5s Statistics
sleep 5
# 4. Pick again
tstop=$(date +%s%N)
cstop=$(cat /xxx/cpuacct.usage)
# 5. Calculate utilization
($cstop - $cstart) / ($tstop - $tstart) * 100

2.3 cpuset Subsystem

Applicable to distributive independence CPU Nodes and Mem node , For example, bind the process to the specified CPU Or running on memory nodes , The parameters are explained as follows :

  • cpuset.cpus: serviceable cpu node
  • cpuset.mems: serviceable mem node
  • cpuset.memory_migrate: Whether the memory node changes should be migrated ?
  • cpuset.cpu_exclusive: this cgroup Whether the task in is exclusive cpu?
  • cpuset.mem_exclusive: this cgroup Whether the task in is exclusive mem node ?
  • cpuset.mem_hardwall: Nodes that limit kernel memory allocation (mems It's user mode allocation )
  • cpuset.memory_pressure: Calculate the page change pressure .
  • cpuset.memory_spread_page: take page cache Assigned to each node , Instead of the current memory node .
  • cpuset.memory_spread_slab: take slab object (inode and dentry) Distributed to nodes .
  • cpuset.sched_load_balance: open cpu set Medium cpu Load balancing of .
  • cpuset.sched_relax_domain_level: the searching range when migrating tasks
  • cpuset.memory_pressure_enabled: Whether to calculate memory_pressure?

2.4 memory Subsystem

memory The subsystem mainly involves some limitation and operation of memory , The main parameters are as follows :

  • memory.usage_in_bytes # Current memory usage
  • memory.memsw.usage_in_bytes # Current usage of memory and swap space
  • memory.limit_in_bytes # Set up or View memory usage
  • memory.memsw.limit_in_bytes # Set up or see Memory plus swap space usage
  • memory.failcnt # Check the number of times memory usage is limited
  • memory.memsw.failcnt # - Check the number of times memory and swap space usage has been limited
  • memory.max_usage_in_bytes # Check the maximum memory usage
  • memory.memsw.max_usage_in_bytes # View maximum memory and swap usage
  • memory.soft_limit_in_bytes # Set up or Check the memory of soft limit
  • memory.stat # Statistics
  • memory.use_hierarchy # Set up or The function of viewing hierarchical Statistics
  • memory.force_empty # Trigger force page Recycling
  • memory.pressure_level # Set memory pressure notification
  • memory.swappiness # Set up or see vmscan swappiness Parameters
  • memory.move_charge_at_immigrate # Set up or see controls of moving charges?
  • memory.oom_control # Set up or View memory overrun control information (OOM killer)
  • memory.numa_stat # Every numa The amount of memory used by the node
  • memory.kmem.limit_in_bytes # Set up or see Hard limits for kernel memory limits
  • memory.kmem.usage_in_bytes # Read the current kernel memory allocation
  • memory.kmem.failcnt # The number of times the current kernel memory allocation is limited
  • memory.kmem.max_usage_in_bytes # Read maximum kernel memory usage
  • memory.kmem.tcp.limit_in_bytes # Set up tcp Cache memory hard limit
  • memory.kmem.tcp.usage_in_bytes # Read tcp Cache memory usage
  • memory.kmem.tcp.failcnt # tcp The limited number of cache memory allocations
  • memory.kmem.tcp.max_usage_in_bytes # tcp Maximum usage of cache memory

2.5 blkio Subsystem - block io

It is mainly used to control equipment IO The interview of . There are two ways to limit : Weight and cap , Weight is to give different applications a weight value , Use as a percentage IO resources , The upper limit is the maximum value that controls the read / write rate of the application . Distribute by weight IO resources :

  • blkio.weight: Fill in 100-1000 An integer value of , As relative weight ratio , As a general device allocation ratio .
  • blkio.weight_device: Weight ratio for a specific device , The writing format is device_types:node_numbers weight, The parameter field before the space specifies the device ,weight Parameters and blkio.weight Same and cover the original general allocation ratio .

Limit the reading and writing speed according to the upper limit :

  • blkio.throttle.read_bps_device: Set an upper limit on the amount of data read from the block device per second , Format device_types:node_numbers bytes_per_second.
  • blkio.throttle.write_bps_device: Set an upper limit on the amount of data written to the block device per second , Format device_types:node_numbers bytes_per_second.
  • blkio.throttle.read_iops_device: Set the maximum number of read operations per second , Format device_types:node_numbers operations_per_second.
  • blkio.throttle.write_iops_device: Set the maximum number of writes per second , Format device_types:node_numbers operations_per_second

For specific operations (read, write, sync, or async) Set the upper limit of reading and writing speed

  • blkio.throttle.io_serviced: Set an upper limit on the number of operations per second for a specific operation , Format device_types:node_numbers operation operations_per_second
  • blkio.throttle.io_service_bytes: Set an upper limit on the amount of data per second for a specific operation , Format device_types:node_numbers operation bytes_per_second

3. cgroups Installation and use

The test environment is ubuntu 18.10

3.1 cgroups Installation

  1. install cgroups
sudo apt install cgroup-bin

After installation , The system will display the directory /sys/fs/cgroup.

  1. establish cpu Resource control group , Limit cpu The maximum utilization rate is 50%
$ cd /sys/fs/cgroup/cpu
$ sudo mkdir test_cpu
$ sudo echo '10000' > test_cpu/cpu.cfs_period_us
$ sudo echo '5000' > test_cpu/cpu.cfs_quota_us
  1. establish mem Resource control group , Limit the maximum memory usage to 100MB
$ cd /sys/fs/cgroup/memory
$ sudo mkdir test_mem
$ sudo echo '104857600' > test_mem/memory.limit_in_bytes

3.2 Join the process to the resource restriction group

Test code as follows :

#include <unistd.h>
#include <stdio.h>
#include <cstring>
#include <thread>
void test_cpu() {
printf("thread: test_cpu start\n");
int total = 0;
while (1) {
void test_mem() {
printf("thread: test_mem start\n");
int step = 20;
int size = 10 * 1024 * 1024; // 10Mb
for (int i = 0; i < step; ++i) {
char* tmp = new char[size];
memset(tmp, i, size);
printf("thread: test_mem done\n");
int main(int argc, char** argv) {
std::thread t1(test_cpu);
std::thread t2(test_mem);
return 0;

1. Compile the program

g++ -o test --std=c++11 -lpthread

2. Observe the operating state before the limit 3. test cpu The limitation of

cgexec -g cpu:test_cpu ./test

cpu The usage rate has been reduced by half . Besides using cgexec Limit out of process , You can also add the process number to cgroup.procs The way , To achieve the purpose of limitation .

4. summary

This article briefly introduces Cgroups Concept and use of , adopt Cgroups You can limit and isolate resources . In the actual production environment ,Cgroups Technology is widely used in various container technologies , Include docker、rocket etc. . The emergence of this resource limitation and isolation technology , It makes it possible for the modules to mix with each other , Greatly improve the utilization of machine resources , This is also one of the key technologies of cloud computing .
Original link

More in-depth articles , Focus on : Binary community

本文为[Binary community]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云