System load average The number , Detailed explanation of system load , influence cpu User mode 、 The kernel state factor


The average load of the system

How to understand average load


Within unit time , The average state in which a system is in a noninterruptible and noninterruptible state , That is, the average number of active processes .

The average load is reasonable


The core number Average load meaning
4 2 Yes 50% Of cpu It's idle , See Figure 1
2 2 CPU Just completely occupied , See Figure 2
1 2 At least half of the process is missing cpu resources , See Figure 3

It's like there are four roads , There are two cars running on it , The traffic is very smooth at this time , No traffic jams .

Now there are two roads , There are two cars on it , This is the right time .

Now there are two roads , Four cars , The road is full , The back is blocked .

Which value to focus on

1、 If 1min,5min,15min There is no significant difference between the two values , It means that the load of the system is very heavy “ Stable ”

2、 If near 1min Less than the nearest value of 15min The value of proves that the system load is , The average load is decreasing , But before 15min The reason why the load is too high needs to be known .

3、 If near 15min Value , Far less than near 1min Value , It means that the load is increasing , It may be temporary or it may keep rising .


Average load case analysis


Yes CPU Test of


stress
[[email protected]_nginx ~]# yum -y install stress
#  simulation 4 individual cpu The core is full  
terminal 1
[[email protected]_nginx ~]# stress --cpu 4 --timeout 600
terminal 2
[[email protected]_nginx ~]# watch -d uptime
10:16:06 up 3 min,  2 users,  load average: 3.90, 1.68, 0.64
mpstat #  see cpu Real time usage of
[[email protected]_nginx ~]# yum install -y sysstat
[[email protected]_nginx ~]# mpstat -P ALL 5
#  Every time 5s Check all at once cpu, Check whether the load is too high due to the user or the system .
Linux 3.10.0-957.el7.x86_64 (fpm_nginx)  04/22/2020  _x86_64_ (4 CPU)
10:22:48 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
10:22:53 AM  all    0.00    0.00    0.05    0.00    0.00    0.00    0.00    0.00    0.00   99.95
10:22:53 AM    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:22:53 AM    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:22:53 AM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
10:22:53 AM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
pidstat
[[email protected]_nginx ~]# pidstat -u 5 1
#  Every time 5s Output a set of data , The last set of data is the average , Check which program is causing the load to be too high
Linux 3.10.0-957.el7.x86_64 (fpm_nginx)  04/22/2020  _x86_64_ (4 CPU)
10:28:25 AM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
10:28:30 AM     0      5819    0.20    0.00    0.00    0.20     2  vmtoolsd
10:28:30 AM     0      6650  100.00    0.20    0.00  100.00     3  stress
10:28:30 AM     0      6651  100.00    0.20    0.00  100.00     2  stress
10:28:30 AM     0      6652   99.60    0.20    0.00   99.80     0  stress
10:28:30 AM     0      6653  100.00    0.00    0.00  100.00     1  stress
10:28:30 AM     0      6656    0.00    0.40    0.00    0.40     0  pidstat
Average:      UID       PID    %usr %system  %guest    %CPU   CPU  Command
Average:        0      5819    0.20    0.00    0.00    0.20     -  vmtoolsd
Average:        0      6650  100.00    0.20    0.00  100.00     -  stress
Average:        0      6651  100.00    0.20    0.00  100.00     -  stress
Average:        0      6652   99.60    0.20    0.00   99.80     -  stress
Average:        0      6653  100.00    0.00    0.00  100.00     -  stress
Average:        0      6656    0.00    0.40    0.00    0.40


Yes I/O Test of


600s Then quit
stress --io 1 --timeout 600
[[email protected]_nginx ~]# mpstat -P ALL 5
Linux 3.10.0-957.el7.x86_64 (fpm_nginx)  04/22/2020  _x86_64_ (4 CPU)
10:42:05 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
10:42:10 AM  all    0.10    0.00   13.27    0.00    0.00    0.05    0.00    0.00    0.00   86.58
10:42:10 AM    0    0.00    0.00    8.56    0.00    0.00    0.00    0.00    0.00    0.00   91.44
10:42:10 AM    1    0.00    0.00   11.51    0.00    0.00    0.00    0.00    0.00    0.00   88.49
10:42:10 AM    2    0.21    0.00   12.63    0.00    0.00    0.00    0.00    0.00    0.00   87.16
10:42:10 AM    3    0.21    0.00   20.68    0.00    0.00    0.00    0.00    0.00    0.00   79.11
[[email protected]_nginx ~]# pidstat -u 3 4
Linux 3.10.0-957.el7.x86_64 (fpm_nginx)  04/22/2020  _x86_64_ (4 CPU)
10:43:02 AM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
10:43:05 AM     0        13    0.00    0.33    0.00    0.33     1  migration/1
10:43:05 AM     0      6119    0.00    0.33    0.00    0.33     2  tuned
10:43:05 AM     0      6405    0.00    0.33    0.00    0.33     2  sshd
10:43:05 AM     0      6630    0.00    8.28    0.00    8.28     1  kworker/u256:0
10:43:05 AM     0      6673    0.00   37.09    0.00   37.09     2  stress
10:43:05 AM     0      6674    0.00   17.88    0.00   17.88     0  kworker/u256:2
10:43:05 AM     0      6675    0.00    0.33    0.00    0.33     0  kworker/0:1
10:43:05 AM     0      6804    0.00    0.33    0.00    0.33     1  pidstat
notes : A large number of disks io It can lead to , Abnormal increase in load , Than cpu The core is still high , It will increase the proportion of kernel states .


A lot of processes


[[email protected]_nginx ~]# stress -c 4 --timeout 600
# %idle  It represents the kernel state
[[email protected]_nginx ~]# mpstat -P ALL 5
Linux 3.10.0-957.el7.x86_64 (fpm_nginx)  04/22/2020  _x86_64_ (4 CPU)
11:02:02 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
11:02:07 AM  all   99.72    0.00    0.11    0.00    0.00    0.17    0.00    0.00    0.00    0.00
11:02:07 AM    0   99.55    0.00    0.23    0.00    0.00    0.23    0.00    0.00    0.00    0.00
11:02:07 AM    1   99.78    0.00    0.00    0.00    0.00    0.22    0.00    0.00    0.00    0.00
11:02:07 AM    2   99.55    0.00    0.23    0.00    0.00    0.23    0.00    0.00    0.00    0.00
11:02:07 AM    3  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
[[email protected]_nginx ~]# pidstat -u 3 4
Linux 3.10.0-957.el7.x86_64 (fpm_nginx)  04/22/2020  _x86_64_ (4 CPU)
11:02:25 AM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
11:02:28 AM     0      5819    0.00    0.32    0.00    1.30    0.32     2  vmtoolsd
11:02:28 AM     0      6119    0.00    0.32    0.00    0.00    0.32     2  tuned
11:02:28 AM     0      7950   97.40    0.00    0.00    0.97   97.40     3  stress
11:02:28 AM     0      7951   98.05    0.32    0.00    0.32   98.38     0  stress
11:02:28 AM     0      7952   98.05    0.32    0.00    0.65   98.38     2  stress
11:02:28 AM     0      7953   98.38    0.32    0.00    0.32   98.70     1  stress
11:02:28 AM     0      8013    0.32    0.32    0.00    0.32    0.65     3  pidstat
#  A large number of processes will cause the user mode to rise 

summary :

  1. High average load may be CPU Intensive processes lead to
  2. High average load does not necessarily mean CPU The utilization rate of is certainly high , And it could be I/O Busy
  3. When the load is high , have access to mpstat、pidstat Tools such as , Quick positioning to , The reason for the high load , So as to deal with