How does a computer execute a process ? This is the core problem of computer operation . Even if the program has been written , But the program is dead . Only a living process can produce . We have Linux Process foundation Learn about the process . Now let's look at the long journey from program to process .

 

A program

Here's a simple one C Program , Suppose the program has been compiled , Generate executable files vamei.exe.

 

#include <stdio.h>
int glob=0;                                             /*global variable*/
void main(void) {
  int main1=5;                                          /*local variable of main()*/
  int main2;                                            /*local variable of main()*/
  main2 = inner(main1);                                 /* call inner() function */
  printf("From Main: glob: %d \n", glob);
  printf("From Main: main2: %d \n", main2);
}
int inner(int inner1) {                                 /*inner1 is an argument, also local to inner()*/
  int inner2=10;                                        /*local variable of inner()*/
  printf("From inner: glob: %d \n", glob);
  return(inner1+inner2);
}

( Which language or specific grammar to choose is not the key , Most languages can write programs similar to the above . Looking at Python Readers of the tutorial can also use Python The function structure of and print Write a similar python Program . Of course , It can also be C++,Java,Objective-C wait . choose C The reason for language is : It is for UNIX And the language of birth .)

 

main() Called in the function inner() function .inner() Once called printf() To output . Last , stay main() We did it twice printf().

Note the scope of the variable . In short , Variables can be divided into global variables and local variables . Variables declared outside all functions are global variables , such as glob, It can be used at any time . Variables defined in a function are local variables , Only in the scope of the function (range) Internal use , Let's say we are inner() You can't use it at work main() Function main1 Variable , And in the main() We can't use inner() Function inner2 Variable .

 

Don't worry too much about the specific functions of this program . The point is how the program works . The following figure shows the running process of the program , And the scope of each variable :

Operation process

Process space

In order to further understand the operation of the above program , We need to know , How processes use memory . When a program file runs as a process , The process gets space in memory . This space is your own little room .

Each process space is divided into different areas as follows :

Memory space

Text The area is used to store instructions (instruction), Explain the operation of each step .Global Data Used to store global variables , Stack (Stack) Used to store local variables , Pile up (heap) Used to store dynamic variables (dynamic variable. Program utilization malloc system call , Directly from memory for dynamic variable Open up space ).Text and Global data It was determined at the beginning of the process , And maintain a fixed size throughout the process .

 

Stack (Stack) In frames (stack frame) In units of . When a program calls a function , such as main() Call in function inner() function ,stack It will grow one frame down . The parameters and local variables of the function are stored in the frame , And the return address of the function (return address). here , The computer takes control from main() Transferred to the inner(),inner() Function is active (active) state . The frame at the bottom of the stack , With global variables , It forms the current environment (context). The activation function can invoke the required variables from the environment . Typical programming languages only allow you to use the stack The bottom frame , You are not allowed to call other frames ( It also conforms to the stack structure “ First in, then out ” Characteristics of . But there are also languages that allow you to call other parts of the stack , It's equivalent to allowing you to run inner() Called when function main() Local variables declared in , such as Pascal). When a function further calls another function , A new frame will continue to grow to the bottom of the stack , Control is transferred to a new function . When the activation function returns , Will pop up from the stack (pop, Read and delete from the stack ) This frame , And according to the return address recorded in the frame , Give control back to the address ( For instance from inner() Function , Carry on main() Assigned to main2 The operation of ).

The following figure shows the changes of the stack during operation . The arrow indicates the growth direction of the stack . Each square represents a frame . In the beginning, we had a plan for main() Frame of service , With the call inner(), We are inner() Add a frame . stay inner() return , Once again, we have only main() Frame of , Until the last main() return , Its return address is empty , So the process is over .

stack change

In the process of running , By calling and returning functions , Control is constantly shifting between functions . When a process calls a function , The frame of the original function holds the state when we leave , And open up the frame space for the new function . When the call function returns , The space occupied by the frame of this function empties as the frame pops up . The process returns to the state saved in the frame of the original function again , And continue to execute according to the instruction pointed by the return address . The above process continues , The stack keeps growing or decreasing , until main() On return , The stack is completely empty , End of process .

 

When used in a program malloc When , Pile up (heap) It's going to grow up , The part of its growth becomes malloc Space allocated from memory .malloc The open space will always exist , Until we use free System call to release , Or the end of the process . A classic mistake is a memory leak (memory leakage), It means that we don't free up heap space that we don't use anymore , Causing the heap to grow , And memory free space is decreasing .

The size of stack and heap will increase or decrease as the process runs . When the stack and heap grow to the point where they meet , That's the blue area in the memory space graph (unused area) When it's completely gone , No more memory available . A stack overflow occurs in the process (stack overflow) Error of , Cause the process to terminate . In modern computers , There are usually enough blue areas allocated to the kernel , If it's cleaned up in time , Stack overflow is easy to avoid . even so , Memory overload , Stack overflow is still possible . We need to increase the physical memory .

Stack overflow It can be said that the most famous computer error , That's why IT Website (stackoverflow.com) In the name of .

 

In advanced languages , The details of memory management are opaque to users . In programming , We just need to remember the variable scope in the previous section . But when you want to write complex programs or debug When , We need relevant knowledge .

 

Process additional information

In addition to the information above , Each process also includes some process additional information , Include PID,PPID,PGID( Reference resources Linux Process foundation as well as Linux Process relationships ) etc. , Used to identify the process 、 Process relationships and other statistics . This information is not stored in the memory space of the process . The kernel allocates a variable in the kernel's own space for each process (task_struct Structure ) To save the above information . The kernel can know the process profile by looking at the additional information of each process in its own space , Instead of going into the space of the process itself ( It's like we can know who the owner of the room is by the number of the house , Instead of opening the door ). There is a place in the additional information of each process dedicated to storing the received signals ( As we are Linux Signal base What we say “ mail ”).

 

fork & exec

Now? , We can learn more about fork and exec( Reference resources Linux Process foundation ) The mechanism of the . When a program calls fork When , In fact, the memory space above is , Include text, global data, heap and stack, There's another copy , Constitute a new process , And create new additional information for the change process in the kernel ( For example, new. PID, and PPID For the original process PID). thereafter , The two processes continue to run separately . The new process has the same running state as the original process ( The same variable value , same instructions...). We can only distinguish the two by additional information about the process .

Program call exec When , The process empties its own memory space text, global data, heap and stack, And rebuild according to the new program file text, global data, heap and stack ( here heap and stack All sizes 0), And start running .

( Modern operating systems for efficiency , Improved management fork and exec The specific mechanism of the system , But logically there is no difference . Please refer to Linux Kernel related books )

 

This one is about integrating a lot of things , So it's a little bit long . This article is mainly conceptual , Many details will vary depending on the language and platform and even the compiler , But in general , The above concepts apply to all computer processes ( Whether it's Windows still UNIX). More in-depth content , Including threads (thread)、 Interprocess communication (IPC) etc. , It all depends on what's introduced here .

 

summary

function , Action range of variable ,global/local/dynamic variables

global data, text,

stack, stack frame, return address, stack overflow

heap, malloc, free, memory leakage

Process additional information , task_struct

fork & exec