[python] controls the sub process through the signal mechanism

Keywords: Python Linux

In python, the use of multiple processes can bypass the limitation of GIL, so as to make full use of multi-core CPU to speed up tasks. However, if the process is not handled properly, it is easy to cause Zombie process or Orphan process , resulting in a waste of system resources.

This paper analyzes the writing methods of these two processes that may be generated in python and how to deal with the code correctly to avoid them.

All the codes in this article have passed the test under CentOS, and other versions and operating systems are not guaranteed.

1. Generation and processing of zombie process

1.1 generate a zombie process

When the parent process assigns a child process to a task and no longer manages the child process (whether the child process completes or reports an error), the child process will become a zombie process after completing the task.

It is very simple to create a zombie process using python. Suppose there are two scripts named main.py and worker.py, which are as follows:

# main.py
import subprocess
import os
import time

if __name__ == '__main__':
    print(f"I am the parent process, with pid: {os.getpid()}")
    subprocess.Popen(['python', 'worker.py'])
    time.sleep(100)  
    print(f"I finish my work.")
# worker.py
import time
import os

if __name__ == '__main__':
	print(f"I am the child process, with pid: {os.getpid()} and I'm going to sleep.")
	time.sleep(5)
	print("Now I wake up")

Now, execute python main.py on the console, and the following will be printed first:

I am the parent process, with pid: 29947
I am the child process, with pid: 29968 and I'm going to sleep.

Then use ps to view the process immediately, and the following will be printed:

As can be seen from the above, the main process (29947) generates a sub process (29968) to perform tasks; After 5 seconds, the console will continue printing:

Now I wake up

This indicates that the task of the child process has been completed. Since the main process does not manage the child process after it is generated, it will be found that the child process has become a zombie process:

If you wait another 95 seconds and the main process completes and exits, you will find that the zombie process has disappeared through ps. This indicates that after the main process exits, the zombie process is managed by the init process, which will recycle the zombie process periodically.

Therefore, the last line of main.py is to ensure that the main process lasts enough time, so that the child process will not be managed by the init process, and the zombie process can be observed through ps.

1.2 processing of zombie process

Although it is no longer an "irresponsible" practice to generate a child process, if the child process can complete all the work (for example, it can store the calculation results in the specified database without summarizing them to the main process), this method is really convenient. We only need to consider recycling the zombies.

python provides such a method through the signal module. Modify the code of main.py as follows:

# main.py
import subprocess
import os
import time
import signal
signal.signal(signal.SIGCHLD, signal.SIG_IGN)

if __name__ == '__main__':
    print(f"I am the parent process, with pid: {os.getpid()}")
    subprocess.Popen(['python', 'worker.py'])
    time.sleep(100)  
    print(f"I finish my work.")

If you repeat the process in 1.1, you will find that the child process will exit directly after 5 seconds without becoming a zombie process. Obviously, this line of code works:

signal.signal(signal.SIGCHLD, signal.SIG_IGN)

signal.signal() accepts two parameters, the first is the received signal value, and the second is the behavior taken. In this example, signal.signal(signal.SIGCHLD, signal.SIG_IGN) indicates that when the main process receives the signal (signal.SIGCHLD) that the child process is terminated, it ignores it (signal.SIG_IGN), so that the resources of the child process are released.

2. Generation and treatment of orphan process

When the parent process terminates before the child process completes the task, it becomes an orphan process because the child process will continue to complete the task.

2.1 generate an orphan process

By slightly modifying the above code, an orphan process can be generated:

# main.py
import subprocess
import time
import os

if __name__ == '__main__':
    print(f"I am the parent process, with pid: {os.getpid()}")
    subprocess.Popen(['python', 'worker.py'])
    time.sleep(5)  
    print(f"I finish my work.")
# worker.py
import time
import os

if __name__ == '__main__':
	print(f"I am the child process, with pid: {os.getpid()} and I'm going to sleep.")
	time.sleep(10)
	print("Now I wake up")

If python main.py is still executed on the console, the following will be printed first:

I am the parent process, with pid: 12994
I am the child process, with pid: 13015 and I'm going to sleep.

This is to use ps to view the process, as follows:

After 5 seconds, the parent process exits. At this time, the result of ps is as follows:

Note that process 1631 has exited, and the parent process of process 1641 has become the init process. This shows that it has become an orphan process and is managed by init.

After 10 seconds, the child process also completes the task and exits, which can be verified by ps.

Since the orphan process has init takeover, it will not become a zombie process.

2.2 handling of orphan process

Although orphan processes are different from zombie processes, in some cases (for example, an error is found at the end of the program after the task runs. In order to avoid wasting time in executing the pre code, we want to terminate these processes) we want the main process to automatically terminate all sub processes after exiting.

Following the idea in 1.2, we hope that when the main process receives a termination signal, it will first terminate all child processes and then exit.

To achieve this, we need to maintain a list to store all child processes:

# main.py
import subprocess
import time
import os
import signal

child_processes = []

def handler(signum, action):
    for i, p in enumerate(child_processes):
    	print(f'Killing {i+1}/5...')
        p.kill()
    raise

if __name__ == '__main__':
    print(f"I am the parent process, with pid: {os.getpid()}")
    for i in range(5):
        p = subprocess.Popen(['python', 'worker.py'])
        child_processes.append(p)
    signal.signal(signal.SIGTERM, handler)
    time.sleep(10)  
    print(f"I finish my work.")

signal.SIGTERM represents the termination signal, that is, the signal sent to the target process when kill < PID > is used. By defining the handler function, we have modified the behavior of the main process when it receives the termination signal: find all child processes and kill them.

Note that you cannot send SIGKILL (i.e. the signal sent by kill - 9 < PID >) or SIGSTOP. These two signal processes cannot be captured and will be terminated immediately without any operation.

If you execute python main.py, the console will print the following immediately:

I am the parent process, with pid: 12162
I am the child process, with pid: 12194 and I'm going to sleep.
I am the child process, with pid: 12207 and I'm going to sleep.
I am the child process, with pid: 12219 and I'm going to sleep.
I am the child process, with pid: 12234 and I'm going to sleep.
I am the child process, with pid: 12245 and I'm going to sleep.

Then kill 12162 is executed in another shell, which sends a signal.SIGTERM signal to the main process. When the main process receives this signal, it turns to the handler function to take corresponding actions. Then, you will see the following on the console:

Killing 1/5...
Killing 2/5...
Killing 3/5...
Killing 4/5...
Killing 5/5...
Traceback (most recent call last):
  File "main.py", line 21, in <module>
    time.sleep(10)  
  File "main.py", line 12, in handler
    raise
RuntimeError: No active exception to reraise

Through ps view, it is found that all main processes and child processes have been terminated.

After seeing this, you can already think that raise in the handler function is actually used to terminate the main process. Because we have modified the behavior of the main process when it receives the SIGTERM signal, in order to enable kill to still kill the main process, we can terminate the main process by throwing an exception after killing all child processes.

2.3 a more Python approach

Although our idea can be realized through 2.2, it is not enough to terminate the process by throwing exceptions; In addition, it is not perfect to record all sub processes in the main process (not to mention what you should do if you use the process pool). So, is there a better way?

The answer is yes.

There is a concept of process group in linux, that is, the main process and the child processes it creates belong to the same process group, which has a unique number. Therefore, all processes in the process group can be killed in batch through the os.killpg() method.

Therefore, the above code is modified as follows:

# main.py
import subprocess
import time
import os
import signal

def handler(signum, action):
    os.killpg(os.getpgid(os.getpid()), signal.SIGKILL)

if __name__ == '__main__':
    print(f"I am the parent process, with pid: {os.getpid()}")
    for i in range(5):
        p = subprocess.Popen(['python', 'worker.py'])
    signal.signal(signal.SIGTERM, handler)
    time.sleep(10)  
    print(f"I finish my work.")

After executing python main.py and kill ing the main process, the contents printed on the console should be as follows:

I am the parent process, with pid: 20416
I am the child process, with pid: 20438 and I'm going to sleep.
I am the child process, with pid: 20445 and I'm going to sleep.
I am the child process, with pid: 20453 and I'm going to sleep.
I am the child process, with pid: 20472 and I'm going to sleep.
I am the child process, with pid: 20463 and I'm going to sleep.
Killed

3 Summary

This paper briefly analyzes the signal processing mechanism in linux operating system and their implementation in python. This paper focuses on how to avoid zombie process and orphan process through signal when using python for multi process programming.

For more information, see the references in this article.

4 references

  1. signal -- set asynchronous event handler
  2. Linux signal foundation
  3. Python module signal
  4. When the main process is killed, how to ensure that the child process exits at the same time without becoming an orphan process (1)

Posted by xkellix on Fri, 19 Nov 2021 05:53:06 -0800