04 understanding process: why was my process in the container forcibly killed?

Keywords: Docker

Today, let's talk about the last part of the init process in the container, why the process in the container is forcibly killed. Understanding this problem can help you better manage processes, so that processes in the container can be gracefully shutdown.

Let me tell you why it is important to do this in process management. In the actual production environment, many applications need to clean up when exiting, such as clearing some remote links or clearing some local temporary data.

Such cleaning can avoid remote or local errors as much as possible, such as reducing packet loss. These exit cleaning work is usually carried out in the handler registered by the signal user SIGTERM.

However, if our process receives SIGKILL, the application will not have a chance to perform these cleanups. This means that once the process cannot be gracefully shutdown, it will increase the error rate of the application.

So next, let's recreate what happens when the process exits the container.

Scene reproduction

On the container platform, if you want to stop a container, whether you delete a pod in Kubernetes or stop a container with Docker, you will finally use the container service.

When the container stops the container, it will send a SIGTERM signal to the init process of the container.

We will find that after the init process exits, other processes in the container also exit immediately. However, the difference is that the init process receives the SIGTERM signal, while other processes receive the SIGKILL signal.

In the first lecture of the understanding process, * * we mentioned that the SIGKILL signal cannot be caught, that is, the user cannot register his own handler, * * while the SIGTERM signal allows the user to register his own handler, which makes a great difference.

Then, let's take a look at how to make the processes in the container receive SIGTERM signal instead of SIGKILL signal when the container exits.

Continuing the idea of dealing with the problem in the previous course, we can also run a simple container to reproduce the problem, execute make image with the code here, and then start the container image with Docker.

$ docker run -d --name fwd_sig registry/fwd_sig:v1 /c-init-sig

You will find that when we stop the container with docker stop, if we use the strace tool to monitor, we can see the signals received by the init process and another process in the container.

In the following example, the init process in the container with process number 15909, and another process in the container with process number 15959.

In the command output, we can see that the init process (15909) received the SIGTERM signal, while the other process (15959) received the SIGKILL signal.

# ps -ef | grep c-init-sig
root     15857 14391  0 06:23 pts/0    00:00:00 docker run -it registry/fwd_sig:v1 /c-init-sig
root     15909 15879  0 06:23 pts/0    00:00:00 /c-init-sig
root     15959 15909  0 06:23 pts/0    00:00:00 /c-init-sig
root     16046 14607  0 06:23 pts/3    00:00:00 grep --color=auto c-init-sig
 
# strace -p 15909
strace: Process 15909 attached
restart_syscall(<... resuming interrupted read ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=0, si_uid=0} ---
write(1, "received SIGTERM\n", 17)      = 17
exit_group(0)                           = ?
+++ exited with 0 +++
 
# strace -p 15959
strace: Process 15959 attached
restart_syscall(<... resuming interrupted read ...>) = ?
+++ killed by SIGKILL +++

Knowledge explanation: two system calls of signal

To understand the example just now, we need to understand the two system calls behind the signal, which are the kill() system call and the signal() system call.

Here, we can understand these two system calls in combination with the signals mentioned earlier. In the first lecture on container init process, we introduced the basic concept of signal. Signal is a notification received by Linux process.

After you learn how to use these two system calls, you will know more about Linux signals. When you encounter signal related problems in the container, you will be able to better understand your ideas.

For example

I will give you another example of using functions to help you further understand how processes implement graceful shutdown.

The signal processing of the process actually includes two problems:

One is how the process sends signals
The other is how to deal with the process after receiving the signal

The system call for sending signals in Linux is kill(). In many previous examples, we used the command kill. Its internal implementation calls the kill() function.

The following is the definition of the kill() function in the Linux Programmer's Manual.

This function has two parameters:

One is the sig signal, which represents which signal needs to be sent. For example, if the value of SIG is 15, it means sending SIGTERM;
Another parameter is the pid process, that is, which process the signal needs to be sent to. For example, if the value is 1, it refers to the process with process number 1.

NAME
       kill - send signal to a process
 
SYNOPSIS
       #include <sys/types.h>
       #include <signal.h>
 
       int kill(`pid_t pid, int sig`);

After we know the system call to send the signal, let's look at another system call, that is, the signal() system calls this function, which can register the signal
handler.

The following is the definition of signal() in Linux Programmer's Manual. The parameter signum is the signal number, for example, the value 15 is the signal SIGTERM; Parameter handler is a function pointer parameter used to register the user's signal handler.

NAME
       signal - ANSI C signal handling
 
SYNOPSIS
       #include <signal.h>
       typedef void (*sighandler_t)(int);
       sighandler_t signal(int signum, sighandler_t handler);

In the first lecture on the container init process, we learned the process's processing of each signal, including three options: calling the system default behavior, capturing and ignoring. The choice here is actually how to call the system call signal() in the program.

1) Default

The first option is default. If we do not make any system calls related to signal() for a signal, such as SIGTERM signal, then when the process is running, if the signal SIGTERM is received, the process will execute the default code of SIGTERM signal in the kernel.

For SIGTERM, its default behavior is process terminate.

There are different default behaviors for different signals in the kernel. Generally, one of the three behaviors will be used: terminate, stop and ignore.

2) Capture

Capture means that we register our handler for a signal, * * call signal() in the code. In this way, when the process is running, once it receives the signal, it will not execute the default code in the kernel, but will execute the handler registered through signal ().

For example, in the following code, we register a handler for the SIGTERM signal, and only do a print operation in the handler.

When the program is running, if it receives SIGTERM signal, it will not exit, but only display "received SIGTERM" on the screen.

void sig_handler(int signo)
{
  if (signo == SIGTERM) {
          printf("received SIGTERM\n");
  }
}
 
int main(int argc, char *argv[])
 
{
...
  signal(SIGTERM, sig_handler);
...
}

3) Ignore

If we want the process to "ignore" a signal, we need to register a special handler, namely SIG, for the signal through the system call signal()_ IGN .

For example, the following code registers sig for SIGTERM signal_ IGN.

The effect of this operation is that when the program is running, if it receives the SIGTERM signal, the program will neither exit nor output log on the screen, but there is no response, just like it has not received this signal at all.

int main(int argc, char *argv[])
{
...
  signal(SIGTERM, SIG_IGN);
...
}

Well, by explaining the system call signal(), we help you review the three choices of signal processing: default behavior, capture and ignore.

Here, I also want to remind you that SIGKILL and SIGSTOP signals are two privileged signals that cannot be captured and ignored. This feature is also reflected in the call to signal().

We can run the following code. If we register the handler for SIGKILL with signal(), it will return SIG_ERR, we are not allowed to capture.

# cat reg_sigkill.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <signal.h>
 
typedef void (*sighandler_t)(int);
 
void sig_handler(int signo)
{
            if (signo == SIGKILL) {
                        printf("received SIGKILL\n");
                        exit(0);
            }
}
 
int main(int argc, char *argv[])
{
            sighandler_t h_ret;
 
            h_ret = signal(SIGKILL, sig_handler);
            if (h_ret == SIG_ERR) {
                        perror("SIG_ERR");
            }
            return 0;
}
 
# ./reg_sigkill
SIG_ERR: Invalid argument

Finally, I use the following code to make a summary.

In this code, we use signal() to ignore the SIGTERM signal, capture and restore its default behavior, and each time we use the kill() system call to send the SIGTERM signal to the process itself, which can confirm the process's selection of the SIGTERM signal.

#include <stdio.h>
#include <signal.h>
 
typedef void (*sighandler_t)(int);
 
void sig_handler(int signo)
{
        if (signo == SIGTERM) {
                printf("received SIGTERM\n\n");
                // Set SIGTERM handler to default
                signal(SIGTERM, SIG_DFL);
        }
}
 
int main(int argc, char *argv[])
{
        //Ignore SIGTERM, and send SIGTERM
        // to process itself.
 
        signal(SIGTERM, SIG_IGN);
        printf("Ignore SIGTERM\n\n");
        kill(0, SIGTERM);
 
        //Catch SIGERM, and send SIGTERM
        // to process itself.
        signal(SIGTERM, sig_handler);
        printf("Catch SIGTERM\n");
        kill(0, SIGTERM);
 
 
        //Default SIGTERM. In sig_handler, it sets
        //SIGTERM handler back to default one.
        printf("Default SIGTERM\n");
        kill(0, SIGTERM);
 
        return 0;
}

Let's summarize the two system calls just mentioned:

kill() is a system call. It is actually very simple. Enter two parameters: process number and signal to send a specific signal to the specified process.
signal() is a call that determines how a process processes a specific signal, SIG_ The DFL parameter restores the corresponding signal to the default handler. You can also use a user-defined function as the handler or sig_ The ign parameter causes the process to ignore the signal.

For SIGKILL signal, if you call the signal() function to register a custom handler for it, the system will reject it.

solve the problem

After studying the system calls related to kill() and signal(), we return to the original question. Why does the init process of the container receive the SIGTERM signal when stopping a container, while other processes in the container receive the SIGKILL signal?

When the Linux process receives the SIGTERM signal and causes the process to exit, the entry point of the Linux kernel for processing the process exit is do_exit() function, do_ The exit () function will release the related resources of the process, such as memory, file handle, semaphore, etc.

The entry point of Linux kernel for processing process exit is do_exit() function, do_ The exit () function will release the related resources of the process, such as memory, file handle, semaphore, etc.

After doing this, it calls an exit_ The notify () function is used to notify the parent-child processes related to this process.

For containers, other processes in the Pid Namespace are also considered. Zap is called here_ pid_ ns_ Processes () is a function. In this function, if it is an init process in the exit state, it will send a SIGKILL signal to other processes in the Namespace.

The whole process is shown in the figure below.

You can also see that the kernel code is like this.

 
    /*
         * The last thread in the cgroup-init thread group is terminating.
         * Find remaining pid_ts in the namespace, signal and wait for them
         * to exit.
         *
         * Note:  This signals each threads in the namespace - even those that
         *        belong to the same thread group, To avoid this, we would have
         *        to walk the entire tasklist looking a processes in this
         *        namespace, but that could be unnecessarily expensive if the
         *        pid namespace has just a few processes. Or we need to
         *        maintain a tasklist for each pid namespace.
         *
         */
 
        rcu_read_lock();
        read_lock(&tasklist_lock);
        nr = 2;
        idr_for_each_entry_continue(&pid_ns->idr, pid, nr) {
                task = pid_task(pid, PIDTYPE_PID);
                if (task && !__fatal_signal_pending(task))
                        group_send_sig_info(SIGKILL, SEND_SIG_PRIV, task, PIDTYPE_MAX);
        }

At this point, we can understand why the container init process receives the SIGTERM signal, while other processes in the container receive the SIGKILL signal.

As I mentioned earlier, SIGKILL is a privilege signal (the privilege signal is reserved by Linux for the kernel and super users to delete any process, which can not be ignored or captured).

Therefore, after receiving this signal, the process immediately exits. It has no chance to call some handler s that release resources before exiting.

SIGTERM can be captured, and users can register their own handler s. Therefore, when the program in the container stops the container, we prefer the process to receive the SIGTERM signal rather than the SIGKILL signal.

When the container is stopped, what should we do to make the process in the container receive the SIGTERM signal?

As you may have thought, let the container init process forward the SIGTERM signal. This is true. For example, the tini used in Docker Container is used as init process. In tini code, sigtimedwait() function is called to check the signal received by itself, then kill() is called to send the signal to the sub process.

Let me give you a specific example. From the following code, we can see that tini will forward all other signals to its child processes except SIGCHLD.

 
 int wait_and_forward_signal(sigset_t const* const parent_sigset_ptr, pid_t const child_pid) {
 
        siginfo_t sig;
 
        if (sigtimedwait(parent_sigset_ptr, &sig, &ts) == -1) {
                switch (errno) {
...
                }
        } else {
                /* There is a signal to handle here */
                switch (sig.si_signo) {
                        case SIGCHLD:
                                /* Special-cased, as we don't forward SIGCHLD. Instead, we'll
                                 * fallthrough to reaping processes.
                                 */
                                PRINT_DEBUG("Received SIGCHLD");
                                break;
                        default:
                                PRINT_DEBUG("Passing signal: '%s'", strsignal(sig.si_signo));
                                /* Forward anything else */
                                if (kill(kill_process_group ? -child_pid : child_pid, sig.si_signo)) {
                                        if (errno == ESRCH) {
                                                PRINT_WARNING("Child was dead when forwarding signal");
                                        } else {
                                                PRINT_FATAL("Unexpected error when forwarding signal: '%s'", strerror(errno));
 
                                                return 1;
                                        }
                                }
                                break;
                }
        }
        return 0;
}

So let's clarify here how to solve the problem that the application in the container is forcibly killed when the container is stopped?

The solution is to forward the received signal in the init process of the container and send it to other child processes in the container, so that all processes in the container will receive SIGTERM instead of SIGKILL signal when they stop.

Posted by selsdon on Wed, 29 Sep 2021 12:35:21 -0700

Programmer Group