In the actual work, the CPU utilization rate and the average load of the system are very high occasionally, but the application of high CPU cannot be found.
The cause of this problem: the process may be constantly crashing and restarting
Through uptime, it is found that the system load is very high, but through top, mpstat, pidstat, perf and other tools, it is difficult to find what processes lead to the high system load and CPU utilization.
Note: according to the judgment of the above tools, neither CPU intensive nor IO waiting nor process and thread contention exists.
execsnoop - a tool designed specifically for tracking short-term (transient) processes;
It monitors the exec() behavior of the process in real time through ftrace, and outputs the basic information of the short-term process, including process PID, parent process PID, command-line parameters and execution results.
github address: https://github.com/brendangregg/perf-tools/blob/master/execsnoop
How to install and use: copy the contents of the above github, write it to the execsnoop file, and add the x permission.
usage method:
#./execsnoop 59187 59186 /usr/local/bin/stress -t 1 -d 1 59188 28775 <...>-59188 [000] d... 40067.137167: execsnoop_sys_execve: (SyS_execve+0x0/0x30) 59191 59188 /usr/local/bin/stress -t 1 -d 1 59190 28778 <...>-59190 [003] d... 40067.138913: execsnoop_sys_execve: (SyS_execve+0x0/0x30) 59192 28776 <...>-59192 [003] d... 40067.139103: execsnoop_sys_execve: (SyS_execve+0x0/0x30) 59194 59192 /usr/local/bin/stress -t 1 -d 1 59196 59190 /usr/local/bin/stress -t 1 -d 1 59198 28770 <...>-59198 [001] d... 40067.145500: execsnoop_sys_execve: (SyS_execve+0x0/0x30) 59199 28779 <...>-59199 [001] d... 40067.146228: execsnoop_sys_execve: (SyS_execve+0x0/0x30) 59200 59198 /usr/local/bin/stress -t 1 -d 1 59202 59199 /usr/local/bin/stress -t 1 -d 1 59204 28778 <...>-59204 [002] d... 40067.155150: execsnoop_sys_execve: (SyS_execve+0x0/0x30) 59206 28775 <...>-59206 [001] d... 40067.157282: execsnoop_sys_execve: (SyS_execve+0x0/0x30) 59208 59206 /usr/local/bin/stress -t 1 -d 1 59209 28770 <...>-59209 [003] d... 40067.158381: execsnoop_sys_execve: (SyS_execve+0x0/0x30) 59205 59204 /usr/local/bin/stress -t 1 -d 1 59207 28776 <...>-59207 [002] d... 40067.158882: execsnoop_sys_execve: (SyS_execve+0x0/0x30)
It can be seen that there are a large number of stress processes, which are constantly enabled, resulting in the increase of system load and CPU utilization.