Detailed Operating Principle and Application of ptrace

Keywords: Linux Programming

Have you thought about how to intercept system calls?Have you tried to fool your system kernel s by changing the parameters of a system call?Have you thought about how the debugger pauses a running process and controls it?

You might start thinking about how to use complex kernel programming to do this, and you're wrong.Actually Linux An elegant mechanism is provided to accomplish this: the ptrace system function. ptrace provides a way for parent processes to monitor and control other processes. It also alters registers and kernel images in child processes, enabling breakpoint debugging and system call tracking.

With ptrace, you can intercept and modify system calls at the user level
In this article, we will learn how to intercept a system call and then modify its parameters.In the second part of this article, we'll learn more advanced techniques: setting breakpoints, inserting code into a running program; we'll sneak into machines, peek into registers and snippets of improved processes.

Basic knowledge


operating system Provides a standard service for programmers to control underlying hardware and services, such as file systems, called system calls.When a program needs to make a system call, it puts relevant parameters into the register associated with the system call, and then calls soft interrupt 0x80, which is like a window that allows the program to touch the kernel mode. The program passes parameters and system call numbers to the kernel, which completes the execution of the system call.
In the i386 system (all the code in this article is for the i386 system), the system call number will be put into%eax, and its parameters will be put into%ebx,%ecx,%edx,%esi and%edi in turn.For example, in the following call.
       Write(2, "Hello", 5)
The assembled form of the
    movl   $4, %eax
    movl   $2, %ebx
    movl   $hello, %ecx
    movl   $5, %edx
    int    $0x80
Here, $hello points to the standard string "Hello".

So when will ptrace appear?Before making a system call, the kernel checks to see if the current process is in a traced state.If so, the kernel pauses the current process and gives control to the tracking process, allowing it to view or modify the registers of the tracked process.

Let's take a look at an example of how this tracker works:

  1. #include <sys/ptrace.h>  
  2. #include <sys/types.h>  
  3. #include <sys/wait.h>  
  4. #include <unistd.h>  
  5. #include <linux/user.h>   /* For constants   
  6.                                    ORIG_EAX etc */  
  7. int main()  
  8. {  
  9.    pid_t child;  
  10.     long orig_eax;  
  11.     child = fork();  
  12.     if(child == 0) {  
  13.         ptrace(PTRACE_TRACEME, 0, NULL, NULL);  
  14.         execl("/bin/ls", "ls", NULL);  
  15.     }  
  16.     else {  
  17.         wait(NULL);  
  18.         orig_eax = ptrace(PTRACE_PEEKUSER,   
  19.                           child, 4 * ORIG_EAX,   
  20.                           NULL);  
  21.         printf("The child made a "  
  22.                "system call %ld ", orig_eax);  
  23.         ptrace(PTRACE_CONT, child, NULL, NULL);  
  24.     }  
  25.     return 0;  
  26. }  

Running this program will output the results of the ls command as well as the following:
The child made a system call 11
Description: 11 is the system call number of execve, which is the first system call called by the program.
To find out more about the system call number, look at / usr/include/asm/unistd.h.

In the example above, the parent process fork outputs a child process and tracks it.Before calling the exec function, the subprocess calls the ptrace function with PTRACE_TRACEME as the first parameter, which tells the kernel: Let someone else follow me!Then, after the child process calls execve(), it returns control to the parent process.The parent process used the wait() function to wait for notifications from the kernel, and now it is notified, so it can begin to see what the child processes are doing, such as looking at the values of registers.

Parameters of the ptrace function

Ptrace has four parameters

long ptrace(enum __ptrace_request request,
            pid_t pid,
            void *addr,
            void *data);

The first parameter determines the behavior of ptrace and how other parameters are used, with the following values:

After a system call occurs, the kernel saves the value in eax (in this case, the system call number), which we can read as the first parameter of ptrace using PTRACE_PEEKUSER.
After reviewing the information about the system call, we can use PTRACE_CONT as the first parameter of ptrace and call ptrace to continue the process of the system call.

The first parameter determines the behavior of ptrace and how other parameters are used, with the following values:
PTRACE_ME
PTRACE_PEEKTEXT
PTRACE_PEEKDATA
PTRACE_PEEKUSER
PTRACE_POKETEXT
PTRACE_POKEDATA
PTRACE_POKEUSER
PTRACE_GETREGS
PTRACE_GETFPREGS,
PTRACE_SETREGS
PTRACE_SETFPREGS
PTRACE_CONT
PTRACE_SYSCALL,
PTRACE_SINGLESTEP
PTRACE_DETACH

The usage of these constants is described below.

Read parameters for system calls
By calling PTRACE_PEEKUSER as the first parameter of ptrace, the register values associated with the child process can be obtained.

Let's start with this example

 

  1. #include <sys/ptrace.h>  
  2. #include <sys/types.h>  
  3. #include <sys/wait.h>  
  4. #include <unistd.h>  
  5. #include <linux/user.h>  
  6. #include <sys/syscall.h>   /* For SYS_write etc */  
  7.   
  8. int main()  
  9. {     
  10.     pid_t child;  
  11.     long orig_eax, eax;  
  12.     long params[3];  
  13.     int status;  
  14.     int insyscall = 0;  
  15.     child = fork();  
  16.     if(child == 0) {  
  17.         ptrace(PTRACE_TRACEME, 0, NULL, NULL);  
  18.         execl("/bin/ls""ls", NULL);  
  19.     }  
  20.     else {  
  21.        while(1) {  
  22.           wait(&status);  
  23.           if(WIFEXITED(status))  
  24.               break;  
  25.           orig_eax = ptrace(PTRACE_PEEKUSER,   
  26.                      child, 4 * ORIG_EAX, NULL);  
  27.           if(orig_eax == SYS_write) {  
  28.              if(insyscall == 0) {      
  29.                 /* Syscall entry */  
  30.                 insyscall = 1;  
  31.                 params[0] = ptrace(PTRACE_PEEKUSER,  
  32.                                    child, 4 * EBX,   
  33.                                    NULL);  
  34.                 params[1] = ptrace(PTRACE_PEEKUSER,  
  35.                                    child, 4 * ECX,   
  36.                                    NULL);  
  37.                 params[2] = ptrace(PTRACE_PEEKUSER,  
  38.                                    child, 4 * EDX,   
  39.                                    NULL);  
  40.                 printf("Write called with "  
  41.                        "%ld, %ld, %ld ",  
  42.                        params[0], params[1],  
  43.                        params[2]);  
  44.                 }  
  45.           else { /* Syscall exit */  
  46.                 eax = ptrace(PTRACE_PEEKUSER,   
  47.                              child, 4 * EAX, NULL);  
  48.                     printf("Write returned "  
  49.                            "with %ld ", eax);  
  50.                     insyscall = 0;  
  51.                 }  
  52.             }  
  53.             ptrace(PTRACE_SYSCALL,   
  54.                    child, NULL, NULL);  
  55.         }  
  56.     }  
  57.     return 0;  
  58. }  
The output of this program is like this

ppadala@linux:~/ptrace > ls
a.out        dummy.s      ptrace.txt   
libgpm.html  registers.c  syscallparams.c
dummy        ptrace.html  simple.c
ppadala@linux:~/ptrace > ./a.out
Write called with 1, 1075154944, 48
a.out        dummy.s      ptrace.txt
Write returned with 48
Write called with 1, 1075154944, 59
libgpm.html  registers.c  syscallparams.c
Write returned with 59
Write called with 1, 1075154944, 30
dummy        ptrace.html  simple.c
Write returned with 30
 
In the example above, we tracked write system calls, and the execution of the ls command results in three write system calls.Use PTRACE_SYSCALL as the first parameter of ptrace so that the kernel pauses a subprocess when it makes a system call or is ready to exit.This behavior is equivalent to using PTRACE_CONT and then pausing it on the next system call/process exit.

In the previous example, we used PTRACE_PEEKUSER to look at the parameters of a write system call.The return value of the system call is placed in%eax.

The wait function uses the status variable to check if the child process has exited.It is used to determine whether a child process has been paused by ptrace or has run and ended and exited.There is a set of macros that can tell the status of a process by its status value, such as WIFEXITED, and you can look at wait(2) man for details.



Read Register Value

If you want to read its registers when a system call or a process terminates, the previous example is okay, but it's awkward.Using PRACE_GETREGS as the first parameter of ptrace, you can get all the relevant register values in one function call.

Examples of register values are as follows:

  1. #include <sys/ptrace.h>  
  2. #include <sys/types.h>  
  3. #include <sys/wait.h>  
  4. #include <unistd.h>  
  5. #include <linux/user.h>  
  6. #include <sys/syscall.h>  
  7.   
  8. int main()  
  9. {     
  10.     pid_t child;  
  11.     long orig_eax, eax;  
  12.     long params[3];  
  13.     int status;  
  14.     int insyscall = 0;  
  15.     struct user_regs_struct regs;  
  16.     child = fork();  
  17.     if(child == 0) {  
  18.         ptrace(PTRACE_TRACEME, 0, NULL, NULL);  
  19.         execl("/bin/ls""ls", NULL);  
  20.     }  
  21.     else {  
  22.        while(1) {  
  23.           wait(&status);  
  24.           if(WIFEXITED(status))  
  25.               break;  
  26.           orig_eax = ptrace(PTRACE_PEEKUSER,   
  27.                             child, 4 * ORIG_EAX,   
  28.                             NULL);  
  29.           if(orig_eax == SYS_write) {  
  30.               if(insyscall == 0) {  
  31.                  /* Syscall entry */  
  32.                  insyscall = 1;  
  33.                  ptrace(PTRACE_GETREGS, child,   
  34.                         NULL, ®s);  
  35.                  printf("Write called with "  
  36.                         "%ld, %ld, %ld ",  
  37.                         regs.ebx, regs.ecx,   
  38.                         regs.edx);  
  39.              }  
  40.              else { /* Syscall exit */  
  41.                  eax = ptrace(PTRACE_PEEKUSER,   
  42.                               child, 4 * EAX,   
  43.                               NULL);  
  44.                  printf("Write returned "  
  45.                         "with %ld ", eax);  
  46.                  insyscall = 0;  
  47.              }  
  48.           }  
  49.           ptrace(PTRACE_SYSCALL, child,  
  50.                  NULL, NULL);  
  51.        }  
  52.    }  
  53.    return 0;  
  54. }  
This code is similar to the previous example except that it uses PTRACE_GETREGS.The user_regs_struct structure is defined in <linux/user.h>.

Single Step
ptrace provides the ability to step through child processes.ptrace(PTRACE_SINGLESTEP,...) causes the kernel to block each instruction of the child process before it executes, and then give control to the parent process.The following example can find out what instructions the child process is currently executing.For ease of understanding, I wrote this controlled program in assembly, rather than letting you have a headache about what system calls c's library functions actually make.

The following is the code dummy1.s for the controlled program, compiled using gcc - o dummy1 dummy1.s
.data
hello:
    .string "hello world/n"
.globl  main
main:
    movl    $4, %eax
    movl    $2, %ebx
    movl    $hello, %ecx
    movl    $12, %edx
    int     $0x80
    movl    $1, %eax
    xorl    %ebx, %ebx
    int     $0x80
    ret
 
The following procedures are used to complete a single step:

  1. #include <sys/ptrace.h>  
  2. #include <sys/types.h>  
  3. #include <sys/wait.h>  
  4. #include <unistd.h>  
  5. #include <linux/user.h>   
  6. #include <sys/syscall.h>  
  7. int main()  
  8. {  
  9.     pid_t child;  
  10.     const int long_size = sizeof(long);  
  11.     child = fork();  
  12.     if(child == 0) {  
  13.         ptrace(PTRACE_TRACEME, 0, NULL, NULL);  
  14.         execl("./dummy1""dummy1", NULL);  
  15.     }  
  16.     else {  
  17.         int status;  
  18.         union u {  
  19.             long val;  
  20.             char chars[long_size];  
  21.         }data;  
  22.         struct user_regs_struct regs;  
  23.         int start = 0;  
  24.         long ins;  
  25.         while(1) {  
  26.             wait(&status);  
  27.             if(WIFEXITED(status))  
  28.                 break;  
  29.             ptrace(PTRACE_GETREGS,   
  30.                    child, NULL, ®s);  
  31.             if(start == 1) {  
  32.                 ins = ptrace(PTRACE_PEEKTEXT,   
  33.                              child, regs.eip,   
  34.                              NULL);  
  35.                 printf("EIP: %lx Instruction "  
  36.                        "executed: %lx ",   
  37.                        regs.eip, ins);  
  38.             }  
  39.             if(regs.orig_eax == SYS_write) {  
  40.                 start = 1;  
  41.                 ptrace(PTRACE_SINGLESTEP, child,   
  42.                        NULL, NULL);  
  43.             }  
  44.             else  
  45.                 ptrace(PTRACE_SYSCALL, child,   
  46.                        NULL, NULL);  
  47.         }  
  48.     }  
  49.     return 0;  
  50. }  

The output of the program is as follows:

You may need to check Intel's user manual to see what these instructions mean.

More complex single steps, such as setting breakpoints, require careful design and more complex code.


 
In the first section, we have seen how ptrace gets system calls from child processes and changes the parameters of system calls.In this article, we'll explore how to set breakpoints in subprocesses and insert code into running programs.This is how the debugger actually sets breakpoints and executes debug handles.As before, all the code here is for the i386 platform.




Attach to process


In the first part of the clock, we used ptrace(PTRACE_TRACEME,...) to track a subprocess, which is good if you just want to see how the process makes system calls and traces programs.But if you want to debug a running process, you need to use ptrace (PTRACE_ATTACH,....)


When ptrace (PTRACE_ATTACH,...) When the pid of the child process is passed in when called, it is roughly the same as ptrace (PTRACE_TRACEME,...) behaves the same, it sends a SIGSTOP signal to the child process, so we can watch and modify the child process, then use ptrace (PTRACE_DETACH,...) to keep the subprocess running.

Here is a simple example of a debugger.

  1. int main()  
  2. {     
  3.    int i;  
  4.     for(i = 0;i < 10; ++i) {  
  5.         printf("My counter: %d ", i);  
  6.         sleep(2);  
  7.     }  
  8.     return 0;  
  9. }  


Save the above code as dummy2.c.Compile and run as follows:

gcc -o dummy2 dummy2.c
./dummy2 &
 
Now we can attach to dummy2 using the code below.

  1. #include <sys/ptrace.h>  
  2. #include <sys/types.h>  
  3. #include <sys/wait.h>  
  4. #include <unistd.h>  
  5. #include <linux/user.h>   /* For user_regs_struct   
  6.                              etc. */  
  7. int main(int argc, char *argv[])  
  8. {     
  9.     pid_t traced_process;  
  10.     struct user_regs_struct regs;  
  11.     long ins;  
  12.     if(argc != 2) {  
  13.         printf("Usage: %s <pid to be traced> ",  
  14.                argv[0], argv[1]);  
  15.         exit(1);  
  16.     }  
  17.     traced_process = atoi(argv[1]);  
  18.     ptrace(PTRACE_ATTACH, traced_process,   
  19.            NULL, NULL);  
  20.     wait(NULL);  
  21.     ptrace(PTRACE_GETREGS, traced_process,   
  22.            NULL, ®s);  
  23.     ins = ptrace(PTRACE_PEEKTEXT, traced_process,   
  24.                  regs.eip, NULL);  
  25.     printf("EIP: %lx Instruction executed: %lx ",   
  26.            regs.eip, ins);  
  27.     ptrace(PTRACE_DETACH, traced_process,   
  28.            NULL, NULL);  
  29.     return 0;  
  30. }  

The above program simply attaches to the child process, waits for it to end, measures its EIP (instruction pointer), and releases the child process.

Set Breakpoint

How does the debugger set breakpoints?Typically, the current instruction to be executed is replaced by a trap instruction, and the debugged program stops here, so the debugger can see the information of the debugged program.After the debugger has resumed running, the debugger will put back the original instructions.Here is an example:

  1. #include <sys/ptrace.h>  
  2. #include <sys/types.h>  
  3. #include <sys/wait.h>  
  4. #include <unistd.h>  
  5. #include <linux/user.h>  
  6.   
  7. const int long_size = sizeof(long);  
  8.   
  9. void getdata(pid_t child, long addr,   
  10.              char *str, int len)  
  11. {     
  12.     char *laddr;  
  13.     int i, j;  
  14.     union u {  
  15.             long val;  
  16.             char chars[long_size];  
  17.     }data;  
  18.   
  19.     i = 0;  
  20.     j = len / long_size;  
  21.     laddr = str;  
  22.   
  23.     while(i < j) {  
  24.         data.val = ptrace(PTRACE_PEEKDATA, child,   
  25.                           addr + i * 4, NULL);  
  26.         memcpy(laddr, data.chars, long_size);  
  27.         ++i;  
  28.         laddr += long_size;  
  29.     }  
  30.     j = len % long_size;  
  31.     if(j != 0) {  
  32.         data.val = ptrace(PTRACE_PEEKDATA, child,   
  33.                           addr + i * 4, NULL);  
  34.         memcpy(laddr, data.chars, j);  
  35.     }  
  36.     str[len] = '';  
  37. }  
  38.   
  39. void putdata(pid_t child, long addr,   
  40.              char *str, int len)  
  41. {     
  42.     char *laddr;  
  43.     int i, j;  
  44.     union u {  
  45.             long val;  
  46.             char chars[long_size];  
  47.     }data;  
  48.   
  49.     i = 0;  
  50.     j = len / long_size;  
  51.     laddr = str;  
  52.     while(i < j) {  
  53.         memcpy(data.chars, laddr, long_size);  
  54.         ptrace(PTRACE_POKEDATA, child,   
  55.                addr + i * 4, data.val);  
  56.         ++i;  
  57.         laddr += long_size;  
  58.     }  
  59.     j = len % long_size;  
  60.     if(j != 0) {  
  61.         memcpy(data.chars, laddr, j);  
  62.         ptrace(PTRACE_POKEDATA, child,   
  63.                addr + i * 4, data.val);  
  64.     }  
  65. }  
  66.   
  67. int main(int argc, char *argv[])  
  68. {     
  69.     pid_t traced_process;  
  70.     struct user_regs_struct regs, newregs;  
  71.     long ins;  
  72.     /* int 0x80, int3 */  
  73.     char code[] = {0xcd,0x80,0xcc,0};  
  74.     char backup[4];  
  75.     if(argc != 2) {  
  76.         printf("Usage: %s <pid to be traced> ",   
  77.                argv[0], argv[1]);  
  78.         exit(1);  
  79.     }  
  80.     traced_process = atoi(argv[1]);  
  81.     ptrace(PTRACE_ATTACH, traced_process,   
  82.            NULL, NULL);  
  83.     wait(NULL);  
  84.     ptrace(PTRACE_GETREGS, traced_process,   
  85.            NULL, ®s);  
  86.     /* Copy instructions into a backup variable */  
  87.     getdata(traced_process, regs.eip, backup, 3);  
  88.     /* Put the breakpoint */  
  89.     putdata(traced_process, regs.eip, code, 3);  
  90.     /* Let the process continue and execute  
  91.        the int 3 instruction */  
  92.     ptrace(PTRACE_CONT, traced_process, NULL, NULL);  
  93.     wait(NULL);  
  94.     printf("The process stopped, putting back "  
  95.            "the original instructions ");  
  96.     printf("Press <enter> to continue ");  
  97.     getchar();  
  98.     putdata(traced_process, regs.eip, backup, 3);  
  99.     /* Setting the eip back to the original  
  100.        instruction to let the process continue */  
  101.     ptrace(PTRACE_SETREGS, traced_process,   
  102.            NULL, ®s);  
  103.     ptrace(PTRACE_DETACH, traced_process,   
  104.            NULL, NULL);  
  105.     return 0;  
  106.   
  107. }  

The program above will replace the contents of three byte s to execute the trap command. After the debugging process has stopped, we will replace the original command and change the eip to its original value.The following diagram demonstrates the execution of the instructions.

 

 

 

1. After process stalls

2. Replace the trap directive

 

 

3. Breakpoint succeeded and control was given to the debugger

4. Continue running, replace the original command and restore the eip

 

ptrace behind the scenes

So what happens in the kernel when ptrace is used?Here is a brief description:

When a process calls ptrace (PTRACE_TRACEME,...) The kernel then sets a flag for the process indicating that it will be tracked.The relevant source code in the kernel is as follows:

Source: arch/i386/kernel/ptrace.c
if (request == PTRACE_TRACEME) {
    /* are we already being traced? */
    if (current->ptrace & PT_PTRACED)
        goto out;
    /* set the ptrace bit in the process flags. */
    current->ptrace |= PT_PTRACED;
    ret = 0;
    goto out;
}
After a system call is completed, the kernel looks at the tag and executes the trace system call if the process is being tracked.Details of its assembly can be found in arh/i386/kernel/entry.S.

Now let's look at this sys_trace() function (at arch/i386/kernel/ptrace.c).It stops the child process, then sends a signal to the parent process that the child process is stalled, which activates the parent process while it is waiting for it to process.The parent process calls ptrace (PTRACE_CONT,...) or ptrace (PTRACE_SYSCALL,...), This will wake up the child process, and what the kernel is doing at this time is calling a process dispatcher called wake_up_process().Other system architectures may do this by sending SIGCHLD to child processes.

Summary:
The ptrace function may seem strange because it can detect and modify a running program.This technique is primarily used in debuggers and system call trackers.It allows programmers to do more interesting things at the user level.There have been many attempts to extend operating systems at the user level, such as UFO, a user-level file system extension that uses ptrace to implement some security mechanisms.


 
Author:
Pradeep Padala,
p_padala@yahoo.com
http://www.cise.ufl.edu/~ppadala

Posted by juancarlosc on Sun, 02 Jun 2019 09:58:40 -0700