trace series 4 - kretprobe learning notes

0. Preface

This paper is mainly based on the practice of the video course "implementation principle and application of Linux kernel trackers" in code reading field on aarch64. Understand the principle of trace by observing the creation process and replacement process of hook function. This paper also uses BLK_ update_ The request function is taken as an example to illustrate the working principle of kprobe. The kretprobe here is implemented based on trace event and uses the framework of ftrace.

1. General principle of kprobe

2. kretprobe domain model

with trace series 3 - kretprobe learning notes
kretprobe_instance: records the original return address and the kretprobe to which it belongs. It is connected to the free of kretprobe as an instance of kretprobe_ Instancesl linked list, when kretprobe_ Once instance is initialized, it will start from free_instancesl linked list removal; Reconnect to global kretprobe_inst_table linked list

3. kretprobe creation

When the following instructions are executed, kretprobe creation will be completed:

#echo 'r:blk_update blk_update_request $retval' > /sys/kernel/debug/tracing/kprobe_events

This process mainly calls create_or_delete_trace_kprobe, the most important pre_handler is pre_handler_kretprobe, set the print format at the same time, and complete trace_ Registration of kprobe. The main differences from kprobe creation are:

|- -rp->kp.pre_ Handler initialization

create_or_delete_trace_kprobe -> 
    trace_kprobe_create ->
        register_trace_kprobe -> 

Register is called_ Kretprobe, which initializes pre_handler is pre_handler_kretprobe

int register_kretprobe(struct kretprobe *rp)
        int ret = 0;
        struct kretprobe_instance *inst;
        int i;
        void *addr;

        if (!kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset))
                return -EINVAL;

        if (kretprobe_blacklist_size) {
                addr = kprobe_addr(&rp->kp);
                if (IS_ERR(addr))
                        return PTR_ERR(addr);

                for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
                        if (kretprobe_blacklist[i].addr == addr)
                                return -EINVAL;
        //Initialize pre_handler callback
        rp->kp.pre_handler = pre_handler_kretprobe;
        rp->kp.post_handler = NULL;
        rp->kp.fault_handler = NULL;

        /* Pre-allocate memory for max kretprobe instances */
        if (rp->maxactive <= 0) {
                //Here is 10
                rp->maxactive = max_t(unsigned int, 10, 2*num_possible_cpus());
                rp->maxactive = num_possible_cpus();
        //In this example, RP - > maxactive is 10, and 10 kretprobes are created in a loop_ Instance and connect it to kretprobe.free_instances linked list
        //It can be seen here that one kretprobe can have multiple kretprobes_ Instance instance
        for (i = 0; i < rp->maxactive; i++) {
                inst = kmalloc(sizeof(struct kretprobe_instance) +
                               rp->data_size, GFP_KERNEL);
                if (inst == NULL) {
                        return -ENOMEM;
                hlist_add_head(&inst->hlist, &rp->free_instances);

        rp->nmissed = 0;
        /* Establish function entry probe point */
        ret = register_kprobe(&rp->kp);
        if (ret != 0)
        return ret;

|-- kretprobe.handler initialization

create_or_delete_trace_kprobe -> 
    trace_kprobe_create ->
  • alloc_trace_kprobe: trace_kprobe allocates space, mainly initializing the handler of kreprobe to kretprobe_dispatcher

4. kretprobe brk instruction replacement

Let's first look at the BLK before replacing the instruction_ update_ Disassembly of request:

Dump of assembler code for function blk_update_request:
   0xffff8000104ec1f0 <+0>:     sub     sp, sp, #0x60
   0xffff8000104ec1f4 <+4>:     stp     x29, x30, [sp,#16]
   0xffff8000104ec1f8 <+8>:     add     x29, sp, #0x10
   0xffff8000104ec1fc <+12>:    stp     x19, x20, [sp,#32]
   0xffff8000104ec200 <+16>:    stp     x21, x22, [sp,#48]
   0xffff8000104ec204 <+20>:    stp     x23, x24, [sp,#64]
   0xffff8000104ec208 <+24>:    str     x25, [sp,#80]
   0xffff8000104ec20c <+28>:    mov     x22, x0
   0xffff8000104ec210 <+32>:    uxtb    w24, w1
   0xffff8000104ec214 <+36>:    mov     w21, w2
   0xffff8000104ec218 <+40>:    mov     x0, x30
   0xffff8000104ec21c <+44>:    nop

After executing the following command

# echo 1 >/sys/kernel/debug/tracing/events/kprobes/blk_update/enable 

Let's take a look at BLK first_ update_ Compilation of request:

(gdb) disassemble blk_update_request
Dump of assembler code for function blk_update_request:
   0xffff8000104ec1f0 <+0>:     brk     #0x4
   0xffff8000104ec1f4 <+4>:     stp     x29, x30, [sp,#16]
   0xffff8000104ec1f8 <+8>:     add     x29, sp, #0x10
   0xffff8000104ec1fc <+12>:    stp     x19, x20, [sp,#32]
   0xffff8000104ec200 <+16>:    stp     x21, x22, [sp,#48]
   0xffff8000104ec204 <+20>:    stp     x23, x24, [sp,#64]
   0xffff8000104ec208 <+24>:    str     x25, [sp,#80]
   0xffff8000104ec20c <+28>:    mov     x22, x0
   0xffff8000104ec210 <+32>:    uxtb    w24, w1
   0xffff8000104ec214 <+36>:    mov     w21, w2
   0xffff8000104ec218 <+40>:    mov     x0, x30
   0xffff8000104ec21c <+44>:    nop
   0xffff8000104ec220 <+48>:    mov     w0, w24
   0xffff8000104ec224 <+52>:    bl      0xffff8000104e92ec <blk_status_to_errno>
   0xffff8000104ec228 <+56>:    nop

We can see that after performing the above operation, BLK_ update_ Command at the entrance of request

sub     sp, sp, #0x60

Replaced with:

0xffff8000104ec1f0 <+0>:     brk     #0x4

It's strange that it is consistent with kprobe. It mainly calls the following function, enable_kprobe is consistent with enabling kprobe

static inline int enable_kretprobe(struct kretprobe *rp)
        return enable_kprobe(&rp->kp);

5. Implementation of kretprobe hook function

The execution path of the kprobe is the same as that of the kprobe. When the kprobe execution is triggered, the following execution path will be followed, except for the pre execution_ Different handlers:

#0  kprobe_handler (regs=0xffff80001253bcf0) at arch/arm64/kernel/probes/kprobes.c:352
#1  kprobe_breakpoint_handler (regs=0xffff80001253bcf0, esr=<optimized out>) at arch/arm64/kernel/probes/kprobes.c:404
#2  0xffff8000100148c4 in call_break_hook (regs=regs@entry=0xffff80001253bcf0, esr=esr@entry=4060086276) at arch/arm64/kernel/debug-monitors.c:322
#3  0xffff800010014a00 in brk_handler (unused=<optimized out>, esr=4060086276, regs=0xffff80001253bcf0) at arch/arm64/kernel/debug-monitors.c:329
#4  0xffff800010036180 in do_debug_exception (addr_if_watchpoint=addr_if_watchpoint@entry=5651652, esr=esr@entry=4060086276, regs=regs@entry=0xffff80001253bcf0) at arch/arm64/mm/fault.c:848
#5  0xffff800010cad220 in el1_dbg (regs=0xffff80001253bcf0, esr=4060086276) at arch/arm64/kernel/entry-common.c:190
#6  0xffff800010cad468 in el1_sync_handler (regs=<optimized out>) at arch/arm64/kernel/entry-common.c:227
#7  0xffff8000100119bc in el1_sync () at arch/arm64/kernel/entry.S:627

|- -pre_handler_kretprobe

For kretprobe, pre is executed_ handler_ Kretprobe callback:

int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
    |--struct kretprobe_instance *ri = NULL, *last = NULL;
    |--struct kretprobe *rp = container_of(p, struct kretprobe, kp);
    |--hash = hash_ptr(current, KPROBE_HASH_BITS);
    |--if (!hlist_empty(&rp->free_instances))
           // From kretprobe - > Free_ In the instance linked list of instances, find the free kretprobe_instance instance
           ri = hlist_entry(rp->free_instances.first,struct kretprobe_instance, hlist);
           //From kretprobe - > Free_ In the instance linked list of instances, delete this instance
           //Initialize the idle kretprobe found_ Instance instance
           ri->rp = rp;
           ri->task = current;
           arch_prepare_kretprobe(ri, regs);
           //Kretprobe to be initialized_ Connect instance to global kretprobe_inst_table hash linked list
           hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);

arch_ prepare_ After kretprobe is executed, arch will be executed_ prepare_ kretprobe

arch_prepare_kretprobe(struct kretprobe_instance *ri,struct pt_regs *regs)
    |  // Initialize kretprobe_instance is the original return address, that is, BLK_ mq_ end_ Return address of request,
    |  // When from kretprobe_ When trampoline returns, it is used to restore the original execution path
    |--ri->ret_addr = (kprobe_opcode_t *)regs->regs[30];
    |  //Initialize stack frame
    |--ri->fp = (void *)kernel_stack_pointer(regs);
    |  /* replace return addr (x30) with trampoline */
    |  //Updated the return address so that it is from BLK_ update_ Kretprobe is executed when request returns_ Trampoline function
    |--regs->regs[30] = (long)&kretprobe_trampoline;

|- -setup_singlestep

setup_singlestep(p, regs, kcb, 0)
    |--unsigned long slot;
    |--kcb->kprobe_status = KPROBE_HIT_SS;
    |--if (p->ainsn.api.insn)
           //slot stores the BLK_ update_ Entry instruction of request: sub SP, SP, #0x60
           slot = (unsigned long)p->ainsn.api.insn;
           set_ss_context(kcb, slot);
               |--kcb->ss_ctx.ss_pending = true;
               |  //slot (KCB - > ss_ctx. Match_addr) also stores the instruction: BRK #0x6
               |--kcb->ss_ctx.match_addr = addr + sizeof(kprobe_opcode_t);
           kprobes_save_local_irqflag(kcb, regs);
           instruction_pointer_set(regs, slot);
               |  //Assign regs - > PC as val, where val is slot, and its corresponding instruction is sub SP, SP, #0x60
               |--regs->pc = val

instruction_pointer_set sets the pc value returned when the breakpoint instruction is executed, which is blk_update_request the original entry instruction. When the breakpoint instruction returns abnormally, BLK will be executed_ update_ The original entry instruction of request (Note: it is located at another memory address p - > ainsn.api.insn, which is not the original memory address). Since the slot also has an endpoint instruction brk #0x6, the breakpoint instruction brk #0x6 will continue to be executed

|- -brk #0x6

0xffff800012533000      sub    sp, sp, #0x60                                                                                                                                                                                        
0xffff800012533004      brk    #0x6 

After executing the slot instruction, it will fall into the breakpoint exception again. After exiting from the breakpoint exception, it will continue along the blk_update_request is executed in the original execution path, which is no different from kprobe until it is executed to the return of the function_ handler_ kretprobe -> arch_ prepare_ Kretprobe replaces the return address, so from BLK_ update_ After the request function returns, it will not execute according to the original return address, but will execute the set return address, that is, kretprobe_trampoline

|- -kretprobe_trampoline

        //kretprobe_trampoline is equivalent to occupying BLK_ update_ Stack of request (dove occupies magpie's Nest)
        //This configuration stack space is used to save pt_regs register
        sub sp, sp, #S_FRAME_SIZE
        //Save pt_regs register
        //Save the stack top to x0, that is, struct pt_regs pointer
        mov x0, sp
        bl trampoline_probe_handler
         * Replace trampoline address in lr with actual orig_ret_addr return
         * address.
        mov lr, x0


        add sp, sp, #S_FRAME_SIZE
void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
        return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
                                        (void *)kernel_stack_pointer(regs));
static nokprobe_inline
unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
                                void *trampoline_address,
                                void *frame_pointer)
        unsigned long ret;
         * Set a dummy kprobe for avoiding kretprobe recursion.
         * Since kretprobe never runs in kprobe handler, no kprobe must
         * be running at this point.
        ret = __kretprobe_trampoline_handler(regs, trampoline_address, frame_pointer);

        return ret;
__kretprobe_trampoline_handler(regs, trampoline_address, frame_pointer)
    |--struct kretprobe_instance *ri = NULL, *last = NULL;
    |  struct hlist_head *head;
    |--kprobe_opcode_t *correct_ret_addr = NULL;
    |--kretprobe_hash_lock(current, &head, &flags);
    |      |  //kretprobe_ inst_ The initialized kretprobe is linked on the table hash_ Instance instance
    |      |--*head = &kretprobe_inst_table[hash];
    |  //Traverse kretprobe_ inst_ The initialized kretprobe is linked on the table hash_ Instance instance
    |--hlist_for_each_entry(ri, head, hlist)
    |       //Find and BLK_ mq_ end_ Kretprobe with the same request frame pointer_ Instance instance
    |       if (ri->fp != frame_pointer)
    |           skipped = true;
    |           continue;
    |       //Get the original return address for trampoline_address returns to BLK after execution_ mq_ end_ request
    |       correct_ret_addr = ri->ret_addr;
    |       if (correct_ret_addr != trampoline_address)
    |           break;
    |--last = ri
    |--hlist_for_each_entry_safe(ri, tmp, head, hlist)
    |      if (ri->task != current) 
    |          continue;
    |      if (ri->fp != frame_pointer)
    |          continue;
    |      if (ri->rp && ri->rp->handler)
    |          struct kprobe *prev = kprobe_running()
    |           ri->ret_addr = correct_ret_addr;
    |           ri->rp->handler(ri, regs);
    |      recycle_rp_inst(ri);
    \--return (unsigned long)correct_ret_addr;

According to the previous initialization, RI - > RP - > handler is kretprobe_dispatcher

static int
kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
        struct trace_kprobe *tk = container_of(ri->rp, struct trace_kprobe, rp);


        if (trace_probe_test_flag(&tk->tp, TP_FLAG_TRACE))
                kretprobe_trace_func(tk, ri, regs);
        if (trace_probe_test_flag(&tk->tp, TP_FLAG_PROFILE))
                kretprobe_perf_func(tk, ri, regs);
        return 0;       /* We don't tweek kernel, so just return 0 */
static void
kretprobe_trace_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
                     struct pt_regs *regs)
        struct event_file_link *link;

        trace_probe_for_each_link_rcu(link, &tk->tp)
                __kretprobe_trace_func(tk, ri, regs, link->file);
__kretprobe_trace_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
                       struct pt_regs *regs,
                       struct trace_event_file *trace_file)
        struct kretprobe_trace_entry_head *entry;
        struct trace_event_buffer fbuffer;
        struct trace_event_call *call = trace_probe_event_call(&tk->tp);
        int dsize;

        WARN_ON(call != trace_file->event_call);

        if (trace_trigger_soft_disabled(trace_file))

        fbuffer.pc = preempt_count();
        fbuffer.trace_file = trace_file;

        dsize = __get_data_size(&tk->tp, regs);
        fbuffer.event =
                trace_event_buffer_lock_reserve(&fbuffer.buffer, trace_file,
                                        sizeof(*entry) + tk->tp.size + dsize,
                                        fbuffer.flags, fbuffer.pc);
        if (!fbuffer.event)

        fbuffer.regs = regs;
        entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
        entry->func = (unsigned long)tk->;
        //Set the return address to the original return address
        entry->ret_ip = (unsigned long)ri->ret_addr;
        //Store parameter values
        store_trace_args(&entry[1], &tk->tp, regs, sizeof(*entry), dsize);
        //Write ring buffer

After returning from the brk instruction, it will follow blk_mq_end_request the original execution path

6. Summary

Let's briefly summarize kprobe's workflow:

  1. First register kretprobe
    This is mainly done by adding / sys / kernel / debug / tracing / kprobe_ After the events node write command is completed, the process will:
    (1) Complete the registration of kretprobe, the most important of which is to initialize pre_ The handler callback is pre_handler_kretprobe, which will be called in the brk #0x4 breakpoint handler, is mainly saved from BLK_ update_ The original return address returned by request, and the temporary return function is set to kretprobe_trampoline´╝Ť
    (2) Save the original instruction returned by the detected function, plus a brk #0x6 breakpoint instruction. They will be saved in the slot. After the replaced brk #0x4 returns in the future, the instruction code in the slot will be executed first;
    (3) At the same time, the address of the last instruction of the probe point will be recorded, and this instruction will be executed when returning from brk #0x6 in the future, so as to restore the original instruction execution path;

  2. Breakpoint instrumentation
    Mainly through echo 1 > / sys / kernel / debug / tracing / events / kprobes / BLK_ Update / enable completed. It will replace the instruction of the probe point of the probe function with brk #0x4.
    Note: brk #0x4 and brk #0x6 will handle callbacks corresponding to different breakpoints

  3. Execute kretprobe callback
    When entering the detection point of the detected function, the brk breakpoint instruction will be executed, causing a breakpoint exception. According to the 0x4 parameter, the breakpoint will be executed, the callback will be processed immediately, and finally pre will be executed_ The handler (pre_handler_kretprobe) callback is mainly used to set the return to BLK_ mq_ end_ Original return address of request; After that, the instructions in the slot slot initialized in the first step will be executed. The first instruction in the slot is the instruction originally executed by the detected function, and then brk #0x6 will be executed. Once again, it falls into a breakpoint exception. At this time, the breakpoint single-step exception handling function will be executed according to the parameter 0x6, which will restore the PC by restoring the instruction address recorded in step 1 (3), so that when brk #0x6 returns, It will continue to execute along the instruction path after the detection point of the detected function to restore the normal instruction execution path. In BLK_ update_ At the return of request, it will jump to the temporary return address kretprobe_trampoline, complete the function of kretprobe, and then modify the return address to blk_mq_end_request the original return address from kretprobe_ When trampoline returns, it returns to blk_mq_end_request continues with the original return address.


struct kretprobe {
        struct kprobe kp;
        kretprobe_handler_t handler;
        kretprobe_handler_t entry_handler;
        int maxactive;
        int nmissed;
        size_t data_size;
        struct hlist_head free_instances;
        raw_spinlock_t lock;
struct kretprobe_instance {
        union {
                struct hlist_node hlist;
                struct rcu_head rcu;
        struct kretprobe *rp;
        //Save original return address
        kprobe_opcode_t *ret_addr;
        struct task_struct *task;
        void *fp;
        char data[];

Posted by inrealtime on Mon, 01 Nov 2021 01:58:36 -0700