0. Preface
This paper is mainly based on the practice of the video course "implementation principle and application of Linux kernel trackers" in code reading field on aarch64. Understand the principle of trace by observing the creation process and replacement process of hook function. This paper also uses BLK_ update_ The request function is taken as an example to illustrate the working principle of kprobe. The kretprobe here is implemented based on trace event and uses the framework of ftrace.
1. General principle of kprobe
2. kretprobe domain model
with trace series 3 - kretprobe learning notes
kretprobe_instance: records the original return address and the kretprobe to which it belongs. It is connected to the free of kretprobe as an instance of kretprobe_ Instancesl linked list, when kretprobe_ Once instance is initialized, it will start from free_instancesl linked list removal; Reconnect to global kretprobe_inst_table linked list
3. kretprobe creation
When the following instructions are executed, kretprobe creation will be completed:
#echo 'r:blk_update blk_update_request $retval' > /sys/kernel/debug/tracing/kprobe_events
This process mainly calls create_or_delete_trace_kprobe, the most important pre_handler is pre_handler_kretprobe, set the print format at the same time, and complete trace_ Registration of kprobe. The main differences from kprobe creation are:
|- -rp->kp.pre_ Handler initialization
create_or_delete_trace_kprobe -> trace_kprobe_create -> register_trace_kprobe -> __register_trace_kprobe
Register is called_ Kretprobe, which initializes pre_handler is pre_handler_kretprobe
int register_kretprobe(struct kretprobe *rp) { int ret = 0; struct kretprobe_instance *inst; int i; void *addr; if (!kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset)) return -EINVAL; if (kretprobe_blacklist_size) { addr = kprobe_addr(&rp->kp); if (IS_ERR(addr)) return PTR_ERR(addr); for (i = 0; kretprobe_blacklist[i].name != NULL; i++) { if (kretprobe_blacklist[i].addr == addr) return -EINVAL; } } //Initialize pre_handler callback rp->kp.pre_handler = pre_handler_kretprobe; rp->kp.post_handler = NULL; rp->kp.fault_handler = NULL; /* Pre-allocate memory for max kretprobe instances */ if (rp->maxactive <= 0) { #ifdef CONFIG_PREEMPTION //Here is 10 rp->maxactive = max_t(unsigned int, 10, 2*num_possible_cpus()); #else rp->maxactive = num_possible_cpus(); #endif } raw_spin_lock_init(&rp->lock); INIT_HLIST_HEAD(&rp->free_instances); //In this example, RP - > maxactive is 10, and 10 kretprobes are created in a loop_ Instance and connect it to kretprobe.free_instances linked list //It can be seen here that one kretprobe can have multiple kretprobes_ Instance instance for (i = 0; i < rp->maxactive; i++) { inst = kmalloc(sizeof(struct kretprobe_instance) + rp->data_size, GFP_KERNEL); if (inst == NULL) { free_rp_inst(rp); return -ENOMEM; } INIT_HLIST_NODE(&inst->hlist); hlist_add_head(&inst->hlist, &rp->free_instances); } rp->nmissed = 0; /* Establish function entry probe point */ ret = register_kprobe(&rp->kp); if (ret != 0) free_rp_inst(rp); return ret; }
|-- kretprobe.handler initialization
create_or_delete_trace_kprobe -> trace_kprobe_create -> alloc_trace_kprobe
- alloc_trace_kprobe: trace_kprobe allocates space, mainly initializing the handler of kreprobe to kretprobe_dispatcher
4. kretprobe brk instruction replacement
Let's first look at the BLK before replacing the instruction_ update_ Disassembly of request:
Dump of assembler code for function blk_update_request: 0xffff8000104ec1f0 <+0>: sub sp, sp, #0x60 0xffff8000104ec1f4 <+4>: stp x29, x30, [sp,#16] 0xffff8000104ec1f8 <+8>: add x29, sp, #0x10 0xffff8000104ec1fc <+12>: stp x19, x20, [sp,#32] 0xffff8000104ec200 <+16>: stp x21, x22, [sp,#48] 0xffff8000104ec204 <+20>: stp x23, x24, [sp,#64] 0xffff8000104ec208 <+24>: str x25, [sp,#80] 0xffff8000104ec20c <+28>: mov x22, x0 0xffff8000104ec210 <+32>: uxtb w24, w1 0xffff8000104ec214 <+36>: mov w21, w2 0xffff8000104ec218 <+40>: mov x0, x30 0xffff8000104ec21c <+44>: nop ......
After executing the following command
# echo 1 >/sys/kernel/debug/tracing/events/kprobes/blk_update/enable
Let's take a look at BLK first_ update_ Compilation of request:
(gdb) disassemble blk_update_request Dump of assembler code for function blk_update_request: 0xffff8000104ec1f0 <+0>: brk #0x4 0xffff8000104ec1f4 <+4>: stp x29, x30, [sp,#16] 0xffff8000104ec1f8 <+8>: add x29, sp, #0x10 0xffff8000104ec1fc <+12>: stp x19, x20, [sp,#32] 0xffff8000104ec200 <+16>: stp x21, x22, [sp,#48] 0xffff8000104ec204 <+20>: stp x23, x24, [sp,#64] 0xffff8000104ec208 <+24>: str x25, [sp,#80] 0xffff8000104ec20c <+28>: mov x22, x0 0xffff8000104ec210 <+32>: uxtb w24, w1 0xffff8000104ec214 <+36>: mov w21, w2 0xffff8000104ec218 <+40>: mov x0, x30 0xffff8000104ec21c <+44>: nop 0xffff8000104ec220 <+48>: mov w0, w24 0xffff8000104ec224 <+52>: bl 0xffff8000104e92ec <blk_status_to_errno> 0xffff8000104ec228 <+56>: nop ......
We can see that after performing the above operation, BLK_ update_ Command at the entrance of request
sub sp, sp, #0x60
Replaced with:
0xffff8000104ec1f0 <+0>: brk #0x4
It's strange that it is consistent with kprobe. It mainly calls the following function, enable_kprobe is consistent with enabling kprobe
static inline int enable_kretprobe(struct kretprobe *rp) { return enable_kprobe(&rp->kp); }
5. Implementation of kretprobe hook function
The execution path of the kprobe is the same as that of the kprobe. When the kprobe execution is triggered, the following execution path will be followed, except for the pre execution_ Different handlers:
#0 kprobe_handler (regs=0xffff80001253bcf0) at arch/arm64/kernel/probes/kprobes.c:352 #1 kprobe_breakpoint_handler (regs=0xffff80001253bcf0, esr=<optimized out>) at arch/arm64/kernel/probes/kprobes.c:404 #2 0xffff8000100148c4 in call_break_hook (regs=regs@entry=0xffff80001253bcf0, esr=esr@entry=4060086276) at arch/arm64/kernel/debug-monitors.c:322 #3 0xffff800010014a00 in brk_handler (unused=<optimized out>, esr=4060086276, regs=0xffff80001253bcf0) at arch/arm64/kernel/debug-monitors.c:329 #4 0xffff800010036180 in do_debug_exception (addr_if_watchpoint=addr_if_watchpoint@entry=5651652, esr=esr@entry=4060086276, regs=regs@entry=0xffff80001253bcf0) at arch/arm64/mm/fault.c:848 #5 0xffff800010cad220 in el1_dbg (regs=0xffff80001253bcf0, esr=4060086276) at arch/arm64/kernel/entry-common.c:190 #6 0xffff800010cad468 in el1_sync_handler (regs=<optimized out>) at arch/arm64/kernel/entry-common.c:227 #7 0xffff8000100119bc in el1_sync () at arch/arm64/kernel/entry.S:627
|- -pre_handler_kretprobe
For kretprobe, pre is executed_ handler_ Kretprobe callback:
int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs) |--struct kretprobe_instance *ri = NULL, *last = NULL; |--struct kretprobe *rp = container_of(p, struct kretprobe, kp); |--hash = hash_ptr(current, KPROBE_HASH_BITS); |--if (!hlist_empty(&rp->free_instances)) // From kretprobe - > Free_ In the instance linked list of instances, find the free kretprobe_instance instance ri = hlist_entry(rp->free_instances.first,struct kretprobe_instance, hlist); //From kretprobe - > Free_ In the instance linked list of instances, delete this instance hlist_del(&ri->hlist); //Initialize the idle kretprobe found_ Instance instance ri->rp = rp; ri->task = current; arch_prepare_kretprobe(ri, regs); INIT_HLIST_NODE(&ri->hlist); //Kretprobe to be initialized_ Connect instance to global kretprobe_inst_table hash linked list hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
arch_ prepare_ After kretprobe is executed, arch will be executed_ prepare_ kretprobe
arch_prepare_kretprobe(struct kretprobe_instance *ri,struct pt_regs *regs) | // Initialize kretprobe_instance is the original return address, that is, BLK_ mq_ end_ Return address of request, | // When from kretprobe_ When trampoline returns, it is used to restore the original execution path |--ri->ret_addr = (kprobe_opcode_t *)regs->regs[30]; | //Initialize stack frame |--ri->fp = (void *)kernel_stack_pointer(regs); | /* replace return addr (x30) with trampoline */ | //Updated the return address so that it is from BLK_ update_ Kretprobe is executed when request returns_ Trampoline function |--regs->regs[30] = (long)&kretprobe_trampoline;
|- -setup_singlestep
setup_singlestep(p, regs, kcb, 0) |--unsigned long slot; |--kcb->kprobe_status = KPROBE_HIT_SS; |--if (p->ainsn.api.insn) //slot stores the BLK_ update_ Entry instruction of request: sub SP, SP, #0x60 slot = (unsigned long)p->ainsn.api.insn; set_ss_context(kcb, slot); |--kcb->ss_ctx.ss_pending = true; | //slot (KCB - > ss_ctx. Match_addr) also stores the instruction: BRK #0x6 |--kcb->ss_ctx.match_addr = addr + sizeof(kprobe_opcode_t); kprobes_save_local_irqflag(kcb, regs); instruction_pointer_set(regs, slot); | //Assign regs - > PC as val, where val is slot, and its corresponding instruction is sub SP, SP, #0x60 |--regs->pc = val
instruction_pointer_set sets the pc value returned when the breakpoint instruction is executed, which is blk_update_request the original entry instruction. When the breakpoint instruction returns abnormally, BLK will be executed_ update_ The original entry instruction of request (Note: it is located at another memory address p - > ainsn.api.insn, which is not the original memory address). Since the slot also has an endpoint instruction brk #0x6, the breakpoint instruction brk #0x6 will continue to be executed
|- -brk #0x6
0xffff800012533000 sub sp, sp, #0x60 0xffff800012533004 brk #0x6
After executing the slot instruction, it will fall into the breakpoint exception again. After exiting from the breakpoint exception, it will continue along the blk_update_request is executed in the original execution path, which is no different from kprobe until it is executed to the return of the function_ handler_ kretprobe -> arch_ prepare_ Kretprobe replaces the return address, so from BLK_ update_ After the request function returns, it will not execute according to the original return address, but will execute the set return address, that is, kretprobe_trampoline
|- -kretprobe_trampoline
SYM_CODE_START(kretprobe_trampoline) //kretprobe_trampoline is equivalent to occupying BLK_ update_ Stack of request (dove occupies magpie's Nest) //This configuration stack space is used to save pt_regs register sub sp, sp, #S_FRAME_SIZE //Save pt_regs register save_all_base_regs //Save the stack top to x0, that is, struct pt_regs pointer mov x0, sp bl trampoline_probe_handler /* * Replace trampoline address in lr with actual orig_ret_addr return * address. */ mov lr, x0 restore_all_base_regs add sp, sp, #S_FRAME_SIZE ret SYM_CODE_END(krtprobe_trampoline)
void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs) { return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, (void *)kernel_stack_pointer(regs)); }
static nokprobe_inline unsigned long kretprobe_trampoline_handler(struct pt_regs *regs, void *trampoline_address, void *frame_pointer) { unsigned long ret; /* * Set a dummy kprobe for avoiding kretprobe recursion. * Since kretprobe never runs in kprobe handler, no kprobe must * be running at this point. */ kprobe_busy_begin(); ret = __kretprobe_trampoline_handler(regs, trampoline_address, frame_pointer); kprobe_busy_end(); return ret; }
__kretprobe_trampoline_handler(regs, trampoline_address, frame_pointer) |--struct kretprobe_instance *ri = NULL, *last = NULL; | struct hlist_head *head; |--kprobe_opcode_t *correct_ret_addr = NULL; |--kretprobe_hash_lock(current, &head, &flags); | | //kretprobe_ inst_ The initialized kretprobe is linked on the table hash_ Instance instance | |--*head = &kretprobe_inst_table[hash]; | //Traverse kretprobe_ inst_ The initialized kretprobe is linked on the table hash_ Instance instance |--hlist_for_each_entry(ri, head, hlist) | //Find and BLK_ mq_ end_ Kretprobe with the same request frame pointer_ Instance instance | if (ri->fp != frame_pointer) | skipped = true; | continue; | //Get the original return address for trampoline_address returns to BLK after execution_ mq_ end_ request | correct_ret_addr = ri->ret_addr; | if (correct_ret_addr != trampoline_address) | break; |--last = ri |--hlist_for_each_entry_safe(ri, tmp, head, hlist) | if (ri->task != current) | continue; | if (ri->fp != frame_pointer) | continue; | if (ri->rp && ri->rp->handler) | struct kprobe *prev = kprobe_running() | ri->ret_addr = correct_ret_addr; | ri->rp->handler(ri, regs); | recycle_rp_inst(ri); \--return (unsigned long)correct_ret_addr;
According to the previous initialization, RI - > RP - > handler is kretprobe_dispatcher
static int kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs) { struct trace_kprobe *tk = container_of(ri->rp, struct trace_kprobe, rp); raw_cpu_inc(*tk->nhit); if (trace_probe_test_flag(&tk->tp, TP_FLAG_TRACE)) kretprobe_trace_func(tk, ri, regs); #ifdef CONFIG_PERF_EVENTS if (trace_probe_test_flag(&tk->tp, TP_FLAG_PROFILE)) kretprobe_perf_func(tk, ri, regs); #endif return 0; /* We don't tweek kernel, so just return 0 */ }
static void kretprobe_trace_func(struct trace_kprobe *tk, struct kretprobe_instance *ri, struct pt_regs *regs) { struct event_file_link *link; trace_probe_for_each_link_rcu(link, &tk->tp) __kretprobe_trace_func(tk, ri, regs, link->file); }
__kretprobe_trace_func(struct trace_kprobe *tk, struct kretprobe_instance *ri, struct pt_regs *regs, struct trace_event_file *trace_file) { struct kretprobe_trace_entry_head *entry; struct trace_event_buffer fbuffer; struct trace_event_call *call = trace_probe_event_call(&tk->tp); int dsize; WARN_ON(call != trace_file->event_call); if (trace_trigger_soft_disabled(trace_file)) return; local_save_flags(fbuffer.flags); fbuffer.pc = preempt_count(); fbuffer.trace_file = trace_file; dsize = __get_data_size(&tk->tp, regs); fbuffer.event = trace_event_buffer_lock_reserve(&fbuffer.buffer, trace_file, call->event.type, sizeof(*entry) + tk->tp.size + dsize, fbuffer.flags, fbuffer.pc); if (!fbuffer.event) return; fbuffer.regs = regs; entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event); entry->func = (unsigned long)tk->rp.kp.addr; //Set the return address to the original return address entry->ret_ip = (unsigned long)ri->ret_addr; //Store parameter values store_trace_args(&entry[1], &tk->tp, regs, sizeof(*entry), dsize); //Write ring buffer trace_event_buffer_commit(&fbuffer); }
After returning from the brk instruction, it will follow blk_mq_end_request the original execution path
6. Summary
Let's briefly summarize kprobe's workflow:
-
First register kretprobe
This is mainly done by adding / sys / kernel / debug / tracing / kprobe_ After the events node write command is completed, the process will:
(1) Complete the registration of kretprobe, the most important of which is to initialize pre_ The handler callback is pre_handler_kretprobe, which will be called in the brk #0x4 breakpoint handler, is mainly saved from BLK_ update_ The original return address returned by request, and the temporary return function is set to kretprobe_trampoline;
(2) Save the original instruction returned by the detected function, plus a brk #0x6 breakpoint instruction. They will be saved in the slot. After the replaced brk #0x4 returns in the future, the instruction code in the slot will be executed first;
(3) At the same time, the address of the last instruction of the probe point will be recorded, and this instruction will be executed when returning from brk #0x6 in the future, so as to restore the original instruction execution path; -
Breakpoint instrumentation
Mainly through echo 1 > / sys / kernel / debug / tracing / events / kprobes / BLK_ Update / enable completed. It will replace the instruction of the probe point of the probe function with brk #0x4.
Note: brk #0x4 and brk #0x6 will handle callbacks corresponding to different breakpoints -
Execute kretprobe callback
When entering the detection point of the detected function, the brk breakpoint instruction will be executed, causing a breakpoint exception. According to the 0x4 parameter, the breakpoint will be executed, the callback will be processed immediately, and finally pre will be executed_ The handler (pre_handler_kretprobe) callback is mainly used to set the return to BLK_ mq_ end_ Original return address of request; After that, the instructions in the slot slot initialized in the first step will be executed. The first instruction in the slot is the instruction originally executed by the detected function, and then brk #0x6 will be executed. Once again, it falls into a breakpoint exception. At this time, the breakpoint single-step exception handling function will be executed according to the parameter 0x6, which will restore the PC by restoring the instruction address recorded in step 1 (3), so that when brk #0x6 returns, It will continue to execute along the instruction path after the detection point of the detected function to restore the normal instruction execution path. In BLK_ update_ At the return of request, it will jump to the temporary return address kretprobe_trampoline, complete the function of kretprobe, and then modify the return address to blk_mq_end_request the original return address from kretprobe_ When trampoline returns, it returns to blk_mq_end_request continues with the original return address.
appendix
struct kretprobe { struct kprobe kp; kretprobe_handler_t handler; kretprobe_handler_t entry_handler; int maxactive; int nmissed; size_t data_size; struct hlist_head free_instances; raw_spinlock_t lock; }
struct kretprobe_instance { union { struct hlist_node hlist; struct rcu_head rcu; }; struct kretprobe *rp; //Save original return address kprobe_opcode_t *ret_addr; struct task_struct *task; void *fp; char data[]; };