Foreword: Recently, I have been exploring the debugging and diagnosis direction of Node.js, because the capabilities provided by Node.js may not solve problems sometimes. For example, the heap memory has not changed, but rss has been rising. Therefore, we need to go deeper to understand more troubleshooting methods. These directions often involve the underlying things, so it is natural to understand some of the technologies and capabilities provided by the kernel. After years of development, a hundred flowers bloom and are very complex. This article briefly shares the implementation of kernel static tracing technology. Tracking is actually collecting some information about code execution to help troubleshoot problems.
1 Tracepoint
Tracepoints is a static pile insertion technology. Although the implementation is complex, it is conceptually simple. For example, when we log, this is similar. We write a lot of logs in the business code to record the information of the process during operation. Tracepoints is a hook based pile insertion technology provided by the kernel. However, unlike logging, we can add corresponding code wherever we want, and tracepoints almost depends on the kernel to decide where to insert stakes. That is almost because we can also write kernel modules to register with the kernel to notify the insertion points. Let's take a look at the use and implementation of Tracepoint through an example (the example is from the kernel document tracepoints.rst). Before analyzing, take a look at two very important macros. The first is DECLARE_TRACE.
#define DECLARE_TRACE(name, proto, args) \ __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \ cpu_online(raw_smp_processor_id()), \ PARAMS(void *__data, proto), \ PARAMS(__data, args))
We only need to focus on the implementation of the subject, not the parameters, and continue to expand.
#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \ extern struct tracepoint __tracepoint_##name; \ // Execute hook function static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ } \ // Register hook function static inline int \ register_trace_##name(void (*probe)(data_proto), void *data) \ { \ return tracepoint_probe_register(&__tracepoint_##name, \ (void *)probe, data); \ } \ // Unregister hook function static inline int \ unregister_trace_##name(void (*probe)(data_proto), void *data) \ { \ return tracepoint_probe_unregister(&__tracepoint_##name,\ (void *)probe, data); \ } \ static inline bool \ trace_##name##_enabled(void) \ { \ return static_key_false(&__tracepoint_##name.key); \ }
__ DECLARE_TRACE mainly implements several functions. We only need to focus on registering hooks and executing hook functions (the formats are register_trace_{yourname} and trace_{yourame}). Next, let's look at the second macro, DEFINE_TRACE.
#define DEFINE_TRACE_FN(name, reg, unreg) \ struct tracepoint __tracepoint_##name#define DEFINE_TRACE(name) \ DEFINE_TRACE_FN(name, NULL, NULL);
I omitted some code, DEFINE_TRACE mainly defines a Tracepoint structure. After understanding the two macros, let's take a look at how to use Tracepoint.
1.1 use
include/trace/events/subsys.h
#include <linux/tracepoint.h>DECLARE_TRACE(subsys_eventname, TP_PROTO(int firstarg, struct task_struct *p), TP_ARGS(firstarg, p));
First, use declare in the header file_ The trace macro defines a series of functions. subsys/file.c
#include <trace/events/subsys.h> DEFINE_TRACE(subsys_eventname);void somefct(void){ ... trace_subsys_eventname(arg, task); ...} // Implement your own hook function and register it with the kernel void callback(...) {}register_trace_subsys_eventname(callback);
Then use define in the implementation file_ Trace defines a tracepoint structure. Then call register_ trace_ subsys_ The eventName function registers the custom hook function with the kernel, and then calls the trace function to handle the hook where information needs to be collected_ subsys_ eventname.
1.2 realization
After understanding the use, let's look at the implementation. First, let's look at the registration hook function.
int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data){ return tracepoint_probe_register_prio(tp, probe, data, TRACEPOINT_DEFAULT_PRIO);} int tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, void *data, int prio){ struct tracepoint_func tp_func; int ret; mutex_lock(&tracepoints_mutex); tp_func.func = probe; tp_func.data = data; tp_func.prio = prio; ret = tracepoint_add_func(tp, &tp_func, prio); mutex_unlock(&tracepoints_mutex); return ret;}
tracepoint_ probe_ register_ A tracepoint is defined in prio_ The func structure is used to represent hook information and then to call tracepoint_. add_ Func, where tp is the tracepoint structure just defined.
static int tracepoint_add_func(struct tracepoint *tp, struct tracepoint_func *func, int prio){ struct tracepoint_func *old, *tp_funcs; int ret; // Get the hook list tp_funcs = rcu_dereference_protected(tp->funcs, lockdep_is_held(&tracepoints_mutex)); // Insert a new hook into the list old = func_add(&tp_funcs, func, prio); rcu_assign_pointer(tp->funcs, tp_funcs); return 0;}static struct tracepoint_func * func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func, int prio){ struct tracepoint_func *new; int nr_probes = 0; int pos = -1; /* + 2 : one for new probe, one for NULL func */ new = allocate_probes(nr_probes + 2); pos = 0; new[pos] = *tp_func; new[nr_probes + 1].func = NULL; *funcs = new;}
The logic of the registration function is actually to insert a new node into the queue of the user-defined structure. Next, let's look at the logic for handling hooks.
#define __DO_TRACE(tp, proto, args, cond, rcuidle) \ do { \ struct tracepoint_func *it_func_ptr; \ void *it_func; \ void *__data; \ int __maybe_unused __idx = 0; \ // Get the queue it_func_ptr = rcu_dereference_raw((tp)->funcs); \ // If it is not empty, the callback of the node inside is executed if (it_func_ptr) { \ do { \ it_func = (it_func_ptr)->func; \ __data = (it_func_ptr)->data; \ ((void(*)(proto))(it_func))(args); \ } while ((++it_func_ptr)->func); \ } \ } while (0)
Logically, it is similar to our in the application layer. When executing the hook, that is, our callback, we can write the information to the ring buffer through the kernel interface, and then the application layer can obtain this information through debugfs.
2 trace event
With the Tracepoint mechanism, we can write modules and load them into the kernel to implement our own plug-in points. But the kernel also provides us with a lot of plug-in points. It is implemented through trace event. Let's look at an example.
#define TRACE_EVENT(name, proto, args, struct, assign, print) \ DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))TRACE_EVENT(consume_skb, TP_PROTO(struct sk_buff *skb), TP_ARGS(skb), TP_STRUCT__entry( __field( void *, skbaddr ) ), TP_fast_assign( __entry->skbaddr = skb; ), TP_printk("skbaddr=%p", __entry->skbaddr));
A macro trace is defined above_ Event, which is essentially to DECLARE_TRACE encapsulation, so a series of functions (Registration hook and processing hook) are defined here. Then in consumption_ The registered hook was handled in the SKB function.
void consume_skb(struct sk_buff *skb){ trace_consume_skb(skb); __kfree_skb(skb);}
3. Summary
The kernel provides a very rich but also very complex mechanism, so that users can obtain lower level data through the ability of the kernel for troubleshooting and performance optimization. We can see that the mechanism of inserting piles is a static mechanism. We usually need to rely on the piles supported by the current version of the kernel to obtain the corresponding information, but in fact, the kernel also provides the ability of dynamic tracking, which can realize the ability of hot plug to obtain information. Generally speaking, there are many tracking technologies under Linux. Although they are very complex, the upper layer also provides a variety of more convenient tools. These capabilities are our sharp tools for in-depth troubleshooting.