Notes on Starting Linux Kernel Analysis from start_kernel to init Process
The use of GDB
Before entering the GDB debugging, first master the most commonly used commands of GDB, in order to facilitate the completion of tracking.
- B [reak] line number: add breakpoints
- s[tep]: Step-in
- n[ext]: One-step skip
- c[ontinue]: Continue execution
- r[un]: Run to the end or crash
- q[uit]: Exit
- info: View the set breakpoints and observation points
- watch: Set up observation points
Other useful commands include:
command | purpose |
---|---|
ptype | Data Type of Printed Variables |
info share | Print the name of the currently loaded shared library |
info functions | Print all function prototypes |
list | Display 10 lines of source code around the current line |
help | Display a list of topics |
Start-up process of computer
- The first action that x86 CPU starts is CS: EIP = FFFFFF: 0000H (converted to physical address 000FF0H, because 16-bit CPU has 20 address lines), which is the location of BIOS program. http://wenku.baidu.com/view/4e5c49eb172ded630b1cb699.html
- After the BIOS routine detects the hardware and completes the corresponding initialization, it finds the bootable medium, loads the bootstrap into the specified memory area, and gives the control to the bootstrap. This is where the first sector MBR of the hard disk and the bootstrapper of the active partition are loaded into memory (i.e. BootLoader), and control is given to BootLoader after loading is complete.
- BootLoader, the bootstrapper, begins to initialize the operating system and then starts the operating system. When starting an operating system, you usually specify the partitions and directories of the kernel, initrd and root, such as root (hd0,0), kernel (hd0,0)/bzImage root=/dev/ram init=/bin/ash, initrd (hd0,0)/myinitrd4M.img.
- The kernel startup process includes the assembly instructions before and after start_kernel, which are all initialized before, then the operating system initialization of C code is started, and finally the first user-mode process init is executed.
- Generally, it is started in two stages, first by using initrd's memory file system, and then by switching to the hard disk file system. There are two main functions of initrd file: 1. Providing driver modules that are necessary for booting, but not provided by kernel file (vmlinuz). 2. Loading the root file system on hard disk and executing the / sbin/init program therein will continue the booting process.
Once the computer is powered on, the PC points to a section of BIOS, which completes the hardware self-check. When the hardware check is completed, no problem is found. It begins to read bytes from a sector of the hard disk (512 bytes in size, which can be considered grub in Linux), and then gives control to the code, which is smaller in size and can do fewer things, allowing users to make some selection operations. When the selection is completed, the code is responsible for loading the kernel into memory. When the loading is completed, control is given to the operating system, which loads and starts running.
Tracking and Analyzing the Startup Process of Linux Kernel
Open the shell with the virtual machine in the laboratory building
cd LinuxKernel/
qemu -kernel linux-3.18.6/arch/x86/boot/bzImage -initrd rootfs.img
After the kernel is started, it enters the menu program (the course project of Software Engineering C Coding Practice) and supports three commands: help, version and quit. You can also add more commands. For children who have taken Software Engineering C Coding Practice, they should be a piece of cake.
Debugging Kernel Using gdb Tracking
qemu -kernel linux-3.18.6/arch/x86/boot/bzImage -initrd rootfs.img -s -S # Notes on-s and-S options:
-S freeze CPU at startup (use 'c' to start execution)
-s shorthand for -gdb tcp::1234 If you don't want to use it1234Port, you can use-gdb tcp:xxxx To replace-soption
Open another shell window
gdb
(gdb)file linux-3.18.6/vmlinux # Load symbol tables before targe remote in the gdb interface
(gdb)target remote:1234 # Establish the connection between gdb and gdbserver, press c to keep Linux running on qemu
(gdb)break start_kernel # Breakpoints can be set before or after target remote
The basic initialization of the Linux operating system is completed in the init module, which is also the beginning of our experiment.
Firstly, according to the description, the breakpoint hits the start_kernel function of the main.c function, then executes c[ontinue], and then stops at the start_kernel function of the main.c:501 line.
Using the list function to print out the context, we can see that there are many initialization operations in this module, such as trap_init (interrupt), ipc_init (process), mm_init (memory management), sched_init (process scheduling), and so on.
Let's break a breakpoint in rest_init. As you can see in the list, rest_init is the last function called by start_kernel.
Analyzing the Startup Process
Looking back at start_kernel, after the architecture-related assembly code has run, the program jumps into the architecture-independent kernel C language code: the start_kernel function in init/main.c, in which the Linux kernel begins to really enter the initialization stage.
asmlinkage __visible void __init start_kernel(void)
{
//Command line to store parameters passed by bootloader
char *command_line;
char *after_dashes;
//Initialize the kernel debugging module
lockdep_init();
//init_task is the manually created PCB
set_task_stack_end_magic(&init_task);
//Get the hardware ID of the current CPU
smp_setup_processor_id();
//Initialize hash bucket
debug_objects_early_init();
//Preventing stack overflow
boot_init_stack_canary();
//Initialize C groups
cgroup_init_early();
//Close all interrupts in the current CPU
local_irq_disable();
//System interrupt sign
early_boot_irqs_disabled = true;
//Activate the current CPU
boot_cpu_init();
//Initialization of high-end memory mapping tables
page_address_init();
//Output of all kinds of information
pr_notice("%s", linux_banner);
//Kernel Architecture Related Initialization Functions
setup_arch(&command_line);
//Each task has a mm_struct structure to manage memory space
mm_init_cpumask(&init_mm);
//Backup and save cmdline
setup_command_line(command_line);
//Set the maximum number of nr_cpu_ids structures
setup_nr_cpu_ids();
//Request space for per_cpu variables per CPU in the system
setup_per_cpu_areas();
//Prepare for boot-cpu boot in SMP system
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
//Setting up memory management related node s
build_all_zonelists(NULL, NULL);
//Setting up memory page allocation Notifier
page_alloc_init();
pr_notice("Kernel command line: %s\n", boot_command_line);
//Analysis of Start-up Parameters in cmdline
parse_early_param();
//Interpretation of afferent kernel parameters
after_dashes = parse_args("Booting kernel",
static_command_line, __start___param,
__stop___param - __start___param,
-1, -1, &unknown_bootoption);
if (!IS_ERR_OR_NULL(after_dashes))
parse_args("Setting init args", after_dashes, NULL, 0, -1, -1,
set_init_arg);
jump_label_init();
//Using bootmeme to allocate a buffer for recording startup information
setup_log_buf(0);
//HASH table initialization of process ID
pidhash_init();
//Cache initialization of pre-virtual file system (vfs)
vfs_caches_init_early();
//Sort the kernel exception table by the size of the exception vector number to speed up access
sort_main_extable();
//Initialization of Kernel Trap Exceptions
trap_init();
//Mark which memory is available
mm_init();
//Initialize the data structure of the process scheduler
sched_init();
//Turn off priority scheduling
preempt_disable();
//This code mainly determines whether an interrupt is opened prematurely, and if so, it prompts and closes the interrupt.
if (WARN(!irqs_disabled(),
"Interrupts were enabled *very* early, fixing it\n"))
local_irq_disable();
//Allocating caches for IDR mechanisms
idr_init_cache();
//Lock mechanism for initializing direct read copy updates
rcu_init();
context_tracking_init();
//Kernel radis tree algorithm initialization
radix_tree_init();
//Pre-initialization of external interrupt descriptors, mainly initializing data structures
early_irq_init();
//Interrupt initialization functions specific to the corresponding architecture
init_IRQ();
//Initialize the Kernel Clock System
tick_init();
rcu_init_nohz();
//Initialize clock-related data structures for booting CPU s
init_timers();
//Initialization of High Precision Timer
hrtimers_init();
//Initialization software interrupt
softirq_init();
//Initialize System Clock Timing
timekeeping_init();
//Initialize system clock
time_init();
sched_clock_postinit();
//CPU Performance Monitoring Mechanism Initialization
perf_event_init();
//Allocate memory space for kernel performance parameters
profile_init();
//Initialize call_single_queue for all CPU s
call_function_init();
WARN(!irqs_disabled(), "Interrupts were enabled early\n");
early_boot_irqs_disabled = false;
local_irq_enable();
//This is the late initialization of the kernel memory cache (slab allocator)
kmem_cache_init_late();
//Initialization console
console_init();
if (panic_later)
panic("Too many boot %s vars at `%s'", panic_later,
panic_param);
//Dependency information for printing locks
lockdep_info();
//Test whether the API of the lock is working properly
locking_selftest();
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
pr_crit("initrd overwritten (0x%08lx < 0x%08lx) - disabling it.\n",
page_to_pfn(virt_to_page((void *)initrd_start)),
min_low_pfn);
initrd_start = 0;
}
#endif
page_cgroup_init();
debug_objects_mem_init();
kmemleak_init();
setup_per_cpu_pageset();
numa_policy_init();
if (late_time_init)
late_time_init();
sched_clock_init();
calibrate_delay();
pidmap_init();
anon_vma_init();
acpi_early_init();
#ifdef CONFIG_X86
if (efi_enabled(EFI_RUNTIME_SERVICES))
efi_enter_virtual_mode();
#endif
#ifdef CONFIG_X86_ESPFIX64
/* Should be run before the first non-init thread is created */
init_espfix_bsp();
#endif
thread_info_cache_init();
cred_init();
fork_init(totalram_pages);
proc_caches_init();
//Initialize the buffer of the file system
buffer_init();
//Initialization of Kernel Key Management System
key_init();
//Initialization of Kernel Security Management Framework
security_init();
dbg_late_init();
vfs_caches_init(totalram_pages);
signals_init();
/* rootfs populating might need page-writeback */
page_writeback_init();
proc_root_init();
cgroup_init();
cpuset_init();
taskstats_init_early();
delayacct_init();
check_bugs();
sfi_init_late();
if (efi_enabled(EFI_RUNTIME_SERVICES)) {
efi_late_init();
efi_free_boot_services();
}
ftrace_init();
rest_init();//Initialization of Residual
}
Step 1: The Birth of Process 0
First, focus on the second sentence in start_kernel().
set_task_stack_end_magic(&init_task);
Init_task is defined in the file linux-3.18.6/init/init_task.c as follows:
struct task_struct init_task = INIT_TASK(init_task);
In fact, task_struct manually implements the function of a PCB, producing the original process, process 0.
The INIT_TASK macro is in the linux/init_task.h header file, which shows that the mission of INIT_TASK is to generate process 0.
This struct contains all the information of the process, especially because the handcrafted process PID is set to 0.
Step 2: Creation of Process 1
Let's look at the rest_init function.
There's a function like kernel_thread.
kernel_thread(kernel_init, NULL, CLONE_FS);
The source code of kernel_thread() is defined in the file linux-3.18.6/kernel/fork.c as follows:
pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
{
return do_fork(flags|CLONE_VM|CLONE_UNTRACED, (unsigned long)fn,
(unsigned long)arg, NULL, NULL);
}
Its function is to call fork to generate a new process running kernel_init.
The definition of kernel_init() function is defined in the file linux-3.18.6/init/main.c as follows:
//The created kernel thread runs this function and starts run_init_process in this function
static int __ref kernel_init(void *unused)
{
int ret;
kernel_init_freeable();
async_synchronize_full();
free_initmem();
mark_rodata_ro();
system_state = SYSTEM_RUNNING;
numa_default_policy();
flush_delayed_fput();
if (ramdisk_execute_command) {
//Start run_init_process
ret = run_init_process(ramdisk_execute_command);
if (!ret)
return 0;
pr_err("Failed to execute %s (error %d)\n",
ramdisk_execute_command, ret);
}
if (execute_command) {
ret = run_init_process(execute_command);
if (!ret)
return 0;
pr_err("Failed to execute %s (error %d). Attempting defaults...\n",
execute_command, ret);
}
/*try_to_run_init_process()Constructing an assembly by embedding
The sys_execve() call, similar to the user-mode code, takes the following parameters
Executable file name to execute.
*/
/*Here is the Yin-Yang bound for the end of kernel initialization and the beginning of user-mode initialization.
*/
if (!try_to_run_init_process("/sbin/init") ||
!try_to_run_init_process("/etc/init") ||
!try_to_run_init_process("/bin/init") ||
!try_to_run_init_process("/bin/sh"))
return 0;
panic("No working init found. Try passing init= option to kernel. "
"See Linux Documentation/init.txt for guidance.");
}
We can clearly see that the function finally calls the execute command to execute the init program on the hard disk, which is when process 1 is generated, the kernel state goes to the user state.
Step 3: Change of Process 0
static noinline void __init_refok rest_init(void)
{
int pid;
rcu_scheduler_starting();
//It's important to create a kernel thread, PID=1, that's created, but it can't be scheduled.
kernel_thread(kernel_init, NULL, CLONE_FS);
numa_default_policy();
//Importantly, create a second kernel thread, PID=2, responsible for managing and scheduling other kernel threads.
pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
rcu_read_lock();
kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
rcu_read_unlock();
complete(&kthreadd_done);
init_idle_bootup_task(current);
schedule_preempt_disabled();
cpu_startup_entry(CPUHP_ONLINE);
}
This is the rest_init function, which creates two processes, kernel_init and kthreadd. Finally, cpu_stargup_entry.
cpu_startup_entry(CPUHP_ONLINE);
Definitions are as follows:
void cpu_startup_entry(enum cpuhp_state state)
{
#ifdef CONFIG_X86
boot_init_stack_canary();
#endif
arch_cpu_idle_prepare();
cpu_idle_loop();
}
The final cpu_idle_loop() is an infinite loop. After starting process 1 and other work, process 0 becomes an infinite loop, idle, idle in the kernel state.
Summary
The whole process, especially the process No. 0 and No. 1, and idle processes, is basically clear here. To summarize briefly, start_kernel is the beginning of initialization of system environment after assembly code running. Process 0 was established artificially at the beginning. Then process 0 fork produced the first user-mode process 1. Process 1 loaded the init program on disk, generated all the processes needed by the system, and then process 0 was transformed into idle process and idled in the system.
Dao Sheng I (start_kernel....cpu_idle), kernel_init and kthreadd, Twin Sheng III (i.e. the first three processes), Three Sheng Universities (process 1 is the ancestor of all user-mode processes, process 2 is the ancestor of all kernel threads), the core code of the new kernel has been optimized quite clean, which conforms to the spirit of Chinese traditional culture.
After more learning, idle process is not the only one. On SMP multiprocessor, idle process of main processor is changed from the original No. 0. idle of processor is generated by main processor fork. PID is 0. idle of each processor rotates in idle time and takes part in scheduling function.
Jin Youzhi's original works
Reprinted please indicate the source of MOOC course "Linux Kernel Analysis" http://mooc.study.163.com/course/USTC-1000029000