Notes on Starting Linux Kernel Analysis from start_kernel to init Process

Keywords: Linux shell C

Notes on Starting Linux Kernel Analysis from start_kernel to init Process

The use of GDB

Before entering the GDB debugging, first master the most commonly used commands of GDB, in order to facilitate the completion of tracking.

  • B [reak] line number: add breakpoints
  • s[tep]: Step-in
  • n[ext]: One-step skip
  • c[ontinue]: Continue execution
  • r[un]: Run to the end or crash
  • q[uit]: Exit
  • info: View the set breakpoints and observation points
  • watch: Set up observation points

Other useful commands include:

command purpose
ptype Data Type of Printed Variables
info share Print the name of the currently loaded shared library
info functions Print all function prototypes
list Display 10 lines of source code around the current line
help Display a list of topics

Start-up process of computer

  • The first action that x86 CPU starts is CS: EIP = FFFFFF: 0000H (converted to physical address 000FF0H, because 16-bit CPU has 20 address lines), which is the location of BIOS program. http://wenku.baidu.com/view/4e5c49eb172ded630b1cb699.html
  • After the BIOS routine detects the hardware and completes the corresponding initialization, it finds the bootable medium, loads the bootstrap into the specified memory area, and gives the control to the bootstrap. This is where the first sector MBR of the hard disk and the bootstrapper of the active partition are loaded into memory (i.e. BootLoader), and control is given to BootLoader after loading is complete.
  • BootLoader, the bootstrapper, begins to initialize the operating system and then starts the operating system. When starting an operating system, you usually specify the partitions and directories of the kernel, initrd and root, such as root (hd0,0), kernel (hd0,0)/bzImage root=/dev/ram init=/bin/ash, initrd (hd0,0)/myinitrd4M.img.
  • The kernel startup process includes the assembly instructions before and after start_kernel, which are all initialized before, then the operating system initialization of C code is started, and finally the first user-mode process init is executed.
  • Generally, it is started in two stages, first by using initrd's memory file system, and then by switching to the hard disk file system. There are two main functions of initrd file: 1. Providing driver modules that are necessary for booting, but not provided by kernel file (vmlinuz). 2. Loading the root file system on hard disk and executing the / sbin/init program therein will continue the booting process.

Once the computer is powered on, the PC points to a section of BIOS, which completes the hardware self-check. When the hardware check is completed, no problem is found. It begins to read bytes from a sector of the hard disk (512 bytes in size, which can be considered grub in Linux), and then gives control to the code, which is smaller in size and can do fewer things, allowing users to make some selection operations. When the selection is completed, the code is responsible for loading the kernel into memory. When the loading is completed, control is given to the operating system, which loads and starts running.

Tracking and Analyzing the Startup Process of Linux Kernel

Open the shell with the virtual machine in the laboratory building

cd LinuxKernel/
qemu -kernel linux-3.18.6/arch/x86/boot/bzImage -initrd rootfs.img

After the kernel is started, it enters the menu program (the course project of Software Engineering C Coding Practice) and supports three commands: help, version and quit. You can also add more commands. For children who have taken Software Engineering C Coding Practice, they should be a piece of cake.

Debugging Kernel Using gdb Tracking

qemu -kernel linux-3.18.6/arch/x86/boot/bzImage -initrd rootfs.img -s -S # Notes on-s and-S options:
 -S freeze CPU at startup (use 'c' to start execution)
 -s shorthand for -gdb tcp::1234 If you don't want to use it1234Port, you can use-gdb tcp:xxxx To replace-soption

Open another shell window

gdb
(gdb)file linux-3.18.6/vmlinux # Load symbol tables before targe remote in the gdb interface
(gdb)target remote:1234 # Establish the connection between gdb and gdbserver, press c to keep Linux running on qemu
(gdb)break start_kernel # Breakpoints can be set before or after target remote

The basic initialization of the Linux operating system is completed in the init module, which is also the beginning of our experiment.
Firstly, according to the description, the breakpoint hits the start_kernel function of the main.c function, then executes c[ontinue], and then stops at the start_kernel function of the main.c:501 line.

Using the list function to print out the context, we can see that there are many initialization operations in this module, such as trap_init (interrupt), ipc_init (process), mm_init (memory management), sched_init (process scheduling), and so on.

Let's break a breakpoint in rest_init. As you can see in the list, rest_init is the last function called by start_kernel.

Analyzing the Startup Process

Looking back at start_kernel, after the architecture-related assembly code has run, the program jumps into the architecture-independent kernel C language code: the start_kernel function in init/main.c, in which the Linux kernel begins to really enter the initialization stage.

asmlinkage __visible void __init start_kernel(void)
{
    //Command line to store parameters passed by bootloader
    char *command_line;
    char *after_dashes;

    //Initialize the kernel debugging module
    lockdep_init();
    //init_task is the manually created PCB
    set_task_stack_end_magic(&init_task);
    //Get the hardware ID of the current CPU
    smp_setup_processor_id();
    //Initialize hash bucket
    debug_objects_early_init();
    //Preventing stack overflow
    boot_init_stack_canary();
    //Initialize C groups
    cgroup_init_early();
    //Close all interrupts in the current CPU
    local_irq_disable();
    //System interrupt sign
    early_boot_irqs_disabled = true;
    //Activate the current CPU
    boot_cpu_init();
    //Initialization of high-end memory mapping tables
    page_address_init();
    //Output of all kinds of information
    pr_notice("%s", linux_banner);
    //Kernel Architecture Related Initialization Functions
    setup_arch(&command_line);
    //Each task has a mm_struct structure to manage memory space
    mm_init_cpumask(&init_mm);
    //Backup and save cmdline
    setup_command_line(command_line);
    //Set the maximum number of nr_cpu_ids structures
    setup_nr_cpu_ids();
    //Request space for per_cpu variables per CPU in the system
    setup_per_cpu_areas();
    //Prepare for boot-cpu boot in SMP system
    smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
    //Setting up memory management related node s
    build_all_zonelists(NULL, NULL);
    //Setting up memory page allocation Notifier
    page_alloc_init();

    pr_notice("Kernel command line: %s\n", boot_command_line);
    //Analysis of Start-up Parameters in cmdline
    parse_early_param();
    //Interpretation of afferent kernel parameters
    after_dashes = parse_args("Booting kernel",
                  static_command_line, __start___param,
                  __stop___param - __start___param,
                  -1, -1, &unknown_bootoption);
    if (!IS_ERR_OR_NULL(after_dashes))
        parse_args("Setting init args", after_dashes, NULL, 0, -1, -1,
               set_init_arg);

    jump_label_init();

    //Using bootmeme to allocate a buffer for recording startup information
    setup_log_buf(0);
    //HASH table initialization of process ID
    pidhash_init();
    //Cache initialization of pre-virtual file system (vfs)
    vfs_caches_init_early();
    //Sort the kernel exception table by the size of the exception vector number to speed up access
    sort_main_extable();
    //Initialization of Kernel Trap Exceptions
    trap_init();
    //Mark which memory is available
    mm_init();
    //Initialize the data structure of the process scheduler
    sched_init();
    //Turn off priority scheduling
    preempt_disable();
    //This code mainly determines whether an interrupt is opened prematurely, and if so, it prompts and closes the interrupt.
    if (WARN(!irqs_disabled(),
         "Interrupts were enabled *very* early, fixing it\n"))
        local_irq_disable();
    //Allocating caches for IDR mechanisms
    idr_init_cache();
    //Lock mechanism for initializing direct read copy updates
    rcu_init();
    context_tracking_init();
    //Kernel radis tree algorithm initialization
    radix_tree_init();
    //Pre-initialization of external interrupt descriptors, mainly initializing data structures
    early_irq_init();
    //Interrupt initialization functions specific to the corresponding architecture
    init_IRQ();
    //Initialize the Kernel Clock System
    tick_init();
    rcu_init_nohz();
    //Initialize clock-related data structures for booting CPU s
    init_timers();
    //Initialization of High Precision Timer
    hrtimers_init();
    //Initialization software interrupt
    softirq_init();
    //Initialize System Clock Timing
    timekeeping_init();
    //Initialize system clock
    time_init();
    sched_clock_postinit();
    //CPU Performance Monitoring Mechanism Initialization
    perf_event_init();  
    //Allocate memory space for kernel performance parameters
    profile_init();
    //Initialize call_single_queue for all CPU s
    call_function_init();
    WARN(!irqs_disabled(), "Interrupts were enabled early\n");
    early_boot_irqs_disabled = false;
    local_irq_enable();
    //This is the late initialization of the kernel memory cache (slab allocator)
    kmem_cache_init_late();
    //Initialization console
    console_init();
    if (panic_later)
        panic("Too many boot %s vars at `%s'", panic_later,
              panic_param);
    //Dependency information for printing locks
    lockdep_info();
    //Test whether the API of the lock is working properly
    locking_selftest();

#ifdef CONFIG_BLK_DEV_INITRD
    if (initrd_start && !initrd_below_start_ok &&
        page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
        pr_crit("initrd overwritten (0x%08lx < 0x%08lx) - disabling it.\n",
            page_to_pfn(virt_to_page((void *)initrd_start)),
            min_low_pfn);
        initrd_start = 0;
    }
#endif
    page_cgroup_init();
    debug_objects_mem_init();
    kmemleak_init();
    setup_per_cpu_pageset();
    numa_policy_init();
    if (late_time_init)
        late_time_init();
    sched_clock_init();
    calibrate_delay();
    pidmap_init();
    anon_vma_init();
    acpi_early_init();
#ifdef CONFIG_X86
    if (efi_enabled(EFI_RUNTIME_SERVICES))
        efi_enter_virtual_mode();
#endif
#ifdef CONFIG_X86_ESPFIX64
    /* Should be run before the first non-init thread is created */
    init_espfix_bsp();
#endif
    thread_info_cache_init();
    cred_init();
    fork_init(totalram_pages);
    proc_caches_init(); 
    //Initialize the buffer of the file system
    buffer_init();
    //Initialization of Kernel Key Management System
    key_init();
    //Initialization of Kernel Security Management Framework
    security_init();
    dbg_late_init();
    vfs_caches_init(totalram_pages);
    signals_init();
    /* rootfs populating might need page-writeback */
    page_writeback_init();
    proc_root_init();
    cgroup_init();
    cpuset_init();
    taskstats_init_early();
    delayacct_init();
    check_bugs();
    sfi_init_late();
    if (efi_enabled(EFI_RUNTIME_SERVICES)) {
        efi_late_init();
        efi_free_boot_services();
    }
    ftrace_init();
    rest_init();//Initialization of Residual
}

Step 1: The Birth of Process 0

First, focus on the second sentence in start_kernel().

set_task_stack_end_magic(&init_task);

Init_task is defined in the file linux-3.18.6/init/init_task.c as follows:

struct task_struct init_task = INIT_TASK(init_task);

In fact, task_struct manually implements the function of a PCB, producing the original process, process 0.

The INIT_TASK macro is in the linux/init_task.h header file, which shows that the mission of INIT_TASK is to generate process 0.


This struct contains all the information of the process, especially because the handcrafted process PID is set to 0.

Step 2: Creation of Process 1

Let's look at the rest_init function.
There's a function like kernel_thread.

kernel_thread(kernel_init, NULL, CLONE_FS);


The source code of kernel_thread() is defined in the file linux-3.18.6/kernel/fork.c as follows:

pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
{
    return do_fork(flags|CLONE_VM|CLONE_UNTRACED, (unsigned long)fn,
        (unsigned long)arg, NULL, NULL);
}

Its function is to call fork to generate a new process running kernel_init.



The definition of kernel_init() function is defined in the file linux-3.18.6/init/main.c as follows:

//The created kernel thread runs this function and starts run_init_process in this function
static int __ref kernel_init(void *unused)
{
    int ret;
    kernel_init_freeable();
    async_synchronize_full();
    free_initmem();
    mark_rodata_ro();
    system_state = SYSTEM_RUNNING;
    numa_default_policy();
    flush_delayed_fput();
    if (ramdisk_execute_command) {
    //Start run_init_process
        ret = run_init_process(ramdisk_execute_command);
        if (!ret)
            return 0;
        pr_err("Failed to execute %s (error %d)\n",
               ramdisk_execute_command, ret);
    }
    if (execute_command) {
        ret = run_init_process(execute_command);
        if (!ret)
            return 0;
        pr_err("Failed to execute %s (error %d).  Attempting defaults...\n",
            execute_command, ret);
    }
    /*try_to_run_init_process()Constructing an assembly by embedding
    The sys_execve() call, similar to the user-mode code, takes the following parameters
    Executable file name to execute.
    */
    /*Here is the Yin-Yang bound for the end of kernel initialization and the beginning of user-mode initialization.
    */
    if (!try_to_run_init_process("/sbin/init") ||
        !try_to_run_init_process("/etc/init") ||
        !try_to_run_init_process("/bin/init") ||
        !try_to_run_init_process("/bin/sh"))
        return 0;

    panic("No working init found.  Try passing init= option to kernel. "
          "See Linux Documentation/init.txt for guidance.");
}

We can clearly see that the function finally calls the execute command to execute the init program on the hard disk, which is when process 1 is generated, the kernel state goes to the user state.

Step 3: Change of Process 0

static noinline void __init_refok rest_init(void)
{
    int pid;
    rcu_scheduler_starting();
    //It's important to create a kernel thread, PID=1, that's created, but it can't be scheduled.
    kernel_thread(kernel_init, NULL, CLONE_FS);
    numa_default_policy();
    //Importantly, create a second kernel thread, PID=2, responsible for managing and scheduling other kernel threads.
    pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
    rcu_read_lock();
    kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
    rcu_read_unlock();
    complete(&kthreadd_done);
    init_idle_bootup_task(current);
    schedule_preempt_disabled();
    cpu_startup_entry(CPUHP_ONLINE);
}

This is the rest_init function, which creates two processes, kernel_init and kthreadd. Finally, cpu_stargup_entry.

cpu_startup_entry(CPUHP_ONLINE);

Definitions are as follows:

void cpu_startup_entry(enum cpuhp_state state)
{
#ifdef CONFIG_X86
    boot_init_stack_canary();
#endif
    arch_cpu_idle_prepare();
    cpu_idle_loop();
}

The final cpu_idle_loop() is an infinite loop. After starting process 1 and other work, process 0 becomes an infinite loop, idle, idle in the kernel state.

Summary

The whole process, especially the process No. 0 and No. 1, and idle processes, is basically clear here. To summarize briefly, start_kernel is the beginning of initialization of system environment after assembly code running. Process 0 was established artificially at the beginning. Then process 0 fork produced the first user-mode process 1. Process 1 loaded the init program on disk, generated all the processes needed by the system, and then process 0 was transformed into idle process and idled in the system.

Dao Sheng I (start_kernel....cpu_idle), kernel_init and kthreadd, Twin Sheng III (i.e. the first three processes), Three Sheng Universities (process 1 is the ancestor of all user-mode processes, process 2 is the ancestor of all kernel threads), the core code of the new kernel has been optimized quite clean, which conforms to the spirit of Chinese traditional culture.

After more learning, idle process is not the only one. On SMP multiprocessor, idle process of main processor is changed from the original No. 0. idle of processor is generated by main processor fork. PID is 0. idle of each processor rotates in idle time and takes part in scheduling function.

Jin Youzhi's original works
Reprinted please indicate the source of MOOC course "Linux Kernel Analysis" http://mooc.study.163.com/course/USTC-1000029000

Posted by Beauford on Wed, 17 Apr 2019 11:06:33 -0700