[exercise 4] analyze the process of bootloader loading OS in ELF format
Read bootmain.c to learn how bootloader loads ELF files. By analyzing the source code and running and debugging bootloader & OS through qemu, understand:
- 1. How can bootloader read the sector of hard disk?
- 2. How does bootloader load OS in ELF format?
Question 1: how to read hard disk sectors by bootloader
Analysis principle
In fact, the reading materials have given the general process of reading a sector:
- 1. Wait for the disk to be ready
- 2. Issue the command to read the sector
- 3. Wait for the disk to be ready
- 4. Read the sector data of the disk to the specified memory
In practice, you need to know how to interact with the hard disk. The reading material also gives the answer: all IO operations are completed by CPU accessing the IO address register of the hard disk. The hard disk has 8 IO address registers, including the first to store data, the eighth to store status and commands, the third to store the number of sectors to read and write, and the fourth to seventh to store the number of starting sectors to read and write (28 bits in total). With this information, it's not hard to program.
Analysis code
The function of bootloader to read sectors is implemented in the readsect function of boot/bootmain.c. first paste the code:
static void waitdisk(void) { //If the highest 2 bits of 0x1F7 are 01, skip the cycle while ((inb(0x1F7) & 0xC0) != 0x40) /* do nothing */; } /* readsect - read a single sector at @secno into @dst */ static void readsect(void *dst, uint32_t secno) { // wait for disk to be ready waitdisk(); outb(0x1F2, 1); //Read a sector outb(0x1F3, secno & 0xFF); //Sector number to read outb(0x1F4, (secno >> 8)&0xFF);//Low 8-bit bytes used to store read and write cylinders outb(0x1F5, (secno >> 16)&0xFF);//High 2-bit byte used to store read / write cylinder // Used to store the disk number and head number to be read / written outb(0x1F6, ((secno >> 24) & 0xF) | 0xE0); outb(0x1F7, 0x20); // cmd 0x20 - read sectors // wait for disk to be ready waitdisk(); // read a sector insl(0x1F0, dst, SECTSIZE / 4); //get data }
Generally, the motherboard has two IDE channels, and each channel can be connected with two IDE hard disks. Accessing the sector of the first hard disk can be realized by setting the IO address register 0x1f0-0x1f7. See the following table for specific parameters. Generally, the first ide channel is implemented by accessing the IO address 0x1f0-0x1f7, and the second ide channel is implemented by accessing 0x170-0x17f. The selection of the master and slave disk of each channel is set by the 6th IO offset address register. From outb(), we can see that we use the PIO (Program IO) mode of LBA mode to access the hard disk. From the disk IO address and the corresponding menu, you can see that this function only reads one sector at a time.
readseg simply wraps the readsect and can read content of any length from the device.
static void readseg(uintptr_t va, uint32_t count, uint32_t offset) { uintptr_t end_va = va + count; va -= offset % SECTSIZE; uint32_t secno = (offset / SECTSIZE) + 1; // Add 1 because sector 0 is occupied by boot // ELF file starts from sector 1 for (; va < end_va; va += SECTSIZE, secno ++) { readsect((void *)va, secno); } }
According to the code, the steps to read the hard disk sector can be obtained:
-
Wait for the hard disk to idle. waitdisk's function implementation has only one line: while ((inb (0x1F7) & 0xc0)! = 0x40), which means that the highest two bits of the 0x1F7 register are queried and read continuously until the highest bit is 0 and the second highest bit is 1 (this state should mean that the disk is free).
-
When the hard disk is idle, issue the command to read the sector. The corresponding command word is 0x20, which is placed in the 0x1F7 register; the number of read sectors is 1, which is placed in the 0x1F2 register; the start number of read sectors is 28 bits, which is divided into 4 parts and placed in the 0x1F3~0x1F6 register in turn.
-
After issuing the command, wait for the hard disk to idle again.
-
After the hard disk is idle again, it starts to read data from the 0x1F0 register. Note that insl's function is "That function will read cnt dwords from the input port specified by port into the supplied output array addr." it is based on dword, which is 4 bytes, so the size here needs to be divided by 4.
Question 2: how does bootloader load OS in ELF format
Analysis principle
Firstly, the loading process is analyzed in principle.
-
bootloader is to load bin/kernel file, which is an ELF file. It starts with the elf header, which contains the phoff field to record the offset of the program header table in the file. The field can find the starting address of the program header table. The program header table is an array of structs with the number of elements recorded in the phnum field of the elf header.
-
Each member of the program header table records the information of a Segment, including the following information needed for loading:
- uint offset; / / the offset value of the Segment relative to the file header, so we can know how to find the Segment from the file.
- uint va; / / the first byte of the Segment will be put into the virtual address in memory, so we can know where to load the Segment into memory.
- uint memsz; / / the number of bytes occupied by the segment in the memory image, from which we can know how much content to load
-
According to the information of ELF Header and Program Header table, we can load all segments in ELF file into memory one by one.
Analysis code
ELF definition:
/* file header */ struct elfhdr { uint32_t e_magic; // must equal ELF_MAGIC uint8_t e_elf[12]; uint16_t e_type; // 1=relocatable, 2=executable, 3=shared object, 4=core image uint16_t e_machine; // 3=x86, 4=68K, etc. uint32_t e_version; // file version, always 1 uint32_t e_entry; // entry point if executable uint32_t e_phoff; // file position of program header or 0 uint32_t e_shoff; // file position of section header or 0 uint32_t e_flags; // architecture-specific flags, usually 0 uint16_t e_ehsize; // size of this elf header uint16_t e_phentsize; // size of an entry in program header uint16_t e_phnum; // number of entries in program header or 0 uint16_t e_shentsize; // size of an entry in section header uint16_t e_shnum; // number of entries in section header or 0 uint16_t e_shstrndx; // section number that contains section name strings };
Here we only need to pay attention to a few parameters, e'magic, which is used to determine whether the read file in ELF format is in the correct format; e'phoff, which is the location offset of the program header table; e'phnum, which is the number of entries in the program header table; e'entry, which is the virtual address corresponding to the program entry.
Macro definition:
#define ELFHDR ((struct elfhdr *)0x10000) #define SECTSIZE 512
The function of bootloader to load os is implemented in the bootmain function. First, paste the code:
void bootmain(void) { // Read the ELF header first readseg((uintptr_t)ELFHDR, SECTSIZE * 8, 0); // Determine whether it is a legal ELF file by the magic number stored in the header if (ELFHDR->e_magic != ELF_MAGIC) { goto bad; } struct proghdr *ph, *eph; // The elf header has a description table describing where the ELF file should be loaded into memory. // First, the header address of the description table exists in ph ph = (struct proghdr *)((uintptr_t)ELFHDR + ELFHDR->e_phoff); eph = ph + ELFHDR->e_phnum; // Load the data in ELF file into memory according to the description table for (; ph < eph; ph ++) { readseg(ph->p_va & 0xFFFFFF, ph->p_memsz, ph->p_offset); } // 0xd1ec bit after the location 0x1000 of ELF file is loaded into memory 0x00100000 // 0x1d20 bits after the location 0xf000 of ELF file are loaded into memory 0x0010e000 // Find the kernel entry according to the entry information stored in the ELF header ((void (*)(void))(ELFHDR->e_entry & 0xFFFFFF))(); bad: outw(0x8A00, 0x8A00); outw(0x8A00, 0x8E00); while (1); }
-
First, load the contents of the first page of the bin/kernel file from the hard disk to the memory address 0x10000 to read the ELF Header information of the kernel file.
-
Verify the E? Magic field of the ELF Header to make sure it is an ELF file
-
Read the e ﹣ phoff field of the ELF Header to get the starting address of the Program Header table; read the e ﹣ Phnum field of the ELF Header to get the number of elements in the Program Header table.
-
Traverse each element in the Program Header table, get the offset of each Segment in the file, the location (virtual address) to be loaded into memory and the length of Segment, and load it through disk I/O.
-
After loading, get the kernel's entry address through the e'entry of ELF Header, and jump to that address to start executing kernel code.
Debug code
-
Enter make debug to start gdb, and set the breakpoint at the entry of bootmain function, that is 0x7d0d. Enter c to jump to the entry.
-
Step by step several times, run to call readseg, because the function will repeatedly read the hard disk, in order to save time, you can set a breakpoint in the next statement to avoid entering the readseg function to repeatedly execute the circular statement. (or input n directly, no such trouble)
-
After executing readseg, you can query the value of e'u magic of ELF Header through x/xw 0x10000. The query result is as follows. It is indeed equal to 0x464c457f, so the verification is successful. Note that our hardware is in small endian byte order (this is not hard to find from the comparison of Assembly statement and binary code of asm file), so 0x464c45 actually corresponds to the string "elf", and the lowest 0x7f character corresponds to DEL.
(gdb) x/xw 0x10000 0x10000: 0x464c457f
- Continue single step execution. From 0x7d2f mov 0x1001c,%eax, it can be seen that the e ﹐ phoff field of ELF Header will be loaded into eax register. The offset between 0x1001c and 0x10000 is 0x1c, that is, the difference is 28 bytes, which is consistent with the definition of ELF Header. After executing the instruction at 0x7d2f, you can see that the value of eax changes to 0x34, indicating that the offset of the program Header table in the file is 0x34, and its position in the memory is 0x10000 + 0x34 = 0x10034. The contents of the next 8 bytes after querying 0x10034 are as follows:
(gdb) x/8xw 0x10034 0x10034: 0x00000001 0x00001000 0x00100000 0x00100000 0x10044: 0x0000dac4 0x0000dac4 0x00000005 0x00001000
- You can understand the meaning of these 8 bytes in combination with the Program Header structure defined in the code.
struct proghdr { uint32_t p_type; // loadable code or data, dynamic linking info,etc. uint32_t p_offset; // file offset of segment uint32_t p_va; // virtual address to map segment uint32_t p_pa; // physical address, not used uint32_t p_filesz; // size of segment in file uint32_t p_memsz; // size of segment in memory (bigger if contains bss) uint32_t p_flags; // read/write/execute bits uint32_t p_align; // required alignment, invariably hardware page size };
- You can also use readelf -l bin/kernel to query the basic information of each Segment in the kernel file for comparison. The query results are as follows. It can be seen that they are consistent with gdb debugging results.
Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x001000 0x00100000 0x00100000 0x0dac4 0x0dac4 R E 0x1000 LOAD 0x00f000 0x0010e000 0x0010e000 0x00aac 0x01dc0 RW 0x1000 GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x10
-
Continue to step by step. From 0x7d34 movzwl 0x1002c,%esi, it can be seen that the e'phnum field of ELF Header will be loaded into the esi register. After executing the instruction at x07d34, you can see that the value of esi changes to 3, indicating that there are 3 segment s in total.
-
Later, I/O is used to load the three segments, which will not be described in detail.
To summarize:
- Read 8 sectors of data from the hard disk to the memory 0x10000, and convert it to elfhdr;
- Verify the E? Magic field;
- Read the data of the program segment into memory according to the offset.