Linux Source Parsing--From Initialization of main Function to Open Interrupt

Keywords: C Linux Operating System server

The three assemblers that are executed before the Linux system starts are mentioned above. The head.s program actually enters the Linux source code written in C by jumping out of the main function stack and into the main function execution. Previous article can be jumped here Linux Source Parsing - From Startup to main Function

Based on the Linux 0.11 source code, this article analyzes the first few initialization steps in the main function until the main function opens the interrupt and executes move_to_user_mode(), which changes from a kernel privilege level to a user privilege level.

	//init/main.c
	mem_init(main_memory_start,memory_end);
	trap_init();
	blk_dev_init();
	chr_dev_init();
	tty_init();
	time_init();
	sched_init();
	buffer_init(buffer_memory_end);
	hd_init();
	floppy_init();
	sti();
	move_to_user_mode();

The main function is in init/main.c
After entering the main function, execute mem_ Before init(), the system first backs up the root device number and the hard disk parameter table.

	ROOT_DEV = ORIG_ROOT_DEV;
 	drive_info = DRIVE_INFO; 
//#define DRIVE_INFO (*(struct drive_info *)0x90080) Copy the hard disk parameter table at 0x90080

Why DRIVE_here INFO wants the macro to be defined as 0x90080 because it was previously set up in setup.s. 0x90080~0x9008f holds the parameter table of the first hard disk and 0x90090~0x9009f holds the parameter table of the second hard disk. However, it was not mentioned in the previous article. If you are interested, you can look at the assembly code in setup.s starting at 65 lines, which is not posted here.

Then plan for physical memory.

	//Memory Size=1Mb Bytes+Extended Memory*1024 Bytes
	memory_end = (1<<20) + (EXT_MEM_K<<10);
	//#define EXT_MEM_K (*(unsigned short *)0x90002)
	memory_end &= 0xfffff000;  //Make the memory space a multiple of 4kb, 4KB a page
	if (memory_end > 16*1024*1024)
		memory_end = 16*1024*1024;//Maximum memory space 16Mb
	if (memory_end > 12*1024*1024) 
		buffer_memory_end = 4*1024*1024;//Set end of memory buffer to 4Mb
	else if (memory_end > 6*1024*1024)
		buffer_memory_end = 2*1024*1024;//Set end of memory buffer to 2Mb
	else
		buffer_memory_end = 1*1024*1024; //Set end of memory buffer to 1Mb
	main_memory_start = buffer_memory_end; //Main Memory Start Address = Buffer End Address
#ifdef RAMDISK
	main_memory_start += rd_init(main_memory_start, RAMDISK*1024);
#endif

Again, here EXT_ MEM_ The K macro is also defined as 0x90002 because it is set in setup. There are 0x90002 Extended Memory values. You can also refer to line 43 of setup code for the same interest. The specific code is not posted here, just until it is expanded memory here.
Notice rd_ The init function initializes the virtual disk, and if the virtual disk is used for settings in the makefile, RAMDISK is defined and rd_is executed Init function, let's assume the system needs to set the virtual disk here and set the size of the virtual disk to 2 MB, then the system will open up 2 MB of memory space for the virtual disk at the main memory area, that is, at the end of the memory buffer.

// kernal/blk_dev/ramdisk.c
main_memory_start += rd_init(main_memory_start, RAMDISK*1024);

#define MAJOR_NR 1
#define DEVICE_REQUEST do_rd_request

long rd_init(long mem_start, int length)
{
	int	i;
	char	*cp;

	blk_dev[MAJOR_NR].request_fn = DEVICE_REQUEST;
	rd_start = (char *) mem_start;
	rd_length = length;
	cp = rd_start;
	for (i=0; i < length; i++)
		*cp++ = '\0';
	return(length);
}

blk_dev is the Request Item Function Control Structure and is a blk_dev_ An array of struct types with two parameters in the structure, a function pointer to the operation of the request item and a pointer to the current request item. Macro Definition MAJOR_NR1 is because the second item in the request item function control structure, the item subscribed to 1, corresponds to memory, because virtual disks use virtual memory to simulate hard disks. The hook function is do_when initializing at the same time Rd_ Request.

// kernal/blk_dev/ll_rw_blk.c
#define NR_BLK_DEV	7

struct blk_dev_struct {
	void (*request_fn)(void);
	struct request * current_request;
};

struct blk_dev_struct blk_dev[NR_BLK_DEV] = {
	{ NULL, NULL },		/* no_dev */
	{ NULL, NULL },		/* dev mem */
	{ NULL, NULL },		/* dev fd */
	{ NULL, NULL },		/* dev hd */
	{ NULL, NULL },		/* dev ttyx */
	{ NULL, NULL },		/* dev tty */
	{ NULL, NULL }		/* dev lp */
};

After the request item function control structure has mounted the request processing function, the rest is to initialize all areas of the virtual disk to'\0'.
mem_init()

// mm/memory.c
#define USED 100
#define MAP_NR(addr) (((addr)-LOW_MEM)>>12)
void mem_init(long start_mem, long end_mem)
{
	int i;

	HIGH_MEMORY = end_mem;
	for (i=0 ; i<PAGING_PAGES ; i++)
		mem_map[i] = USED;
	i = MAP_NR(start_mem);
	end_mem -= start_mem;
	end_mem >>= 12;
	while (end_mem-->0)
		mem_map[i++]=0;
}

The system uses mem_to page 15 MB of space except for the 1 MB of the kernel The map array records the number of times each page has been used. Set all pages to 100 first, then clear all pages to 0 based on the start and end locations of main memory.

#define PAGING_MEMORY (15*1024*1024)
#define PAGING_PAGES (PAGING_MEMORY>>12)
static unsigned char mem_map [ PAGING_PAGES ] = {0,};

trap_init()
Rebuild the interrupt system and mount the interrupt service program. The methods for setting up the interrupt service program are consistent, and only one example is posted here.

// kernal/traps.c
#define set_trap_gate(n,addr) \
	_set_gate(&idt[n],15,0,addr)
	
void trap_init(void)
{
	...
	set_trap_gate(0,&divide_error);
	...
}

This means divide_ The address of the error function, the address of the interrupt service program divided by 0 errors, is mounted to item 0 of the idt.

#define _set_gate(gate_addr,type,dpl,addr) \
__asm__ ("movw %%dx,%%ax\n\t" \
	"movw %0,%%dx\n\t" \
	"movl %%eax,%1\n\t" \
	"movl %%edx,%2" \
	: \
	: "i" ((short) (0x8000+(dpl<<13)+(type<<8))), \
	"o" (*((char *) (gate_addr))), \
	"o" (*(4+(char *) (gate_addr))), \
	"d" ((char *) (addr)),"a" (0x00080000))

How to Understand _ Set_ What about the assembly code for the gate function macro expansion?
%0,%1,%2,%3:0, 1, 2, 3 can be seen as variables, which define the input and output items in the program''After two of the programs''. For this program, these variables are preceded by explicit restrictions, such as "i" (input), "o" (output), the remaining "d" (initial value of edx), and "a" (initial value of eax). The concepts of 0, 1, 2, 3 refer to the first variable, where the input item, output direction and register are initially mixed numbers. Corresponding 0 ("i" ((short) (0x8000+ (dpl < 13)+ (type < 8))); 1 (*((char) (gate_addr))); 2 (((4+(char *)(gate_addr))); 3 ("d" ((char *)(addr))); 4 ("a" (0x00080000))), the rest is to follow the reconstructed IDT from the first article. This will also post the idt's picture again.

If you don't understand it, you can read this article in more detail _ set_gate macro
blk_dev_init()
Initialize block device request item structure.

struct request {
	int dev;		/* -1 if no request */
	int cmd;		/* READ or WRITE */
	int errors;
	unsigned long sector;
	unsigned long nr_sectors;
	char * buffer;
	struct task_struct * waiting;
	struct buffer_head * bh;
	struct request * next;
};
void blk_dev_init(void)
{
	int i;

	for (i=0 ; i<NR_REQUEST ; i++) {
		request[i].dev = -1;
		request[i].next = NULL;
	}
}

The simple work here is to leave all request items idle (dev=-1)
tty_init();
Initialize the peripheral and hook the peripheral-related interrupt service program to the idt, as described here.
time_init()
Set the startup time, collect time data by reading a CMOS chip on the motherboard. There are not many explanations here.
sched_init()
Activate process 0. This is a very important step. Task_of Process 0 The struct code has been designed ahead of time, but to run process 0, you also need to hook up the data structure in the management structure of process 0 to the GDT and set the gdt, process slot, and associated registers.

// include/linux/head.h
typedef struct desc_struct {
	unsigned long a,b;
} desc_table[256];
extern desc_table idt,gdt;
// include/kernel/sched.c
void sched_init(void)
{
	int i;
	struct desc_struct * p;
	...
}

First, a segment descriptor table is defined, starting with an array of structures with 256 items, each descriptor consisting of 8 bytes. The descriptor table then defines idt and gdt, so idt and GDT are an array of 256 items of structure. sched_init begins by defining a pointer p to the segment descriptor table, followed by P to empty the gdt. You then need to attach TSS and ldt to the gdt, which is the task status descriptor table, which is populated by tss_ A struct structure that records the status of the current process.

// include/kernel/sched.c
#define FIRST_TSS_ENTRY 4
set_tss_desc(gdt+FIRST_TSS_ENTRY,&(init_task.task.tss));
set_ldt_desc(gdt+FIRST_LDT_ENTRY,&(init_task.task.ldt));

// include/asm/system.c
#define set_tss_desc(n,addr) _set_tssldt_desc(((char *) (n)),addr,"0x89")
#define set_ldt_desc(n,addr) _set_tssldt_desc(((char *) (n)),addr,"0x82")

#define _set_tssldt_desc(n,addr,type) \
__asm__ ("movw $104,%1\n\t" \
	"movw %%ax,%2\n\t" \
	"rorl $16,%%eax\n\t" \
	"movb %%al,%3\n\t" \
	"movb $" type ",%4\n\t" \
	"movb $0x00,%5\n\t" \
	"movb %%ah,%6\n\t" \
	"rorl $16,%%eax" \
	::"a" (addr), "m" (*(n)), "m" (*(n+2)), "m" (*(n+4)), \
	 "m" (*(n+5)), "m" (*(n+6)), "m" (*(n+7)) \
	)

Init_here Task is the kernel task_defined by the initialization macro Union, you can see task_union is a task_ Common union of struct and kernel stack

union task_union {
	struct task_struct task;
	char stack[PAGE_SIZE];
};
static union task_union init_task = {INIT_TASK,};
#define INIT_TASK \
/* state etc */	{ 0,15,15, \
/* signals */	0,{{},},0, \
/* ec,brk... */	0,0,0,0,0,0, \
/* pid etc.. */	0,-1,0,0,0, \
/* uid etc */	0,0,0,0,0,0, \
/* alarm */	0,0,0,0,0,0, \
/* math */	0, \
/* fs info */	-1,0022,NULL,NULL,NULL,0, \
/* filp */	{NULL,}, \
	{ \
		{0,0}, \
/* ldt */	{0x9f,0xc0fa00}, \
		{0x9f,0xc0f200}, \
	}, \
/*tss*/	{0,PAGE_SIZE+(long)&init_task,0x10,0,0,0,0,(long)&pg_dir,\
	 0,0,0,0,0,0,0,0, \
	 0,0,0x17,0x17,0x17,0x17,0x17,0x17, \
	 _LDT(0),0x80000000, \
		{} \
	}, \
}

set_tss_desc and set_ldt_desc is a macro function, so the simple understanding is to attach TSS to the location of gdt[FIRST_TSS_ENTRY], and addr represents the address of tss. The assembly code is loaded in a similar manner to the idt.
Notable here is FIRST_TSS_ENTRY macro defined as 4, FIRST_ ldt_ The ENTRY macro is defined as 5 because 0 items of gdt are useless, 1 item is code segment, 2 items are data segment, 3 items are system segment, then TSS is placed in 4 items, ldt is placed in 5 items, 6 items in process 1 tss, and so on.
Next, let P point to the next ldt of process 0 in gdt, and clear all subsequent items and the task array to zero. The reason p->a=p->b=0 is that P is a pointer to a segment descriptor with only two elements, A and B.

	p = gdt+2+FIRST_TSS_ENTRY;
	for(i=1;i<NR_TASKS;i++) {
		task[i] = NULL;
		p->a=p->b=0;
		p++;
		p->a=p->b=0;
		p++;
	}

tss and ldt are then recorded in the corresponding registers

	ltr(0);
	lldt(0);

Then set the clock interrupt, which is divided into three steps: setting the 8253 timer, setting the polling-related service program, and opening the shield code related to the clock mid-section in 8259A. This can generate the clock interrupt, which is the basis of the subsequent process polling. There are not many explanations here.
Next, the system call handler set_system_gate is hooked to idt. The steps here are the same as the previous process of hooking the interrupt service program, except that the priority is different, here is user priority 3, because this is the soft interrupt of system call for the user, the user process wants to deal with the kernel through system call.
buffer_init(buffer_memory_end)

First of all, you need to know about buffers. The images are referenced from a complete analysis of Zhao Liang's Linux kernel. The start of the cache starts with the end label at the end of the kernel module, which is reflected in struct buffer_ Head * start_ Buffer = (struct buffer_head *) &end; End is a value set by the linker during kernel module linking.

First, if the high end of the buffer equals 1Mb, the actual available high end of the buffer should be adjusted to 640Kb because it is occupied by video memory and BIOS from 640Kb-1Mb, otherwise the high end of memory must be greater than 1MB.

// fs/buffer.c
void buffer_init(long buffer_end)
{
	if (buffer_end == 1<<20)
		b = (void *) (640*1024);
	else
		b = (void *) buffer_end;
	...
}

Then buffer_for each buffer Head initializes, sets various flag bits equal to 0, and buffer_ The head is linked into a two-way ring list. The entire cache is divided into 1024 byte buffer blocks, which are the same size as the disk logical blocks on the block device. Buffer header structure is set at the lower end of the buffer, linking the high-end buffer blocks. The picture below is good, referring to a complete analysis of the Linux kernel from Zhao Liang

while ( (b -= BLOCK_SIZE) >= ((void *) (h+1)) ) {
		h->b_dev = 0;
		h->b_dirt = 0;
		h->b_count = 0;
		h->b_lock = 0;
		h->b_uptodate = 0;
		h->b_wait = NULL;
		h->b_next = NULL;
		h->b_prev = NULL;
		h->b_data = (char *) b;
		h->b_prev_free = h-1;
		h->b_next_free = h+1;
		h++;
		NR_BUFFERS++;
		if (b == (void *) 0x100000)
			b = (void *) 0xA0000;
	}
	h--;
	free_list = start_buffer;
	free_list->b_prev_free = h;
	h->b_next_free = free_list;


Finally, the hash table control array is initialized to NULL. Hash_ There are 307 tables.

for (i=0;i<NR_HASH;i++)
		hash_table[i]=NULL;

hd_init()
Hard disk initialization.
floppy_init()
Floppy disk initialization.

Finally, turn on the interrupt, and then switch from the kernel state to the user state.

	sti();
	move_to_user_mode();

At this point, Linux will enter the most difficult part, process 0 will fork process 1 and switch to process 1 for execution, which will be explained in another article.

There are some wrong places to welcome criticism and correction!

References:
[1] New Design Team. Art of Linux Kernel Design [M]. Beijing: Machine Industry Press, 2014.
[2] Zhao Liang. Full analysis of Linux kernel [M]. Beijing: Machinery Industry Press, 2008.

Posted by php3ch0 on Fri, 05 Nov 2021 10:03:17 -0700