Golang program startup process analysis

This article uses the golang 1.17 code. If you have any problems, please point out.

The process by which Golang code is run by the operating system

1, Compile

The go source code must first be compiled into an executable file through go build and an ELF format executable file on the linux platform. In the compilation stage, the executable file will be finally generated through three processes: compiler, assembler and linker.

  • 1. Compiler:. Go source code generates plan9 assembly code of. s through the go compiler. The go compiler entry is compile/internal/gc/main.go main function of the file;
  • 2. Assembler: convert the. s assembly language generated by the compiler into machine code through the go assembler, and write the final target program. o file, src/cmd/internal/obj Package implements go assembler;
  • 3. Linker: the *. o object files generated by the assembler get the final executable program through link processing, src/cmd/link/internal/ld Package implements linker;

2, Run

After the go source code generates the executable file through the above steps, the binary file will go through the following stages when it is loaded and run by the operating system:

  • 1. Read the executable program into memory from disk;
  • 2. Create process and main thread;
  • 3. Allocate stack space for the main thread;
  • 4. Copy the parameters entered by the user on the command line to the stack of the main thread;
  • 5. Put the main thread into the running queue of the operating system and wait for the scheduled execution to run;

Golang program startup process analysis

1. Start the process by gdb debugging the analyzer

Here, a simple go program is used to analyze the start-up process through one-step debugging:


package main

import "fmt"

func main() {
	fmt.Println("hello world")

Compile the program and debug it using gdb. When debugging with gdb, first set a breakpoint at the program entrance, and then conduct one-step debugging to see the code execution process during the startup of the program.

$ go build -gcflags "-N -l" -o main main.go

$ gdb ./main

(gdb) info files
Symbols from "/home/gosoon/main".
Local exec file:
	`/home/gosoon/main', file type elf64-x86-64.
	Entry point: 0x465860
	0x0000000000401000 - 0x0000000000497893 is .text
	0x0000000000498000 - 0x00000000004dbb65 is .rodata
	0x00000000004dbd00 - 0x00000000004dc42c is .typelink
	0x00000000004dc440 - 0x00000000004dc490 is .itablink
	0x00000000004dc490 - 0x00000000004dc490 is .gosymtab
	0x00000000004dc4a0 - 0x0000000000534b90 is .gopclntab
	0x0000000000535000 - 0x0000000000535020 is .go.buildinfo
	0x0000000000535020 - 0x00000000005432e4 is .noptrdata
	0x0000000000543300 - 0x000000000054aa70 is .data
	0x000000000054aa80 - 0x00000000005781f0 is .bss
	0x0000000000578200 - 0x000000000057d510 is .noptrbss
	0x0000000000400f9c - 0x0000000000401000 is .note.go.buildid
(gdb) b *0x465860
Breakpoint 1 at 0x465860: file /home/gosoon/golang/go/src/runtime/rt0_linux_amd64.s, line 8.
(gdb) r
Starting program: /home/gaofeilei/./main

Breakpoint 1, _rt0_amd64_linux () at /home/gaofeilei/golang/go/src/runtime/rt0_linux_amd64.s:8
8		JMP	_rt0_amd64(SB)
(gdb) n
_rt0_amd64 () at /home/gaofeilei/golang/go/src/runtime/asm_amd64.s:15
15		MOVQ	0(SP), DI	// argc
(gdb) n
16		LEAQ	8(SP), SI	// argv
(gdb) n
17		JMP	runtime·rt0_go(SB)
(gdb) n
runtime.rt0_go () at /home/gaofeilei/golang/go/src/runtime/asm_amd64.s:91
91		MOVQ	DI, AX		// argc
231		CALL	runtime·mstart(SB)
(gdb) n
hello world
[Inferior 1 (process 39563) exited normally]

Through one-step debugging, you can see that the program entry function is in runtime / RT0_ linux_ Line 8 in the AMD64. S file will eventually execute the call runtime · mstart(SB) instruction and output "hello world", and then the program will exit.

Start the process. The function calls in the process are as follows:

rt0_linux_amd64.s -->_rt0_amd64 --> rt0_go-->runtime·settls -->runtime·check-->runtime·args-->runtime·osinit-->runtime·schedinit-->runtime·newproc-->runtime·mstart

2. golang start process analysis

Through gdb debugging in the previous section, we have seen that the golang program will execute a series of assembly instructions during startup. In this section, we will specifically analyze the meaning of each instruction in the startup process. Only by understanding these can we understand the operation of the golang program during startup.


#include "textflag.h"

TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
    JMP _rt0_amd64(SB)

TEXT _rt0_amd64_linux_lib(SB),NOSPLIT,$0
    JMP _rt0_amd64_lib(SB)

Line 8 of the first execution is JMP_ rt0_amd64, which runs on the AMD64 platform_ rt0_ The file where the AMD64 function is located is src/runtime/asm_amd64.s.

TEXT _rt0_amd64(SB),NOSPLIT,$-8
		// Process argc and argv parameters. Argc refers to the number of command line input parameters. Argv stores all command line parameters
    MOVQ    0(SP), DI   // argc
    // argv is a pointer type
    LEAQ    8(SP), SI   // argv
    JMP runtime·rt0_go(SB)

_ rt0_ The AMD64 function saves argc and argv parameters to DI and SI registers and jumps to rt0_go function, RT0_ Main functions of go function:

  • 1. Copy the argc and argv parameters to the main process stack;
  • 2. Initialize the global variable g0, allocate about 64K stack space for g0 on the main process stack, and set the stackguard0, stackguard1 and stack fields of g0;
  • 3. Execute CPUID instruction to detect CPU information;
  • 4. Execute the nocpinfo code block to determine whether cgo needs to be initialized;
  • 5. Execute needtls code block and initialize tls and m0;
  • 6, execute ok code block, first bind m0 and g0, then call runtime args function to process the parameters and environment variables, call runtime osinit function to initialize cpu quantity, call runtime schedinit initialize scheduler, call Runtime / runtime to create the first execution function, call the "X" to start the main thread, The main thread will execute the first goroutine to run the main function, which will be blocked until the process exits;
TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
		// Code for handling command line arguments
    MOVQ    DI, AX      // AX = argc 
    MOVQ    SI, BX      // BX = argv
    // Expand the stack by 39 bytes. Why expand the stack by 39 bytes is not clear yet
    SUBQ    $(4*8+7), SP        
    ANDQ    $~15, SP    // Adjust to 16 byte alignment
    MOVQ    AX, 16(SP)  //argc is placed at SP + 16 bytes
    MOVQ    BX, 24(SP)  //argv is placed at SP + 24 bytes

		// Start initializing g0. runtime · g0 is a global variable. The variable is defined in src/runtime/proc.go. The global variable will be saved in the data area of the process memory space. The following describes the method to view the code data and global variables in the elf binary file
		// The stack of g0 is allocated from the memory area of the process stack, and g0 occupies about 64k. 
    MOVQ    $runtime·g0(SB), DI    // Put the address of g0 into the DI register 
    LEAQ    (-64*1024+104)(SP), BX // BX = SP - 64*1024 + 104
    // Start initializing the three fields of stackguard0, stackguard1 and stack of the g0 object    
    MOVQ    BX, g_stackguard0(DI) // g0.stackguard0 = SP - 64*1024 + 104
    MOVQ    BX, g_stackguard1(DI) // g0.stackguard1 = SP - 64*1024 + 104
    MOVQ    BX, (g_stack+stack_lo)(DI) // g0.stack.lo = SP - 64*1024 + 104
    MOVQ    SP, (g_stack+stack_hi)(DI) // g0.stack.hi = SP

After executing the above instructions, the process memory space layout is as follows:

Then start to execute the instructions to obtain cpu information and related to cgo initialization. This code can be ignored for the time being.

		// Execute CPUID instruction, try to obtain CPU information, and probe the code of CPU and instruction set
    MOVL    $0, AX
    MOVL    AX, SI
    CMPL    AX, $0
    JE  nocpuinfo

    // Figure out how to serialize RDTSC.
    // On Intel processors LFENCE is enough. AMD requires MFENCE.
    // Don't know about the rest, so let's do MFENCE.
    CMPL    BX, $0x756E6547  // "Genu"
    JNE notintel
    CMPL    DX, $0x49656E69  // "ineI"
    JNE notintel
    CMPL    CX, $0x6C65746E  // "ntel"
    JNE notintel
    MOVB    $1, runtime·isIntel(SB)
    MOVB    $1, runtime·lfenceBeforeRdtsc(SB)

    // Load EAX=1 cpuid flags
    MOVL    $1, AX
    MOVL    AX, runtime·processorVersionInfo(SB)
		// CGO initialization related_ cgo_init is a global variable
    MOVQ    _cgo_init(SB), AX
    // Check if AX is 0
    TESTQ   AX, AX
    // Jump to needtls
    JZ  needtls
    // arg 1: g0, already in DI
    MOVQ    $setg_gcc<>(SB), SI // arg 2: setg_gcc

		CALL    AX
		// If the CGO feature is enabled, some fields of g0 will be modified
    MOVQ    $runtime·g0(SB), CX
    MOVQ    (g_stack+stack_lo)(CX), AX
    ADDQ    $const__StackGuard, AX
    MOVQ    AX, g_stackguard0(CX)
    MOVQ    AX, g_stackguard1(CX)

Next, execute the needtls code block and initialize tls and m0. tls is stored locally by the thread. During the operation of the golang program, each m needs to be associated with a working thread. How does the working thread know its associated m? At this time, the thread local storage will be used. The thread local storage is the thread private global variable, Through thread local storage, you can initialize a private global variable m for each thread, and then you can use the same global variable name in each working thread to access different M structure objects. It will be analyzed later that each worker thread M uses the thread local storage mechanism to implement a private global variable pointing to the instance object of the M structure for the worker thread just before it is created and enters the scheduling cycle.

In the following code analysis, you will often see calling the getg function. The getg function will get the currently running g from the thread local storage, and the g0 associated with m obtained here.

The tls address will be written to m0, and m0 will be bound to g0, so g0 can be obtained directly from tls.

// Next, initialize tls(thread local storage), set m0 as the thread private variable, and bind m0 to the main thread
    LEAQ    runtime·m0+m_tls(SB), DI  //DI = & m0.tls, take the address of the TLS member of M0 to the DI register
    // Call the runtime · settls function to set the local storage of the thread, and the parameters of the runtime · settls function are in the DI register
    // Set the address of m0.tls[1] to the address of TLS in the runtime · settls function
    // Runtime · settls function in runtime/sys_linux_amd64.s#599
    CALL    runtime·settls(SB)

    // This is to verify whether the local storage works properly and ensure that the value is correctly written to m0.tls,
    // If there is a problem, abort exits the program
    // get_tls is a macro located in runtime/go_tls.h
    get_tls(BX) 					 // Put the address of TLS into BX, i.e. BX = & m0.tls [1] 
    MOVQ    $0x123, g(BX)  // BX = 0x123, i.e. m0.tls[0] = 0x123
    MOVQ    runtime·m0+m_tls(SB), AX    // AX = m0.tls[0]
    CMPQ    AX, $0x123
    JEQ 2(PC)   								// If equal, jump back two instructions to the ok code block
    CALL    runtime·abort(SB)   // Interrupt execution using INT instruction

Continue to execute the ok code block. The main logic is:

  • Bind m0 and g0 to start the main thread;
  • Call the runtime · osinit function to initialize the number of CPUs. The scheduler needs to know how many CPU cores the current system has during initialization;
  • Calling the runtime · schedinit function initializes the m0 and p objects, and sets the maxmcount member of the global variable sched to 10000, limiting the maximum number of 10000 operating system threads that can be created to work;
  • Call runtime · newproc to create goroutine for the main function;
  • Call runtime · mstart to start the main thread and execute the main function;
// First save the g0 address in TLS, that is, m0.tls [0] = & g0, and then bind M0 and g0
// That is, m0.g0 = g0, g0.m = m0
    get_tls(BX)    							// Get TLS address to BX register, i.e. BX = m0.tls[0]
    LEAQ    runtime·g0(SB), CX  // CX = &g0
    MOVQ    CX, g(BX) 				  // m0.tls[0]=&g0
    LEAQ    runtime·m0(SB), AX  // AX = &m0
    MOVQ    CX, m_g0(AX)  // m0.g0 = g0
    MOVQ    AX, g_m(CX)   // g0.m = m0
    CLD             // convention is D is always left cleared
   	// The check function checks various types and whether there is a problem with type conversion, which is located in runtime/runtime1.go#137
    CALL    runtime·check(SB)
		// Move argc and argv to SP+0 and SP+8
		// This is to use argc and argv as arguments to the runtime · args function
    MOVL    16(SP), AX      
    MOVL    AX, 0(SP)
    MOVQ    24(SP), AX      
    MOVQ    AX, 8(SP)
    // The args function reads parameters and environment variables from the stack for processing
    // The args function is located at runtime/runtime1.go#61
    CALL    runtime·args(SB)
    // osinit function is used to initialize the number of CPUs. The function is located in runtime/os_linux.go#301
    CALL    runtime·osinit(SB)
		// The schedinit function is used to initialize the scheduler. The function is located at runtime/proc.go#654
    CALL    runtime·schedinit(SB)

		// Create the first goroutine and execute the runtime. Main function. Get the address of runtime.main and call newproc to create g
    MOVQ    $runtime·mainPC(SB), AX     
    PUSHQ   AX            // runtime.main is put on the stack as the second parameter of newproc
    PUSHQ   $0            // The first parameter of newproc is put on the stack. This parameter represents the parameter size required by the runtime.main function. Runtime.main has no parameters, so here is 0
		// newproc creates a new goroutine and places it in the waiting queue. The goroutine will execute the runtime.main function, which is located in runtime/proc.go#4250
    CALL    runtime·newproc(SB)
    // Pop up data at the top of the stack
    POPQ    AX
    POPQ    AX

		// The mstart function will start the main thread to enter the scheduling loop, and then run the goroutine just created. Mstart will block unless the function exits. The mstart function is located in runtime/proc.go#1328
    CALL    runtime·mstart(SB)

    CALL    runtime·abort(SB)   // mstart should never return

    // Prevent dead-code elimination of debugCallV2, which is
    // intended to be called by debuggers.
    MOVQ    $runtime·debugCallV2<ABIInternal>(SB), AX

At this time, the process memory space layout is as follows:

View ELF binary file structure

You can view the structure of the ELF binary file through the readelf command. You can see the contents of the code area and data area in the binary file. Global variables are saved in the data area and functions are saved in the code area.

$ readelf -s main | grep runtime.g0
  1765: 000000000054b3a0   376 OBJECT  GLOBAL DEFAULT   11 runtime.g0
// _ cgo_init is a global variable
$ readelf -s main | grep -i _cgo_init
  2159: 000000000054aa88     8 OBJECT  GLOBAL DEFAULT   11 _cgo_init


This paper mainly introduces the key codes in the startup process of Golang program. The main codes in the startup process are compiled through Plan9. If you haven't done the underlying related things, it seems very difficult. The author doesn't fully understand some of the details. If you are interested, you can discuss some detailed implementation details in private, Some hard coded numbers and operating system and hardware related specifications are relatively difficult to understand. Relevant analysis articles will also be written for several major components in Golang runtime.

reference resources:








Posted by pietbez on Wed, 24 Nov 2021 00:35:26 -0800