In the past three sections, you and I have seen how our programs become machine instructions through some simple code; How conditional jumps like if... else are executed; How do loops like for/while execute; How mutual calls between functions occur.
Since our programs are eventually turned into machine codes to execute, why is the same program on the same computer Linux Can it run under windows, but not under windows? Conversely, programs on Windows cannot be executed on Linux. But our CPU has not been replaced. Should it recognize the same instructions?
If you have the same question as me, let's solve this section together.
Compiling, linking, and loading: disassembling program execution
In Section 5, we said that the written C language code can be compiled into assembly code through the compiler, and then the assembly code can be transformed into machine code understandable by the CPU through the assembler, so the CPU can execute these machine codes. You should be familiar with this process now, but this description greatly simplifies the process. Next, let's look at how a C language program becomes an executable program.
I don't know if you have noticed that in the past few sections, we have some small problems with the files generated by gcc and the assembly instructions obtained by objdump. We split the previous example of the add function into two files, add_lib.c and link_example.c.
// add_lib.c int add(int a, int b) { return a+b; }
// link_example.c #include <stdio.h> int main() { int a = 10; int b = 5; int c = add(a, b); printf("c = %d\n", c); }
We compile these two files through gcc, and then look at their assembly code through the objdump command.
gcc -g -c add_lib.c link_example.c $ objdump -d -M intel -S add_lib.o $ objdump -d -M intel -S link_example.o
add_lib.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <add>: 0: 55 push rbp 1: 48 89 e5 mov rbp,rsp 4: 89 7d fc mov DWORD PTR [rbp-0x4],edi 7: 89 75 f8 mov DWORD PTR [rbp-0x8],esi a: 8b 55 fc mov edx,DWORD PTR [rbp-0x4] d: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8] 10: 01 d0 add eax,edx 12: 5d pop rbp 13: c3 ret
link_example.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <main>: 0: 55 push rbp 1: 48 89 e5 mov rbp,rsp 4: 48 83 ec 10 sub rsp,0x10 8: c7 45 fc 0a 00 00 00 mov DWORD PTR [rbp-0x4],0xa f: c7 45 f8 05 00 00 00 mov DWORD PTR [rbp-0x8],0x5 16: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8] 19: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 1c: 89 d6 mov esi,edx 1e: 89 c7 mov edi,eax 20: b8 00 00 00 00 mov eax,0x0 25: e8 00 00 00 00 call 2a <main+0x2a> 2a: 89 45 f4 mov DWORD PTR [rbp-0xc],eax 2d: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc] 30: 89 c6 mov esi,eax 32: 48 8d 3d 00 00 00 00 lea rdi,[rip+0x0] # 39 <main+0x39> 39: b8 00 00 00 00 mov eax,0x0 3e: e8 00 00 00 00 call 43 <main+0x43> 43: b8 00 00 00 00 mov eax,0x0 48: c9 leave 49: c3 ret
Now that the code has been "compiled" into instructions, we might as well try running it. / link_example.o.
Unfortunately, the file does not have execution permission, and we encountered a Permission denied error. Even if link is given through the chmod command_ Example. O file executable permissions, run. / link_example.o will still only get an error of cannot execute binary file:Exec format error.
If we take a closer look at the codes of the two files from objdump, we will find that the addresses of the two programs start from 0. If the address is the same, how does the program know which file to jump to if it needs to call the function through the call instruction?
Let's put it this way. Either the run error here or the duplicate address in the assembly code from objdump is due to add_lib.o and link_example.o is not an Executable Program, but an Object File. Only by linking multiple object files and various function libraries called through Linker can we get an executable file.
Through the - o parameter of gcc, we can generate the corresponding executable file. After the corresponding execution, we can get the result of this simple addition call function.
gcc -o link-example add_lib.o link_example.o $ ./link_example c = 15
In fact, the process of "C language code assembly code machine code" is composed of two parts when it is carried out on our computer.
The first part consists of three stages: compile, Assemble and Link. After these three stages are completed, we generate an executable file.
In the second part, we Load the executable file into memory through the Loader. The CPU reads instructions and data from memory to start the real program execution.
ELF format and linking: understanding the linking process
The program is finally transformed into instructions and data through the loader, so the executable code we generate is not just instructions. Let's take out the contents of the executable file through the objdump instruction.
link_example: file format elf64-x86-64 Disassembly of section .init: ... Disassembly of section .plt: ... Disassembly of section .plt.got: ... Disassembly of section .text: ... 6b0: 55 push rbp 6b1: 48 89 e5 mov rbp,rsp 6b4: 89 7d fc mov DWORD PTR [rbp-0x4],edi 6b7: 89 75 f8 mov DWORD PTR [rbp-0x8],esi 6ba: 8b 55 fc mov edx,DWORD PTR [rbp-0x4] 6bd: 8b 45 f8 mov eax,DWORD PTR [rbp-0x8] 6c0: 01 d0 add eax,edx 6c2: 5d pop rbp 6c3: c3 ret 00000000000006c4 <main>: 6c4: 55 push rbp 6c5: 48 89 e5 mov rbp,rsp 6c8: 48 83 ec 10 sub rsp,0x10 6cc: c7 45 fc 0a 00 00 00 mov DWORD PTR [rbp-0x4],0xa 6d3: c7 45 f8 05 00 00 00 mov DWORD PTR [rbp-0x8],0x5 6da: 8b 55 f8 mov edx,DWORD PTR [rbp-0x8] 6dd: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 6e0: 89 d6 mov esi,edx 6e2: 89 c7 mov edi,eax 6e4: b8 00 00 00 00 mov eax,0x0 6e9: e8 c2 ff ff ff call 6b0 <add> 6ee: 89 45 f4 mov DWORD PTR [rbp-0xc],eax 6f1: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc] 6f4: 89 c6 mov esi,eax 6f6: 48 8d 3d 97 00 00 00 lea rdi,[rip+0x97] # 794 <_IO_stdin_used+0x4> 6fd: b8 00 00 00 00 mov eax,0x0 702: e8 59 fe ff ff call 560 <printf@plt> 707: b8 00 00 00 00 mov eax,0x0 70c: c9 leave 70d: c3 ret 70e: 66 90 xchg ax,ax ... Disassembly of section .fini: ...
You will find that the contents of the executable code dump are similar to the previous object code, but much longer. Because under Linux, executable files and target files use a file format called ELF (executable and linkable file format), which is called executable and linkable file format in Chinese. It not only stores the compiled assembly instructions, but also retains a lot of other data.
For example, in all our objdump codes in the past, you can see the corresponding function names, such as add, main, etc., and even the globally accessible variable names defined by you are stored in this elf format file. These names and their corresponding addresses are stored in an ELF file in a location called Symbols Table. The symbol table is equivalent to an address book, associating names with addresses.
Let's focus first on the parts related to our add and main functions. You will find that the address of the main function calling add and jump is no longer the address of the next instruction, but the entry address of the add function. This is the credit of the ELF format and linker.
ELF file format saves various information into sections one by one. Elf has a basic File Header to represent the basic attributes of the file, such as whether it is an executable file, the corresponding CPU, operating system, etc. In addition to these basic properties, most programs also have some sections:
1. The first is. text Section, also known as Code Section or Code Section, which is used to save the program code and instructions;
2. Followed by. Data Section, also known as Data Section, which is used to save the initialization data information set in the program;
3. Then there is the. rel.text Section, also known as the Relocation Table. In the redirection table, what is reserved is the current file. In fact, we don't know which jump addresses. Like the link above_ In example. O, we call add and printf in the main function, but before the link occurs, we don't know where to jump to, and these information will be stored in the redirection table;
4. Finally, the. symtab Section is called the Symbol Table. The Symbol Table keeps the address book of the function name and corresponding address defined in the current file.
The linker will scan all the input target files, and then collect all the information in the symbol table to form a global symbol table,. Then, according to the redirection table, correct all the codes that are not sure to jump to the address according to the address stored in the symbol table. Finally, the corresponding segments of all the target files are merged once to become the final executable code. This is why the address of the function call in the executable file is correct.
After the linker turns the program into an executable file, it is much easier for the loader to execute the program. The loader no longer considers the problem of address jump. It only needs to parse the ELF file and load the corresponding instructions and data into memory for CPU execution.
Summary extension
At this point, I believe you have guessed why the same program can be executed under LInux but cannot be executed under Windows. One very important reason is that the format of executable files under the two operating systems is different.
Today we will focus on the ELF file format under Linux, while the executable file format of Windows is a file format called PE (Portable Executable Format). Loaders under Linux can only parse elf format, not PE format.
If we have a loader that can parse the PE format, we may run the windows program under Linux. Does such a program really exist? Yes, the famous open source project Wine under Linux enables us to run windows programs directly under Linux through a loader compatible with PE format. Now Microsoft Windows also provides WSL, that is, WIndows Subsystem for Linux, which can parse and load ELF format files.
When we write a program that can be used, we not only compile and execute all the code in one file, but can split it into different function libraries. Finally, through a static link mechanism, we can not only divide the work between different files, but also "cooperate" through static links to become an executable program.
For ELF format files, in order to implement such a static linking mechanism, it not only lists the instructions that the program needs to execute, but also includes the redirection table and symbol table required for linking.
reflection
You can read the symbol table of today's demo program through readelf to see what information is in the symbol table; Then read out the redirection table of today's demo program through objdump to see what information is in it.