Mach-O file structure

Main contents:

  1. Understand executable
  2. Understanding Mach-O files
  3. Mach-O file structure
  4. Mach Header
  5. Load Commands
  6. Data
  7. Understand the big and small end mode
  8. Understanding common binaries

1, Understand executable

1. Executable documents
  1. Process is actually the result of loading the executable file in memory;
  2. The executable file must be in a format understandable by the operating system, and the format of executable files of different systems is also different;
2. Executable files of different platforms
  • Linux: ELF files
  • Windows: PE32 / PE32 + files
  • OS and iOS: Mach-O(Mach Object) files

2, Understanding Mach-O files

As the executable file format of iOS, iPadOS and macOS platforms, Mach-O file involves many functions such as App startup and operation, bitcode analysis, crash symbolization and so on:

1. Mach-O file
  1. Mach-O file is the executable file format of iOS, iPadOS and macOS platforms. The corresponding system runs the file in this format through the application binary interface (abbreviated as ABI);
  2. Mach-O format is used to replace the a.out format in BSD system. It saves the machine code and data generated in the process of compilation and link, so as to provide a single file format for static link and dynamic link codes.
  3. Mach-O provides stronger scalability and faster symbol table information access speed;
2. Common file types in mach-o format
  1. Executable: executable file (. Out. O);
  2. Dylib: dynamic link library;
  3. Bundle: cannot be linked. It can only be loaded at runtime using dlopen();
  4. Image: including Executable, Dylib and Bundle;
  5. Framework: folder containing Dylib, resource files and header files;

3, Mach-O file structure

1. Two ways to view Mach-O
  1. Using MachOView software, you can directly view the structure of MachO files;
  2. Use the terminal command objdump;
2. View Mach-O file structure

Using MachOView to view Mach-O, the effect is as follows:

Mach-O file contains three main parts:

  1. Header: header, which describes CPU type, file type, number and size of loading commands and other information;
  2. Load Commands: Load Commands whose number and size have been provided in the header;
  3. Data: data segment;

Other information includes:

  1. Dynamic Loader Info: dynamic library loading information
  2. Function Starts: entry function
  3. Symbol Table: Symbol Table
  4. Dynamic Symbol Table: dynamic library symbol table
  5. String Table: String Table

4, Mach header (executable header)

1. Function summary
  1. Header is the first content read by the linker when loading, because it determines some infrastructure, system type and other information;
  2. The Header contains the key information of the whole Mach-O file, such as CPU type, file type, number and size of loading commands, so that the system can quickly locate the running environment of Mach-O file;
  3. Header is for 32-bit and 64 bit CPU s, corresponding to Mach respectively_ Header and mach_header_64 structure;
2. Source code analysis

The Header is defined in the loader.h file. The specific code is as follows:

struct mach_header_64 {
    uint32_t    magic;          // 32-bit or 64 bit, which is used by the system kernel to determine whether it is in mach-o format
    cpu_type_t  cputype;        // CPU architecture type, such as ARM
    cpu_subtype_t   cpusubtype; // Specific types of CPU s, such as arm64 and armv7
    uint32_t    filetype;       // mach-o file type, executable file, target file or static library and dynamic library
    uint32_t    ncmds;          // LoadCommands the number of loading commands (the loading command follows the header)
    uint32_t    sizeofcmds;     // Size of all LoadCommands
    uint32_t    flags;          // Flag bit identifies the functions supported by binary files, mainly related to system loading and linking
    uint32_t    reserved;       // Reserved fields (more fields than 32 bits)
    };

Since the executable file, target file, static library and dynamic library are all in Mach-O format, filetype is required to describe them. Common file types are as follows:

#define MH_ Object 0x1 / * object file*/
#define MH_ Execute 0x2 / * executable*/
#define MH_ Dylib 0x6 / * dynamic library*/
#define MH_DYLINKER 0x7 / * dynamic linker*/
#define MH_ Dsym 0xa / * stores binary file symbol information for debug analysis*/
3.MachOView demonstration

5, Analyze Load Commands

1. Function summary
  1. Load Commands is a list of Load Commands, which is used to describe the layout information of Data in binary files and virtual memory;
  2. Load Commands records a lot of information, such as the location of the dynamic linker, the entry of the program, the information of the dependent library, the location of the code, the location of the symbol table, etc;
  3. Load commands are defined by the kernel. The number of commands in different versions is different, and their number and size are recorded in the header;
  4. The type of Load commands is LC_ Is a prefix constant, such as LC_SEGMENT,LC_SYMTAB, etc;
2. Code analysis

The Load Command is defined in the loader.h file. The specific code is as follows:

struct load_command {
    uint32_t cmd;       /* Type of load command */
    uint32_t cmdsize;   /* The size of the load command */
};

Each Load Command has an independent structure, but the first two fields of all structures are fixed. Like LC_SEGMENT_64. This is a command for reading segment and section. The specific code is as follows:

struct segment_command_64 { /* for 64-bit architectures */
    uint32_t    cmd;          // Indicates the type of load command
    uint32_t    cmdsize;      // Indicates the size of the loading command (including the size of the next nsects section s)
    char        segname[16];  // 16 byte segment name
    uint64_t    vmaddr;       // Virtual memory start address of segment
    uint64_t    vmsize;       // Virtual memory size of segment
    uint64_t    fileoff;      // The offset of the segment in the file
    uint64_t    filesize;     // The size of the segment in the file
    vm_prot_t   maxprot;      // Maximum memory protection required for segment pages (4 = r, 2 = w, 1 = x)
    vm_prot_t   initprot;     // Segment page initial memory protection
    uint32_t    nsects;       // Number of section s in the segment
    uint32_t    flags;        // Flag bit
};

6, Data

1. Function summary
  1. Data stores actual data and codes, mainly including methods, symbol tables, dynamic symbol tables, dynamic library loading information (redirection, symbol binding, etc.);
  2. The layout in Data is completely in accordance with the description in Load Command;
  3. Data is composed of segments and sections. Generally, data has multiple segments, and each segment can have zero to multiple section sections;
  4. Different segment s have a virtual address mapped to the address space of the process;

Almost all Mach-O files contain three segment s

  1. __ TEXT: code segment, read-only executable, storing binary code (_text) of function, constant string (_cstring), class / method name of OC, etc
  2. __ DATA: DATA segment, readable and writable, storing OC string (_cfstring), and runtime metadata: class/protocol/method, global variables, static variables, etc;
  3. __ LINKEDIT: read only, which stores the information required to start the App, such as the address of bind & rebase, the name and address of the function, etc;
2. Source code analysis

In the DATA area, Section accounts for a large proportion, and is mainly reflected in Mach-O__ TEXT and__ In the two paragraphs of DATA.

Section is defined in the loader.h file. The specific code is as follows:

struct section_64 { /* for 64-bit architectures */
    char        sectname[16];   // The name of the current section
    char        segname[16];    // The name of the segment where the section is located
    uint64_t    addr;       // Starting position in memory
    uint64_t    size;       // section size
    uint32_t    offset;     // File offset of section
    uint32_t    align;    // Byte size alignment
    uint32_t    reloff;     // Relocates the file offset of the entry
    uint32_t    nreloc;   // Number of relocation entrances
    uint32_t    flags;      // Flag, type and attribute of section
    uint32_t    reserved1;  // Reserved (for offset or index)
    uint32_t    reserved2;  // Reserved (for count or sizeof)
    uint32_t    reserved3;  // retain
};

7, Understand the big and small end mode

When analyzing Mach-O files, you often see the contents related to memory addresses, which involves the concept of large and small end mode;

  1. Small end mode: low byte of data, stored in the low address of memory;
  2. Big end mode: the low byte of data is stored in the high address of memory;

The processor of iOS device is based on ARM architecture. By default, it uses the small end mode (low byte puts low bit) to read data, while the network and Bluetooth usually use the large end mode (low byte puts high bit):

Let's take unsigned int value = 0x12345678 as an example to see its storage in two byte orders. We can use unsigned char buf[4] to represent value

Little-Endian: The low address is stored in the low order, as follows:
Low address ------------------> High address
0x78  |  0x56  |  0x34  |  0x12

Big-Endian: The low address stores the high address as follows:
Low address -----------------> High address
0x12  |  0x34  |  0x56  |  0x78

Memory address

Store content in small end mode

Content stored in big end mode

0x4000

0x78

0x12

0x4001

0x56

0x34

0x4002

0x34

0x56

0x4003

0x12

0x78

8, Understanding common binaries

1. Basic concepts
  1. The storage structure of general binary file is to package Mach-O files of various architectures, and the CPU can automatically detect and select the appropriate architecture when reading the binary file;
  2. The general binary file will store multiple architectures at the same time, so it is much larger than the binary file of a single architecture and will occupy a lot of disk space. However, since the system will automatically select the most appropriate and irrelevant architecture code when running, it will not occupy memory space, so the execution efficiency is improved;
  3. The general binary format is also called fat binary format;
2. General binary format analysis

The general binary format is defined in < mach-o / fat. H >

  1. download xnu Then, find the file in XNU - > external_headers - > mach-o.
  2. General binary files have two important structures: fat_header and fat_arch;

Two structures are defined as follows:

/*
 - magic: You can let the system kernel know that the file is a general binary when it reads it
 - nfat_arch: Indicates that there are multiple fat_arch structures below, that is, how many Mach-O are contained in the general binary file
 */
struct fat_header {
    uint32_t    magic;      /* FAT_MAGIC */
    uint32_t    nfat_arch;  /* number of structs that follow */
};

/*
 fat_arch Is the description of Mach-O
 - cputype And cpusubtype: describe the platforms applicable to Mach-O
 - offset(Offset), size, and align describe where Mach-O binaries are located in general binaries
 */
struct fat_arch {
    cpu_type_t  cputype;    /* cpu specifier (int) */
    cpu_subtype_t   cpusubtype; /* machine specifier (int) */
    uint32_t    offset;     /* file offset to this object file */
    uint32_t    size;       /* size of this object file */
    uint32_t    align;      /* alignment as a power of 2 */
};

Reference link

  1. xnu
  2. Mach-O official source code

Posted by saiko on Wed, 24 Nov 2021 01:56:06 -0800