Program memory allocation

Keywords: stm32 ARM

1, Preliminary knowledge - program memory allocation

The memory occupied by a program compiled by c/C + + is divided into the following parts
1. Stack - automatically allocated and released by the compiler to store the parameter values of functions, local variable values, etc. Its operation is similar to the stack in the data structure.
2. Heap - it is generally allocated and released by the programmer. If the programmer does not release it, it may be recycled by the OS at the end of the program. Note that it is different from the heap in the data structure. The allocation method is similar to the linked list, ha ha.
3. Global area (static) - global variables and static variables are stored together. Initialized global variables and static variables are in one area, and uninitialized global variables and uninitialized static variables are in another adjacent area- System release after program
4. Text constant area - constant strings are placed here. Released by the system at the end of the program
5. Program code area - the binary code that holds the function body.

Example program

This is written by an elder. It's very detailed

int a = 0; //Global initialization area
int a = 0; //Global initialization area
char *p1; //Global uninitialized area
main() {
    int b; //Stack
    char s[] = "abc"; //Stack
    char *p2; //Stack
    char *p3 = "123456"; //123456 \ 0 is in the constant area and p3 is on the stack.
    static int c = 0; //Global (static) initialization area
    p1 = (char *)malloc(10);
    p2 = (char *)malloc(20);
    //The allocated areas of 10 and 20 bytes are in the heap area.
    strcpy(p1, "123456"); //123456 \ 0 is placed in the constant area, and the compiler may optimize it in the same place as the "123456" pointed to by p3.

2, Theoretical knowledge of heap and stack

2.1 application method

Automatically assigned by the system. For example, declare a local variable int b in the function; The system automatically opens up space for B in the stack
Programmers need to apply and specify the size. malloc function in c
If p1 = (char *)malloc(10);
Using new operator in C + +
If p2 = (char *)malloc(10);
But note that p1 and p2 are on the stack.

2.2 system response after application

Stack: as long as the remaining space of the stack is greater than the applied space, the system will provide memory for the program, otherwise an exception will be reported, indicating stack overflow.
Heap: first of all, you should know that the operating system has a linked list of free memory addresses. When the system receives the application of the program,
It will traverse the linked list, find the first heap node whose space is greater than the applied space, then delete the node from the free node linked list, and allocate the space of the node to the program. In addition, for most systems, the size of this allocation will be recorded at the first address in this memory space, so that the delete statement in the code can correctly release this memory space. In addition, because the size of the found heap node is not necessarily equal to the size of the application, the system will automatically put the excess part back into the free linked list.

2.3 restrictions on application size

Stack: under windows, stack is a data structure extending to low address, which is a continuous memory area. This sentence means that the address at the top of the stack and the maximum capacity of the stack are predetermined by the system. Under windows, the size of the stack is 2M (or 1M, in short, it is a constant determined at compile time). If the requested space exceeds the remaining space of the stack, overflow will be prompted. Therefore, less space can be obtained from the stack.
Heap: a heap is a data structure that extends to a high address and is a discontinuous memory area. This is because the system uses the linked list to store the free memory address, which is naturally discontinuous, and the traversal direction of the linked list is from low address to high address. The size of the heap is limited by the virtual memory available in the computer system. It can be seen that the space obtained by heap is flexible and large.

2.4 comparison of application efficiency:

The stack is automatically allocated by the system, which is fast. But programmers have no control.
Heap is the memory allocated by new, which is generally slow and easy to generate memory fragments, but it is the most convenient to use
In addition, under WINDOWS, the best way is to use VirtualAlloc to allocate memory. It is not on the heap or on the stack. It directly reserves a fast memory in the process address space, although it is the most inconvenient to use. But it's fast and flexible.

2.5 storage contents in heap and stack

Stack: when calling a function, the first thing on the stack is the address of the next instruction in the main function (the next executable statement of the function call statement), and then the parameters of the function. In most C compilers, the parameters are stacked from right to left, and then the local variables in the function. Note that static variables are not stacked.
When this function call is over, the local variable comes out of the stack first, then the parameter, and finally the pointer at the top of the stack points to the address stored at the beginning, that is, the next instruction in the main function, from which the program continues to run.
Heap: generally, one byte is used at the head of the heap to store the size of the heap. The specific contents of the heap are arranged by the programmer.

2.6 comparison of access efficiency

char s1[] = "aaaaaaaaaaaaaaa";
char *s2 = "bbbbbbbbbbbbbbbbb";
AAA is assigned at runtime;
BBB is determined at compile time;
However, in future access, the array on the stack is faster than the string pointed to by the pointer (such as heap).
For example:

void main() {
    char a = 1;
    char c[] = "1234567890";
    char *p ="1234567890";
    a = c[1];
    a = p[1];

Corresponding assembly code

10: a = c[1];
00401067 8A 4D F1 mov cl,byte ptr [ebp-0Fh]
0040106A 88 4D FC mov byte ptr [ebp-4],cl
11: a = p[1];
0040106D 8B 55 EC mov edx,dword ptr [ebp-14h]
00401070 8A 42 01 mov al,byte ptr [edx+1]
00401073 88 45 FC mov byte ptr [ebp-4],al

The first one reads the elements in the string directly into the register cl, while the second one reads the pointer value into edx first. It is obviously slow to read characters according to edx.

2.7 summary:

The difference between heap and stack can be seen by the following analogy:
Using the stack is like eating in a restaurant. We just order (apply), pay, and eat (use). When we are full, we leave without paying attention to the preparation work such as cutting and washing dishes and the finishing work such as washing dishes and pots. Its advantage is fast, but it has little freedom.
Using piles is like making your favorite dishes by yourself. It is more troublesome, but it is more in line with your own taste, and has a large degree of freedom.

3, Memory structure in windows process

Before reading this article, if you don't even know what the stack is, please read the basics at the back of the article.

People who have come into contact with programming know that high-level languages can access data in memory through variable names. So how are these variables stored in memory? How does the program use these variables? This will be discussed in depth below. If there is no special statement in the C language code below, the release version compiled by VC is used by default.

First, let's learn how C language variables are divided in memory. C language has global variable, local variable, static variable and register variable. Each variable has a different allocation method. Let's start with the following code:

#include <stdio.h>
int g1=0, g2=0, g3=0;
int main()
    static int s1=0, s2=0, s3=0;
    int v1=0, v2=0, v3=0;
    //Print out the memory address of each variable     
    printf("0x%08x\n",&v1); //Print the memory address of each local variable
    printf("0x%08x\n",&g1); //Print the memory address of each global variable
    printf("0x%08x\n",&s1); //Print the memory address of each static variable
    return 0;

The compiled execution result is:




The output is the memory address of the variable. Where V1, V2 and V3 are local variables, G1, G2 and G3 are global variables, and S1, S2 and S3 are static variables. You can see that these variables are continuously distributed in memory, but the memory address allocated by local variables and global variables is 180000 miles different, while the memory allocated by global variables and static variables is continuous. This is because local variables and global / static variables are the result of allocation in different types of memory areas. The memory space of a process can be logically divided into three parts: code area, static data area and dynamic data area. Dynamic data areas are generally "stacks". "Stack" and "heap" are two different dynamic data areas. Stack is a linear structure and heap is a chain structure. Each thread of the process has a private "stack", so although the code of each thread is the same, the data of local variables do not interfere with each other. A stack can be described by "base address" and "top address". Global variables and static variables are allocated in the static data area, and local variables are allocated in the dynamic data area, that is, the stack. The program accesses local variables through the base address and offset of the stack.

├---—┤Low end memory area
│ ...... │
│ Dynamic data area │
│ ...... │
│ Code area │
│ Static data area │
│ ...... │
├---—┤High end memory area

Stack is a first in and last out data structure. The top address of the stack is always less than or equal to the base address of the stack. We can first understand the process of function call, so as to have a deeper understanding of the role of stack in the program. Different languages have different function call rules. These factors include parameter push rules and stack balance. The call rules of windows API are different from those of ANSI C. the former is adjusted by the called function, and the latter is adjusted by the caller. The two are distinguished by the prefixes "_stdcall" and "_cdecl". Look at the following code first:

#include <stdio.h>
void __stdcall func(int param1,int param2,int param3)
    int var1=param1;
    int var2=param2;
    int var3=param3;
    printf("0x%08x\n",param1); //Print out the memory address of each variable

int main() {
    return 0;

The compiled execution result is:


├---—┤<—Top of stack at function execution( ESP),Low end memory area
│ ...... │
│ var 1 │
│ var 2 │
│ var 3 │
│ RET │
├---—┤<—"__cdecl"Function returns the top of the stack( ESP)
│ parameter 1 │
│ parameter 2 │
│ parameter 3 │
├---—┤<—"__stdcall"Function returns the top of the stack( ESP)
│ ...... │
├---—┤<—Stack bottom (base address) EBP),High end memory area

The above figure shows the stack during function call. First, press three parameters into the stack in right to left order, first "param3", then "param2", and finally "param1"; then press the return address (RET) of the function, jump to the function address, and then execute (I would like to add a point here. In the articles introducing the principle of buffer overflow under UNIX, it is mentioned that after pressing RET, continue to press the current EBP, and then use the current ESP instead of EBP. However, in an article introducing function calls under windows, it is said that this step is also available for function calls under windows, but according to my actual debugging, I did not find this step. This can also be seen from param3 and var1 It can be seen that there is only a 4-byte gap between them); the third step is to subtract a number from the top of the stack (ESP) to allocate memory space for local variables. In the above example, 12 bytes are subtracted (ESP=ESP-3*4, each int variable occupies 4 bytes); then initialize the memory space of local variables. Because "_stdcall" The called function adjusts the stack, so it is necessary to recover the stack before the function returns. First recover the memory occupied by the local variable (ESP=ESP+3*4), then take out the return address, fill in the EIP register, recover the memory occupied by the previously pressed parameter (ESP=ESP+3*4), and continue to execute the caller's code. See the following assembly code:

;--------------func Assembly code of function-------------------

:00401000 83EC0C sub esp, 0000000C //Create memory space for local variables
:00401003 8B442410 mov eax, dword ptr [esp+10]
:00401007 8B4C2414 mov ecx, dword ptr [esp+14]
:0040100B 8B542418 mov edx, dword ptr [esp+18]
:0040100F 89442400 mov dword ptr [esp], eax
:00401013 8D442410 lea eax, dword ptr [esp+10]
:00401017 894C2404 mov dword ptr [esp+04], ecx

........................((omit several codes)

:00401075 83C43C add esp, 0000003C ;Recover the stack and reclaim the memory space of local variables
:00401078 C3 ret 000C ;Function returns to recover the memory space occupied by parameters
;If so“__cdecl"If so, here is“ ret",The stack will be recovered by the caller

;-------------------End of function-------------------------

;--------------Main program call func Function code--------------

:00401080 6A03 push 00000003 //Press in parameter param3
:00401082 6A02 push 00000002 //Press in parameter param2
:00401084 6A01 push 00000001 //Press in parameter param1
:00401086 E875FFFFFF call 00401000 //Call func function
;If so“__cdecl"If so, the stack will be restored here“ add esp, 0000000C"

Smart readers can almost understand the principle of buffer overflow after reading here. Let's take a look at the following code:

#include <stdio.h>
#include <string.h>

void __stdcall func() {
    char lpBuff[8]="\0";

int main() {
    return 0;

How about executing the next time after compilation? Ha, the "0x00000000" memory referenced by the "0x00414141" instruction. The memory cannot be "read". "Illegal operation" "41" is the hexadecimal ASCII code of "A", which is obviously the problem with strcat. The size of "lpBuff" is only 8 bytes, which is calculated into the \ 0 at the end. Strcat can only write 7 "A" at most, but the program actually writes 11 "A" and 1 \ 0. Let's take A look at the figure above. The additional 4 bytes just cover the memory space of RET, which leads to The function returns to A wrong memory address and executes the wrong instruction. If this string can be carefully constructed and divided into three parts, the first part is only filled with meaningless data to achieve the purpose of overflow, followed by A data covering RET, followed by A segment of shellcode, as long as this RET address can point to the first instruction of this segment of shellcode, then The shellcode can be executed when the function returns. However, different versions of the software and different running environments may affect the location of this shellcode in memory, so it is very difficult to construct this ret. Generally, A large number of NOP instructions are filled between RET and shellcode to make exploit more versatile.

├---—┤<—Low end memory area
│ ...... │
├---—┤<—from exploit Start of filling in data
│ │
│ buffer │<—Fill in useless data
│ │
│ RET │<—point shellcode,or NOP Scope of instruction
│ NOP │
│ ...... │<—Filled NOP Command, yes RET Pointing range
│ NOP │
│ │
│ shellcode │
│ │
├---—┤<—from exploit End of filling data
│ ...... │
├---—┤<—High end memory area

Dynamic data under windows can be stored not only in the stack, but also in the heap. Friends who know C + + know that C + + can use the new keyword to dynamically allocate memory. Take a look at the following C + + Code:

#include <stdio.h>
#include <iostream.h>
#include <windows.h>

void func()
    char *buffer=new char[128];
    char bufflocal[128];
    static char buffstatic[128];
    printf("0x%08x\n",buffer); //Print the memory address of the variable in the heap
    printf("0x%08x\n",bufflocal); //Print the memory address of the local variable
    printf("0x%08x\n",buffstatic); //Print the memory address of the static variable

void main() {

The program execution result is:


It can be found that the memory allocated with the new keyword is neither in the stack nor in the static data area. The VC compiler realizes the dynamic memory allocation of the new keyword through the "heap" under windows. Before talking about "heap", let's learn about several API functions related to "heap":

- HeapAlloc Request memory space in heap
- HeapCreate Create a new heap object
- HeapDestroy Destroy a heap object
- HeapFree Free requested memory
- HeapWalk Enumerates all memory blocks of heap objects
- GetProcessHeap Gets the default heap object for the process
- GetProcessHeaps Get all heap objects of the process
- LocalAlloc
- GlobalAlloc

When the process is initialized, the system will automatically create a default heap for the process. The default memory size of this heap is 1M. Heap objects are managed by the system and exist in memory in a chain structure. You can dynamically request memory space through the heap through the following code:

HANDLE hHeap=GetProcessHeap();
char *buff=HeapAlloc(hHeap,0,8);

Where hhep is the handle to the heap object, and buff is the address pointing to the requested memory space. What exactly is this hhep? Does its value mean anything? Take a look at the following code:

#pragma   comment(linker,"/entry:main")  // Define the entry of the program
#include <windows.h>

_CRTIMP int (__cdecl *printf)(const char *, ...); //Define STL function printf
 Here, let's review the previous knowledge:
 (*Note) printf function is a function in the standard function library of C language, and the standard function library of VC is implemented by msvcrt.dll module.
 It can be seen from the function definition that the number of parameters of printf is variable. The number of parameters pushed by the caller cannot be known in advance inside the function. The function can only obtain the information of the pushed parameters by analyzing the format of the first parameter string. Because the number of parameters here is dynamic, the caller must balance the stack. This is used here__ Cdecl call rule. BTW, the API function of Windows system is basically__ stdcall calls, with the exception of one API, wsprintf, which uses__ The cdecl call rule is the same as the printf function because the number of parameters is variable.
void main()
    HANDLE hHeap=GetProcessHeap();
    char *buff=HeapAlloc(hHeap,0,0x10);
    char *buff2=HeapAlloc(hHeap,0,0x10);
    HMODULE hMsvcrt=LoadLibrary("msvcrt.dll");
    printf=(void *)GetProcAddress(hMsvcrt,"printf");

The execution results are:


How is the value of hhep so close to the value of that buff? In fact, the handle of hhep is the address pointing to the header of HEAP. There is a structure called PEB (process environment block) in the user area of the process. This structure stores some important information about the process. The ProcessHeap stored at the PEB first address offset 0x18 is the address of the process default HEAP, and the offset 0x90 stores a pointer to the address list of all the heaps of the process. Many windows API s use the default HEAP of the process to store dynamic data. For example, all ANSI functions in windows 2000 apply for memory in the default HEAP to convert ANSI strings to Unicode strings. The access to a HEAP is sequential. Only one thread can access the data in the HEAP at the same time. When multiple threads have access requirements at the same time, they can only wait in line, resulting in the decline of program execution efficiency.

Finally, let's talk about data alignment in memory. Bit data alignment means that the memory address where the data is located must be an integer multiple of the data length. The memory starting address of DWORD data can be divided by 4, and the memory starting address of WORD data can be divided by 2. x86 CPU can directly access the aligned data. When it attempts to access an unaligned data, it will make a series of internal adjustments, These adjustments are transparent to the program, but will reduce the running speed, so the compiler will try to ensure data alignment when compiling the program. For the same piece of code, let's look at the execution results of programs compiled with three different compilers: VC, Dev-C + + and lcc:

#include <stdio.h>

int main()
    int a;
    char b;
    int c;
    return 0;

This is the execution result compiled with VC:


The order of variables in memory: b(1 byte) - a(4 bytes) - c(4 bytes).

This is the execution result compiled with Dev-C + +:


The order of variables in memory: c(4 bytes) - 3 bytes apart - b(1 byte) - a(4 bytes).

This is the execution result compiled with lcc:


The order of variables in memory: the same as above.

The three compilers have achieved data alignment, but the latter two compilers are obviously not as "smart" as VC, so that a char occupies 4 bytes and wastes memory.

Posted by WeddingLink on Sun, 07 Nov 2021 19:45:47 -0800