AArch64 tutorial Chapter 5

Keywords: Java ARM

AArch64 tutorial Chapter 5

In this chapter, we will look at how memory is accessed in aarch64

Memory

Random access memory, or simply memory, is a necessary part of any architecture. Memory can be thought of as an array of consecutive numbers called addresses, each element being a byte. In AArch64, the address is a 64 bit (which does not mean that all bits are meaningful to the address).

Address algebra

Suppose the address is a pile of numbers we can manipulate. However, not all arithmetic operations can operate on addresses. A high address that can be subtracted is called a low address. The result is not an address, but an offset. The offset can be added to an address to form a new address. Many times, we need to access a continuous set of b-sized elements in memory, so their addresses are also continuous. This means that it is very common to calculate an address in the form of A+b*i, that is, i-th operation.

These common address operations affect the instructions of the architecture, as described below.

Load and store

Because of the inheritance of RISC, the AArch64 instruction cannot directly operate on memory. Only two special instructions can operate, load and store. These two instructions have two basic operands, a register and an address. The address is calculated based on the addressing mode, as we will see next. A load will get a number in bytes from a calculated address and put it into a register. A store takes some bytes from the register and puts them into an address.

AArch64 supports a series of load/store instructions, but for the purpose of this chapter, we only consider ldr(load) and str(store). Their syntax is as follows

ldr Xn, address-mode  // Xn ← 8 bytes of memory at address computed by address-mode
ldr Wn, address-mode  // Wn ← 4 bytes of memory at address computed by address-mode

str Xn, address-mode  // 8 bytes of memory at address computed by address-mode ← Xn
str Wn, address-mode  // 4 bytes of memory at address computed by address-mode ← Wn

The AArch64 can be configured as a small end or a large end. This makes them a little different, but it determines which 8 / 4 bytes we will operate in the register. We assume a small end setting (which is generally very common). This also means that as an 8-byte load/store, the lowest bit of the register corresponds to the first byte, and then goes to the high bit. A large end machine will work in the opposite working mode, and the first byte corresponds to the highest bit.

Addressing mode

Addressing mode is a process through which the load/store instruction calculates the address value it will access. Instructions can be encoded as 32 bits on AArch64, but as we have said, the address is 64 bits. This means that most addressing modes using immediate numbers are not feasible. Some architectures can encode bit full address patterns in their instructions. Programs in these architectures rarely do this because it may take up a lot of space.

Base address (base)

We consider the simplest model, which, of course, has not been discussed in some environments. We already have an address in the Xn register. In this case, the address is calculated using only the value in the register. This mode is called the base address register, and its syntax is [Xn]. Only a 64 bit register can be used as the base address.

ldr W2, [X1]  // W2 ← *X1 (32-bit load)
ldr X2, [X1]  // X2 ← *X1 (64-bit load)

Base address plus offset

With the address calculation method described above, we can add an offset to the address to make it called another address. Its syntax is [Xn, #offset]. This offset ranges from - 256 to 255. Larger offsets are subject to some limitations. For a 32-bit immediate, its value must be a multiple of 4, from 0 to 16380, and for a 64 bit immediate, it must be a multiple of 8, from 0 to 32760.

ldr W2, [X1, #4]        // W2 ← *(X1 + 4)   [32-bit load]
ldr W2, [X1, #-4]       // W2 ← *(X1 - 4)   [32-bit load]
ldr X2, [X1, #240]      // X2 ← *(X1 + 240) [64-bit load]
ldr X2, [X1, #400]      // X2 ← *(X1 + 400) [64-bit load]
// ldr X2, [X1, #404]   // Invalid offset, not multiple of 8!
// ldr X2, [X1, #-400]  // Invalid offset, must be positive!
// ldr X2, [X1, #32768] // Invalid offset, out of the range!

Base address plus Register Offset

Although immediate offsets can already be used, sometimes offsets cannot be encoded as immediate or may not be known before the program runs. In these cases, a register is used.

ldr W1, [X2, X3]  // W1 ← *(X2 + X3) [32-bit load]
ldr X1, [X2, X3]  // X1 ← *(X2 + X3) [64-bit load]

It is possible to change the value of the offset register, using the lsl shift instruction. By lsl #n multiplying the offset by 2.

ldr W1, [X2, X3, lsl #3] // W1 ← *(X2 + (X3 << 3)) [32-bit load]
                         // this is the same as
                         // W1 ← *(X2 + X3*8)      [32-bit load]
ldr X1, [X2, X3, lsl #3] // X1 ← *(X2 + (X3 << 3)) [64-bit load]
                         // this is the same as
                         // X1 ← *(X2 + X3*8)      [64-bit load]

The offset register enables 32 bits relative to the base address register, but we are sometimes forced to specify 32 to 64 bits. At this point, we must use the extension operators in Chapter 3. Assuming that the source is a 32-bit value, only sxtw and uxtw are allowed.

ldr W1, [X2, W3, sxtw] // W1 ← *(X2 + ExtendSigned32To64(W3))    [32-bit load]
ldr W1, [X2, W3, uxtw] // W1 ← *(X2 + ExtendUnsigned32To64(W3))  [64-bit load]

As we already know, it is feasible to combine the extension operator and the shift operator.

ldr W1, [X2, W3, sxtw #3] // W1 ← *(X2 + ExtendSigned32To64(W3 << 3)) [32-bit-load]

Index mode

Sometimes, until we are reading continuous memory, in this scenario, we only relate to the currently read elements. In the worst case, we always get the address value through arithmetic operation in a register and use the mode of base address index. Or, better, put the first address in the register, then this address will be regarded as the base address, and then calculate its offset. If we use the latter method, most of the time, our offset will be updated with relatively simple calculations, such as addition or the combination of addition and shift (multiplication is the nth power of 2). We can get the fact that most of the time, the address calculation will have a peak (shape). In these cases, we might want an index pattern.

There are two indexing modes in AArch64: pre indexing and post indexing. In pre index mode, the offset calculation address is added to its base address register, and this address is written back to the base address register. In post index mode, the base address is used to calculate the address, but the value of the address is updated after the address accesses the base address register, and the offset is added to the value.

These two methods look a little similar. They both update the offset base address register. The difference lies in the calculation timing of the offset: the pre index mode will be calculated before the access address, and the post index mode will be calculated after the access. The offset value we can use must be between - 256 and 255.

Pre index

The access mode of budget index is [Xn, #offset]!, consider! Symbol, otherwise you may describe a base address plus offset without index. In the actual operation, it is more likely to be the base address plus offset, but! Remind us of the side effects of updating the base address register.

ldr X1, [X2, #4]! // X1 ← *(X2 + 4)
                  // X2 ← X2 + 4

Post model

The syntax is [Xn], #offset. If there is one! After #offset, the syntax is to get a visible clue, similar to the pre pattern.

ldr X1, [X2], #4  // X1 ← *X2
                  // X2 ← X2 + 4

Load a literal address

Global objects, such as global variables or functions, have constant addresses. This means that it should be able to load them as literals. But as we know on AArch64, it cannot be loaded directly from literal. Therefore, we must use a two-step approach (which is very common in RSIC Architecture). First, we need to tell the assembler to put the global variable address in the current instruction attachment. Then we load the address into a register that uses a special form of load instruction (called load immediate).

In most of our examples, it might look like this

ldr Xn, addr_of_var // Xn ← &var
... 
addr_of_var : .dword variable // This tells the assembler that
                              // we want here the address of var
                              // (This is not to be executed!)

Once we have the address of the variable, which is loaded into the register, we can do a secondary load to load the number of bits we want.

ldr Xm, [Xn]  // Xm ← *Xn    [64-bit load]
ldr Wm, [Xn]  // Wm ← *Xn    [32-bit load]

Use 32-bit address

Using 64 bit addresses is correct, but there is some waste. The reason is that most of our programs do not need more than 32-bit values to encode all code and data addresses. The address of our global variable always makes the upper 32 bits 0. So we may only want to use the 32-bit address.

ldr Wn, addr_of_var // Wn ← &var
... 
addr_of_var : .word variable // This tells the assembler that
                             // we want here the address of var
                             // (This is not to be executed!)
                             // 32-bit address here

Recall that when a 32-bit value is written to a register, its upper 32 bits are cleared. Therefore, after ldr, we can use [Xn] in load or store without any problems.

global variable

As an example of today's topic, we will load and store some global variables. This question will not serve any purpose.

Global variables are defined in the. data section. To implement this method, we simply define their initial values. If we want to define a 32-bit variable, we use. word. If we want to create a 64 bit variable, we use. dword.

// globalvar.s
.data

.balign 8 // Align to 8 bytes
.byte 1
global_var64 : .dword 0x1234  // a 64-bit value of 0x1234
// alternatively: .word 0x1234, 0x0

.balign 4 // Align to 4 bytes
.byte 1
global_var32 : .word 0x5678   // a 32-bit value of 0

In Linux, AArch64 does not require memory access to it. But if they are aligned, they will execute faster in hardware. So we use the. balign instruction to align each variable according to the size of the data (in bytes).

Limit the number of variables we can load. For example, we will add 1 to each variable.

.text

.globl main
main :
  ldr X0, address_of_global_var64 // X0 ← &global_var64
  ldr X1, [X0]                    // X1 ← *X0
  add X1, X1, #1                  // X1 ← X1 + 1
  str X1, [X0]                    // *X0 ← X1

  ldr X0, address_of_global_var32 // X0 ← &global_var32
  ldr W1, [X0]                    // W1 ← *X0
  add W1, W1, #1                  // W1 ← W1 + 1
  str W1, [X0]                    // *X0 ← W1

  mov W0, #0                      // W0 ← 0
  ret                             // exit program
address_of_global_var64 : .dword global_var64
address_of_global_var32 : .dword global_var32

Use 32-bit address

As mentioned above, saving 64 bit addresses to our variables is usually a bit wasteful. Here are some changes that are required to use 32-bit addresses.

.text

.globl main
main :
  ldr W0, address_of_global_var64 // W0 ← &global_var64
  ldr X1, [X0]                    // X1 ← *X0
  add X1, X1, #1                  // X1 ← X1 + 1
  str X1, [X0]                    // *X0 ← X1

  ldr W0, address_of_global_var32 // W0 ← &global_var32
  ldr W1, [X0]                    // W1 ← *X0
  add W1, W1, #1                  // W1 ← W1 + 1
  str W1, [X0]                    // *X0 ← W1

  mov W0, #0                      // W0 ← 0
  ret                             // exit program
address_of_global_var64 : .word global_var64 // note the usage of .word here
address_of_global_var32 : .word global_var32 // note the usage of .word here

Note that it is necessary to use the - static flag in the final connection phase. This creates a static file that is loaded directly into memory. By default, when the program runs, the linker creates dynamic files, which are loaded by the dynamic linker. The dynamic linker will load programs on one address, and more than 232 submit these addresses are illegal. When using. dword, the static linker ensures that the declaration of the dynamic linker is emitted, so the latter can repair the 64 bit address at run time.

There are better ways to get global variables, but these are enough for now. Maybe we will review the knowledge here in later chapters.

That's all for today.

The translator adds:
GDB debugging
First run the arm file with QEMU arm, set the debugging port to 12345, then start gdb, set the architecture, large end sequence or small end sequence, and set the remote target to start debugging.

$ qemu-arm -g 12345 ./a.out &
$ gdb-multiarch ./a.out
(gdb) set arch arm
The target architecture is assumed to be mips
(gdb) set endian little
The target is assumed to be little endian
(gdb) target remote localhost:12345
Remote debugging using localhost:12345
0x00400280 in _ftext ()
(gdb) x/i $pc
  => 0x767cb880    move   $t9, $ra

--------
Copyright notice: This is the original article of CSDN blogger "snow rubbing little monster", which follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this notice for reprint.
Original link: https://blog.csdn.net/qq_33892117/article/details/89500363

Posted by darkke on Tue, 14 Sep 2021 23:06:21 -0700