ICode9

精准搜索请尝试: 精确搜索
首页 > 编程语言> 文章详细

x86_64汇编基础:Basics

2022-08-19 21:01:01  阅读:193  来源: 互联网

标签:register Basics x86 instruction registers 64 compiler rax instructions


参考

正文

Registers

Registers are the fastest kind of memory available in the machine. x86-64 has 14 general-purpose registers and several special-purpose registers. This table gives all the basic registers, with special-purpose registers highlighted in yellow. You’ll notice different naming conventions, a side effect of the long history of the x86 architecture (the 8086 was first released in 1978).

image

Note that unlike primary memory (which is what we think of when we discuss memory in a C/C++ program), registers have no addresses! There is no address value that, if cast to a pointer and dereferenced, would return the contents of the %rax register. Registers live in a separate world from the memory whose contents are partially prescribed by the C abstract machine.

The %rbp register has a special purpose: it points to the bottom of the current function’s stack frame, and local variables are often accessed relative to its value. However, when optimization is on, the compiler may determine that all local variables can be stored in registers. This frees up %rbp for use as another general-purpose register.

The relationship between different register bit widths is a little weird.

  • Loading a value into a 32-bit register name sets the upper 32 bits of the register to zero. Thus, after movl $-1, %eax, the %rax register has value 0x00000000FFFFFFFF.
  • Loading a value into a 16- or 8-bit register name leaves all other bits unchanged.
    There are special instructions for loading signed and unsigned 8-, 16-, and 32-bit quantities into registers, recognizable by instruction suffixes. For instance, movzbl moves an 8-bit quantity (a byte) into a 32-bit register (a longword) with zero extension; movslq moves a 32-bit quantity (longword) into a 64-bit register (quadword) with sign extension. There’s no need for movzlq (why?).

Instruction format

The basic kinds of assembly instructions are:

  • Computation. These instructions perform computation on values, typically values stored in registers. Most have zero or one source operands and one source/destination operand, with the source operand coming first. For example, the instruction addq %rax, %rbx performs the computation %rbx := %rbx + %rax.
  • Data movement. These instructions move data between registers and memory. Almost all have one source operand and one destination operand; the source operand comes first.
  • Control flow. Normally the CPU executes instructions in sequence. Control flow instructions change the instruction pointer in other ways. There are unconditional branches (the instruction pointer is set to a new value), conditional branches (the instruction pointer is set to a new value if a condition is true), and function call and return instructions.
    (We use the “AT&T syntax” for x86-64 assembly. For the “Intel syntax,” which you can find in online documentation from Intel, see the Aside in CS:APP3e §3.3, p177, or Wikipedia, or other online resources. AT&T syntax is distinguished by several features, but especially by the use of percent signs for registers. Sadly, the Intel syntax puts destination registers before source registers.)

Some instructions appear to combine computation and data movement. For example, given the C code int* ip; ... ++(*ip); the compiler might generate incl (%rax) rather than movl (%rax), %ebx; incl %ebx; movl %ebx, (%rax). However, the processor actually divides these complex instructions into tiny, simpler, invisible instructions called microcode, because the simpler instructions can be made to execute faster. The complex incl instruction actually runs in three phases: data movement, then computation, then data movement. This matters when we introduce parallelism.

Directives

Assembly generated by a compiler contains instructions as well as labels and directives. Labels look like labelname: or labelnumber:; directives look like .directivename arguments. Labels are markers in the generated assembly, used to compute addresses. We usually see them used in control flow instructions, as in jmp L3 (“jump to L3”). Directives are instructions to the assembler; for instance, the .globl L instruction says “label L is globally visible in the executable”, .align sets the alignment of the following data, .long puts a number in the output, and .text and .data define the current segment.

We also frequently look at assembly that is disassembled from executable instructions by GDB, objdump -d, or objdump -S. This output looks different from compiler-generated assembly: in disassembled instructions, there are no intermediate labels or directives. This is because the labels and directives disappear during the process of generating executable instructions.

For instance, here is some compiler-generated assembly:

    .globl  _Z1fiii
    .type   _Z1fiii, @function
_Z1fiii:
.LFB0:
    cmpl    %edx, %esi
    je  .L3
    movl    %esi, %eax
    ret
.L3:
    movl    %edi, %eax
    ret
.LFE0:
    .size   _Z1fiii, .-_Z1fiii
And a disassembly of the same function, from an object file:

0000000000000000 <_Z1fiii>:
   0:   39 d6                   cmp    %edx,%esi
   2:   74 03                   je     7 <_Z1fiii+0x7>
   4:   89 f0                   mov    %esi,%eax
   6:   c3                      retq   
   7:   89 f8                   mov    %edi,%eax
   9:   c3                      retq

Everything but the instructions is removed, and the helpful .L3 label has been replaced with an actual address. The function appears to be located at address 0. This is just a placeholder; the final address is assigned by the linking process, when a final executable is created.

Finally, here is some disassembly from an executable:

0000000000400517 <_Z1fiii>:
  400517:   39 d6                   cmp    %edx,%esi
  400519:   74 03                   je     40051e <_Z1fiii+0x7>
  40051b:   89 f0                   mov    %esi,%eax
  40051d:   c3                      retq   
  40051e:   89 f8                   mov    %edi,%eax
  400520:   c3                      retq

The instructions are the same, but the addresses are different. (Other compiler flags would generate different addresses.)

Address modes

Most instruction operands use the following syntax for values. (See also CS:APP3e Figure 3.3 in §3.4.1, p181.)

image

The full form of a memory operand is offset(base,index,scale), which refers to the address offset + base + index*scale. In 0x18(%rax,%rbx,4), %rax is the base, 0x18 the offset, %rbx the index, and 4 the scale. The offset (if used) must be a constant and the base (if used) must be a register; the scale must be either 1, 2, 4, or 8. The default offset, base, and index are 0, and the default scale is 1.

symbol_names are found only in compiler-generated assembly; disassembly uses raw addresses (0x601030) or %rip-relative offsets (0x200bf2(%rip)).

Jumps and function call instructions use different syntax

标签:register,Basics,x86,instruction,registers,64,compiler,rax,instructions
来源: https://www.cnblogs.com/pengdonglin137/p/16603286.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有