Wiki: Assembly

Introduction to Assembly

Assembly languages describe the machine code of a computer in simple, human-readable syntax. In these languages, each machine instruction (i.e. a sequence of bytes that is executed by the processor) is represented by a single command. The assembly language on code.golf (hereinafter referred to as "Assembly") represents the machine language of the x86-64 architecture.

Syntaxes

There are two main branches of syntax used for Assembly: Intel syntax and AT&T syntax. The assembler used by code.golf, DefAssembler, supports both of these syntaxes, defaulting to AT&T syntax unless otherwise stated using directives. In this article, when a difference between the two syntaxes emerges, a table that illustrates this difference will appear.

Instructions

Each instruction in Assembly consists of a mnemonic (the instruction's name, e.g. add, sub, mov) and a comma-separated list of operands. Note that in AT&T syntax, the order of the operands in each instruction is reversed.

The number, types, sizes, and purposes of the operands given to an instruction depend on its mnemonic. For example, the instruction push accepts a single operand, the instruction mov accepts 2, and the instruction imul accepts anywhere from 1 to 3 operands. To understand what each instruction does and which operands it accepts, its mnemonic should be looked up in Volume 2 of the Intel 64 Software Developer's Manuals (I recommend using a more straightforward index, like Félix Cloutier's x86 instruction reference).

Operands

An operand is an argument passed to an instruction, for example which values to read and where to write the result. There are 3 common types of operands in Assembly (in reality there are about 10 types, but the ones not listed here are rarely used):

  1. Registers - small units of data used to store temporary values on the processor. They can either be read from or written to, or in some cases both. In x86-64, there are 16 different registers, each consisting of 64 bits. It is also possible to read from and write to each register's lower 32, 16, or 8 bits (in some cases the top 8 bits of the 16 lowest bits can also be accessed).

    Click here for a full list of registers
    Name 64-bit 32-bit 16-bit 8-bit High 8-bit Notes
    Accumulator rax eax ax al ah This register is implicitly multiplied and divided by the mul and div instructions. It is also used by lods, stos, scas, xlat and cmpxchg, and for syscall it both selects the syscall ID and holds the return value of the syscall.
    Base rbx ebx bx bl bh This register is used implicitly in the xlat instruction.
    Counter rcx ecx cx cl ch This register allows for short-form loops with the loop/loope/loopne instruction, as well as short-form zero checks with jrcxz/jecxz. It is also used by string instructions with a rep/repe/repne prefix as a limit on how many times to execute the instruction. The register is overwritten by syscall.
    Data rdx edx dx dl dh This register is often paired up with the accumulator, e.g. rdx:rax, to act as an extension of that register (for example in the mul and div instructions). It also serves as the third argument in syscall.
    Source index rsi esi si sil This register is used by the movs, lods, and cmps string instructions. It also serves as the second argument in syscall.
    Destination index rdi edi di dil This register is used by the movs, stos, cmps, and scas string instructions. It also serves as the first argument in syscall.
    Stack pointer rsp esp sp spl This register holds the address of the stack, a portion of the computer's memory dedicated to storing temporary values (written to with push and read from by pop). It is the only register that has a non-zero value at the start of the program, and in general, its value should not be manually changed. It is affected by the push, pop, enter and leave instructions.
    Base pointer rbp ebp bp bpl This register typically serves as an address holder for the base of a stack frame (a portion of the stack holding the local memory of a function), but it may be used for any other purpose. It is used by the enter and leave instructions.
    r8 r8d r8w r8b This register serves as the fifth argument to syscall.
    r9 r9d r9w r9b This register serves as the sixth argument to syscall.
    r10 r10d r10w r10b This register serves as the fourth argument to syscall.
    r11 r11d r11w r11b This register is overwritten by syscall.
    r12 r12d r12w r12b
    r13 r13d r13w r13b
    r14 r14d r14w r14b
    r15 r15d r15w r15b
    AT&T syntax Intel syntax
    Registers are prefixed with a percent sign (%) Registers are written without a prefix
    E.g. %rax, %ebx, %cx, %dl E.g. rax, ebx, cx, dl
  2. Immediates - constant numbers that are stored as part of the instruction. They provide an immediate value that can be quickly read by the processor without accessing memory. They are written using arithmetic expressions, which can include number literals, string literals and/or symbols (macro-like static values that are stored as names for easy access).

    AT&T syntax Intel syntax
    Immediates are prefixed with a dollar sign ($). Expressions not prefixed with a dollar sign are treated as memory offsets (see below). Immediates are written without a prefix. If the immediate contains a symbol name, however, the expression must be prefixed with the word OFFSET; otherwise, it is treated as a memory offset (see below).
    E.g. $23, $(15 + 6) * 2, $symbol E.g. 23, (15 + 6) * 2, OFFSET symbol
  3. Memory expressions - calculated addresses which instruct the processor to read from and write to certain addresses in the computer's temporary memory (RAM). These consist of 3 parts, all of which are summed by the processor to calculate the final address:

    1. Displacement - a numerical value giving an offset to the address
    2. Base register - a 64-bit register whose value is added to the address
    3. Index register - another 64-bit register, whose value can also be scaled up by 1, 2, 4, or 8

    Each of these parts is optional, however at least one is required to form a valid memory expression.

    The syntax for a memory expression is as follows:

    AT&T syntax Intel syntax
    <displacement>(<base>, <index>, <scale>) <displacement>[<base> + <index> * <scale>]
    E.g. 23(%rsi, %rax, 4) E.g. 23[rsi + rax * 4]

    Note that in some cases, the size of a memory expression may be ambiguous; if this is the case, the assembler will issue an error stating this. To disambiguate the size:

    AT&T syntax Intel syntax
    The instruction's name should have a letter at the end specifying the size The memory expression should be prefixed with the size name
    E.g. incl (%rax) E.g. inc LONG [rax]

Flags

In addition to the general-purpose registers, x86-64 processors also store a "flags" register. This register holds an array of bits, each of which represents a flag - a boolean value indicating some useful information about the results of the previous instruction. These flags are:

Conditional branching in Assembly is done through conditional jump instructions, which jump to the given location depending on the flags (e.g. jz (jump if zero) will jump only if the zero flag is set to 1). Some conditional jumps also check multiple flags to form simpler conditions; these can be paired with the cmp instruction to create simple arithmetic conditions. For example, the following program will halt if and only if edx is less than or equal to ebx:

AT&T syntax Intel syntax
label:
cmp %ebx, %edx
jnle label
label:
cmp edx, ebx
jnle label

Note that not all instructions cause the flags to change; for example, while arithmetic instructions (e.g. add, sub, xor etc.) always affect the flags, mov, push, pop and xchg do not affect any flags.

The flags register can be manually read from and written to with pushf, popf, lahf, and sahf, however this is generally not recommended.

Prefixes

In some cases, a prefix may also be added before the mnemonic. Prefixes give the processor more info on how to execute the instruction. For example, in repne scasb, the instruction scasb (which increments rdi and compares the byte at the address pointed by it to al) will execute in a loop so long as al and the byte at rdi are not equal (repeat while not equal). In other words, it will find the first byte after the address rdi that is equal to al.

Directives

Along with executable code, it's also possible to embed raw data in an Assembly program. These are sequences of bytes (e.g. strings, integers, arrays) that will be loaded into the program's memory along with the code; they may be written to and read from by the program's instructions. This data can be added through the use of directives. These are pseudo-instructions that tell the assembler (the program that turns the assembly code into an executable file) to do certain things besides encoding instructions. For example, the following directives generate, in order, a byte, a sequence of words, a quad-word, and a string:

AT&T syntax Intel syntax
.byte 5 db 5
.word 0, 1, 1, 2 dw 0, 1, 1, 2
.quad 0x0123456789ABCDEF dq 0123456789ABCDEFh
.ascii "Hello, world!" db "Hello, world!"

Directives can also be used to divide the bytes of a program into sections, which are distinct regions of memory that have different permissions (e.g. the .text section can be read, executed, and modified, the .data section can be read and modified but not executed, and the .rodata section can only be read):

AT&T syntax Intel syntax
.text section .text
.data section .data
.section .rodata section .rodata

Finally, directives can be used to set the syntax of a program: .intel_syntax sets the syntax to Intel, and .att_syntax sets the syntax to AT&T. These can be used anywhere within the program and in any order.

I/O

Input and output are not defined by Assembly; they way they are handled depends on the operating system the program runs under. In code.golf, the assembly programs are run under Linux, which uses system calls (which are executed via the syscall instruction) for program output. For a list of available system calls see the Linux System Call Table for x86-64 (note that sys_write is the only one you really need to know on code.golf). On Linux, the arguments passed to a program appear at the top of the stack when the program begins; to get the next argument, simply pop it off the stack (the popped value is an address pointing to the argument string).

Sample code

AT&T syntax Intel syntax
.att_syntax
SYS_WRITE = 1
SYS_EXIT = 60
STDOUT_FILENO = 1

# Printing
.data
buffer: .string "Hello, World!\n"
bufferLen = . - buffer

.text
mov $SYS_WRITE, %eax
mov $STDOUT_FILENO, %edi
mov $buffer, %esi
mov $bufferLen, %edx
syscall

# Looping
.data
digit: .byte '0', '\n'

.text
mov $0, %bl
numberLoop:
    mov $SYS_WRITE, %eax
    mov $STDOUT_FILENO, %edi
    mov $digit, %esi
    mov $2, %edx
    syscall

    incb (%rsi)
    inc %bl
    cmp $10, %bl
    jl numberLoop

# Accessing arguments
pop %rbx
pop %rax

argLoop:
    dec %rbx
    jz endArgLoop

    pop %rsi
    mov %rsi, %rdi

    mov $-1, %ecx
    mov $0, %al
    repne scasb

    not %ecx
    movb $'\n', -1(%rsi, %rcx)

    mov %ecx, %edx
    mov $SYS_WRITE, %eax
    mov $STDOUT_FILENO, %edi
    syscall

    jmp argLoop
endArgLoop:

mov $SYS_EXIT, %eax
mov $0, %edi
syscall
.intel_syntax
SYS_WRITE = 1
SYS_EXIT = 60
STDOUT_FILENO = 1

; Printing
section .data
buffer db "Hello, World!\n"
bufferLen = $ - buffer

section .text
mov eax, OFFSET SYS_WRITE
mov edi, OFFSET STDOUT_FILENO
mov esi, OFFSET buffer
mov edx, OFFSET bufferLen
syscall

; Looping
section .data
digit db '0', '\n'

section .text
mov bl, 0
numberLoop:
    mov eax, OFFSET SYS_WRITE
    mov edi, OFFSET STDOUT_FILENO
    mov esi, OFFSET digit
    mov edx, 2
    syscall

    inc BYTE [rsi]
    inc bl
    cmp bl, 10
    jl numberLoop

; Accessing arguments
pop rbx
pop rax

argLoop:
    dec rbx
    jz endArgLoop

    pop rsi
    mov rdi, rsi

    mov ecx, -1
    mov al, 0
    repne scasb

    not ecx
    mov BYTE [rsi + rcx - 1], '\n'

    mov edx, ecx
    mov eax, OFFSET SYS_WRITE
    mov edi, OFFSET STDOUT_FILENO
    syscall

    jmp argLoop
endArgLoop:

mov eax, OFFSET SYS_EXIT
mov edi, 0
syscall

Golfing tips

Transferring values between registers

The shortest way to move a value from one 64-bit register to another is using push and pop:

AT&T syntax Intel syntax
push %rax    # 50
pop %rsi     # 5E
push rax    ; 50
pop rsi     ; 5E

For any registers except r8-r15 (for which an additional REX byte is required), this will result in a 2-byte encoding.

If the registers are 32-bit and one of them is eax, this may be shortened to a single byte with an xchg instruction:

AT&T syntax Intel syntax
xchg %eax, %esi    # 96
xchg eax, esi    ; 96

Avoiding the REX prefix

An additional byte called the REX prefix is added in certain encodings. It can be easily identified, as it always has the form 4X (where X is between 0 and F). It provides auxiliary information about the registers used in an instruction. You should usually seek to avoid it.

The following cases cause a REX prefix to appear:

Avoiding the word-size prefix

Additionally, instructions that use 16-bit registers entail the encoding of a 66 prefix; for this reason they should generally be avoided.

Use the al register for arithmetic operations with immediates

For arithmetic instructions (e.g. add, sub, and, or, xor), the 8-bit size form of the instruction with an immediate and a register (e.g. add cl, 4, xor bl, 20), is 1 byte shorter if the register being used is al.

AT&T syntax Intel syntax
add $1, %cl    # 80 C1 01
add $1, %bl    # 80 C3 01
add $1, %al    # 04 01
add cl, 1    ; 80 C1 01
add bl, 1    ; 80 C3 01
add al, 1    ; 04 01

Placing a 1 value into a register

In holes that have no arguments, the value at the top of the stack (argc) has the value 1. This makes it convenient to pop into registers where this value is significant (typically rdi, as it's used for write syscalls as the file descriptor).

Using argv[0] as memory

The second value from the top of the stack holds a pointer to argv[0], which is a string containing the program's name (in DefAssembler this is always /tmp/asm). This memory area can be used for relatively small memory operations, such as printing numbers.

AT&T syntax Intel syntax
pop %rdi                # 5F
pop %rsi                # 5E

movl $'Heyo', (%rsi)    # C7 06 48 65 79 6F
mov $4, dl              # B2 04
mov $1, al              # B0 01
syscall                 # 0F 05
pop rdi                   ; 5F
pop rsi                   ; 5E

mov LONG [rsi], 'Heyo'    ; C7 06 48 65 79 6F
mov dl, 4                 ; B2 04
mov al, 1                 ; B0 01
syscall                   ; 0F 05

Zeroing registers

The shortest way to set a register's value to 0 is xoring or subing it with itself, as in:

AT&T syntax Intel syntax
xor %ecx, %ecx    # 31 C9
sub %eax, %eax    # 29 C0
xor ecx, ecx    ; 31 C9
sub eax, eax    ; 29 C0

Note that specifying a 64-bit size in this case has no effect; writing to a 32-bit size register will always set the top 32 bits of that register's 64-bit counterpart to 0.

Zeroing edx

The edx register is a special case; it can be zeroed using a single byte instruction, namely cdq (cltd in AT&T syntax), if the value in eax is non-negative (if it is negative, edx will be set to -1). This is especially useful before performing a div instruction, where the value in edx may affect the result of the operation.

Placing values divisible by 256 in 16-bit size registers

Occasionally, it is useful to place values divisible by 256 (or just arbitrary large values) in certain registers. This can be made shorter by writing to the register's high-byte part.

Instead of:

AT&T syntax Intel syntax
mov $512, %cx    # 66 B9 00 02
mov cx, 512    ; 66 B9 00 02

Use:

AT&T syntax Intel syntax
mov $2, %ch    # B5 02
mov ch, 2    ; B5 02