Introduction to Assembly Language

This tutorial is based on Intel Syntax

By Motti KumarPublished 3 years ago • 8 min read

I've just started learning Assembly Language since it is useful to perform tasks like Binary Exploitation and Reverse Engineering. This happens to be the the first big wall I need to tackle. I'll be sharing the knowledge I learn through the upcoming blogs, so if you're interested then stay tuned.

Also one thing you don't need to have everything here memorized before moving on, and parts of it will make more sense when you actually see it in action.

Compiling

So first off, what is assembly code? Assembly code is the code that actually runs on your computer by the processor. For instance take some C code:

#include <stdio.h>
void main(void)
{
puts("Hello World!");
}

This piece of code isn't ran. Thing is that code is compiled into assembly code, which looks like this:

The purpose of languages like C, is that we can program without having to really deal with assembly code. We write code that is handed to a compiler, and the compiler takes that code and generates assembly code that will accomplish whatever the C code tells it to. Then the assembly code is what is actually ran on the processor. Since this is the code that is actually ran, it helps to understand it. Also since most of the time we are handed compiled binaries we only have the assembly code to work from. However we have tools such as Ghidra that will take compiled assembly code and give us a view of what it thinks the C code that the code was compiled from looks like, so we don't need to read endless lines of assembly code.

Also with assembly code, there is a lot of different architectures. Different types of processors can run different types of assembly code architectures. The two we are dealing with the most here will be 64 bit, and 32 bit ELF (Executable and Linkable Format). I will often call these two things x64 and x86.

Registers

Registers are essentially places that the processor can store memory. You can think of them as buckets which the processor can store information in. Here is a list of the x64 registers, and what their common use cases are.

In x64 linux arguments to a function are passed via registers. The first few args are passed by these registers:

rdi: First Argument

rsi: Second Argument

rdx: Third Argument

rcx: Fourth Argument

r8: Fifth Argument

r9: Sixth Argument

With the x86 elf architecture, arguments are passed on the stack. Also one thing as you may know, in C function can return a value. In x64, this value is passed in the rax register. In x86 this value is passed in the eax register.

Also one thing, there are different sizes for registers. These typical sizes we will be dealing with are 8 bytes, 4 bytes, 2 bytes, and 1. The reason for these different sizes is due to the advancement of technology, we can store more data in a register.

In x64 we will see the 8 byte registers. However in x86 the largest sized registers we can use are the 4 byte registers like ebp, esp, eip etc. Now we can also use smaller registers, than the maximum sized registers for the architecture.

In x64 there is the rax, eax, ax, and al register. The rax register points to the full 8. The eax register is just the lower four bytes of the rax register. The ax register is the last 2 bytes of the rax register. Lastly the al register is the last byte of the rax register.

Words

You might hear the term word throughout this. A word is just two bytes of data. A dword is four bytes of data. A qword is eight bytes of data.

Stacks

Now one of the most common memory regions you will be dealing with is the stack. It is where local variables in the code are stored.

For instance, in this code the variable x is stored in the stack:

#include <stdio.h>
void main(void)
{
int x = 5;
puts("hi");
}

Specifically we can see it is stored on the stack at rbp-0x4.

Now values on the stack are moved on by either pushing them onto the stack, or popping them off. That is the only way to add or remove values from the stack (it is a LIFO data structure). However we can reference values on the stack.

The exact bounds of the stack is recorded by two registers, rbp and rsp. The base pointer rbp points to the bottom of the stack. The stack pointer rsp points to the top of the stack.

Flags

There is one register that contains flags. A flag is a particular bit of this register. If it is set or not, will typically mean something. Here is the list of flags.

00: Carry Flag

01: always 1

02: Parity Flag

03: always 0

04: Adjust Flag

05: always 0

06: Zero Flag

07: Sign Flag

08: Trap Flag

09: Interruption Flag

10: Direction Flag

11: Overflow Flag

12: I/O Privilege Field lower bit

13: I/O Privilege Field higher bit

14: Nested Task Flag

15: Resume Flag

There are other flags then the one listed, however we really don't deal with them too much (and out of these, there are only a few we actively deal with).

If you want to hear more about this, checkout: https://en.wikibooks.org/wiki/X86_Assembly/X86_Architecture

Instructions

Now we will be covering some of the more common instructions you will see. This isn't everything you will see, but here are the more common things you will see.

mov

The move instruction just moves data from one register to another. For instance:

mov rax, rdx

This will just move the data from the rdx register to the rax register.

dereference

If you ever see brackets like [], they are meant to dereference, which deals with pointers. A pointer is a value that points to a particular memory address (it is a memory address). Dereferencing a pointer means to treat a pointer like the value it points to. For instance:

mov rax, [rdx]

Will move the value pointed to by rdx into the rax register. On the flipside:

mov [rax], rdx

Will move the value of the rdx register into whatever memory is pointed to by the rax register. The actual value of the rax register does not change.

lea

The lea instruction calculates the address of the second operand, and moves that address in the first. For instance:

lea rdi, [rbx+0x10]

This will move the address rbx+0x10 into the rdi register.

add

This just adds the two values together, and stores the sum in the first argument. For instance:

add rax, rdx

That will set rax equal to rax + rdx

sub

This value will subtract the second operand from the first one, and store the difference in the first argument. For instance:

sub rsp, 0x10

This will set the rsp register equal to rsp - 0x10

xor

This will perform the binary operation xor on the two arguments it is given, and stores the result in the first operation:

xor rdx, rax

That will set the rdx register equal to rdx ^ rax.

The and and or operations essentially do the same thing, except with the and or or binary operators.

push

The push instruction will grow the stack by either 8 bytes (for x64, 4 for x86), then push the contents of a register onto the new stack space. For instance:

push rax

This will grow the stack by 8 bytes, and the contents of the rax register will be on top of the stack.

pop

The pop instruction will pop the top 8 bytes (for x64, 4 for x86) off of the stack and into the argument. Then it will shrink the stack. For instance:

pop rax

The top 8 bytes of the stack will end up in the rax register.

jmp

The jmp instruction will jump to an instruction address. It is used to redirect code execution. For instance:

jmp 0x602010

That instruction will cause the code execution to jump to 0x602010, and execute whatever instruction is there.

call & ret

This is similar to the jmp instruction. The difference is it will push the values of rbp and rip onto the stack, then jump to whatever address it is given. This is used for calling functions. After the function is finished, a ret instruction is called which uses the pushed values of rbp and rip (saved base and instruction pointers) it can continue execution right where it left off

cmp

The cmp instruction is similar to that of the sub instruction. Except it doesn't store the result in the first argument. It checks if the result is less than zero, greater than zero, or equal to zero. Depending on the value it will set the flags accordingly.

jnz / jz

This jump if not zero and jump if zero (jnz/jz) instructions are pretty similar to the jump instruction. The difference is they will only execute the jump depending on the status of the zero flag. For jz it will only jump if the zero flag is set. The opposite is true for jnz.

how to

About the Creator

Motti Kumar

Hey guys i'm Motti Kumar and it’s a pleasure to be a guest blogger and hopefully inspire, give back, and keep you updated on overall cyber news or anything hot that impacts us as security enthusiast's here at Vocal Media.

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Keep reading

More stories from Motti Kumar and writers in 01 and other communities.