understanding assembly code (1)

1) Background

Sometimes, when we meet with a problem very hard to explain or understand, we can try more powerful tools, e.g. math or assembly code. In these days, I meet with a problem at work, which can be simplified as following.

I thought a function was not so efficient, so I decided to improve it.

From 

vector<int> f()

{

  vector<int> v;

  v.push_back(2);

  return v;

}

To

int g(vector<int> &v)

{

  v.push_back(2);

  return 1;

}

I thought f will use more memory and cause memory fragments, and I believed g was better than f. However, after I tested these function both with get_memory_usage and my customized allocator, I found that they are the same from the aspect of memory usage. It is amazing. So I decided to dig in more detail about how these code was compiled to assembly code.

2) Knowledge on understanding assembly code

 1. Machine language is what the computer (CPU) deal with. Every command the computer sees is a given as a sequence of numbers. It is in binary, normally presented in hex to simplify and be more readable. For example, 83 ec 08.

  Assembly language is the same as machine language, except that the commands and parameters are replaced as letter sequence which are more readable and easier to memorize. For example, 83 ec 08 -> sub $0x8,%esp

 High-lever languages are to make programming easier, e.g. c/c++. Code written in high-level languages may be compiled to machine languages.

 2. Some general rules for most assembly languages are listed below:

  1. Source can be in memory, register or constant
  2. Destination can be in memory or non-segment register
  3. Only one of source and destination can be in memory
  4. Source and destination must be same size

3. Complier, assembler, linker and loader

     a. Preprocessing processes include files, conditional compilation instructions and macros. gcc –E test.c

b. compilation takes the output of preprocessor, and the source code and generates assemble source code. gcc –S test.c

c. Assembly is the third stage of compilation. It takes assemble source code and generate object file. E.g. gcc –c test.c, test.o is produced which is an ELF file.

d. linking is the final stage of compilation. It takes one or more objects files or libraries as input and combines them to produce a single executable file, e,g, a.out, an ELF file. In doing so, it resolves references to external symbols. There are two types of linker, static linker and dynamic linker.

e. loading the executable file into memory for program running.

4. ELF sections and segments

                                                            understanding assembly code (1) 

 

Figure 3: Simplified object file format: linking view and execution view.

Use readelf/objdump to get more information about elf file.

[torstan]$ more simple.c

void main()

{

    printf("hello world\n");

}

[torstan]$ gcc simple.c -o simple

simple.c: In function `main':

simple.c:2: warning: return type of 'main' is not `int

[torstan]$ readelf -d simple

Dynamic section at offset 0x6a0 contains 20 entries:

  Tag        Type                         Name/Value

 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

 0x000000000000000c (INIT)               0x4003a8

 0x000000000000000d (FINI)               0x400598

 0x0000000000000004 (HASH)               0x400240

 0x0000000000000005 (STRTAB)             0x4002e0

 0x0000000000000006 (SYMTAB)             0x400268

 0x000000000000000a (STRSZ)              83 (bytes)

 0x000000000000000b (SYMENT)             24 (bytes)

 0x0000000000000015 (DEBUG)              0x0

 0x0000000000000003 (PLTGOT)             0x500838

 0x0000000000000002 (PLTRELSZ)           48 (bytes)

 0x0000000000000014 (PLTREL)             RELA

 0x0000000000000017 (JMPREL)             0x400378

 0x0000000000000007 (RELA)               0x400360

 0x0000000000000008 (RELASZ)             24 (bytes)

 0x0000000000000009 (RELAENT)            24 (bytes)

 0x000000006ffffffe (VERNEED)            0x400340

 0x000000006fffffff (VERNEEDNUM)         1

 0x000000006ffffff0 (VERSYM)             0x400334

 0x0000000000000000 (NULL)               0x0

[torstan]$ readelf -l simple

 

Elf file type is EXEC (Executable file)

Entry point 0x4003f0

There are 8 program headers, starting at offset 64

 

Program Headers:

  Type           Offset             VirtAddr           PhysAddr

                 FileSiz            MemSiz              Flags  Align

  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040

                 0x00000000000001c0 0x00000000000001c0  R E    8

  INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200

                 0x000000000000001c 0x000000000000001c  R      1

      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]

  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000

                 0x0000000000000674 0x0000000000000674  R E    100000

  LOAD           0x0000000000000678 0x0000000000500678 0x0000000000500678

                 0x0000000000000200 0x0000000000000208  RW     100000

  DYNAMIC        0x00000000000006a0 0x00000000005006a0 0x00000000005006a0

                 0x0000000000000190 0x0000000000000190  RW     8

  NOTE           0x000000000000021c 0x000000000040021c 0x000000000040021c

                 0x0000000000000020 0x0000000000000020  R      4

  GNU_EH_FRAME   0x00000000000005bc 0x00000000004005bc 0x00000000004005bc

                 0x0000000000000024 0x0000000000000024  R      4

  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000

                 0x0000000000000000 0x0000000000000000  RW     8

 

 Section to Segment mapping:

  Segment Sections...

   00

   01     .interp

   02     .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame

   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss

   04     .dynamic

   05     .note.ABI-tag

   06     .eh_frame_hdr

   07

 

5. process memory layout

 

                                                                    understanding assembly code (1)

Figure 5: Illustration of C’s process memory layout on an x86.

 

The process load segments (corresponding to “text” and “data” in the diagram) at the process’s base address. The main stack is located just below and grows downwards. Any additional threads that are created will have their own stacks, located below the main stack. Each of the stack frames is separated by a guard page to detect stack overflows among stacks frame.

 

6. c function and stack frame

In high-level languages, one of the most important techniques introduced to construct programs is function. Programmer use functions to break program into pieces of routines with specific task which can be independently developed, tested and reused. From The memory point of view, the high-level abstraction of function is implemented with the help of the STACK. A stack frame is a portion of memory that has been allocated for function to execute. When a function is called, a stack frame is allocated to store old frame of caller function, some registers, local variables within this function. When a function returns to the calling function, the stack will be dismantled (clean up).

 

understanding assembly code (1)

7. the c function convention

A convention is a way of doing things that is standardized, but not a documented standard.  For example, the c/c++ function calling convention tells the compiler thing such as:

  1. The order in which function arguments are pushed onto the stack
  2. Whether the caller or callee responsibility to remove the arguments from the stack at the end of the call that is the stack cleanup process
  3. The name decorating convention convention that the compiler uses to identify individual functions

There are 3 kinds of conventions, __cdecl, __stdcall, __fastcall (for Micro visual C++). The default is __cdecl.

void __cdecl TestFunc(float a, char b, char c);   // Borland and Microsoft

void TestFunc (float a, char b, char c) __attribute__((cdecl)); //gnu gcc

For __cdecl, parameters are pushed onto the stack in reverse order (right to left), and  the caller cleans up stack.

 

8. stack layout during function call

 

understanding assembly code (1)

When function call takes place, data elements are pushed onto the stack in the following way:

  1. Push function parameters onto stack, from right to left
  2. Push return address onto stack which equals the value in EIP register
  3. Push the EBP onto stack which belongs to the caller, and make EBP point to this address in stack
  4. If a function includes try/catch or any other exception handling structure such as SEH (structured exception handing –Microsoft implementation), the compiler will include exception handling information on the stack
  5. The callee save registers such as ESI, EDI, and EBX if they will be used in the callee function
  6. Local variable declared in the callee function

9. EBP and ESP

 EBP and ESP are the two important registers for the stack frame which holds necessary information. ESP and EBP are the names in 32 bits system. In 64 bits system, they are RSP and RBP. The EBP is called Frame Pointer, which points to the bottom of the stack frame after a new function is called, while ESP points to the top of the stack frame of callee. The callee’s frame stack is from EBP to ESP. The data in this frame stack can be referenced by EBP or ESP. Since EBP points to a fixed location within the frame, local variables and parameters are preferred to be referenced with an offset from EBP.

EBP can be modified by the following commands, move ESP, EBP; leave;

ESP can be modified by the following commands, pop; push; sub esp, 0ch; call; leave; ret

10. contorl instructions

i. call myfunc ;

a) push return address, value in EIP register, onto stack

b) jump to the starting address of myfunc

ii. leave

a)    mov rbp, rsp  ; clean up stack frame of callee

b)   pop rbp   ; restore rbp to caller’s rbp            

          iii. ret

                   a) pop stack top element (return address) to eip

                   b) jump to the address of eip

11. The caller is responsible for allocating memory for parameter which used in callee stack frame, and caller is also responsible for clean up these elements in stack after the calling of function completes.

The callee is responsible for allocating memory and free memory for the elements on the stack including: old EBP, exception handle frame if any, saved register if any, local variables. So after the calling of the function completes, the callee should pop out these elements including old EBP by leave command, and jump to the next instruction in caller function by ret command.

 

12 the commands in gdb supporting debug assembly code

a)    disassemble functionname; or disassemble     function_address;

b)   nexti ;next instruction, ni for short

c)    stepi ; si for short

d)   info registers; or info registers rbp rsp

e)    display /3i $pc

f)     x /fmt address

g)   print

h)   bt

 

3) Some examples

1.  a simple case to show the frame stack

[torstan]$ more ass2.c

#include<stdio.h>

int swap_add(int *xp, int* yp)

{

    int x = *xp;

    int y = *yp;

    *xp = y;

    *yp = x;

    return x+y;

}

 

int caller()

{

    int arg1 = 534;

    int arg2 = 1057;

    int sum = swap_add(&arg1, &arg2);

    int diff = arg1 - arg2;

    return sum*diff;

}

 

int main()

{

    int res = 0;

    res = caller();

    printf("result is %d\n", res);

    return 0;

}

[torstan]$ gcc ass2.c -o ass

[torstan]$ gdb ass

….

Copyright … Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

……

 (gdb) b swap_add

Breakpoint 1 at 0x4004ac

(gdb) r

Starting program: /home/torstan /ass

(no debugging symbols found)

(no debugging symbols found)

 

Breakpoint 1, 0x00000000004004ac in swap_add ()

(gdb) disassemble swa

swab         swap_add     swapcontext  swapoff      swapon

(gdb) disassemble swap_add

Dump of assembler code for function swap_add:

0x00000000004004a8 <swap_add+0>:        push   %rbp

0x00000000004004a9 <swap_add+1>:        mov    %rsp,%rbp

0x00000000004004ac <swap_add+4>:        mov    %rdi,0xfffffffffffffff8(%rbp) ; save rdi to [rbp-8]

0x00000000004004b0 <swap_add+8>:        mov    %rsi,0xfffffffffffffff0(%rbp) ; save rsi to [rbp-16], since they are pointer in 64 bits system, they need 8 bytes for each

0x00000000004004b4 <swap_add+12>:       mov    0xfffffffffffffff8(%rbp),%rax

0x00000000004004b8 <swap_add+16>:       mov    (%rax),%eax

0x00000000004004ba <swap_add+18>:       mov    %eax,0xffffffffffffffec(%rbp) ;save *[rbp-8] to [rbp-20]

0x00000000004004bd <swap_add+21>:       mov    0xfffffffffffffff0(%rbp),%rax

0x00000000004004c1 <swap_add+25>:       mov    (%rax),%eax

0x00000000004004c3 <swap_add+27>:       mov    %eax,0xffffffffffffffe8(%rbp) ; save *[rbp-16] to [rbp-24]

0x00000000004004c6 <swap_add+30>:       mov    0xfffffffffffffff8(%rbp),%rdx ;

0x00000000004004ca <swap_add+34>:       mov    0xffffffffffffffe8(%rbp),%eax

0x00000000004004cd <swap_add+37>:       mov    %eax,(%rdx) ; save [rbp-24] to*[rbp-8]

0x00000000004004cf <swap_add+39>:       mov    0xfffffffffffffff0(%rbp),%rdx

0x00000000004004d3 <swap_add+43>:       mov    0xffffffffffffffec(%rbp),%eax

0x00000000004004d6 <swap_add+46>:       mov    %eax,(%rdx) ; save [rbp-20] to *[rbp-16]

0x00000000004004d8 <swap_add+48>:       mov    0xffffffffffffffe8(%rbp),%eax

0x00000000004004db <swap_add+51>:       add    0xffffffffffffffec(%rbp),%eax ; add [rbp-20] and [rbp-24] to eax, and return value is stored in eax

0x00000000004004de <swap_add+54>:       leaveq

0x00000000004004df <swap_add+55>:       retq

End of assembler dump.

(gdb) disassemble caller

Dump of assembler code for function caller:

0x00000000004004e0 <caller+0>:  push   %rbp ; save caller’s rbp on stack

0x00000000004004e1 <caller+1>:  mov    %rsp,%rbp ;now rbp points to caller’s rbp

0x00000000004004e4 <caller+4>:  sub    $0x10,%rsp ; allocate 16 bytes for local variables

0x00000000004004e8 <caller+8>:  movl   $0x216,0xfffffffffffffffc(%rbp) ; store 534 to [rbp-4]

0x00000000004004ef <caller+15>: movl   $0x421,0xfffffffffffffff8(%rbp) ; store 1057 to [rbp-8]

0x00000000004004f6 <caller+22>: lea    0xfffffffffffffff8(%rbp),%rsi ; store the address of [rbp-8] to rsi

0x00000000004004fa <caller+26>: lea    0xfffffffffffffffc(%rbp),%rdi ; store the address of [rbp-4] to rdi

0x00000000004004fe <caller+30>: callq  0x4004a8 <swap_add> ; rsi, rdi are prepared for this function call, swap_add

0x0000000000400503 <caller+35>: mov    %eax,0xfffffffffffffff4(%rbp) ; save return value to [rbp-12]

0x0000000000400506 <caller+38>: mov    0xfffffffffffffff8(%rbp),%edx ; move [rbp-8] to edx

0x0000000000400509 <caller+41>: mov    0xfffffffffffffffc(%rbp),%eax ; move [rbp-4] to eax

0x000000000040050c <caller+44>: sub    %edx,%eax

0x000000000040050e <caller+46>: mov    %eax,0xfffffffffffffff0(%rbp)  ; save the diff to [rbp-16]

0x0000000000400511 <caller+49>: mov    0xfffffffffffffff4(%rbp),%eax  ;move [rbp-12] to eax

0x0000000000400514 <caller+52>: imul   0xfffffffffffffff0(%rbp),%eax ; multiple [rbp-16], result is in eax as return value

0x0000000000400518 <caller+56>: leaveq ; move rbp to rsp, and pop rbp to restore rbp with caller’s rbp

0x0000000000400519 <caller+57>: retq ; pop the top element to EIP and jump to EIP instruction

End of assembler dump.

(gdb) disassemble main

Dump of assembler code for function main:

0x000000000040051a <main+0>:    push   %rbp

0x000000000040051b <main+1>:    mov    %rsp,%rbp

0x000000000040051e <main+4>:    sub    $0x10,%rsp  ; allocate 16 bytes for local variables

0x0000000000400522 <main+8>:    movl   $0x0,0xfffffffffffffffc(%rbp) ; move 0 to [rbp-4]

0x0000000000400529 <main+15>:   mov    $0x0,%eax  ;move 0 to eax, prepare for calling function

0x000000000040052e <main+20>:   callq  0x4004e0 <caller> ; call function caller, (push 0x0000000000400533 onto the stack, and jump to the first instruction of caller)

0x0000000000400533 <main+25>:   mov    %eax,0xfffffffffffffffc(%rbp) ; save function return value to [rbp-4]

0x0000000000400536 <main+28>:   mov    0xfffffffffffffffc(%rbp),%esi

0x0000000000400539 <main+31>:   mov    $0x40063c,%edi

0x000000000040053e <main+36>:   mov    $0x0,%eax

0x0000000000400543 <main+41>:   callq  0x4003e0 ; esi, edi, eax are prepared for this calling

0x0000000000400548 <main+46>:   mov    $0x0,%eax ; return value is 0

0x000000000040054d <main+51>:   leaveq ; equal to mov %rbp, %rsp; pop %rbp

0x000000000040054e <main+52>:   retq ; pop out the return address to rip,and jump to the address to keep execution

0x000000000040054f <main+53>:   nop

 

 

 

4. Reference

a) Computer systems : A programmer’s perspective, by Bryant, and O’Hallaron

b) http://www.tenouk.com/cncplusplusbufferoverflow.html

c) http://turkeyland.net/projects/overflow/defense.php

你可能感兴趣的:(assembly)