Other resource about function-call conventions.
http://www.codeproject.com/KB/cpp/calling_conventions_demystified.aspx
http://www.intel.com/products/processor/manuals/index.htm Architecture Software Developer Manual Volume 1
http://en.wikipedia.org/wiki/X86_calling_conventions
http://www.tenouk.com/Bufferoverflowc/Bufferoverflow2a.html
Intel x86 Function-call Conventions - Assembly View
http://unixwiz.net/techtips/win32-callconv-asm.html
from: http://blog.163.com/puzzle_1985/blog/static/1296221802009971003632/
One of the "big picture" issues in looking at compiled C code is the function-calling conventions. These are the methods that a calling function and a called function agree on how parameters and return values should be passed between them, and how the stack is used by the function itself. The layout of the stack constitutes the "stack frame", and knowing how this works can go a long way to decoding how something works.
查看编译后的C代码有一个大问题就是函数调用约定。这是调用方与被调用方约定好如何互相传递参数和返回值,以及函数自己如何使用栈。栈的布局组成了“栈帧”,知道这些如何运作需要一些脑力劳动去破译。
In C and modern CPU design conventions, the stack frame is a chunk of memory, allocated from the stack, at run-time, each time a function is called, to store its automatic variables. Hence nested or recursive calls to the same function, each successively obtain their own separate frames.
在C语言和现代CPU设计约定中,栈帧是一块内存,在运行时在栈上申请,在每次函数被调用时存储它的自动变量。因此当我们做递归调用同一个函数时,每次都会相继调用它们自己单独的栈帧。
Physically, a function's stack frame is the area between the addresses contained in esp, the stack pointer, and ebp, the frame pointer (base pointer in Intel terminology). Thus, if a function pushes more values onto the stack, it is effectively growing its frame.
从物理意义上说,一个函数的栈帧就是ESP(栈指针)与EBP(帧指针)之间的区域。换句话说,如果一个函数压入更多的值到栈里,就需要增加它的帧。
This is a very low-level view: the picture as seen from the C/C++ programmer is illustrated elsewhere:
? Unixwiz.net Tech Tip: Intel x86 Function-call Conventions - C Programmer's View
For the sake of discussion, we're using the terms that the Microsoft Visual C compiler uses to describe these conventions, even though other platforms may use other terms.
我们在讨论时使用微软的VC编译器来表述这些约定,当然其他的平台会使用它们自己的术语。
__cdecl (pronounced see-DECK-'ll rhymes with "heckle")
This convention is the most common because it supports semantics required by the C language. The C language supports variadic functions (variable argument lists, alá printf), and this means that the caller must clean up the stack after the function call: the called function has no way to know how to do this. It's not terribly optimal, but the C language semantics demand it.
这个约定是一个最通用的因为C语言语义上需要。C语言要支持可变参数的函数(比如printf),这意味着调用者caller必须在函数调用完成后做清除工作;被调用的函数不知道该如何做这些。这不是最佳的,但是C语言需要这个约定。
__stdcall
Also known as __pascal, this requires that each function take a fixed number of parameters, and this means that the called function can do argument cleanup in one place rather than have this be scattered throughout the program in every place that calls it. The Win32 API primarily uses __stdcall.
也被称为__pascal,这需要每个函数的参数个数是固定的,这也意味着被调用的函数做清除工作,而不是把清除工作零散分布在程序里。Win32API主要适用这个约定。
It's important to note that these are merely conventions, and any collection of cooperating code can agree on nearly anything. There are other conventions (passing parameters in registers, for instance) that behave differently, and of course the optimizer can make mincemeat of any clear picture as well.
Our focus here is to provide an overview, and not an authoritative definition for these conventions.
需要重点注意的是这些仅仅是约定,(只要你愿意)合作的任意代码集合可以同意任何约定方式。还有其他的约定(使用寄存器为实例传参数)的行为不一样,当然优化器也可以把清晰地状态变得混乱。我们的焦点是提供一个大概看法,而不是为这些约定做什么权威性的定义。
Register use in the stack frame
In both __cdecl and __stdcall conventions, the same set of three registers is involved in the function-call frame:
在__cdecl和__stdcall约定方式中,同样的三个寄存器被用在函数调用帧:
%ESP - Stack Pointer
This 32-bit register is implicitly manipulated by several CPU instructions (PUSH, POP, CALL, and RET among others), it always points to the last element used on the stack (not the first free element): this means that the PUSH and POP operations would be specified in pseudo-C as:
*--ESP = value; // push
value = *ESP++; // pop
The "Top of the stack" is an occupied location, not a free one, and is at the lowest memory address.
ESP-栈指针,这个32位寄存器被几个CPU指令隐含的操作(PUSH,POP,CALL和RET),它一直指向栈中最后一个被用到的元素(不是第一个空闲元素);这意味着PUSH和POP操作可以使用伪C代码写成:
*--ESP = value; // push, 先移动,后赋值
value = *ESP++; // pop, 先赋值,后移动
“栈顶”是一个被占用的位置,不是空闲的,在内存地址中是处于最低的位置。
%EBP - Base Pointer
This 32-bit register is used to reference all the function parameters and local variables in the current stack frame. Unlike the %esp register, the base pointer is manipulated onlyexplicitly. This is sometimes called the "Frame Pointer".
EBP基地址指针,这个32位寄存器被用在引用当前栈帧的所有的函数参数和本地变量。不像ESP堆栈指针,基地址指针仅仅能被显式操作。有时候它也被称为“帧指针”。
%EIP - Instruction Pointer
This holds the address of the next CPU instruction to be executed, and it's saved onto the stack as part of the CALL instruction. As well, any of the "jump" instructions modify the %EIP directly.
EIP指令指针,这个保存下一个将运行的CPU指令的地址,它将被存在栈中作为CALL指令的一部分。任何“跳转”指令将修改EIP指针。
Assembler notation
Virtually everybody in the Intel assembler world uses the Intel notation, but the GNU C compiler uses what they call the "AT&T syntax" for backwards compatibility. This seems to us to be a really dumb idea, but it's a fact of life.
实际上在Intel汇编世界的任何人都使用Intel记号法,但是GNU C语言编译器使用的是“AT&T语法”(为了向后兼容)。这好像是一个挺蠢的事,但是现实就是这样。
There are minor notational differences between the two notations, but by far the most annoying is that the AT&T syntax reverses the source and destination operands. To move the immediate value 4 into the EAX register:
两种记号法(Intel和AT&T)有一些微小的差别,但是最大的不同就是AT&T记号法对于源操作数和目的操作数(相对Intel)是相反的。例如,想移动一个直接值到EAX寄存器:
mov $4, %eax // AT&T notation
mov eax, 4 // Intel notation
More recent GNU compilers have a way to generate the Intel syntax, but it's not clear if the GNU assembler takes it. In any case, we'll use the Intel notation exclusively.
There are other minor differences that are not of much concern to the reverse engineer.
大部分现代的GNU编译器可以生成Intel汇编语法,但是GNU汇编器可能不太清楚这一点。实际上,我们可以只用Intel记号语法。还有其他的一些不同,但是对于反向工程师来说不需要关注。
Calling a __cdecl function
The best way to understand the stack organization is to see each step in calling a function with the __cdecl conventions. These steps are taken automatically by the compiler, and though not all of them are used in every case (sometimes no parameters, sometimes no local variables, sometimes no saved registers), but this shows the overall mechanism employed.
理解栈组织最好的方式就是仔细观察__cdecl约定的函数调用。这些步骤被编译器自动处理,然而不是所有的都会被使用(有时候没有参数,有时候没有本地变量,有时没有保存的寄存器),但是可以显示大部分的机制。
Push parameters onto the stack, from right to left
Parameters are pushed onto the stack, one at a time, from right to left. Whether the parameters are evaluated from right to left is a different matter, and in any case this is unspecified by the language and code should never rely on this. The calling code must keep track of how many bytes of parameters have been pushed onto the stack so it can clean it up later.
从右至左的把参数压入栈中
参数被压在栈中,从右至左每次一个。参数是否被从右至左的计算是另外一件事,绝大部分情况语言没有指定规则,代码也不应该依赖这件事。调用方的代码不需知道多少Bytes的参数被压入栈中,确保以后可以清除它们。
Call the function
Here, the processor pushes contents of the %EIP (instruction pointer) onto the stack, and it points to the first byte after the CALL instruction. After this finishes, the caller has lost control, and the callee is in charge. This step does not change the %ebp register.
调用参数
处理器吧EIP的内容压入栈,它指向CALL指令后的第一个byte。做完这些,调用方就失去控制权,被调用方接管了控制权。这一步不会改变EBP寄存器。
Save and update the %ebp
Now that we're in the new function, we need a new local stack frame pointed to by %ebp, so this is done by saving the current %ebp (which belongs to the previous function's frame) and making it point to the top of the stack.
push ebp
mov ebp, esp // ebp « esp
Once %ebp has been changed, it can now refer directly to the function's arguments as 8(%ebp), 12(%ebp). Note that 0(%ebp) is the old base pointer and 4(%ebp) is the old instruction pointer.
保存并更新EBP寄存器
现在我们是在新的函数中了,需要一个被EBP指向的新的本地栈帧,可以通过保存当前EBP(属于前一个函数帧)并且把它指向栈顶来实现。
push ebp
mov ebp, esp // ebp « esp
一旦EBP被改变,就可以直接引用到函数参数通过EBP+8,EBP+12,注意EBP+0是旧的基地址指针BP,而EBP+4是旧的指令指针IP。
Save CPU registers used for temporaries
If this function will use any CPU registers, it has to save the old values first lest it walk on data used by the calling functions. Each register to be used is pushed onto the stack one at a time, and the compiler must remember what it did so it can unwind it later.
如果这个函数使用了任何一个CPU寄存器,它就不得不把旧的值现存起来以免调用函数使用它们保存过数据。被使用的寄存器依次被压入栈,编译器必须知道这个以保证将来可以恢复。
Allocate local variables
The function may choose to use local stack-based variables, and they are allocated here simply by decrementing the stack pointer by the amount of space required. This is always done in four-byte chunks.
Now, the local variables are located on the stack between the %ebp and %esp registers, and though it would be possible to refer to them as offsets from either one, by convention the %ebp register is used. This means that -4(%ebp) refers to the first local variable.
分配本地变量
函数也许会选择使用本地堆栈级别的变量,它们可以简单的通过增加栈指针来分配需要的空间。每次都会递增4个字节数据块。
现在,本地变量被定位在基于EBP和ESP寄存器中间的栈中,即使可以通过偏移这两个寄存器的任一个来引用它们,通常情况下还是使用EBP寄存器。这意味着EBP-4就会引用到第一个本地变量。
Perform the function's purpose
At this point, the stack frame is set up correctly, and this is represented by the diagram to the right. All the parameters and locals are offsets from the %ebp register:
16(%ebp) |
- third function parameter |
12(%ebp) |
- second function parameter |
8(%ebp) |
- first function parameter |
4(%ebp) |
- old %EIP (the function's "return address") |
0(%ebp) |
- old %EBP (previous function's base pointer) |
-4(%ebp) |
- first local variable |
-8(%ebp) |
- second local variable |
-12(%ebp) |
- third local variable |
The function is free to use any of the registers that had been saved onto the stack upon entry, but it must not change the stack pointer or all Hell will break loose upon function return.
完成函数目的
在这时候,栈帧应该被正确的设置,这代表着整个流程都没问题了,所有的参数和本地变量都可以通过EBP寄存器以及偏移计算引用:
EBP + 16 第三个函数参数
EBP + 12 第二个函数参数
EBP + 8 第一个函数参数
EBP + 4 旧的EIP寄存器 (函数的返回地址)
EBP + 0 旧的EBP寄存器 (前一个函数的基地址指针)
EBP - 4 第一个本地变量
EBP - 8 第二个本地变量
EBP - 12 第三个本地变量(如果都有的话)
这个函数可以随便使用保存在栈中的任意寄存器,但是不能改变栈指针ESP否则函数返回时会出大问题。
Release local storage
When the function allocates local, temporary space, it does so by decrementing from the stack point by the amount space needed, and this process must be reversed to reclaim that space. It's usually done by adding to the stack pointer the same amount which was subtracted previously, though a series of POP instructions could achieve the same thing.
释放本地存储
当函数分配本地临时的空间,是通过在栈中增加需要的空间来完成,这个过程必须被反向操作来归还分配的空间。通常是通过对栈指针移动同样数量,也就是一系列的POP指令来达到目的。
Restore saved registers
For each register saved onto the stack upon entry, it must be restored from the stack in reverse order. If the "save" and "restore" phases don't match exactly, catastrophic stack corruption will occur.
恢复保存的寄存器
对于每个保存在栈中的寄存器,必须通过相反的顺序恢复它们。如果保存和恢复步骤不匹配,灾难性的栈破损现象将会发生。
Restore the old base pointer
The first thing this function did upon entry was save the caller's %ebp base pointer, and by restoring it now (popping the top item from the stack), we effectively discard the entire local stack frame and put the caller's frame back in play.
恢复旧的基地址指针
函数做的第一件事就是保存调用方的基地址指针寄存器EBP,通过现在恢复它们(从栈中pop),我们有效地抛弃了整个本地栈帧,并且把它放回了调用者的帧。
Return from the function
This is the last step of the called function, and the RET instruction pops the old %EIP from the stack and jumps to that location. This gives control back to the calling function. Only the stack pointer and instruction pointers are modified by a subroutine return.
从函数返回
这是调用函数的最后一步,RET指令将会从栈中弹出旧的EIP寄存器,并且跳转到那个位置。这意味着控制权重新回到调用函数。想修改栈指针寄存器和指令指针寄存器,只能通过子函数返回的方式。
Clean up pushed parameters
In the __cdecl convention, the caller must clean up the parameters pushed onto the stack, and this is done either by popping the stack into don't-care registers (for a few parameters) or by adding the parameter-block size to the stack pointer directly.
清除压入的参数
在__cdecl规定中,调用方必须清理压入栈中的参数,这可以通过弹出栈到无关紧要的寄存器实现,或者通过增加参数块大小到站指针来直接实现。
__cdecl -vs- __stdcall
The __stdcall convention is mainly used by the Windows API, and it's a bit more compact than __cdecl. The main difference is that any given function has a hard-coded set of parameters, and this cannot vary from call to call like it can in C (no "variadic functions").
__stdcall通常被Windows API使用,与__cdecl得最大不同是函数必须有固定的参数,不能实现C语言那样的可变参数。
Because the size of the parameter block is fixed, the burden of cleaning these parameters off the stack can be shifted to the called function, instead of being done by the calling function as in __cdecl.
因为参数块大小固定,清理这些参数的任务就被转移到被调用函数中,而不是调用方来实现。
There are several effects of this:
For instance, the function int foo(int a, int b) would generate — at the assembler level — the symbol "_foo@8", where "8" is the number of bytes expected. This means that not only will a call with 1 or 3 parameters not resolve (due to the size mismatch), but neither will a call expecting the __cdecl parameters (which looks for _foo). It's a clever mechanism that avoids a lot of problems.
这有一些效果:
1,代码更小,因为函数自己清理,而不是每次调用的位置都去做清理。也许这个也能让代码跑的更快。
2,使用错误数量的参数来调用函数将是灾难性的,栈将会对不奇,通常的破坏将发生。
3,作为第二条的分支,微软VC编译器做了一些特殊关照对于__stdcall。即使参数数量在编译期已知,编译器将编码参数数量到符号名字中,这意味着错误的调用将导致链接错误。例如函数int foo (int a, int b)将生成_foo@8,8就是期望多少个Byte的参数,这意味着使用一个或三个参数调用将会在链接时找不到。
Variations and Notes
The x86 architecture provides a number of built-in mechanisms for assisting with frame management, but they don't seem to be commonly used by C compilers. Of particular interest is the ENTER instruction, which handles most of the function-prolog code.
ENTER 10,0 PUSH ebp
MOV ebp, esp
SUB esp, 10
We're pretty sure these are functionally equivalent, but our 80386 processor reference suggests that the ENTER version is more compact (6 bytes -vs- 9) but slower (15 clocks -vs- 6). The newer processors are probably harder to pin down, but somebody has probably figured out that ENTER is slower. Sigh.