After some research, I find 3 articles about these topic to clearly clarify how function is calling.
http://unixwiz.net/techtips/win32-callconv-asm.html
http://www.installsetupconfig.com/win32programming/processtoolhelpapis12_1.html
http://joshcarter.com/books/pragprowrimo_2009/functions_and_parameters
ne of the "big picture" issues in looking at compiled C code is the function-calling conventions. These are the methods that a calling function and a called function agree on how parameters and return values should be passed between them, and how the stack is used by the function itself. The layout of the stack constitutes the "stack frame", and knowing how this works can go a long way to decoding how something works.
In C and modern CPU design conventions, the stack frame is a chunk of memory, allocated from the stack, at run-time, each time a function is called, to store its automatic variables. Hence nested or recursive calls to the same function, each successively obtain their own separate frames.
Physically, a function's stack frame is the area between the addresses contained in esp, the stack pointer, and ebp, the frame pointer (base pointer in Intel terminology). Thus, if a function pushes more values onto the stack, it is effectively growing its frame.
This is a very low-level view: the picture as seen from the C/C++ programmer is illustrated elsewhere:
• Unixwiz.net Tech Tip: Intel x86 Function-call Conventions - C Programmer's View
For the sake of discussion, we're using the terms that the Microsoft Visual C compiler uses to describe these conventions, even though other platforms may use other terms.
It's important to note that these are merely conventions, and any collection of cooperating code can agree on nearly anything. There are other conventions (passing parameters in registers, for instance) that behave differently, and of course the optimizer can make mincemeat of any clear picture as well.
Our focus here is to provide an overview, and not an authoritative definition for these conventions.
Register use in the stack frame
In both __cdecl and __stdcall conventions, the same set of three registers is involved in the function-call frame:
*--ESP = value; // push value = *ESP++; // pop
Assembler notation
Virtually everybody in the Intel assembler world uses the Intel notation, but the GNU C compiler uses what they call the "AT&T syntax" for backwards compatibility. This seems to us to be a really dumb idea, but it's a fact of life.
There are minor notational differences between the two notations, but by far the most annoying is that the AT&T syntax reverses the source and destination operands. To move the immediate value 4 into the EAX register:
mov $4, %eax // AT&T notation mov eax, 4 // Intel notation
More recent GNU compilers have a way to generate the Intel syntax, but it's not clear if the GNU assembler takes it. In any case, we'll use the Intel notation exclusively.
There are other minor differences that are not of much concern to the reverse engineer.
Calling a __cdecl function
The best way to understand the stack organization is to see each step in calling a function with the __cdecl conventions. These steps are taken automatically by the compiler, and though not all of them are used in every case (sometimes no parameters, sometimes no local variables, sometimes no saved registers), but this shows the overall mechanism employed.
push ebp mov ebp, esp // ebp « esp
16(%ebp) | - third function parameter |
12(%ebp) | - second function parameter |
8(%ebp) | - first function parameter |
4(%ebp) | - old %EIP (the function's "return address") |
0(%ebp) | - old %EBP (previous function's base pointer) |
-4(%ebp) | - first local variable |
-8(%ebp) | - second local variable |
-12(%ebp) | - third local variable |
__cdecl -vs- __stdcall
The __stdcall convention is mainly used by the Windows API, and it's a bit more compact than __cdecl. The main difference is that any given function has a hard-coded set of parameters, and this cannot vary from call to call like it can in C (no "variadic functions").
Because the size of the parameter block is fixed, the burden of cleaning these parameters off the stack can be shifted to the called function, instead of being done by the calling function as in __cdecl. There are several effects of this:
- the code is a tiny bit smaller, because the parameter-cleanup code is found once — in the called function itself — rather than in every place the function is called. These may be only a few bytes per call, but for commonly-used functions it can add up. This presumably means that the code may be a tiny bit faster as well.
- calling the function with the wrong number of parameters is catastrophic - the stack will be badly misaligned, and general havoc will surely ensue.
-
As an offshoot of #2, Microsoft Visual C takes special care of functions that are B{__stdcall}. Since the number of parameters is known at compile time, the compiler encodes the parameter byte count in the symbol name itself, and this means that calling the function wrong leads to a link error.
For instance, the function int foo(int a, int b) would generate — at the assembler level — the symbol "_foo@8", where "8" is the number of bytes expected. This means that not only will a call with 1 or 3 parameters not resolve (due to the size mismatch), but neither will a call expecting the __cdecl parameters (which looks for _foo). It's a clever mechanism that avoids a lot of problems.
Variations and Notes
The x86 architecture provides a number of built-in mechanisms for assisting with frame management, but they don't seem to be commonly used by C compilers. Of particular interest is theENTER instruction, which handles most of the function-prolog code.
ENTER 10,0 PUSH ebp MOV ebp, esp SUB esp, 10
We're pretty sure these are functionally equivalent, but our 80386 processor reference suggests that the ENTER version is more compact (6 bytes -vs- 9) but slower (15 clocks -vs- 6). The newer processors are probably harder to pin down, but somebody has probably figured out that ENTER is slower. Sigh.