旧时,ARM 过程调用标准叫做 APCS
(ARM Procedure Call Standard),Thumb的过程调用标准为 TPCS
。如今这两种叫法已经废弃,统一称作 AAPCS
(Procedure Call Standard for the ARM Architecture)。
AAPCS
是 ARM ABI
(Application Binary Interface) 接口文档的一份,它定义了:
有了这个约定,编译器生成代码时才有参照,我们使用内联汇编或者汇编编写代码时也才有了参考,不至于各行其是,引起程序异常。
当前 ARM 官方已经通过 github 来维护相关文档,想获取最新的文档可以访问。Application Binary Interface for the Arm® Architecture。
默认的情况下,这些寄存器只是叫做 r0,r1,...,r15
,在汇编器的支持下大写也是可以接受的,而 AAPCS
对其起了不同的别名。
r0-r15 and R0-R15
a1-a4 (argument, result, or scratch registers, synonyms for r0 to r3)
v1-v8 (variable registers, r4 to r11)
sb and SB (static base, r9)
ip and IP (intra-procedure-call scratch register, r12)
sp and SP (stack pointer, r13)
lr and LR (link register, r14)
pc and PC (program counter, r15).
在不满足寄存器传参的场景下,参数将通过栈传递给被调者,ARM 是满减栈, 参数列表 从右往左
依次从高地址到低地址入栈。
接下来看下 AArch64 的函数调用过程标准,首先认识一下 AArch64 架构提供的寄存器。
name | number | mark |
---|---|---|
x31 | SP | 堆栈指针寄存器,函数调用栈栈顶,低地址 |
x30 | LR | 链接寄存器,函数返回地址 |
x29 | FP | 栈帧寄存器,函数调用栈栈底,高地址 |
x19…x28 | Callee-saved registers | |
x18 | The Platform Register, if needed; otherwise a temporary register. | |
x17 | IP1 | The second intra-procedure-call temporary register (can be used by call veneers and PLT code); at other times may be used as a temporary register. |
x16 | IP0 | The first intra-procedure-call scratch register (can be used by call veneers and PLT code); at other times may be used as a temporary register. |
x9…x15 | Caller-saved temporary registers | |
x8 | Indirect result location register | |
x0…x7 | Parameter/result registers |
For function calls, general-purpose registers are divided into four groups:
X8 is the indirect result register. This is used to pass the address location of an indirect result, for example, where a function returns a large structure.
X16 and X17 are IP0 and IP1, intra-procedure-call temporary registers. These can be used by call veneers and similar code, or as temporary registers for intermediate values between subroutine calls. They are corruptible by a function. Veneers are small pieces of code which are automatically inserted by the linker, for example when the branch target is out of range of the branch instruction.
X18 is the platform register and is reserved for the use of platform ABIs. This is an additional temporary register on platforms that don’t assign a special meaning to it.
X29 is the frame pointer register (FP).
X30 is the link register (LR).
前面出现了 Caller-saved,Callee-saved,它们分别是什么含义呢?实际上它们规定了调用方和被调用方对寄存器使用的义务。
Caller-saved register (AKA volatile registers, or call-clobbered)。我个人更喜欢 call-clobbered
这个描述,它清楚地表明了这个种类的寄存器在函数调用返回时可能被破坏的,调用方不应当做任何这些寄存器值不变的假设,也就是说如果调用方希望这个值在函数调用后不变,调用方自己有义务去保存[调用发生前]和恢复[调用发生后]相应的寄存器值。
A register used to hold an intermediate value during a calculation (usually, such values are not named in the
program source and have a limited lifetime). If a function needs to preserve the value held in such a register over
a call to another function, then the calling function must save and restore the value.
Callee-saved register (AKA non-volatile registers, or call-preserved) 。我个人更喜欢 call-preserved
这个描述,它清楚地表明了这个种类的寄存器在函数调用返回时是不会被破坏的,调用方可以放心假设这些寄存器值不变。因此如果被调用方要使用这些寄存器,这将是它的义务去保存和恢复这些寄存器的值以满足约定。
A register whose value must be preserved over a function call. If the function being called (the callee) needs to
use the register, then it is responsible for saving and restoring the old value. Or to not touch them.
int funcB(int, int);
int funcC(int, int);
int funcA(int a, int b) {
int ret = funcB(a, b);
return ret;
}
int funcB(int a, int b) {
return funcC(a, b);
}
int funcC(int a, int b) {
int c = a + b;
return c;
}
int main(int argc, char *argv[])
{
return 0;
}
A->B->C。从函数是否调用了其他函数的角度看,可分为叶子函数、非叶子函数:
叶子函数:函数内部没有调用其他函数了,例如上面的 funcC
。
非叶子函数:函数内部还调用了其他的函数,例如上面的 funcA、funcB
。
以下是 O0
编译的结果。
0000000000400524 <funcA>:
400524: a9bd7bfd stp x29, x30, [sp, #-48]!
400528: 910003fd mov x29, sp
40052c: b9001fa0 str w0, [x29, #28]
400530: b9001ba1 str w1, [x29, #24]
400534: b9401ba1 ldr w1, [x29, #24]
400538: b9401fa0 ldr w0, [x29, #28]
40053c: 94000005 bl 400550 <funcB> // call the funcB, return address in the lr
400540: b9002fa0 str w0, [x29, #44]
400544: b9402fa0 ldr w0, [x29, #44]
400548: a8c37bfd ldp x29, x30, [sp], #48
40054c: d65f03c0 ret
0000000000400550 <funcB>:
400550: a9be7bfd stp x29, x30, [sp, #-32]! // 保护现场,fp & lr. sp 下移 32 bytes
400554: 910003fd mov x29, sp // fp = sp
400558: b9001fa0 str w0, [x29, #28] // prepare the parameters
40055c: b9001ba1 str w1, [x29, #24] // Use fp to locate the para, why not sp ?
400560: b9401ba1 ldr w1, [x29, #24]
400564: b9401fa0 ldr w0, [x29, #28]
400568: 94000003 bl 400574 <funcC> // call the funcC
40056c: a8c27bfd ldp x29, x30, [sp], #32 // restore the fp & lr, and sp 上移动 32 bytes,回收栈空间
400570: d65f03c0 ret
0000000000400574 <funcC>:
400574: d10083ff sub sp, sp, #0x20 // sp 下移 32 bytes。这里没有再存 lr & fp
400578: b9000fe0 str w0, [sp, #12] // 原因是 funcC 是叶子函数,里面没有 bl 指令等涉及更改 lr 的指令
40057c: b9000be1 str w1, [sp, #8] // 且下面的指令均使用 sp 去寻址,也没有存 fp。
400580: b9400fe1 ldr w1, [sp, #12]
400584: b9400be0 ldr w0, [sp, #8]
400588: 0b000020 add w0, w1, w0
40058c: b9001fe0 str w0, [sp, #28]
400590: b9401fe0 ldr w0, [sp, #28]
400594: 910083ff add sp, sp, #0x20 // restore the sp, 回收栈空间
400598: d65f03c0 ret
arm
处理器的栈是满降序栈,由高地址往低地址增长,sp
总是指向在当前栈桢中使用的最低地址。aarch64
架构开辟栈空间是 16 字节的倍数,这是 aarch64
对栈对齐的要求,同时也许有助于充分利用 cache
的宽度,加快 CPU 对数据的访问速度。
https://static.docs.arm.com/ihi0042/f/IHI0042F_aapcs.pdf ↩︎
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/using-the-stack-in-aarch32-and-aarch64 ↩︎