ARM 过程调用标准 AAPCS 以及堆栈使用

AAPCS

旧时,ARM 过程调用标准叫做 APCS (ARM Procedure Call Standard),Thumb的过程调用标准为 TPCS。如今这两种叫法已经废弃,统一称作 AAPCS (Procedure Call Standard for the ARM Architecture)。

AAPCS 是 ARM ABI(Application Binary Interface) 接口文档的一份,它定义了:

  • 对寄存器使用的限制。
  • 使用栈的惯例。
  • 在函数调用之间传递/返回参数。
  • 可以被"回溯"的基于栈的结构的格式,用来提供从失败点到程序入口函数的列表,包括函数参数。

有了这个约定,编译器生成代码时才有参照,我们使用内联汇编或者汇编编写代码时也才有了参考,不至于各行其是,引起程序异常。

当前 ARM 官方已经通过 github 来维护相关文档,想获取最新的文档可以访问。Application Binary Interface for the Arm® Architecture。

arm321

默认的情况下,这些寄存器只是叫做 r0,r1,...,r15,在汇编器的支持下大写也是可以接受的,而 AAPCS 对其起了不同的别名。
ARM 过程调用标准 AAPCS 以及堆栈使用_第1张图片

r0-r15 and R0-R15
a1-a4 (argument, result, or scratch registers, synonyms for r0 to r3)
v1-v8 (variable registers, r4 to r11)
sb and SB (static base, r9)
ip and IP (intra-procedure-call scratch register, r12)
sp and SP (stack pointer, r13)
lr and LR (link register, r14)
pc and PC (program counter, r15).

在不满足寄存器传参的场景下,参数将通过栈传递给被调者,ARM 是满减栈, 参数列表 从右往左 依次从高地址到低地址入栈。

AArch642

接下来看下 AArch64 的函数调用过程标准,首先认识一下 AArch64 架构提供的寄存器。

name number mark
x31 SP 堆栈指针寄存器,函数调用栈栈顶,低地址
x30 LR 链接寄存器,函数返回地址
x29 FP 栈帧寄存器,函数调用栈栈底,高地址
x19…x28 Callee-saved registers
x18 The Platform Register, if needed; otherwise a temporary register.
x17 IP1 The second intra-procedure-call temporary register (can be used by call veneers and PLT code); at other times may be used as a temporary register.
x16 IP0 The first intra-procedure-call scratch register (can be used by call veneers and PLT code); at other times may be used as a temporary register.
x9…x15 Caller-saved temporary registers
x8 Indirect result location register
x0…x7 Parameter/result registers

For function calls, general-purpose registers are divided into four groups:

  • X0-X7: Argument registers
  • X9-X15: Caller-saved temporary register
  • X19-X29: Callee-saved registers
  • X8, X16-X18, X29, X30

X8 is the indirect result register. This is used to pass the address location of an indirect result, for example, where a function returns a large structure.
X16 and X17 are IP0 and IP1, intra-procedure-call temporary registers. These can be used by call veneers and similar code, or as temporary registers for intermediate values between subroutine calls. They are corruptible by a function. Veneers are small pieces of code which are automatically inserted by the linker, for example when the branch target is out of range of the branch instruction.
X18 is the platform register and is reserved for the use of platform ABIs. This is an additional temporary register on platforms that don’t assign a special meaning to it.
X29 is the frame pointer register (FP).
X30 is the link register (LR).

ARM 过程调用标准 AAPCS 以及堆栈使用_第2张图片

前面出现了 Caller-saved,Callee-saved,它们分别是什么含义呢?实际上它们规定了调用方和被调用方对寄存器使用的义务。
Caller-saved register (AKA volatile registers, or call-clobbered)。我个人更喜欢 call-clobbered 这个描述,它清楚地表明了这个种类的寄存器在函数调用返回时可能被破坏的,调用方不应当做任何这些寄存器值不变的假设,也就是说如果调用方希望这个值在函数调用后不变,调用方自己有义务去保存[调用发生前]和恢复[调用发生后]相应的寄存器值。

A register used to hold an intermediate value during a calculation (usually, such values are not named in the
program source and have a limited lifetime). If a function needs to preserve the value held in such a register over
a call to another function, then the calling function must save and restore the value.

Callee-saved register (AKA non-volatile registers, or call-preserved) 。我个人更喜欢 call-preserved 这个描述,它清楚地表明了这个种类的寄存器在函数调用返回时是不会被破坏的,调用方可以放心假设这些寄存器值不变。因此如果被调用方要使用这些寄存器,这将是它的义务去保存和恢复这些寄存器的值以满足约定。

A register whose value must be preserved over a function call. If the function being called (the callee) needs to
use the register, then it is responsible for saving and restoring the old value. Or to not touch them.

int funcB(int, int);
int funcC(int, int);

int funcA(int a, int b) {
    int ret = funcB(a, b);
    return ret;
}

int funcB(int a, int b) {
    return funcC(a, b);
}

int funcC(int a, int b) {
    int c = a + b;
    return c;
}

int main(int argc, char *argv[])
{
        return 0;
}

A->B->C。从函数是否调用了其他函数的角度看,可分为叶子函数、非叶子函数:
叶子函数:函数内部没有调用其他函数了,例如上面的 funcC
非叶子函数:函数内部还调用了其他的函数,例如上面的 funcA、funcB

以下是 O0 编译的结果。

0000000000400524 <funcA>:
  400524:	a9bd7bfd 	stp	x29, x30, [sp, #-48]!
  400528:	910003fd 	mov	x29, sp
  40052c:	b9001fa0 	str	w0, [x29, #28]
  400530:	b9001ba1 	str	w1, [x29, #24]
  400534:	b9401ba1 	ldr	w1, [x29, #24]
  400538:	b9401fa0 	ldr	w0, [x29, #28]
  40053c:	94000005 	bl	400550 <funcB>		  // call the funcB, return address in the lr
  400540:	b9002fa0 	str	w0, [x29, #44]
  400544:	b9402fa0 	ldr	w0, [x29, #44]
  400548:	a8c37bfd 	ldp	x29, x30, [sp], #48
  40054c:	d65f03c0 	ret

0000000000400550 <funcB>:
  400550:	a9be7bfd 	stp	x29, x30, [sp, #-32]! // 保护现场,fp & lr. sp 下移 32 bytes
  400554:	910003fd 	mov	x29, sp				  // fp = sp
  400558:	b9001fa0 	str	w0, [x29, #28]		  // prepare the parameters
  40055c:	b9001ba1 	str	w1, [x29, #24]		  // Use fp to locate the para, why not sp ?
  400560:	b9401ba1 	ldr	w1, [x29, #24]
  400564:	b9401fa0 	ldr	w0, [x29, #28]
  400568:	94000003 	bl	400574 <funcC>		  // call the funcC
  40056c:	a8c27bfd 	ldp	x29, x30, [sp], #32	  // restore the fp & lr, and sp 上移动 32 bytes,回收栈空间
  400570:	d65f03c0 	ret

0000000000400574 <funcC>:
  400574:	d10083ff 	sub	sp, sp, #0x20		  // sp 下移 32 bytes。这里没有再存 lr & fp
  400578:	b9000fe0 	str	w0, [sp, #12]		  // 原因是 funcC 是叶子函数,里面没有 bl 指令等涉及更改 lr 的指令
  40057c:	b9000be1 	str	w1, [sp, #8]		  // 且下面的指令均使用 sp 去寻址,也没有存 fp。
  400580:	b9400fe1 	ldr	w1, [sp, #12]
  400584:	b9400be0 	ldr	w0, [sp, #8]
  400588:	0b000020 	add	w0, w1, w0
  40058c:	b9001fe0 	str	w0, [sp, #28]
  400590:	b9401fe0 	ldr	w0, [sp, #28]
  400594:	910083ff 	add	sp, sp, #0x20		  // restore the sp, 回收栈空间
  400598:	d65f03c0 	ret

arm 处理器的栈是满降序栈,由高地址往低地址增长,sp 总是指向在当前栈桢中使用的最低地址。aarch64 架构开辟栈空间是 16 字节的倍数,这是 aarch64 对栈对齐的要求,同时也许有助于充分利用 cache 的宽度,加快 CPU 对数据的访问速度。


  1. https://static.docs.arm.com/ihi0042/f/IHI0042F_aapcs.pdf ↩︎

  2. https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/using-the-stack-in-aarch32-and-aarch64 ↩︎

你可能感兴趣的:(ARM,arm,arm开发)