Sparc V8指令
在sparc V8手册中p83(Table A-1 Mapping of Synthetic Instructions to SPARC Instructions)有合成指令synthetic instruction,这些合成指令是作为助记符mnemonic存在的,实际指令并不是这样,合成指令只是更利于记忆。
The load integer instructions copy a byte, a halfword, or a word from memory into r[rd]. A fetched byte or halfword is right-justified in destination register r[rd]; it is either sign-extended or zero-filled on the left, depending on whether or not the opcode specifies a signed or unsigned operation, respectively.
Jump and Link Instruction
jmpl address, regrd
除了跳转之外,还会将PC拷贝到regrd。The JMPL instruction copies the PC, which contains the address of the JMPL instruction, into register r[rd].
Call and Link Instruction
对于Sparc来说,call实际执行的jmpl指令(这个地方有点没明白,v8手册中明确说CALL是格式1类指令p43,而且B.24还有CALL指令解释;而Table A-1中还call的实际指令是jmpl?)
call address = jmpl address,%o7,即跳转到address,同时,将PC拷贝到%o7(该寄存器用于存储跳转时的PC指针,作用类似ARM的LR)
语法为call label,执行操作如下:
将当前的PC拷贝到o7(这就是call and link中link的意思,保存中断执行的地址,共调用函数使用该链接返回,继续执行)
下面为v8手册p123的解释,其中有描述:The CALL instruction also writes the value of PC, which contains the address of the CALL, into r[15] (out register 7).
(ARM跳转有BL指令,Branch Link(Saves (PC+4) in LR and jumps to function),即首先将跳转指令的下一条指令地址保存在LR寄存器中,以便调用函数返回时能找到返回地址,然后执行跳转。)

Format (1): 01 disp30 31 29 0 Suggested Assembly Language Syntax call label Description: The CALL instruction causes an unconditional, delayed, PC-relative control transfer to address “PC + (4 × disp30)”. Since the word displacement (disp30) field is 30 bits wide, the target address can be arbitrarily distant. The PC-relative displacement is formed by appending two low-order zeros to the instruction’s 30-bit word displacement field. The CALL instruction also writes the value of PC, which contains the address of the CALL, into r[15] (out register 7).
call; jmpl , %o7 mov 0, %o0
由于是延时指令,因此,最后的mov 0,%o0也会执行(o0通常用于存储返回值,即被调函数将返回值放在i0,这里的延时指令用于将返回值存储的寄存器先清零)
ret,return from subroutine,从非叶子函数返回
retl,return from leaf subroutine,从叶子函数返回
从上表看ret和retl中的l是leaf叶子的意思,不是link的意思。且两个都是通过jmpl实现的(V8手册中p83(Table A-1 Mapping of Synthetic Instructions to SPARC Instructions))。
下面函数返回时,调用ret,实际执行的jmpl %i7+8, %g0,效果为
由于jmpl是延时指令,因此,后面的restore也会执行,只不过效果是restore %g0,%g0,%g0,即仅是将CWP+1
function: save %sp, -C, %sp ; perform. function, leave return value, ; if any, in register %i0 upon exit ret ; jmpl %i7+8, %g0 restore ; restore %g0,%g0,%g0
Figure 5 - Epilogue/prologue in procedures |
SAVE指令还具有ADD的功能,因此,save %sp, -1024, %sp,还可以为被调函数开辟1024字节的栈帧stack frame。
RESTORE指令还具有ADD的功能,因此,restore %sp, 1024, %sp,可以恢复为原来的栈帧(需要看实际汇编确认?)。

Furthermore, if and only if an overflow or underflow trap is not generated, SAVE and RESTORE behave like normal ADD instructions, except that the source operands r[rs1] and/or r[rs2] are read from the old window (that is, the window addressed by the original CWP) and the sum is written into r[rd] of the new window (that is, the window addressed by new_CWP).
对于leaf函数,并不会执行save和restore指令,下图为Peter Magnusson里的图。
function: ; no save instruction needed upon entry ; perform function, leave return value, ; if any, in register %o0 upon exit retl ; jmpl %o7+8, %g0 nop ; the delay slot can be used for something else
Figure 6 - Epilogue/prologue in leaf procedures |
因为sparc使用了PC和nPC的机制,即nPC->PC, nPC+4->nPC。
执行完branch后,nPC->PC=branch下一条指令的地址, nPC=branch目标地址。

2.分支指令延时间隙(Branch delay slot), SPARC用两个程序指针PC和nPC,来保持指令执行的轨迹。PC持有下一条要执行的指令的地址;第二个程序指针,nPC持有PC的下一个值。通常,SPARC在每一条指令执行结束时更新当前程序指针,更新时,用nPC的值代替PC的值,nPC的值则是“其原值+4”。当它执行一个转移指令时,SPARC分配nPC的值给PC,然后更新nPC的值。如果跳转发生,nPC被分配一个用指令声明的目标(地址)值;否则,nPC的值是自增4的。也就是说,跳转指令发生时,一时并没改变PC的值到目标地址,它只能执行原nPC给的下一条指令(这就产生了跳转间隙)。 3.在跳转指令的延时间隙里不方便放有用的指令的情形时,SPARC提供 了“nop”复合指令。nop的执行,它不改变任何寄存器或内存的值。然而,它的作用是导致处理器执行更多的指令,比如,增加所需执行程序的时间。 4.用加后缀",a"来声明跳转间隙无效。如果条件分支指令没有发生,则跳转间隙中的指令无效。如 bg ,a top。 这里讲的延时间隙是很牛的东东 在Sparc汇编中经常看到,所以要花点时间: test.c: int temp; int x = 0; int y = 0x9; int z = 0x42; temp = y; while(temp > 0) { x = x+z; temp = temp-1; } 简单地转换,这里用到nop指令放到“分支指令延时间隙(Branch delay slot)”中。 .data x: .word 0 y: .word 0x9 z: .word 0x42 .text start: set y, %r1 ld [%r1],%r2 set z, %r1 ld [%r1],%r3 mov %r0,%r4 add %r2, 1, %r2 ba test nop !here, Branch delay slot top: add %r4, %r3, %r4 test: subcc %r2, 1, %r2 bg top nop !here, Branch delay slot set x,%r1 st %r4,[%r1] !store x end: ta 0 在上面,分支指令bg top,当发生跳转时,处理器并没有马上执行跳转,而是要执行完nop这条分支指令延时间隙中的指令后,才转移到top:处。 Ok,在“分支指令延时间隙(Branch delay slot)”中用有用的指令。 .data ...... .text start: set y,%r1 ld [%r1],%r2 set z,%r1 ld [%r1],%r3 mov %r0,%r4 add %r2, 1, %r2 top: subcc %r2, 1, %r2 bg,a top add %r4, %r3, %r4 set x, %r1 st %r4,[%r1] end: ta 0 分支指令bg,a top,当发生跳转时,处理器并没有马上执行跳转,而是要执行完“add %r4, %r3, %r4”这条分支指令延时间隙中的指令后,才转移到top:处。而且,当跳转条件不满足时,分支指令延时间隙中的指令无效。

sethi指令 格式: sethi const22, %reg 这个指令的作用就在于把const22放在reg的高22位,和把reg的低十位设为0。 sethi 0x333333, %L1; 0x333333 是 1100110011001100110011 经过指令作用%L1为 1100110011001100110011 0000000000 好,下面进入机智问答时间: Q:为什么要这样的一条指令? A:目的在于把一个32位的常数(比如:地址)放入寄存器。因为你不可能用一条指令来完成这个功能,(因为所有的指令都是32位长,其中还包括操作命令、标志位等)所以用这个指令再配合(add或者or)把底位补上,就能很好的解决问题。 例如,把%L1设为0x89abcdef: 1.把0x89abcdef分为高22位和底10位 89abcdef = 10001001101010111100110111101111 高22位:1000100110101011110011 = 226ae3 底10位:0111101111 = 1ef 2.把这两部分分别的放进%L1 sethi 0x226af3, %L1 or %L1, 0x1ef, %L1 看完后,你心里也许开始骂指令的开发者,这不是傻A的后面吗?每次放一个数我都得自己拆分一次? 呵呵,还记得开始介绍的%hi(X)和%lo(X)吗?前一个指令就是获得常数X的高22位,后一个就是获得常数X底10位。 所以,我们可以这样做: sethi %hi(0x89abcdef), %L1 or %L1, lo(0x89abcdef), %L1 不过这也挺麻烦的,我们可以使用一条等效的组合指令:set(它不是真正的sparc指令,其实就是把sethi...or...封装起来)。 set const32, %reg
下面是来自Peter Magnusson《Understanding stacks and registers in the Sparcarchitecture(s)》的一段解释

At any given time, only one window is visible, as determined by the current window pointer (CWP) which is part of the processor status register (PSR). This is a five bit value that can be decremented or incremented by the SAVE and RESTORE instructions, respectively. These instructions are generally executed on procedure call and return (respectively). The idea is that the in registers contain incoming parameters, the local register constitute scratch registers, the out registers contain outgoing parameters, and the global registers contain values that vary little between executions. The register windows overlap partially, thus the out registers become renamed by SAVE to become the in registers of the called procedure. Thus, the memory traffic is reduced when going up and down the procedure call. Since this is a frequent operation, performance is improved. (That was the idea, anyway. The drawback is that upon interactions with the system the registers need to be flushed to the stack, necessitating a long sequence of writes to memory of data that is often mostly garbage. Register windows was a bad idea that was caused by simulation studies that considered only programs in isolation, as opposed to multitasking workloads, and by considering compilers with poor optimization. It also caused considerable problems in implementing high-end Sparc processors such as the SuperSparc, although more recent implementations have dealt effectively with the obstacles. Register windows is now part of the compatibility legacy and not easily removed from the architecture.)
窗口用完了会产生溢出overflow trap(将reg写入存储器),所有窗口都空了会产生下溢underflow trap(从存储器加载到reg)。

The "WIM" register is also indicated in the top left of figure 1. The window invalid mask is a bit map of valid windows. It is generally used as a pointer, i.e. exactly one bit is set in the WIM register indicating which window is invalid (in the figure it's window 7). Register windows are generally used to support procedure calls, so they can be viewed as a cache of the stack contents. The WIM "pointer" indicates how many procedure calls in a row can be taken without writing out data to memory. In the figure, the capacity of the register windows is fully utilized. An additional call will thus exceed capacity, triggering a window overflow trap. At the other end, a window underflow trap occurs when the register window "cache" if empty and more data needs to be fetched from memory.
fp=frame pointer,sp=stack pointer

The sparc register windows are, naturally, intimately related to the stack. In particular, the stack pointer (%sp or %o6) must always point to a free block of 64 bytes. This area is used by the operating system (Solaris, SunOS, and Linux at least) to save the current local and in registers upon a system interupt, exception, or trap instruction. (Note that this can occur at any time.)
* If r[0] is addressed as a source operand (rs1 = 0 or rs2 = 0, or rd = 0 for a
Store) the constant value 0 is read. When r[0] is used as a destination
operand (rd = 0, excepting Stores), the data written is discarded (no r regis-
ter is modified).
* The CALL instruction writes its own address into register r[15] (out register
* When a trap occurs, the program counters PC and nPC are copied into regis-
ters r[17] and r[18] (local registers 1 and 2) of the trap’s new register win-
Peter Magnusson《Understanding stacks and registers in the Sparcarchitecture(s)》

