A leaf subroutine is one that does not call any other subroutine. A simpler calling method might be used for the leaf subroutines.
A leaf subroutine uses only the first six out registers, and the global registers %g0 and %g1.
A leaf routine does not execute a save or a restore instruction. e.g. of such subroutines: .mul, .div, etc.
Here, the return address is (%o7 + 8), (and not %i7 + 8)
There is a synthetic instruction for that:
retl ==> jmpl %o7, 8, %g0
If the subroutine foo in the previous example is written as a leaf subroutine, then it should have the following code:
define(a8_s, arg8_s) define(a7_s, arg7_s) define(a6_r, o5) define(a5_r, o4) define(a4_r, o3) define(a3_r, o2) define(a2_r, o1) define(a1_r, o0) .global _foo _foo: add %a2_r, %a1_r, %o0 !o0 = first + second add %a3_r, %o0, %o0 !o0 += third argument add %a4_r, %o0, %o0 !o0 += fourth argument add %a5_r, %o0, %o0 !o0 += fifth argument add %a6_r, %o0, %o0 !o0 += sixth argument ld [%sp + a7_s], %o1 !the seventh argument add %o1, %o0, %o0 !o0 += seventh argument ld [%sp + a8_s], %o1 !the eighth argument retl add %o1, %o0, %o0 !o0 += eighth argument
swap (int *x, int *y) { int temp; temp = *x; *x = *y; *y = temp; }
The corresponding assemply code is given below:
define(x_s, -4) define(y_s, -8) .global _main _main: save %sp, (-64 -8) & -8, %sp mov 5, %o0 st %o0, [%fp + x_s] !x = 5 mov 7, %o0 st %o0, [%fp + y_s] !y = 7 add %fp, x_s, %o0 !pointer to x in %o0 call _swap add %fp, y_s, %o1 !pointer to y in %o1 ret restore .global _swap ! a leaf routine _swap: ld [%o0], %o2 !%o2 = x ld [%o1], %o3 !%o3 = y st %o2, [%o1] retl st %o3, [%o0]
If x and y are initially stored in registers, then a few modifications to the code are required, as shown below:
define(x_r, l0) ! x in %l0 define(y_r, l1) ! y in %l1 define(x_s, -4) ! where x might be stored on the stack define(y_s, -8) ! where y might be stored on the stack .global _main _main: save %sp, -72, %sp mov 5, %x_r !x = 5 mov 7, %y_r !y = 7 , now call swap st %x_r, [%fp + x_s] ! place args on stack st %y_r, [%fp + y_s] add %fp, x_s, %o0 !pass pointers to args on stack call _swap add %fp, y_s, %o1 ld [%fp + x_s], %x_r !move values back into reg. ld [%fp + y_s], %y_r ret restore .global _swap ! a leaf routine _swap: ld [%o0], %o2 !%o2 = x ld [%o1], %o3 !%o3 = y st %o2, [%o1] retl st %o3, [%o0]
The principle design decision supporting efficiency is the fact that instructions in the SPARC architecture are constant size. Each instructions occupies a single word, 32 bits. This allows the instruction fetch unit to be very simple and fast. For each instruction executed, the i-fetch unit simply gets one word from memory.
The 32 bits that we have available to encode an instruction are divided into fields -- contiguous subsets of the bits. We need to define fields to specify the instruction's opcode, and fields to specify the instruction's arguments (operands).
The format of the instruction should be uniform and as simple as possible. A typical scheme is given below:
8 bits - To specify the instruction (we get 256 instructions!!) 5 bits - For address of the three registers (32 different regs.) 1 bit - To specify if second argument is a source register or constant 8 bits - Remaining 8 bits either combined with 5 for signed immediate constant or to provide additional 8 bits to specify floating point instructions.
The 8 bits of the instruction opcode are divided into 2 parts:
op - 2 bits op3 - 6 bits
The division of the instruction into fields in done in three different ways, called formats. The format is specified in the first two bits of the instruction:
OP INSTRUCTION CLASS Format 00 Branch Instruction Format 2 01 call instruction Format 1 10 Format three instruction Format 3 11 Format three instruction Format 3
op rs1, rs2, rdor
op rs1, imm, rdwhere "imm" is an immediate value, that is, it is coded right in to the instruction.
For the first case (3 registers), the fields in the instruction look like this:
Width: 2 5 6 5 1 8 5 ---------------------------------------- | | | | |0| | | ---------------------------------------- Name: op rd op3 rs1 empty rs2 op: 10 or 11, indicating that this is a format 3 instruction rd: numeric ID of destination register op3: encoding of opcode rs1: numeric ID of first source register rs2: numeric ID of second source registerThe numeric IDs of registers are as follows:
%g0 - %g7 -- 0 to 7 %o0 - %o7 -- 8 to 15 %l0 - %l7 -- 16 to 23 %i0 - %i7 -- 24 to 31Five bits are needed to encode a register, since there are only 32 of them. The numeric encoding of opcodes can be found in Chapter 8, pp 215-216.
We can see that eight bits have been left unused, which is necessary to "pad" the instruction out to the required 32 bit length.
For the second case, (2 registers and an immediate value), the fields in the instruction look like this:
Width: 2 5 6 5 1 13 ---------------------------------------- | | | | |1| | ---------------------------------------- Name: op rd op3 rs1 signed constant op: 10 or 11, indicating that this is a format 3 instruction rd: numeric ID of destination register op3: encoding of opcode rs1: numeric ID of first source registerThe way the processor tells these two cases apart is by the 1 or 0 right after the rs1 field.
Note that the signed constant is only allotted 13 bits. This is a consequence of our decision to limit all instructions to exactly 32 bits in size. The immediate value is encoded in two's complement, so the range of values we can encode is -4096 to +4095. Thus, if we try to use an immediate value in a format 3 instruction that is out of this range, the assembler will complain -- it just can't fit such a value in there.
7 bits are used for format 3 instructions, so we can have a total of 128 possible instructions. All the opcodes that are not used are named UNIMP