CLASS 20

Leaf Suboutines

A leaf subroutine is one that does not call any other subroutine. A simpler calling method might be used for the leaf subroutines.

A leaf subroutine uses only the first six out registers, and the global registers %g0 and %g1.

A leaf routine does not execute a save or a restore instruction. e.g. of such subroutines: .mul, .div, etc.

Here, the return address is (%o7 + 8), (and not %i7 + 8)

There is a synthetic instruction for that:

retl
==> jmpl %o7,  8, %g0

If the subroutine foo in the previous example is written as a leaf subroutine, then it should have the following code:

        define(a8_s, arg8_s)
        define(a7_s, arg7_s)
        define(a6_r, o5)
        define(a5_r, o4)
        define(a4_r, o3)
        define(a3_r, o2)
        define(a2_r, o1)
        define(a1_r, o0)

        .global _foo

_foo:   add     %a2_r, %a1_r, %o0       !o0 = first + second
        add     %a3_r, %o0, %o0         !o0 += third argument
        add     %a4_r, %o0, %o0         !o0 += fourth argument
        add     %a5_r, %o0, %o0         !o0 += fifth argument
        add     %a6_r, %o0, %o0         !o0 += sixth argument
        ld      [%sp + a7_s], %o1       !the seventh argument
        add     %o1, %o0, %o0           !o0 += seventh argument
        ld      [%sp + a8_s], %o1       !the eighth argument
        retl
        add     %o1, %o0, %o0           !o0 += eighth argument

Pointers as arguments to Subroutines

Consider the following example:

swap (int *x, int *y)
{  int temp;
   temp = *x;
   *x = *y;
   *y = temp;
}

The corresponding assemply code is given below:

	define(x_s, -4)
	define(y_s, -8)

       .global _main
_main: save %sp, (-64 -8) & -8, %sp
        mov     5, %o0
        st      %o0, [%fp + x_s]        !x = 5
        mov     7, %o0
        st      %o0, [%fp + y_s]        !y = 7
        add     %fp, x_s, %o0           !pointer to x in %o0
        call    _swap
        add     %fp, y_s, %o1           !pointer to y in %o1
        ret
        restore

        .global _swap                      ! a leaf routine
_swap:  ld      [%o0], %o2           !%o2 = x
        ld      [%o1], %o3              !%o3 = y
        st      %o2, [%o1]
        retl
        st      %o3, [%o0]

If x and y are initially stored in registers, then a few modifications to the code are required, as shown below:

	define(x_r, l0)      ! x in %l0
	define(y_r, l1)      ! y in %l1
	define(x_s, -4)      ! where x might be stored on the stack
	define(y_s, -8)      ! where y might be stored on the stack

        .global _main
_main:  save %sp, -72, %sp
        mov     5, %x_r               !x = 5
        mov     7, %y_r               !y = 7  , now call swap
        st      %x_r, [%fp + x_s]   ! place args on stack
        st      %y_r, [%fp + y_s]
        add     %fp, x_s, %o0       !pass pointers to args on stack
        call    _swap
        add     %fp, y_s, %o1
        ld      [%fp + x_s], %x_r    !move values back into reg.
        ld      [%fp + y_s], %y_r
        ret
        restore

        .global _swap                   ! a leaf routine
_swap:  ld      [%o0], %o2         !%o2 = x
        ld      [%o1], %o3             !%o3 = y
        st      %o2, [%o1]
        retl
        st      %o3, [%o0]

Machine Instructions

Instruction Codes

Another need for representation is in the coding of instructions. Each instruction must encode its opcode and its arguments, and once again the issue in designing the encoding is efficiency.

The principle design decision supporting efficiency is the fact that instructions in the SPARC architecture are constant size. Each instructions occupies a single word, 32 bits. This allows the instruction fetch unit to be very simple and fast. For each instruction executed, the i-fetch unit simply gets one word from memory.

The 32 bits that we have available to encode an instruction are divided into fields -- contiguous subsets of the bits. We need to define fields to specify the instruction's opcode, and fields to specify the instruction's arguments (operands).

The format of the instruction should be uniform and as simple as possible. A typical scheme is given below:

8 bits - To specify the instruction (we get 256 instructions!!)
5 bits - For address of the three registers (32 different regs.)
1 bit  - To specify if second argument is a source register or constant
8 bits - Remaining 8 bits either combined with 5 for signed 
	 immediate constant or to provide additional 8 bits to 
	 specify floating point instructions.

Instruction Decode

The 8 bits of the instruction opcode are divided into 2 parts:

op  -  2 bits

op3 -  6 bits

The division of the instruction into fields in done in three different ways, called formats. The format is specified in the first two bits of the instruction:

OP         INSTRUCTION CLASS		Format 

00         Branch Instruction		Format 2 
01         call instruction		Format 1 
10         Format three instruction	Format 3
11         Format three instruction	Format 3

Format 3 Instructions

There are two possibilities for Format 3 instructions:

        op      rs1, rs2, rd

        op      rs1, imm, rd

where "imm" is an immediate value, that is, it is coded right in to the instruction.

For the first case (3 registers), the fields in the instruction look like this:

Width:   2    5     6      5   1    8       5
        ----------------------------------------
        |  |     |      |     |0|        |     |
        ----------------------------------------
Name:    op   rd    op3   rs1      empty    rs2

        op:  10 or 11, indicating that this is a format 3 instruction
        rd:  numeric ID of destination register
        op3: encoding of opcode
        rs1: numeric ID of first source register
        rs2: numeric ID of second source register

The numeric IDs of registers are as follows:

        %g0 - %g7  -- 0 to 7
        %o0 - %o7  -- 8 to 15
        %l0 - %l7  -- 16 to 23
        %i0 - %i7  -- 24 to 31

Five bits are needed to encode a register, since there are only 32 of them. The numeric encoding of opcodes can be found in Chapter 8, pp 215-216.

We can see that eight bits have been left unused, which is necessary to "pad" the instruction out to the required 32 bit length.

For the second case, (2 registers and an immediate value), the fields in the instruction look like this:

Width:   2    5     6      5   1        13
        ----------------------------------------
        |  |     |      |     |1|              |
        ----------------------------------------
Name:    op   rd    op3   rs1    signed constant

        op:  10 or 11, indicating that this is a format 3 instruction
        rd:  numeric ID of destination register
        op3: encoding of opcode
        rs1: numeric ID of first source register

The way the processor tells these two cases apart is by the 1 or 0 right after the rs1 field.

Note that the signed constant is only allotted 13 bits. This is a consequence of our decision to limit all instructions to exactly 32 bits in size. The immediate value is encoded in two's complement, so the range of values we can encode is -4096 to +4095. Thus, if we try to use an immediate value in a format 3 instruction that is out of this range, the assembler will complain -- it just can't fit such a value in there.

7 bits are used for format 3 instructions, so we can have a total of 128 possible instructions. All the opcodes that are not used are named UNIMP