If-then is pretty simple, just complement the condition and jump OVER the body of the "if" if it's not true.
To translate:
We would get (before filling delay slots):
/* six.c */ d = a; if ((a + b) > c) { a += b; c++; } a = c + d;
Fill the delay slot with the first prior instruction that doesn't affect the condition code:
/* six.s */ mov %a_r, %d_r add %a_r, %b_r, %o0 cmp %o0, %c_r ble next nop add %a_r, %b_r, %a_r add %c_r, 1, %c_r next: add %c_r, %d_r, %a_r
This is best. However, sometimes you just can't find a good instruction to fill the delay slot. In that case, using the annulled version, you can fill the delay slot "half the time":
/* six1.s */ add %a_r, %b_r, %o0 cmp %o0, %c_r ble next mov %a_r, %d_r add %a_r, %b_r, %a_r add %c_r, 1, %c_r next: add %c_r, %d_r, %a_r
Trace the number of instructions executed if the branch is taken, and if the branch isn't taken.
/* six2.s */ mov %a_r, %d_r add %a_r, %b_r, %o0 cmp %o0, %c_r ble,a next add %c_r, %d_r, %a_r add %a_r, %b_r, %a_r add %c_r, 1, %c_r add %c_r, %d_r, %a_r next:
Now, as before, we complement the condition and jump over the first block. Don't interchange blocks -- it makes the assembly code too hard to compare to the C code.
/* seven.c */ if ((a + b) >= c) { a += b; c++; } else { a -= b; c--; } c += 10;
There are a number of ways to get rid of the nops here. The first nop can be removed using an annulled branch:
/* seven.s */ add %a_r, %b_r, %o0 cmp %o0, %c_r bl else nop add %a_r, %b_r, %a_r add %c_r, 1, %c_r ba next nop else: sub %a_r, %b_r, %a_r sub %c_r, 1, %c_r next: add %c_r, 10, %c_r
The second nop can be removed by moving code (eg, the c++ line):
/* seven1.s */ add %a_r, %b_r, %o0 cmp %o0, %c_r bl,a else sub %a_r, %b_r, %a_r add %a_r, %b_r, %a_r add %c_r, 1, %c_r ba next nop sub %a_r, %b_r, %a_r else: sub %c_r, 1, %c_r next: add %c_r, 10, %c_r
/* seven2.s */ add %a_r, %b_r, %o0 cmp %o0, %c_r bl,a else sub %a_r, %b_r, %a_r add %a_r, %b_r, %a_r ba next add %c_r, 1, %c_r sub %a_r, %b_r, %a_r else: sub %c_r, 1, %c_r next: add %c_r, 10, %c_r
/* eight.c */ for (i =15, i > 3; i--)
{ if (i == 12) j = 10;
else j = 12;
}
The assembly code for the following is given below :
I leave it as an exercise for you to optimize this piece of code to get rid of all the nops (if possible!!)
/* eight.s */ mov 15, %i_r
loop: cmp %i_r, 3 ble exit nop cmp %i_r, 12 bne else nop mov 10, %j_r sub %i_r, 1, %i_r ba loop nop else: mov 12, %j_r sub %i_r, 1, %i_r ba loop nop exit:
There are two program counters. %pc and %npc. %npc is always copied to %pc; but %npc is sometimes incremented by four, other times modified. On each cycle, this occurs. One can understand how delay slots work by tracking the contents of pc and npc. Because there are TWO program counters, the effective depth of the SPARC pipeline is 2.
The net effect is that instructions after branches and calls are always executed. They are called "delay slot" instructions.
Clearly, one can always put nops in there. How can this be more useful? Well, it turns out that there is almost always an instruction that can be put there. The instruction cannot be allowed to modify data that affects the branch however!
To fill a delay slot, find an instruction that can be placed immediately before the branch, but doesn't affect the condition tested by the branch.
Consider translating the following code into assembly:
Assume we keep b in %l0 and we keep a in %l1. The simplest translation would be:
/* nine.c */ b = 0; if (a <= 17) { a++; }
This code is correct because we have put a "nop" in the delay slot. A nop instruction simply does nothing. However, it is also a waste of the processor's time. The code is less efficient when it contains a nop -- it takes longer to get the job done. What can we do?
/* nine.s */ clr %l0 subcc %l1, 17, %g0 bg end nop add %l1, 1, %l1 end: ! (rest of program here)
We observe that the instruction "clr %l0" doesn't affect the branch condition. So we move it into the delay slot:
This code is more efficient, because it only requires 4 clock cycles to execute instead of 5 as before. However, when reading the program, we have to read it OUT OF ORDER -- we think of the "clr" instruction as happening BEFORE the "bg" instruction.
/* nine1.s */ subcc %l1, 17, %g0 bg end clr %l0 add %l1, 1, %l1 end: (rest of program here)
How should we fill a delay slot in general? We cannot move an instruction into the delay slot that affects the condition codes we wish to test. Find an instruction prior to the delay slot that doesn't affect the branch condition.
One thing you should never do is put a control transfer instruction into a delay slot. Chaos would ensue, as can be seen from tracking the contents of %pc and %npc for such a scenario.
Delay Slots - things to remember: