Using the Delay Slot Can Sometimes Save a Tick When We Have a Control Hazard
What is the problem?
- Look at the diagram on page 317 of the book.
- It is a sequence of instructions beginning with a beq.
- The lw at the bottom is the target of the branch
(there's a blue arrow telling us that).
- Now look at the addresses of these instructions: 40, 44, 48, 52.
- Then there's a gap in the numbers, and the lw is at address 72
(maybe they should have put ... in the gap); there must be instructions at those intervening addresses
(56, 60, 64, and 68), but they don't enter into the discussion here,
so they didn't waste space on them.
- By the way, those numbers are way too small to be real instruction addresses
(I certainly hope that you know why by now).
- However, that's not the point of this example, and we're going to have to do arithmetic with those
numbers (in our heads, yet), so they did us a favor by making them small.
- Now let's concentrate on the beq instruction for a minute:
they've confused the issue by using an incorrect representation of that instruction.
- The assembly-language view is "beq $1 $3 72" (remember, when PC-relative mode is written out
as assembly language, it looks like direct mode, and direct mode uses
the actual address of the target, which is 72 in this case)
- The offset from the next instruction (at 44) to the target (at 72) is 28, and the times-4 trick
divides that by 4 to give us 7, so the box view looks like this:
- They have written the instruction as "beq $1 $3 28", which doesn't correspond to either view.
- They have used the offset (28) instead of the target address (72) in the fourth field.
- I have no idea why they wanted to do that; it's confusing.
- When the branch is taken, the lw must be the next instruction to be executed
after the branch.
- Look at pages 301 and 302; you can see there that the branch instruction completes in tick 4.
- By this time, three other instructions have been fetched into the pipeline behind the branch.
- The whole point of this branch is to skip around those instructions
and go straight to the lw at address 72.
- But they are already in the pipeline, so we need to do something about that.
- This problem is called a control hazard.
- We obviously need to keep those instructions from executing; how might we do that?
- We can flush them all out of the pipeline, i.e. turn them all into nop's.
- This wastes three clock ticks each time we do it. (gasp!)
- We can find a better way. Look at the diagrams on page 320. There is some new hardware.
- In tick 2 there is a comparator that tests the two outputs of the register file for equality.
- Tick 2 also has a new adder that calculates a branch target for every instruction,
whether it will be used or not. That's fine; we just ignore it if we don't need it.
- By the end of tick 2, the instruction has been decoded, so we know whether it is a branch or not.
- Now the decision to branch or not to branch is made at the end of tick 2, instead of in tick 4.
- This way, there is only one instruction in the pipeline behind the branch,
so at most one tick will be wasted.
- In the upper diagram, the beq is in tick 2 and the and
is being fetched in tick 1.
- If the decision is not to take the branch, everything proceeds normally, and there is no hazard.
- On the other hand, if the decision is to take the branch, we must take unusual action:
- The and that was just fetched must not execute.
- The next instruction fetched must be the lw at address 72.
- The second diagram shows this:
- The lw at address 72 is being fetched in tick 1.
- The and has advanced to tick 2 and has been turned into a
nop.
- The beq and the instructions ahead of it in the pipeline
have advanced normally.
- Now we waste only one clock tick instead of three. (yay!!!)
- One more problem solved (well, almost solved) by throwing money at it.
Sometimes we can save that wasted tick.
- Maybe there is an instruction before the branch that could just as well be after it.
- Then we could move it to the position right behind the branch;
this place is called the delay slot.
- In that position, it will be fetched next after the branch,
and it won't need to be turned into a nop.
- What are the qualifications for such an instruction?
- It must not change either register that the branch reads.
- It must not change any register that is read by any instruction
between its original position and the branch.
- It must not read any register that is changed by any instruction between
its original position and the branch.
- It must not change any register that is changed by any instruction between
its original position and the branch.
- Such an instruction cannot always be found, but sometimes one can.
- It is the job of the assembler to look for such an instruction and, if it can find one,
to move it to the delay slot.
- The programmer does not have to worry about this.
The programmer does not even need to know about this.
Here's a short example showing instructions that all fail the test:
- the or instruction changes $8, which is read by the branch.
- the add instruction changes $9, which is read by the or.
- the and instruction reads $8, which is changed by the or.
- the sub instruction changes $9, which is changed by the add.