pipelining

To make it easier to navigate this post, here are the contents:

--------------------------------------

 A.   Data Hazards
  1.  Needing to forward
  2.  Needing to stall
  3.  Not needing anything
 B.  Control hazards
  1.  Flushing the pipeline - jump
  2.  Flushing the pipeline - branch

--------------------------------------





====================
 A.  DATA HAZARDS
====================

In our 5-stage pipelined architecture, all instructions take 5
cycles.  What's important to remember is *when* certain pieces of
data are ready.  In particular,
  IF   calculate pc+4
  ID   read registers / jump target / jump taken
  EX   know ALU result / branch condition+target / branch taken
  MEM  access memory
  WB   write registers

When instructions use different pieces of information at
different times, we run into no problems.  The pipeline runs
smoothly and we can get a maximum of 5x speedup (assuming all
stages are full, all the time).  For example,

addi    $1, $0, 4
addi    $2, $2, 4
add     $0, $0, $0

the pipeline diagram of the stages is

        1  2  3  4   5   6   7
addi    IF ID EX MEM WB
addi       IF ID EX  MEM WB
add           IF ID  EX  MEM WB

-----------------------
 1.  Needing to forward
-----------------------

The problem arises when data in one stage depends on data in
another stage.  This is called a data dependency, and is a
potential hazard --- are we going to get the data in time?  For
example,

addi    $1, $1, 4
lw      $2, 0($1)

has a dependency between $1 in the addi and in the lw.  This will
need to be resolved by forwarding, as we will see.

Note that the addi has the result of the addition at the end of
EXecute (cycle 3), but has not written the result back into the
register bank until WB (cycle 5).  The lw instruction needs the
data in ID (cycle 3) when it reads the registers... or in the
worst case, in EX when it calculates the address.  Solution: we
can forward the result of the addi operation in EX (cycle 3) into
the EX stage of the lw instruction (cycle 4).

        1  2  3  4   5   6
addi    IF ID EX\MEM WB
lw         IF ID`EX  MEM WB

where the \` signifies an arrow. :)


---------------------
 1.  Needing to stall
---------------------

You may also need to fix things with stalling in conjunction with
forwarding.  For example,

lw      $1, 0($2)
addi    $4, $1, 4

will be a problem (the data used in $1 in the addi may not be
back from memory in the lw).

        1  2  3  4   5   6
lw      IF ID EX MEM\WB
add        IF ID    `EX  MEM WB

Because the data isn't back from MEM until the end of the 4th
cycle, but it's needed in EX in the beginning 4 (which is
impossible), we need to stall the pipeline at cycle 4 and forward
from MEM (4) into EX (5).

Note that the pipeline stalls all the way down... meaning that NO
stages are exected in cycle 4.  If there were more instructions,
the flow would look like this,

        1  2  3  4   5   6   7   8   9   10
lw      IF ID EX MEM\WB
addi       IF ID    `EX  MEM WB
nop           IF     ID  EX  MEM WB
nop                  IF  ID  EX  MEM WB
nop                      IF  ID  EX  MEM WB

with a big hole or "bubble" in cycle 4.

-------------------------
 3.  Not needing anything
-------------------------

Sometimes data dependencies work themselves out.  For example,

add     $1, $1, $0
lw      $2, 0($3)
sub     $5, $0, $4
xor     $3, $4, $0
addi    $3, $1, 4

there is a dependency between the add and the addi ($1), but if
you look at the timing for it,

        1  2  3  4   5   6   7   8   9
add     IF ID EX MEM WB
lw         IF ID EX  MEM WB
sub           IF ID  EX  MEM WB
xor              IF  ID  EX  MEM WB
addi                 IF  ID  EX  MEM WB

the data is already in the register bank in cycle 5, and it's
read out of the register bank in cycle 6!  So there is no need to
forward or stall.


======================
 A.  CONTROL HAZARDS
======================

A control hazard happens when you have a branch or jump, i.e.,
you are modifying the control flow of the program.  For example,
in a predict-not-taken (PNT) scheme where jumps are calculated
and exectuted in ID, you don't know it's a jump until the end of
ID... which means you've already fetched the next instruction.
In a branch, you don't know if it's taken until EX, so you have
fetched two instructions (and decoded one).  If you're wrong, you
need to flush the pipeline.

---------------------------------
  1.  Flushing the pipeline: jump
---------------------------------

A jump is like an unconditional branch.  You know the condition
and the target in ID; you can take the jump in ID.  If you have a
jump followed by another instruction (such as in a loop), you
will begin executing the wrong instruction.

j  address
add $0, $0, $0
.
.
address: sub $0, $0, $0

The jump starts its IF and ID.  During ID, the next instruction
(which is at PC+4 since the PC is updated in IF), add, is being
fetched.  However, at the end of ID we realize we don't want the
add instruction; instead, we want to jump to address.  So we must
flush the pipeline of the pesky add and fetch the sub.

j       IF ID EX MEM WB
add     >>>IF<<<< (flushed)
sub           IF ID  EX ...


-----------------------------------
  2.  Flushing the pipeline: branch
-----------------------------------

For a branch, the situation is similar, except the branch takes
longer to figure out if it is taken.  For a predict-not-taken
(PNT)  scheme, we start fetching the instructions at PC+4 away
from the branch (as opposed to the branch target address).  If
we're right, and the branch is not taken (BNT), we don't lose any
time.  However, if we're wrong and the branch is taken (BT), we
need to flush the pipeline of 2 wrong instructions.

For a not-taken branch in a PNT scheme,

BNT      IF ID EX MEM WB
nop(right)  IF ID EX  MEM WB
nop(right)     IF ID  EX  MEM WB

there is no problem.  For a taken branch in a PNT scheme,

BT       IF ID EX MEM WB
nop(wrong)  IF ID<<<< (flushed)
nop(wrong)     IF<<<< (flushed)
add(right)        IF  ID  EX...

we have to flush two instructions out of the pipeline.  This is
analogously similar to BT in a PT (predict-taken)  scheme, and a
BNT in a PT scheme, respectively.


  
  
  
  
https://classes.soe.ucsc.edu/cmpe110/Spring04/pipelining.txt

你可能感兴趣的:(pipline)