Home | Projects | Notes > Computer Architecture & Organization > Pipeline Hazards
A pipeline hazard occurs when the pipeline, or some portion of the pipeline, must stall because conditions do not permit continued execution. Such a pipeline stall is also referred to as pipeline bubble.
Three types of hazards:
Structural hazard (or resource hazard)
Data hazard (or pipeline data hazard)
Control hazard (or branch hazard)
A structural hazard occurs when two (or more) instructions that are already in the pipeline need the same resource simultaneously. The result is that the instructions must be executed in serial rather than parallel for a portion of the pipeline.
For example, memory can be a structural hazard if two instructions attempt to access it at the same time and it cannot grant simultaneous access.
Another example is when multiple instructions are ready to enter the execute instruction phase and there is a single ALU. One solution to such structural hazards is to increase available resources, such as having multiple ports into main memory and multiple ALU units.
Structural hazards have to be taken care of in the design time.
A data hazard occurs when the processing of one instruction depends on the data created by a previous instruction that is still in the pipeline. To maintain correct operation, the pipeline must stall for one or more clocks cycles which results in inefficient pipeline usage.
To prevent data hazards, special hardware is used to forward or bypass the stages to provide the results sooner.
Three types of data hazards:
Read after write (RAW), or true data dependency
An instruction modifies a register or memory location and a succeeding instruction reads the data in that memory or register location. A hazard occurs if the read takes place before the write operation is complete.
For example:
xxxxxxxxxx
21add r0, r1, r2
2sub r4, r0, r5 @ r0 will not be ready in time for this calculation.
Write after read (WAR), or anti data dependency
An instruction reads a register or memory location and a succeeding instruction writes to the location. A hazard occurs if the write operation completes before the read operation takes place.
For example:
xxxxxxxxxx
21add r1, r2, r3
2sub r2, r4, r5
WAR hazard cannot occur in most systems. If the instructions are reordered (e.g., to delay branches) then this presents a problem!
Write after write (WAW), or output dependency
Two instructions both write to the same location. A hazard occurs if the write operations take place in the reverse order of the intended sequence.
For example:
xxxxxxxxxx
21add r1, r2, r3
2sub r1, r4, r5
WAW hazard is very unlikely to happen in a single core processor. It is more of a multi-core processor issue.
Can happen with superscalar processors that perform out-of-order execution. In “Operating Systems” this is also called a race condition. When the value of a variable is dependent on the order of execution of the instructions and the last instruction to execute wins by having its value stored.
A control hazard occurs when a branch is taken and all the partially executed instructions in the pipeline have to be thrown away (or flushed, squashed). This happens because when a branch instruction (to be taken) is fetched, its target address is not known until it is in the execute stage in the pipeline. During that time (i.e., IF ~ EX), the subsequent instructions are pushed into the pipeline only to be wasted.
Data dependency arises when the outcome of the current operation is dependent on the result of a previous instruction that has not yet been executed to completion.
The most important form of data hazard is RAW, where a read operation takes place after a write.
Two solutions to mitigate the aspect of data hazard:
Bubbles (primitive)
Internal data forwarding (advanced)
Consider,
xxxxxxxxxx
61@ X = (A + B) AND (A + B - C)
2@ assume A, B, C, X and two temporary values, T1 and T2, are in registers
3
4add T1, A, B @ [T1] ← [A] + [B] ; T1 will not be available immediately
5sub T2, T1, C @ [T2] ← [T1] - [C] ; uses T1 before the instruction 1 has committed it to memory
6and X, T1, T2 @ [X] ← [T1] · [T2]; uses T2 before the instruction 1 has committed it to memory
How to fix this?
Bubbles can be introduced into the pipeline while the instructions
However, this is not what pipelines are for! This can be improved in the hardware by using internal data forwarding.
Internal data forwarding is a mechanism to reduces the stalls due to data dependency, it uses hardware technique to forward the result of interstage buffer register (IBR) to next instruction’s buffer register.
Consider,
xxxxxxxxxx
51add r3, r1, r2 @ [r3] ← [r1] + [r2]
2add r6, r4, r5 @ [r6] ← [r4] + [r5]
3add r9, r1, r2 @ [r9] ← [r1] + [r2]
4add r7, r3, r4 @ [r7] ← [r3] + [r4] ; RAW harzard because r7 is reuqired by the next instruction
5add r8, r1, r7 @ [r8] ← [r1] + [r7]
Instruction 4 generates a destination operand r7
that is required as a source operand by the next instruction. If the processor were to read the source operand requested by instruction 5 from the register file, it would see the old value of r7
.
How to fix this?
By means of internal data forwarding the processor transfers R7
from instruction 4’s execution unit directly to the execution unit of instruction 5. (When the issue is detected the internal forwarding hardware provides the right data to the instruction.)
It is known that branches take up 5 ~ 30% of the instructions in any program. (e.g., Unconditional branches, conditional branches, indirect branches, procedure calls, procedure returns). So, it is very important that they are correct and efficient.
The following are some of the solutions that have been proposed for mitigating aspects of control hazards:
Pipeline stall
Delayed branch
Branch prediction
Freeze the pipeline until the branch outcome and target are known, then proceed with fetch. Thus, every branch instruction incurs a penalty equal to the number of stall cycles. This solution is unsatisfactory if the instruction mix contains many branch instructions, and/or the pipeline is very deep.
Delayed branch is reordering of the instructions in such a way that the instruction immediately following a branch is always executed. This avoids stalling the pipeline while the branch condition is evaluated, thus keeping the pipeline full and minimizing the effect of conditional branches on processor performance.
xxxxxxxxxx
51@ original code
2add r3, r2, r1
3add r5, r4, r6 @ not dependent on the following branch, will be executed 100% anyways
4b N
5add r7, r8, r9
xxxxxxxxxx
51@ reordered code
2add r3, r2, r1
3b N
4add r5, r4, r6 @ gets executed in the "branch delay slot"
5add r7, r4, r6
Sections in the code to look for the instructions to fill the branch delay slot:
Before the branch instruction (Best choice, always improves performance)
Target address (Only meaningful when branch taken)
Fall through (Only meaningful when not taken)
Unfortunately, it's not always possible to reorder the code in such a way that the instruction immediately following a branch is always executed. It is said that 70% of the time, an instruction is found to fill the branch delay slot. If the compiler fails to find such an instruction, it must introduce a no operation (NOP
) instruction after the branch and accept the inevitability of a stall (or bubble).
xxxxxxxxxx
51add r3, r2, r1
2add r5, r4, r6
3b N
4NOP @ no operation ("branch delay slot")
5add r7, r8, r9
This really slows down the pipeline. The more stages the pipeline has the more NOP
s that have to be added and the bigger the the slow-down in the pipeline.
Branch prediction is an approach to computer architecture that attempts to mitigate the costs of branching. The processor looks ahead in the instruction code fetched from memory and takes an educated guess if the branch will or will not be taken.
Two types of branch prediction:
Static branch prediction
Dynamic branch prediction (Branch guess are determined at runtime and will change over time)
Prediction mechanisms works well with unconditional branches like:
xxxxxxxxxx
31 B Loop @ unconditional branch; this branch will be taken 100%
2
3Loop:
Conditional branches pose a problem:
xxxxxxxxxx
41bne Loop @ conditional branch; CCR flags setting affects the subsequent instructions
2...
3
4Loop:
Example 1
xxxxxxxxxx
101Instruction R0 R1 R2 R3 R4 R5 R6 R7 R8
2================== == == == == == == == == ==
3I1: ADD R0, R1, R2 W R R
4I2: ADD R3, R1, R2 R R W
5I3: ADD R4, R1, R3 R R W
6I4: ADD R5, R2, R3 R R W
7I5: ADD R6, R1, R0 R R W
8I6: ADD R7, R7, R2 R R
9 W
10I7: ADD R8, R6, R0 R R W
RAW
R
: I I , I R
: I I , I R
: I I WAR
R
: I I WAW
None
Example 2
xxxxxxxxxx
71Instruction R0 R1 R2 R3 R4 R5 R6 R7 R8
2================== == == == == == == == == ==
3I1: ADD R2, R0, R1 R R W
4I2: ADD R5, R3, R4 R R W
5I3: SUB R8, R0, R1 R R W
6I4: ADD R6, R2, R3 R R W
7I5: ADD R7, R0, R6 R R W
RAW
R
: I I R
: I I WAR
None
WAW
None
Example 3
xxxxxxxxxx
71Instruction R0 R1 R2 R3 R4 R5 R6 X
2================== == == == == == == == ==
3I1: ADD R5, R0, R1 R R W
4I2: MUL R6, R2, R5 R R W
5I3: SUB R5, R3, R6 R W R
6I4: DIV R6, R5, R4 R R W
7I5: STR R7, R0, R6 R W
RAW
R
: I I , and I I R
: I I , and I I WAR
R
: I I R
: I I WAW
R
: I I R
: I I