class5-pipeline-b-4.pdf

Overview
CS:APP Chapter 4
Computer Architecture
Make the pipelined processor work!
Pipelined
Implementation
Data Hazards
!
Instruction having register R as source follows shortly after
instruction having register R as destination
!
Common condition, don’t want to slow down pipeline
Control Hazards
Part II
!
Mispredict conditional branch
" Our design predicts all branches as being taken
Randal E. Bryant
" Naïve pipeline executes two extra instructions
!
Getting return address for ret instruction
" Naïve pipeline executes three extra instructions
Carnegie Mellon University
Making Sure It Really Works
http://csapp.cs.cmu.edu
!
What if multiple special cases happen simultaneously?
CS:APP
–2–
CS:APP
Write back
Pipeline Stages
W_icode, W_valM
W_valE, W_valM, W_dstE, W_dstM
W
PIPE- Hardware
W
icode
valE
Mem.
control
valM
Fetch
!
Memory
Read instruction
!
Compute incremented PC
!
Addr, Data
Select current PC
!
Data
Data
memory
memory
M_icode,
M_Bch,
M_valA
M
Bch
valE
CC
CC
Execute
!
Forward (Upward) Paths
!
E
Read program registers
valA, valB
Execute
!
!
!
valP
Values passed from one
stage to next
Cannot jump past
stages
icode, ifun,
rA, rB, valC
Instruction
Instruction
memory
memory
Fetch
through decode
valP
PC
PC
increment
increment
dstE dstM
Data
Data
memory
memory
data in
Addr
M_valA
M_Bch
M
icode
Bch
valE
valA
dstE dstM
e_Bch
ALU
A
Execute
E
icode
ALU
fun.
ALU
ALU
CC
CC
ifun
ALU
B
valC
valA
valB
dstE dstM srcA srcB
d_srcA d_srcB
Select
A
Decode
D
predPC
Fetch
f_PC
icode
d_rvalA
A
dstE dstM srcA srcB
W_valM
B
Register
Register M
file
file E
" e.g., valC passes
PC
–3–
!
B
Write back
D
Read or write data memory
Write Back
A
Register
Register M
file
file
E
Operate ALU
Memory
d_srcA,
d_srcB
Decode
write
Memory
ALU
ALU
aluA, aluB
Decode
Pipeline registers hold
intermediate values
from instruction
execution
valM
data out
read
ifun
rA
rB
Instruction
Instruction
memory
memory
valC
W_valE
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
Select
PC
F
F
Update register file
CS:APP
–4–
W_valM
predPC
CS:APP
Data Dependencies: 2 Nop’s
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
F
W
M
E
D
F
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
0x00e: addl %edx,%eax
0x010: halt
6
Data Dependencies: No Nop
7
8
9
10
# demo-h0.ys
1
2
3
4
5
6
7
8
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
F
W
M
E
D
W
M
E
W
M
W
0x006: irmovl
W
M
E
D
F
$3,%eax
0x00c: addl %edx,%eax
W
M
E
D
0x00e: halt
W
M
E
W
M
W
Cycle 4
M
Cycle 6
M_valE = 10
M_dstE = %edx
W
R[%eax] R[%eax]
!3
!3
E
e_valE ! 0 + 3 = 3
E_dstE = %eax
•
•
•
D
D
valA ! R[%edx] = 10
valB ! R[%eax] = 0
Error
CS:APP
–5–
Error
valA ! R[ %edx] = 0
valB ! R[%eax] = 0
CS:APP
–6–
Write back
Stalling for Data Dependencies
# demo-h2.ys
1
2
3
4
5
0x000: irmovl $10,%edx
F
D
F
E
D
F
M
E
D
F
W
M
E
D
0x006: irmovl
$3,%eax
0x00c: nop
0x00d: nop
6
0x010: halt
!
F
8
9
10
W
valE
valM
dstE
dstM
dstE
dstM
dstE
dstM srcA
dstE
dstM srcA
data out
11
read
write
Memory
Data
Data
memory
memory
data in
Addr
M_valA
M_Bch
W
M
E
D
F
!
W
M
E
D
F
W
M
E
D
W
M
E
W
M
Destination Registers
W
!
dstE and dstM fields
!
Instructions in execute,
memory, and write-back
stages
If instruction follows too closely after one that writes
register, slow it down
!
Hold instruction in decode
!
Dynamically inject nop into execute stage
srcA and srcB of current
instruction in decode
stage
Special Case
!
Don’t stall for register ID
8
M
register operand
CS:APP
–8–
icode
Bch
valE
valA
e_Bch
ALU
A
Execute
E
icode
ALU
fun.
ALU
ALU
CC
CC
ifun
ALU
B
valC
valA
valB
srcB
d_srcA d_srcB
Select
A
Decode
d_rvalA
A
srcB
W_valM
B
Register
RegisterM
file
file
W_valE
E
D
Fetch
icode
ifun
rA
rB
Instruction
Instruction
memory
memory
valC
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
Select
PC
" Indicates absence of
–7–
icode
Mem.
control
Source Registers
bubble
0x00e: addl %edx,%eax
7
Stall Condition
F
W_valM
predPC
CS:APP
Stalling X3
Detecting Stall Condition
# demo-h2.ys
0x000: irmovl $10,%edx
0x006: irmovl
$3,%eax
1
2
3
4
5
6
F
D
F
E
D
F
M
E
D
F
W
M
E
D
W
M
E
0x00c: nop
0x00d: nop
7
F
0x010: halt
9
10
0x006: irmovl
W
M
E
D
F
D
F
1
2
3
4
5
F
D
F
E
D
M
E
W
M
E
# demo-h0.ys
11
0x000: irmovl $10,%edx
bubble
0x00e: addl %edx,%eax
8
$3,%eax
bubble
W
M
E
D
bubble
6
W
M
E
bubble
W
M
E
W
M
0x00c: addl %edx,%eax
F
D
F
0x00e: halt
W
D
F
Cycle 6
D
F
7
8
9
10
11
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Cycle 6
W
W
Cycle 5
W_dstE = %eax
W_valE = 3
M
•
•
•
–9–
Cycle 4
M_dstE = %eax
E
•
•
•
D
E_dstE = %eax
D
srcA = %edx
srcB = %eax
W_dstE = %eax
D
CS:APP
srcA = %edx
srcB = %eax
– 10 –
srcA = %edx
srcB = %eax
•
•
•
D
srcA = %edx
srcB = %eax
CS:APP
Implementing Stalling
What Happens When Stalling?
W_dstM
W_dstE
# demo-h0.ys
Cycle 8
4
5
6
7
0x000: irmovl $10,%edx
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
0x00e: halt
Write Back
Memory
Execute
Decode
Fetch
W
0x000: irmovl
0x006:
bubble $10,%edx
$3,%eax
0x000: irmovl
0x006:
bubble $10,%edx
$3,%eax
M
0x00c: halt
0x00e:
addl %edx,%eax
Following instruction stays in fetch stage
!
Bubbles injected into execute stage
dstE dstM
icode
Bch
valE
valA
dstE dstM
E_dstE
E_bubble
E
!
valM
E_dstM
Pipe
control
logic
0x00e: halt
Stalling instruction held back in decode stage
valE
M_dstE
0x006: addl
0x00c:
irmovl
bubble
%edx,%eax
$3,%eax
!
icode
M_dstM
icode
ifun
valC
valA
valB
dstE dstM srcA
srcB
d_srcB
d_srcA
srcB
D_icode
" Like dynamically generated nop’s
" Move through later stages
D_stall
D
F_stall
F
srcA
icode
ifun
rA
rB
valC
valP
predPC
Pipeline Control
– 11 –
CS:APP
– 12 –
!
Combinational logic detects stall condition
!
Sets mode signals for how pipeline registers should update
CS:APP
Pipeline Register Modes
Input = y
Normal
Output = x
x
stall
=0
Output = x
x
stall
=1
"
Output = y
Output = x
x
stall
=0
"
"
Output = x
x
"
1
0x000: irmovl $10,%edx
$3,%eax
0x00c: nop
0x00d: nop
0x00e: addl %edx,%eax
Rising
clock
"
!
Forward as valB for
decode stage
!
Value generated in execute or memory stage
Output = nop
n
o
p
!
Pass value directly from generating instruction to decode
stage
!
Needs to be available at end of decode stage
F
2
D
F
3
E
D
F
4
M
E
D
F
CS:APP
– 14 –
Bypass Paths
5
6
7
8
9
10
W_valE, W_valM, W_dstE, W_dstM
W_valE
W_valM
W
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Forwarding logic selects
Memory
valA and valB
!
Normally from register
file
W
R[%eax] !3
W_dstE = %eax
W_valE = 3
m_valM
!
!
Cycle 6
Data
Data
memory
memory
M_icode,
M_Bch,
M_valA
Addr, Data
M_valE
M
Forwarding: get valA or
valB from later pipeline
Execute
stage
D
valA ! R[%edx] = 10
valB ! W_valE = 3
!
Execute: valE
!
Memory: valE, valM
!
Write back: valE, valM
e_valE
Bch
CC
CC
ALU
ALU
E_valA, E_valB,
E_srcA, E_srcB
Forwarding Sources
•
•
•
srcA = %edx
srcB = %eax
W_icode, W_valM
Decode Stage
W
M
E
D
F
0x010: halt
Destination value in
W pipeline register
Source operands read from register file in decode stage
bubble
=1
# demo-h2.ys
!
!
Trick
Data Forwarding Example
irmovl in writeback stage
Register isn’t written until completion of write-back stage
Observation
Rising
clock
CS:APP
0x006: irmovl
!
" Needs to be in register file at start of stage
– 13 –
!
Naï
Naïve Pipeline
y
bubble
=0
Input = y
Bubble
Rising
clock
bubble
=0
Input = y
Stall
"
Data Forwarding
E
valA, valB
Forward
d_srcA,
d_srcB
Decode
A
B
Register
Register M
file
file
E
Write back
– 15 –
CS:APP
D
– 16 –
valP
icode, ifun,
rA, rB, valC
Fetch
Instruction
Instruction
memory
memory
CS:APP
valP
PC
PC
increment
increment
predPC
Data Forwarding Example #2
1
# demo-h0.ys
0x000: irmovl $10,%edx
0x006: irmovl
$3,%eax
0x00c: addl %edx,%eax
F
2
D
F
3
4
E
D
F
M
E
D
F
0x00e: halt
5
W
M
E
D
Implementing
Forwarding
W_valE
Write back
W_valM
W
6
7
icode
valE
8
m_valM
Data
Data
memory
memory
write
Memory
W
M
dstE dstM
data out
read
Mem.
control
W
M
E
valM
W
M
icode
valE
valA
Value just generated
by ALU
!
Forward from execute
as valB
!
ALU
ALU
CC
CC
M
Execute
M_dstE = %edx
M_valE = 10
E
icode
ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
d_srcA d_srcB
Sel+Fwd
A
E_dstE = %eax
e_valE ! 0 + 3 = 3
Decode
D
srcA = %edx
srcB = %eax
dstE dstM srcA srcB
dstE dstM srcA srcB
E
Register %eax
Create logic blocks to
select from multiple
sources for valA and valB
in decode stage
dstE dstM
e_valE
Forward from memory
as valA
!
!
M_valA
M_valE
Bch
Cycle 4
Generated by ALU
during previous cycle
!
Add additional feedback
paths from E, M, and W
pipeline registers into
decode stage
Addr
M_Bch
e_Bch
Register %edx
!
data in
valA ! M_valE = 10
valB ! e_valE = 3
D
CS:APP
– 17 –
icode
Fwd
B
A
W_valM
B
Register
Register M
file
file E
ifun
– 18 –
rA
rB
valC
Instruction
Instruction
memory
memory
Fetch
W_valE
valP
PC
PC
increment
increment
Predict
PC
CS:APP
f_PC
M_valA
Select
PC
F
Implementing Forwarding
predPC
Limitation of Forwarding
# demo-luh.ys
W_valE
back
W_valM
icode
valE
valM
dstE dstM
data out
read
Mem.
control
Data
Data
memory
memory
write
ry
m_valM
data in
Addr
M_Bch
icode
M_valA
M_valE
Bch
valE
valA
dstE dstM
e_Bch
e_valE
ALU
ALU
CC
CC
ALU
A
te
icode
ifun
valC
ALU
fun.
ALU
B
valA
valB
dstE dstM srcA srcB
d_srcA d_srcB
dstE dstM srcA srcB
Sel+Fwd
A
de
Fwd
B
A
ifun
rA
rB
Instruction
Instruction
memory
memory
–valC
19 –
Predict
PC
f_PC
M_valA
Select
PC
W_valM
1
2
Load-use dependency
!
!
3
4
irmovl $128,%edx
F
D
E M
irmovl $3,%ecx
F
D
E
rmmovl %ecx, 0(%edx)
F
D
irmovl $10,%ebx
F
mrmovl 0(%edx),%eax # Load %eax
addl %ebx,%eax # Use %eax
halt
Value needed by end of
decode stage in cycle 7
Value read from memory in
memory stage of cycle 8
5
6
W
M
E
D
F
W
M
E
D
F
7
8
9
10
11
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
Cycle 7
Cycle 8
M
M_dstE = %ebx
M_valE = 10
M
M_dstM = %eax
m_valM ! M[128] = 3
•
•
•
D
valA ! M_valE = 10
valB ! R[%eax] = 0
CS:APP
valP
PC
PC
increment
increment
0x000:
0x006:
0x00c:
0x012:
0x018:
0x01e:
0x020:
W_valE
E
icode
## What should be the A value?
int new_E_valA = [
# Use incremented PC
D_icode in { ICALL, IJXX } : D_valP;
# Forward valE from execute
d_srcA == E_dstE : e_valE;
# Forward valM from memory
d_srcA == M_dstM : m_valM;
# Forward valE from memory
d_srcA == M_dstE : M_valE;
# Forward valM from write back
d_srcA == W_dstM : W_valM;
# Forward valE from write back
d_srcA == W_dstE : W_valE;
# Use value read from register file
1 : d_rvalA;
];
W_valM
B
Register
Register M
file
file
W_valM
– 20 –
Error
CS:APP
W_valE
Write back
W_valM
W
icode
valE
valM
dstE
dstM
data out
Avoiding Load/Use Hazard
# demo-luh.ys
0x000:
0x006:
0x00c:
0x012:
0x018:
irmovl
irmovl
rmmovl
irmovl
mrmovl
1
2
3
$128,%edx
F
D
$3,%ecx
F
%ecx, 0(%edx)
$10,%ebx
0(%edx),%eax # Load
E
4
5
6
Stall using instruction for
one cycle
!
Can then pick up loaded
value by forwarding from
memory stage
Detecting Load/Use Hazard
W
E
D
F
%eax
M
E
D
W
M
E
W
M
F
D
E
F
D
F
9
10
11
12
M_Bch
W
M
E
D
F
W
M
E
D
Execute
W
M
E
E
W
M
icode
ifun
ALU
A
ALU
B
valC
valA
Decode
D
0x000:
0x006:
0x00c:
0x012:
0x018:
irmovl $128,%edx
F
D
E M
irmovl $3,%ecx
F
D
E
rmmovl %ecx, 0(%edx)
F
D
irmovl $10,%ebx
F
mrmovl 0(%edx),%eax # Load %eax
bubble
0x01e: addl %ebx,%eax # Use %eax
0x020: halt
!
Stall instructions in fetch
and decode stages
!
Inject bubble into execute
stage
5
6
7
W
M
E
D
F
W
M
E
D
W
M
E
F
icode
ifun
Fetch
Load/Use Hazard
8
9
F
CS:APP
10
dstM
srcA
srcB
11
12
rA
rB
valC
Instruction
Instruction
memory
memory
Trigger
dstM
srcA
srcB
Fwd
B
A
Condition
valA ! W_valE = 10
valB ! m_valM = 3
dstE
dstE
Sel +Fwd
A
M
4
valB
W
W
3
dstM
ALU
fun.
W_dstE = %ebx
W_valE = 10
2
dstE
d_srcA d_srcB
Control for Load/Use Hazard
– 23 –
valA
ALU
ALU
Cycle 8
– 21 –
Load/Use Hazard
M_valA
valE
e_valE
D
Condition
Bch
e_Bch
•
•
•
1
icode
M_valE
CC
CC
M_dstM = %eax
m_valM ! M[128] = 3
# demo-luh.ys
data in
Addr
8
M
D
F
m_valM
Data
Data
memory
memory
write
Memory
M
bubble
0x01e: addl %ebx,%eax # Use %eax
0x020: halt
!
7
read
Mem.
control
W_valM
B
Register
RegisterM
file
file E
W_valE
valP
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
E_icode
E_icode in { IMRMOVL, IPOPL } &&
E_dstM
E_dstM in { d_srcA
d_srcA,, d_srcB
d_srcB }
Select
PC
W_valM
predPC
CS:APP
– 22 –
Branch Misprediction Example
demo-j.ys
D
F
W
M
E
D
F
W
M
E
D
W
M
E
W
M
W
0x000:
xorl %eax,%eax
0x002:
jne t
0x007:
irmovl $1, %eax
0x00d:
nop
0x00e:
nop
0x00f:
nop
0x010:
halt
0x011: t: irmovl $3, %edx
0x017:
irmovl $4, %ecx
0x01d:
irmovl $5, %edx
!
F
D
E
M
W
stall
stall
bubble
normal
normal
CS:APP
– 24 –
# Not taken
# Fall through
# Target (Should not execute)
# Should not execute
# Should not execute
Should only execute first 8 instructions
CS:APP
W_valE
Write back
W_valM
W
Handling Misprediction
# demo-j.ys
1
2
3
4
5
0x000:
F
D
F
E
D
M
E
W
M
F
D
0x002:
xorl %eax,%eax
valE
jne target # Not taken
bubble
irmovl $3,%ebx # Target+1
0x007:
irmovl $1,%eax # Fall through
0x00d:
nop
dstE
6
7
8
9
10
m_valM
Data
Data
memory
memory
write
data in
Addr
M_Bch
W
M
icode
M_valA
M_valE
Bch
valE
valA
dstE
E
M
W
D
F
E
D
F
M
E
D
e_valE
ALU
ALU
CC
CC
W
M
E
Execute
W
M
E
icode
ifun
ALU
fun.
ALU
A
ALU
B
valC
valA
valB
dstE
W
Fetch 2 instructions at target
On following cycle, replace instructions in execute and
decode by bubbles
D
Fetch
No side effects have occurred yet
srcA
srcB
A
W_valM
B
RegisterM
W_valE
Control for Misprediction
1
2
3
4
5
0x000:
xorl %eax,%eax
F
0x002:
jne target # Not taken
D
F
E
D
F
M
E
D
W
M
W
E
M
W
D
F
E
D
F
M
E
D
irmovl $3,%ebx # Target+1
6
7
8
9
0x007:
irmovl $1,%eax # Fall through
0x00d:
nop
rB
valC
valP
Instruction
Instruction
memory
memory
PC
PC
increment
increment
Predict
PC
Select
PC
W
M
E
W
M
CS:APP
W_valM
predPC
demo-retb.ys
10
F
bubble
rA
Return Example
# demo-j.ys
bubble
ifun
M_valA
– 26 –
F
0x011: t: irmovl $2,%edx # Target
icode
f_PC
CS:APP
– 25 –
– 27 –
dstM
E
!
Mispredicted Branch normal
srcB
Register
Mispredicted Branch E_icode
E_icode = IJXX
file
file & !e_Bch
Detect branch not-taken in execute stage
F
srcA
Fwd
B
Trigger
Decode
!
Condition
Sel +Fwd
A
Condition
Cancel when mispredicted
0x017:
dstM
d_srcA d_srcB
dstE
!
dstM
e_Bch
Predict branch as taken
!
dstM
data out
read
Mem.
control
F
bubble
valM
Detecting Mispredicted Branch
Memory
0x011: t: irmovl $2,%edx # Target
0x017:
icode
W
D
E
M
W
bubble
bubble
normal
normal
CS:APP
0x000:
0x006:
0x00b:
0x011:
0x020:
0x020:
0x026:
0x027:
0x02d:
0x033:
0x039:
0x100:
0x100:
!
– 28 –
irmovl Stack,%esp
call p
irmovl $5,%esi
halt
.pos 0x20
p: irmovl $-1,%edi
ret
irmovl $1,%eax
irmovl $2,%ecx
irmovl $3,%edx
irmovl $4,%ebx
.pos 0x100
Stack:
# Initialize stack pointer
# Procedure call
# Return point
# procedure
#
#
#
#
Should
Should
Should
Should
not
not
not
not
be
be
be
be
executed
executed
executed
executed
# Stack: Stack pointer
Previously executed three additional instructions
CS:APP
Correct Return Example
Detecting Return
# demo-retb
0x026:
M_Bch
ret
F
D
F
bubble
bubble
E
D
F
bubble
0x00b:
M
E
D
F
irmovl $5,%esi # Return
W
M
E
D
F
M
W
M
E
D
icode
M_valE
Bch
valE
valA
dstE dstM
e_Bch
e_valE
W
M
E
ALU
ALU
CC
CC
W
M
ALU
A
ALU
B
valC
valA
Execute
W
E
icode
ifun
ALU
fun.
valB
dstE dstM srcA
srcB
d_srcA d_srcB
dstE dstM srcA
!
As ret passes through
pipeline, stall at fetch stage
W
Sel+Fwd
A
Decode
•
•
•
memory stage
!
Inject bubble into decode
stage
D
valC ! 5
rB ! %esi
CS:APP
– 29 –
Control for Return
F
D
F
bubble
bubble
bubble
0x00b:
E
D
F
M
E
D
F
irmovl $5,%esi # Return
Condition
Processing ret
W_valM
B
Register
Register M
file
file E
ifun
rA
rB
valC
W_valE
valP
Condition
Trigger
Processing ret
IRET in { D_icode
D_icode,, E_icode
E_icode,, M_icode
M_icode }
CS:APP
– 30 –
Special Control Cases
# demo-retb
ret
icode
A
F
Release stall when reach
write-back stage
0x026:
Fwd
B
valM = 0x0b
" While in decode, execute, and
!
srcB
W
M
E
D
F
Detection
W
M
E
D
W
M
E
W
M
W
F
D
E
M
W
stall
bubble
normal
normal
normal
Condition
Trigger
Processing ret
IRET in { D_icode
D_icode,, E_icode
E_icode,, M_icode
M_icode }
Load/Use Hazard
E_icode
E_icode in { IMRMOVL, IPOPL } &&
E_dstM
E_dstM in { d_srcA
d_srcA,, d_srcB
d_srcB }
Mispredicted Branch E_icode
E_icode = IJXX & !e_Bch
Action (on next cycle)
Condition
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
bubble
bubble
normal
normal
Mispredicted Branch normal
– 31 –
CS:APP
– 32 –
CS:APP
Implementing Pipeline Control
W
icode
valE
valM
dstE dstM
valE
valA
dstE dstM
Initial Version of Pipeline Control
bool F_stall =
# Conditions for a load/use hazard
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||
# Stalling at fetch while ret passes through pipeline
IRET in { D_icode, E_icode, M_icode };
M_icode
M
icode
Bch
e_Bch
bool D_stall =
# Conditions for a load/use hazard
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };
CC
CC
E_dstM
E_icode
Pipe
control
logic
E_bubble
E
icode
ifun
valC
valA
valB
dstE dstM srcA
srcB
d_srcB
d_srcA
srcB
D_icode
srcA
D_bubble
D_stall
D
icode
ifun
rA
rB
valC
valP
bool D_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Bch) ||
Write
back
# Stalling at fetch while ret
passes
through pipeline
IRET in { D_icode, E_icode, WM_icode
};
icode
valE
valM
dstE
dstM
data out
Data
Mem.
Data
bool E_bubble =
control
memory
memory
# Mispredicted branch
Memory
Addr
(E_icode == IJXX && !e_Bch) ||
# Load/use hazard
Bch
valA
dstE dstM
E_icode in { IMRMOVL, IPOPL }M &&icodeE_dstM
invalE{ d_srcA,
d_srcB};
read
F_stall
write
F
predPC
data in
M_valA
M_Bch
!
Combinational logic generates pipeline control signals
!
Action occurs at start of following cycle
e_Bch
CS:APP
– 33 –
– 34 –
ALU
A
Execute
E
icode
ALU
fun.
ALU
ALU
CC
CC
ifun
CS:APP
ALU
B
valC
valA
valB
dstE
dstM srcA
dstE
dstM srcA
srcB
d_srcA d_srcB
Select
A
Decode
Control Combinations
Load/use
M
E
D
M
Load
Use
E
D
D
ret 2
M
M
E
ret
D
Combination A
E
D
JXX
ret
bubble
ret 3
M
ret
E
D
bubble
bubble
Combination B
!
!
Mispredict
M
E
D
Not-taken branch
ret instruction at branch target
– 35 –
Instruction that reads from memory to %esp
!
Followed by ret instruction
CS:APP
ifun
rA
rB
Fetch
W_valE
E
valC
valP
Instruction
Instruction
memory
memory
PC
PC
increment
increment
Predict
PC
f_PC
M_valA
Select
PC
F
W_valM
predPC
F
D
E
M
W
stall
bubble
normal
normal
normal
Mispredicted Branch normal
bubble
bubble
normal
normal
Combination
bubble
bubble
normal
normal
Combination B
!
M
E
ret
D
Combination A
Processing ret
Special cases that can arise on same clock cycle
icode
srcB
W_valM
B
Register
RegisterM
file
file
ret 1
JXX
Condition
Combination A
!
A
Control Combination A
ret 1
Mispredict
d_rvalA
– 36 –
stall
!
Should handle as mispredicted branch
!
Stalls F pipeline register
!
But PC selection logic will be using M_valM anyhow
CS:APP
Control Combination B
ret 1
Load/use
M
E
D
Handling Control Combination B
M
E
D
Load
Use
M
E
D
ret
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
Combination
stall
bubble + bubble
stall
normal
normal
F
D
E
M
W
Processing ret
stall
bubble
normal
normal
normal
Load/Use Hazard
stall
stall
bubble
normal
normal
Combination
stall
stall
bubble
normal
normal
Would attempt to bubble and stall pipeline register D
!
!
Signaled by processor as pipeline error
!
CS:APP
– 37 –
Corrected Pipeline Control Logic
bool D_bubble =
# Mispredicted branch
(E_icode == IJXX && !e_Bch) ||
# Stalling at fetch while ret passes
IRET in { D_icode, E_icode, M_icode
# but not condition for a load/use
&& !(E_icode in { IMRMOVL, IPOPL }
&& E_dstM in { d_srcA, d_srcB
Pipeline Summary
!
Most handled by forwarding
" No performance penalty
!
});
Load/use hazard requires one cycle stall
Control Hazards
F
D
E
M
W
stall
bubble
normal
normal
normal
stall
bubble
normal
normal
Combination
stall
stall
bubble
normal
normal
!
CS:APP
– 38 –
through pipeline
}
hazard
stall
!
Load/use hazard should get priority
ret instruction should be held in decode stage for additional
cycle
Data Hazards
Load/Use Hazard
– 39 –
Condition
!
Processing ret
ret
Combination B
F
Condition
M
E
D
Load
Use
Combination B
Condition
ret 1
Load/use
Cancel instructions when detect mispredicted branch
!
Stall fetch stage while ret passes through pipeline
" Two clock cycles wasted
" Three clock cycles wasted
Control Combinations
Load/use hazard should get priority
ret instruction should be held in decode stage for additional
cycle
CS:APP
!
!
Must analyze carefully
!
First version had subtle bug
" Only arises with unusual instruction combination
– 40 –
CS:APP