Overview CS:APP Chapter 4 Computer Architecture Make the pipelined processor work! Pipelined Implementation Data Hazards ! Instruction having register R as source follows shortly after instruction having register R as destination ! Common condition, don’t want to slow down pipeline Control Hazards Part II ! Mispredict conditional branch " Our design predicts all branches as being taken Randal E. Bryant " Naïve pipeline executes two extra instructions ! Getting return address for ret instruction " Naïve pipeline executes three extra instructions Carnegie Mellon University Making Sure It Really Works http://csapp.cs.cmu.edu ! What if multiple special cases happen simultaneously? CS:APP –2– CS:APP Write back Pipeline Stages W_icode, W_valM W_valE, W_valM, W_dstE, W_dstM W PIPE- Hardware W icode valE Mem. control valM Fetch ! Memory Read instruction ! Compute incremented PC ! Addr, Data Select current PC ! Data Data memory memory M_icode, M_Bch, M_valA M Bch valE CC CC Execute ! Forward (Upward) Paths ! E Read program registers valA, valB Execute ! ! ! valP Values passed from one stage to next Cannot jump past stages icode, ifun, rA, rB, valC Instruction Instruction memory memory Fetch through decode valP PC PC increment increment dstE dstM Data Data memory memory data in Addr M_valA M_Bch M icode Bch valE valA dstE dstM e_Bch ALU A Execute E icode ALU fun. ALU ALU CC CC ifun ALU B valC valA valB dstE dstM srcA srcB d_srcA d_srcB Select A Decode D predPC Fetch f_PC icode d_rvalA A dstE dstM srcA srcB W_valM B Register Register M file file E " e.g., valC passes PC –3– ! B Write back D Read or write data memory Write Back A Register Register M file file E Operate ALU Memory d_srcA, d_srcB Decode write Memory ALU ALU aluA, aluB Decode Pipeline registers hold intermediate values from instruction execution valM data out read ifun rA rB Instruction Instruction memory memory valC W_valE valP PC PC increment increment Predict PC f_PC M_valA Select PC F F Update register file CS:APP –4– W_valM predPC CS:APP Data Dependencies: 2 Nop’s # demo-h2.ys 1 2 3 4 5 0x000: irmovl $10,%edx F D F E D F M E D F W M E D F 0x006: irmovl $3,%eax 0x00c: nop 0x00d: nop 0x00e: addl %edx,%eax 0x010: halt 6 Data Dependencies: No Nop 7 8 9 10 # demo-h0.ys 1 2 3 4 5 6 7 8 0x000: irmovl $10,%edx F D F E D F M E D F W M E D W M E W M W 0x006: irmovl W M E D F $3,%eax 0x00c: addl %edx,%eax W M E D 0x00e: halt W M E W M W Cycle 4 M Cycle 6 M_valE = 10 M_dstE = %edx W R[%eax] R[%eax] !3 !3 E e_valE ! 0 + 3 = 3 E_dstE = %eax • • • D D valA ! R[%edx] = 10 valB ! R[%eax] = 0 Error CS:APP –5– Error valA ! R[ %edx] = 0 valB ! R[%eax] = 0 CS:APP –6– Write back Stalling for Data Dependencies # demo-h2.ys 1 2 3 4 5 0x000: irmovl $10,%edx F D F E D F M E D F W M E D 0x006: irmovl $3,%eax 0x00c: nop 0x00d: nop 6 0x010: halt ! F 8 9 10 W valE valM dstE dstM dstE dstM dstE dstM srcA dstE dstM srcA data out 11 read write Memory Data Data memory memory data in Addr M_valA M_Bch W M E D F ! W M E D F W M E D W M E W M Destination Registers W ! dstE and dstM fields ! Instructions in execute, memory, and write-back stages If instruction follows too closely after one that writes register, slow it down ! Hold instruction in decode ! Dynamically inject nop into execute stage srcA and srcB of current instruction in decode stage Special Case ! Don’t stall for register ID 8 M register operand CS:APP –8– icode Bch valE valA e_Bch ALU A Execute E icode ALU fun. ALU ALU CC CC ifun ALU B valC valA valB srcB d_srcA d_srcB Select A Decode d_rvalA A srcB W_valM B Register RegisterM file file W_valE E D Fetch icode ifun rA rB Instruction Instruction memory memory valC valP PC PC increment increment Predict PC f_PC M_valA Select PC " Indicates absence of –7– icode Mem. control Source Registers bubble 0x00e: addl %edx,%eax 7 Stall Condition F W_valM predPC CS:APP Stalling X3 Detecting Stall Condition # demo-h2.ys 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax 1 2 3 4 5 6 F D F E D F M E D F W M E D W M E 0x00c: nop 0x00d: nop 7 F 0x010: halt 9 10 0x006: irmovl W M E D F D F 1 2 3 4 5 F D F E D M E W M E # demo-h0.ys 11 0x000: irmovl $10,%edx bubble 0x00e: addl %edx,%eax 8 $3,%eax bubble W M E D bubble 6 W M E bubble W M E W M 0x00c: addl %edx,%eax F D F 0x00e: halt W D F Cycle 6 D F 7 8 9 10 11 W M E D F W M E D W M E W M W Cycle 6 W W Cycle 5 W_dstE = %eax W_valE = 3 M • • • –9– Cycle 4 M_dstE = %eax E • • • D E_dstE = %eax D srcA = %edx srcB = %eax W_dstE = %eax D CS:APP srcA = %edx srcB = %eax – 10 – srcA = %edx srcB = %eax • • • D srcA = %edx srcB = %eax CS:APP Implementing Stalling What Happens When Stalling? W_dstM W_dstE # demo-h0.ys Cycle 8 4 5 6 7 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax 0x00c: addl %edx,%eax 0x00e: halt Write Back Memory Execute Decode Fetch W 0x000: irmovl 0x006: bubble $10,%edx $3,%eax 0x000: irmovl 0x006: bubble $10,%edx $3,%eax M 0x00c: halt 0x00e: addl %edx,%eax Following instruction stays in fetch stage ! Bubbles injected into execute stage dstE dstM icode Bch valE valA dstE dstM E_dstE E_bubble E ! valM E_dstM Pipe control logic 0x00e: halt Stalling instruction held back in decode stage valE M_dstE 0x006: addl 0x00c: irmovl bubble %edx,%eax $3,%eax ! icode M_dstM icode ifun valC valA valB dstE dstM srcA srcB d_srcB d_srcA srcB D_icode " Like dynamically generated nop’s " Move through later stages D_stall D F_stall F srcA icode ifun rA rB valC valP predPC Pipeline Control – 11 – CS:APP – 12 – ! Combinational logic detects stall condition ! Sets mode signals for how pipeline registers should update CS:APP Pipeline Register Modes Input = y Normal Output = x x stall =0 Output = x x stall =1 " Output = y Output = x x stall =0 " " Output = x x " 1 0x000: irmovl $10,%edx $3,%eax 0x00c: nop 0x00d: nop 0x00e: addl %edx,%eax Rising clock " ! Forward as valB for decode stage ! Value generated in execute or memory stage Output = nop n o p ! Pass value directly from generating instruction to decode stage ! Needs to be available at end of decode stage F 2 D F 3 E D F 4 M E D F CS:APP – 14 – Bypass Paths 5 6 7 8 9 10 W_valE, W_valM, W_dstE, W_dstM W_valE W_valM W W M E D F W M E D W M E W M W Forwarding logic selects Memory valA and valB ! Normally from register file W R[%eax] !3 W_dstE = %eax W_valE = 3 m_valM ! ! Cycle 6 Data Data memory memory M_icode, M_Bch, M_valA Addr, Data M_valE M Forwarding: get valA or valB from later pipeline Execute stage D valA ! R[%edx] = 10 valB ! W_valE = 3 ! Execute: valE ! Memory: valE, valM ! Write back: valE, valM e_valE Bch CC CC ALU ALU E_valA, E_valB, E_srcA, E_srcB Forwarding Sources • • • srcA = %edx srcB = %eax W_icode, W_valM Decode Stage W M E D F 0x010: halt Destination value in W pipeline register Source operands read from register file in decode stage bubble =1 # demo-h2.ys ! ! Trick Data Forwarding Example irmovl in writeback stage Register isn’t written until completion of write-back stage Observation Rising clock CS:APP 0x006: irmovl ! " Needs to be in register file at start of stage – 13 – ! Naï Naïve Pipeline y bubble =0 Input = y Bubble Rising clock bubble =0 Input = y Stall " Data Forwarding E valA, valB Forward d_srcA, d_srcB Decode A B Register Register M file file E Write back – 15 – CS:APP D – 16 – valP icode, ifun, rA, rB, valC Fetch Instruction Instruction memory memory CS:APP valP PC PC increment increment predPC Data Forwarding Example #2 1 # demo-h0.ys 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax 0x00c: addl %edx,%eax F 2 D F 3 4 E D F M E D F 0x00e: halt 5 W M E D Implementing Forwarding W_valE Write back W_valM W 6 7 icode valE 8 m_valM Data Data memory memory write Memory W M dstE dstM data out read Mem. control W M E valM W M icode valE valA Value just generated by ALU ! Forward from execute as valB ! ALU ALU CC CC M Execute M_dstE = %edx M_valE = 10 E icode ifun ALU fun. ALU A ALU B valC valA valB d_srcA d_srcB Sel+Fwd A E_dstE = %eax e_valE ! 0 + 3 = 3 Decode D srcA = %edx srcB = %eax dstE dstM srcA srcB dstE dstM srcA srcB E Register %eax Create logic blocks to select from multiple sources for valA and valB in decode stage dstE dstM e_valE Forward from memory as valA ! ! M_valA M_valE Bch Cycle 4 Generated by ALU during previous cycle ! Add additional feedback paths from E, M, and W pipeline registers into decode stage Addr M_Bch e_Bch Register %edx ! data in valA ! M_valE = 10 valB ! e_valE = 3 D CS:APP – 17 – icode Fwd B A W_valM B Register Register M file file E ifun – 18 – rA rB valC Instruction Instruction memory memory Fetch W_valE valP PC PC increment increment Predict PC CS:APP f_PC M_valA Select PC F Implementing Forwarding predPC Limitation of Forwarding # demo-luh.ys W_valE back W_valM icode valE valM dstE dstM data out read Mem. control Data Data memory memory write ry m_valM data in Addr M_Bch icode M_valA M_valE Bch valE valA dstE dstM e_Bch e_valE ALU ALU CC CC ALU A te icode ifun valC ALU fun. ALU B valA valB dstE dstM srcA srcB d_srcA d_srcB dstE dstM srcA srcB Sel+Fwd A de Fwd B A ifun rA rB Instruction Instruction memory memory –valC 19 – Predict PC f_PC M_valA Select PC W_valM 1 2 Load-use dependency ! ! 3 4 irmovl $128,%edx F D E M irmovl $3,%ecx F D E rmmovl %ecx, 0(%edx) F D irmovl $10,%ebx F mrmovl 0(%edx),%eax # Load %eax addl %ebx,%eax # Use %eax halt Value needed by end of decode stage in cycle 7 Value read from memory in memory stage of cycle 8 5 6 W M E D F W M E D F 7 8 9 10 11 W M E D F W M E D W M E W M W Cycle 7 Cycle 8 M M_dstE = %ebx M_valE = 10 M M_dstM = %eax m_valM ! M[128] = 3 • • • D valA ! M_valE = 10 valB ! R[%eax] = 0 CS:APP valP PC PC increment increment 0x000: 0x006: 0x00c: 0x012: 0x018: 0x01e: 0x020: W_valE E icode ## What should be the A value? int new_E_valA = [ # Use incremented PC D_icode in { ICALL, IJXX } : D_valP; # Forward valE from execute d_srcA == E_dstE : e_valE; # Forward valM from memory d_srcA == M_dstM : m_valM; # Forward valE from memory d_srcA == M_dstE : M_valE; # Forward valM from write back d_srcA == W_dstM : W_valM; # Forward valE from write back d_srcA == W_dstE : W_valE; # Use value read from register file 1 : d_rvalA; ]; W_valM B Register Register M file file W_valM – 20 – Error CS:APP W_valE Write back W_valM W icode valE valM dstE dstM data out Avoiding Load/Use Hazard # demo-luh.ys 0x000: 0x006: 0x00c: 0x012: 0x018: irmovl irmovl rmmovl irmovl mrmovl 1 2 3 $128,%edx F D $3,%ecx F %ecx, 0(%edx) $10,%ebx 0(%edx),%eax # Load E 4 5 6 Stall using instruction for one cycle ! Can then pick up loaded value by forwarding from memory stage Detecting Load/Use Hazard W E D F %eax M E D W M E W M F D E F D F 9 10 11 12 M_Bch W M E D F W M E D Execute W M E E W M icode ifun ALU A ALU B valC valA Decode D 0x000: 0x006: 0x00c: 0x012: 0x018: irmovl $128,%edx F D E M irmovl $3,%ecx F D E rmmovl %ecx, 0(%edx) F D irmovl $10,%ebx F mrmovl 0(%edx),%eax # Load %eax bubble 0x01e: addl %ebx,%eax # Use %eax 0x020: halt ! Stall instructions in fetch and decode stages ! Inject bubble into execute stage 5 6 7 W M E D F W M E D W M E F icode ifun Fetch Load/Use Hazard 8 9 F CS:APP 10 dstM srcA srcB 11 12 rA rB valC Instruction Instruction memory memory Trigger dstM srcA srcB Fwd B A Condition valA ! W_valE = 10 valB ! m_valM = 3 dstE dstE Sel +Fwd A M 4 valB W W 3 dstM ALU fun. W_dstE = %ebx W_valE = 10 2 dstE d_srcA d_srcB Control for Load/Use Hazard – 23 – valA ALU ALU Cycle 8 – 21 – Load/Use Hazard M_valA valE e_valE D Condition Bch e_Bch • • • 1 icode M_valE CC CC M_dstM = %eax m_valM ! M[128] = 3 # demo-luh.ys data in Addr 8 M D F m_valM Data Data memory memory write Memory M bubble 0x01e: addl %ebx,%eax # Use %eax 0x020: halt ! 7 read Mem. control W_valM B Register RegisterM file file E W_valE valP PC PC increment increment Predict PC f_PC M_valA E_icode E_icode in { IMRMOVL, IPOPL } && E_dstM E_dstM in { d_srcA d_srcA,, d_srcB d_srcB } Select PC W_valM predPC CS:APP – 22 – Branch Misprediction Example demo-j.ys D F W M E D F W M E D W M E W M W 0x000: xorl %eax,%eax 0x002: jne t 0x007: irmovl $1, %eax 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx 0x017: irmovl $4, %ecx 0x01d: irmovl $5, %edx ! F D E M W stall stall bubble normal normal CS:APP – 24 – # Not taken # Fall through # Target (Should not execute) # Should not execute # Should not execute Should only execute first 8 instructions CS:APP W_valE Write back W_valM W Handling Misprediction # demo-j.ys 1 2 3 4 5 0x000: F D F E D M E W M F D 0x002: xorl %eax,%eax valE jne target # Not taken bubble irmovl $3,%ebx # Target+1 0x007: irmovl $1,%eax # Fall through 0x00d: nop dstE 6 7 8 9 10 m_valM Data Data memory memory write data in Addr M_Bch W M icode M_valA M_valE Bch valE valA dstE E M W D F E D F M E D e_valE ALU ALU CC CC W M E Execute W M E icode ifun ALU fun. ALU A ALU B valC valA valB dstE W Fetch 2 instructions at target On following cycle, replace instructions in execute and decode by bubbles D Fetch No side effects have occurred yet srcA srcB A W_valM B RegisterM W_valE Control for Misprediction 1 2 3 4 5 0x000: xorl %eax,%eax F 0x002: jne target # Not taken D F E D F M E D W M W E M W D F E D F M E D irmovl $3,%ebx # Target+1 6 7 8 9 0x007: irmovl $1,%eax # Fall through 0x00d: nop rB valC valP Instruction Instruction memory memory PC PC increment increment Predict PC Select PC W M E W M CS:APP W_valM predPC demo-retb.ys 10 F bubble rA Return Example # demo-j.ys bubble ifun M_valA – 26 – F 0x011: t: irmovl $2,%edx # Target icode f_PC CS:APP – 25 – – 27 – dstM E ! Mispredicted Branch normal srcB Register Mispredicted Branch E_icode E_icode = IJXX file file & !e_Bch Detect branch not-taken in execute stage F srcA Fwd B Trigger Decode ! Condition Sel +Fwd A Condition Cancel when mispredicted 0x017: dstM d_srcA d_srcB dstE ! dstM e_Bch Predict branch as taken ! dstM data out read Mem. control F bubble valM Detecting Mispredicted Branch Memory 0x011: t: irmovl $2,%edx # Target 0x017: icode W D E M W bubble bubble normal normal CS:APP 0x000: 0x006: 0x00b: 0x011: 0x020: 0x020: 0x026: 0x027: 0x02d: 0x033: 0x039: 0x100: 0x100: ! – 28 – irmovl Stack,%esp call p irmovl $5,%esi halt .pos 0x20 p: irmovl $-1,%edi ret irmovl $1,%eax irmovl $2,%ecx irmovl $3,%edx irmovl $4,%ebx .pos 0x100 Stack: # Initialize stack pointer # Procedure call # Return point # procedure # # # # Should Should Should Should not not not not be be be be executed executed executed executed # Stack: Stack pointer Previously executed three additional instructions CS:APP Correct Return Example Detecting Return # demo-retb 0x026: M_Bch ret F D F bubble bubble E D F bubble 0x00b: M E D F irmovl $5,%esi # Return W M E D F M W M E D icode M_valE Bch valE valA dstE dstM e_Bch e_valE W M E ALU ALU CC CC W M ALU A ALU B valC valA Execute W E icode ifun ALU fun. valB dstE dstM srcA srcB d_srcA d_srcB dstE dstM srcA ! As ret passes through pipeline, stall at fetch stage W Sel+Fwd A Decode • • • memory stage ! Inject bubble into decode stage D valC ! 5 rB ! %esi CS:APP – 29 – Control for Return F D F bubble bubble bubble 0x00b: E D F M E D F irmovl $5,%esi # Return Condition Processing ret W_valM B Register Register M file file E ifun rA rB valC W_valE valP Condition Trigger Processing ret IRET in { D_icode D_icode,, E_icode E_icode,, M_icode M_icode } CS:APP – 30 – Special Control Cases # demo-retb ret icode A F Release stall when reach write-back stage 0x026: Fwd B valM = 0x0b " While in decode, execute, and ! srcB W M E D F Detection W M E D W M E W M W F D E M W stall bubble normal normal normal Condition Trigger Processing ret IRET in { D_icode D_icode,, E_icode E_icode,, M_icode M_icode } Load/Use Hazard E_icode E_icode in { IMRMOVL, IPOPL } && E_dstM E_dstM in { d_srcA d_srcA,, d_srcB d_srcB } Mispredicted Branch E_icode E_icode = IJXX & !e_Bch Action (on next cycle) Condition F D E M W Processing ret stall bubble normal normal normal Load/Use Hazard stall stall bubble normal normal bubble bubble normal normal Mispredicted Branch normal – 31 – CS:APP – 32 – CS:APP Implementing Pipeline Control W icode valE valM dstE dstM valE valA dstE dstM Initial Version of Pipeline Control bool F_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; M_icode M icode Bch e_Bch bool D_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }; CC CC E_dstM E_icode Pipe control logic E_bubble E icode ifun valC valA valB dstE dstM srcA srcB d_srcB d_srcA srcB D_icode srcA D_bubble D_stall D icode ifun rA rB valC valP bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || Write back # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, WM_icode }; icode valE valM dstE dstM data out Data Mem. Data bool E_bubble = control memory memory # Mispredicted branch Memory Addr (E_icode == IJXX && !e_Bch) || # Load/use hazard Bch valA dstE dstM E_icode in { IMRMOVL, IPOPL }M &&icodeE_dstM invalE{ d_srcA, d_srcB}; read F_stall write F predPC data in M_valA M_Bch ! Combinational logic generates pipeline control signals ! Action occurs at start of following cycle e_Bch CS:APP – 33 – – 34 – ALU A Execute E icode ALU fun. ALU ALU CC CC ifun CS:APP ALU B valC valA valB dstE dstM srcA dstE dstM srcA srcB d_srcA d_srcB Select A Decode Control Combinations Load/use M E D M Load Use E D D ret 2 M M E ret D Combination A E D JXX ret bubble ret 3 M ret E D bubble bubble Combination B ! ! Mispredict M E D Not-taken branch ret instruction at branch target – 35 – Instruction that reads from memory to %esp ! Followed by ret instruction CS:APP ifun rA rB Fetch W_valE E valC valP Instruction Instruction memory memory PC PC increment increment Predict PC f_PC M_valA Select PC F W_valM predPC F D E M W stall bubble normal normal normal Mispredicted Branch normal bubble bubble normal normal Combination bubble bubble normal normal Combination B ! M E ret D Combination A Processing ret Special cases that can arise on same clock cycle icode srcB W_valM B Register RegisterM file file ret 1 JXX Condition Combination A ! A Control Combination A ret 1 Mispredict d_rvalA – 36 – stall ! Should handle as mispredicted branch ! Stalls F pipeline register ! But PC selection logic will be using M_valM anyhow CS:APP Control Combination B ret 1 Load/use M E D Handling Control Combination B M E D Load Use M E D ret D E M W Processing ret stall bubble normal normal normal Load/Use Hazard stall stall bubble normal normal Combination stall bubble + bubble stall normal normal F D E M W Processing ret stall bubble normal normal normal Load/Use Hazard stall stall bubble normal normal Combination stall stall bubble normal normal Would attempt to bubble and stall pipeline register D ! ! Signaled by processor as pipeline error ! CS:APP – 37 – Corrected Pipeline Control Logic bool D_bubble = # Mispredicted branch (E_icode == IJXX && !e_Bch) || # Stalling at fetch while ret passes IRET in { D_icode, E_icode, M_icode # but not condition for a load/use && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB Pipeline Summary ! Most handled by forwarding " No performance penalty ! }); Load/use hazard requires one cycle stall Control Hazards F D E M W stall bubble normal normal normal stall bubble normal normal Combination stall stall bubble normal normal ! CS:APP – 38 – through pipeline } hazard stall ! Load/use hazard should get priority ret instruction should be held in decode stage for additional cycle Data Hazards Load/Use Hazard – 39 – Condition ! Processing ret ret Combination B F Condition M E D Load Use Combination B Condition ret 1 Load/use Cancel instructions when detect mispredicted branch ! Stall fetch stage while ret passes through pipeline " Two clock cycles wasted " Three clock cycles wasted Control Combinations Load/use hazard should get priority ret instruction should be held in decode stage for additional cycle CS:APP ! ! Must analyze carefully ! First version had subtle bug " Only arises with unusual instruction combination – 40 – CS:APP
© Copyright 2025 Paperzz