【正文】
e Busy Op Vj Vk Qj QkA dd1 Y e s S U BD M (A 1) L oa d2A dd2 NoA dd3 NoM ul t 1 Y e s M U L T D R(F 4) L oa d2M ul t 2 NoR e gis te r r e s ult s tat us :C loc k F0 F2 F4 F6 F8 F 10 F 12 ... F 304 FU M ul t 1 L oa d2 M (A 1) A dd1? Load2 pleting。 –40 clocks for Flopt. / 2022/8/17 16 Tomasulo Example I ns tr uc ti on s tat us : E xec W r i t eIns t ruc t i on j k Is s u e Co m p R es u l t Bu s y A d d res sLD F6 34+ R2 L oa d1 NoLD F2 45+ R3 L oa d2 NoM U L T D F0 F2 F4 L oa d3 NoS U BD F8 F6 F2D IV D F 10 F0 F6ADDD F6 F8 F2R e s e r v ati on Stat ions : S1 S2 RS RST i m e Nam e Busy Op Vj Vk Qj QkA dd1 NoA dd2 NoA dd3 NoM ul t 1 NoM ul t 2 NoR e gis te r r e s ult s tat us :C loc k F0 F2 F4 F6 F8 F 10 F 12 ... F 300 FUClock cycle counter FU count down Instruction stream 3 Load/Buffers 3 FP Adder . 2 FP Mult . 2022/8/17 17 Tomasulo Example Cycle 1 I ns tr uc ti on s tat us : E xec W r i t eIns t ruc t i on j k Is s u e Co m p R es u l t Bu s y A d d res sLD F6 34+ R2 1 L oa d1 Y e s 34+ R2LD F2 45+ R3 L oa d2 NoM U L T D F0 F2 F4 L oa d3 NoS U BD F8 F6 F2D IV D F 10 F0 F6ADDD F6 F8 F2R e s e r v ati on Stat ions : S1 S2 RS RST i m e Nam e Busy Op Vj Vk Qj QkA dd1 NoA dd2 NoA dd3 NoM ul t 1 NoM ul t 2 NoR e gis te r r e s ult s tat us :C loc k F0 F2 F4 F6 F8 F 10 F 12 ... F 301 FU L oa d12022/8/17 18 Tomasulo Example Cycle 2 I ns tr uc ti on s tat us : E xec W r i t eIns t ruc t i on j k Is s u e Co m p R es u l t Bu s y A d d res sLD F6 34+ R2 1 L oa d1 Y e s 34+ R2LD F2 45+ R3 2 L oa d2 Y e s 45+ R3M U L T D F0 F2 F4 L oa d3 NoS U BD F8 F6 F2D IV D F 10 F0 F6ADDD F6 F8 F2R e s e r v ati on Stat ions : S1 S2 RS RST i m e Nam e Busy Op Vj Vk Qj QkA dd1 NoA dd2 NoA dd3 NoM ul t 1 NoM ul t 2 NoR e gis te r r e s ult s tat us :C loc k F0 F2 F4 F6 F8 F 10 F 12 ... F 302 FU L oa d2 L oa d1Note: Can have multiple loads outstanding 2022/8/17 19 Tomasulo Example Cycle 3 I ns tr uc ti on s tat us : E xec W r i t eIns t ruc t i on j k Is s u e Co m p R es u l t Bu s y A d d res sLD F6 34+ R2 1 3 L oa d1 Y e s 34+ R2LD F2 45+ R3 2 L oa d2 Y e s 45+ R3M U L T D F0 F2 F4 3 L oa d3 NoS U BD F8 F6 F2D IV D F 10 F0 F6ADDD F6 F8 F2R e s e r v ati on Stat ions : S1 S2 RS RST i m e Nam e Busy Op Vj Vk Qj QkA dd1 NoA dd2 NoA dd3 NoM ul t 1 Y e s M U L T D R(F 4) L oa d2M ul t 2 NoR e gis te r r e s ult s tat us :C loc k F0 F2 F4 F6 F8 F 10 F 12 ... F 303 FU M ul t 1 L oa d2 L oa d1? Note: registers names are removed (“renamed”) in Reservation Stations。 mark reservation station available 2022/8/17 15 Three Stages of Tomasulo Algorithm ? Normal data bus: data + destination (―go to‖ bus) ? Common data bus: data + source (―e from‖ bus) –64 bits of data + 4 bits of Functional Unit source address –Write if matches expected Functional Unit (produces result) –Does the broadcast ? Example speed: –3 clocks for Flopt. +,。 sends operands (renames registers). 2. Execute—operate on operands (EX) When both operands ready then execute。 called register renaming 。 buffers distributed with Function Units (FU) –FU buffers called ―reservation stations‖?,F(xiàn)代計算機體系結構 1 現(xiàn)代計算機體系結構 主講教師:張鋼 教授 天津大學計算機學院 通信郵箱: 提交作業(yè)郵箱: 2022年 2 The Main Contents課程主要內容 ? Chapter 1. Fundamentals of Quantitative Design and Analysis ? Chapter 2. Memory Hierarchy Design ? Chapter 3. InstructionLevel Parallelism and Its Exploitation ? Chapter 4. DataLevel Parallelism in Vector, SIMD, and GPU Architectures ? Chapter 5. ThreadLevel Parallelism ? Chapter 6. WarehouseScale Computers to Exploit RequestLevel and DataLevel Parallelism ? Appendix A. Pipelining: Basic and Intermediate Concepts 課堂討論 2022/8/17 4 Advantages of Dynamic Scheduling ? Dynamic scheduling –Hardware rearranges the instruction execution to reduce stalls while maintaining data flow and exception behavior ? What’s the meaning that maintaining data flow and exception behavior? 2022/8/17 5 Advantages of Dynamic Scheduling ? Advantages –It handles cases when dependences unknown at pile time ? it allows the processor to tolerate unpredictable delays such as cache misses, by executing other code while waiting for the miss to resolve –It allows code that piled for one pipeline to run efficiently on a different pipeline –It simplifies the piler ? Why? 2022/8/17 6 HW Schemes: Instruction Parallelism ? Key idea: Allow instructions behind stall to proceed DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F12,F8,F14 ? Enables outoforder execution and allows outoforder pletion (., SUBD) –In a dynamically scheduled pipeline, all instructions still pass through issue stage in order (inorder issue) ? What are the meaning that inorder issue, outoforder execution, outoforder pletion? 202