【正文】
spice fpppp gcc espresso eqntott li Copyright 2020 UCB amp。 UCB 94% 96% 98% 98% 97% 100% 70% 82% 77% 82% 84% 99% 88% 86% 88% 86% 95% 99% 0% 20% 40% 60% 80% 100% gcc espresso li fpppp doduc tomcatv Profilebased 2bit counter Tournament Accuracy of Branch Prediction ? Profile: branch profile from last execution (static in that is encoded in instruction, but profile) fig Copyright 2020 UCB amp。 UCB Accuracy v. Size (SPEC89) 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128 Total predictor size (Kbits) Conditional branch misprediction rate Local 2 bit counters Correlating (2,2) scheme Tournament Copyright 2020 UCB amp。 UCB Need Address at Same Time as Prediction ? Branch Target Buffer (BTB): Address of branch used as index to get prediction AND branch address (if taken) ? Note: must check for branch match now, since can’t use wrong branch address Branch PC Predicted PC =? PC of instruction FETCH Prediction state bits Yes: instruction is branch。 proceed normally (PC+4) Copyright 2020 UCB amp。 UCB Branch Target “Cache” ? Branch Target cache Only predicted taken branches ? “Cache” Content Addressable Memory (CAM) or Associative Memory (see figure) ? Use a big Branch History Table amp。 Man Kaufmann ECE668 .22 Adapted from Patterson, Katz and Culler 169。 restart fetch at other target。 continue execution with no stalls No Yes Yes Yes No No ID IF EX for the 5stage MIPS Copyright 2020 UCB amp。 UCB ?Avoid branch prediction by turning branches into conditionally executed instructions: if (x) then A = B op C else NOP ?If false, then neither store result nor cause interference ?Expanded ISA of Alpha, MIPS, PowerPC, SPARC have conditional move。 Man Kaufmann ECE668 .24 Adapted from Patterson, Katz and Culler 169。 Man Kaufmann ECE668 .25 Adapted from Patterson, Katz and Culler 169。 Use static information 187。 Add hints to be used at runtime ? Also, predict statically ? Branch folding 187。 Compiletime Copyright 2020 CAM amp。 Man Kaufmann ECE668 .26 Adapted from Patterson, Katz and Culler 169。 BlueRISC Copyright 2020 UCB amp。 UCB Pitfall: Sometimes dumber is better ? Alpha 21264 uses tournament predictor (29 Kbits) ? Earlier 21164 uses a simple 2bit predictor with 2K entries (or a total of 4 Kbits) ? SPEC95 benchmarks, 21264 outperforms ? 21264 avg. mispredictions per 1000 instructions ?21164 avg. mispredictions per 1000 instructions ? Reversed for transaction processing (TP) ! ? 21264 avg. 17 mispredictions per 1000 instructions ?21164 avg. 15 mispredictions per 1000 instructions ? TP code much larger amp。 Man Kaufmann ECE668 .28 Adapted from Patterson, Katz and Culler