【正文】
erson, Katz and Culler 169。 Man Kaufmann ECE668 .13 Adapted from Patterson, Katz and Culler 169。 by adding global information, performance improved ?Tournament predictors: use two predictors, 1 based on global information and 1 based on local information, and bine with a selector ?Hopes to select right predictor for right branch (or right context of branch) Copyright 2020 UCB amp。 ith bit is 1 = ith prior branch taken。 each 10bit entry corresponds to the most recent 10 branch outes for the entry. 10bit history allows patterns 10 branches to be discovered and predicted ? Next level Selected entry from the local history table is used to index a table of 1K entries consisting a 3bit saturating counters, which provide the local prediction ? Total size: 4K*2 + 4K*2 + 1K*10 + 1K*3 = 29K bits! (~180K transistors) 1K ? 10 bits 1K ? 3 bits Copyright 2020 UCB amp。 UCB 94% 96% 98% 98% 97% 100% 70% 82% 77% 82% 84% 99% 88% 86% 88% 86% 95% 99% 0% 20% 40% 60% 80% 100% gcc espresso li fpppp doduc tomcatv Profilebased 2bit counter Tournament Accuracy of Branch Prediction ? Profile: branch profile from last execution (static in that is encoded in instruction, but profile) fig Copyright 2020 UCB amp。 UCB Need Address at Same Time as Prediction ? Branch Target Buffer (BTB): Address of branch used as index to get prediction AND branch address (if taken) ? Note: must check for branch match now, since can’t use wrong branch address Branch PC Predicted PC =? PC of instruction FETCH Prediction state bits Yes: instruction is branch。 UCB Branch Target “Cache” ? Branch Target cache Only predicted taken branches ? “Cache” Content Addressable Memory (CAM) or Associative Memory (see figure) ? Use a big Branch History Table amp。 restart fetch at other target。 UCB ?Avoid branch prediction by turning branches into conditionally executed instructions: if (x) then A = B op C else NOP ?If false, then neither store result nor cause interference ?Expanded ISA of Alpha, MIPS, PowerPC, SPARC have conditional move。 Man Kaufmann ECE668 .25 Adapted from Patterson, Katz and Culler 169。 Add hints to be used at runtime ? Also, predict statically ? Branch folding 187。 Man Kaufmann ECE668 .26 Adapted from Patterson, Katz and Culler 169。 UCB Pitfall: Sometimes dumber is better ? Alpha 21264 uses tournament predictor (29 Kbits) ? Earlier 21164 uses a simple 2bit predictor with 2K entries (or a total of 4 Kbits) ? SPEC95 benchmarks, 21264 outperforms ? 21264 avg. mispredictions per 1000 instructions ?21164 avg. mispredictions per 1000 instructions ? Reversed for transaction processing (TP) ! ? 21264 avg. 17 mispredictions per 1000 instructions ?21164 avg. 15 mispredictions per 1000 instructions ? TP code much larger