【正文】
ch entry in the global predictor is a standard 2bit predictor ? 12bit pattern: ith bit is 0 = ith prior branch not taken。 UCB Tournament Predictor in Alpha 21264 ? Local predictor consists of a 2level predictor: ? Top level a local history table consisting of 1024 10bit entries。 Man Kaufmann ECE668 .18 Adapted from Patterson, Katz and Culler 169。 Man Kaufmann ECE668 .20 Adapted from Patterson, Katz and Culler 169。 Man Kaufmann ECE668 .21 Adapted from Patterson, Katz and Culler 169。 UCB Steps with Branch target Buffer Branch_CPI_Penalty = [Buffer_hit_rate x P{Incorrect_prediction}] x Penalty_Cycles + [(1Buffer_hit_rate) x P{Branch_taken}] x Penalty_Cycles = . + . = .29 Taken Branch? Entry found in branchtarget buffer? Send out predicted PC Is instruction a taken branch? Send PC to memory and branchtarget buffer Enter branch instruction address and next PC into branchtarget buffer Mispredicted branch, kill fetched instruction。 Man Kaufmann ECE668 .23 Adapted from Patterson, Katz and Culler 169。 UCB Special Case: Return Addresses ?Register Indirect branch hard to predict address ?SPEC89 85% such branches for procedure return ?Since stack discipline for procedures, save return address in small buffer that acts like a stack: 8 to 16 entries has small miss rate Copyright 2020 UCB amp。 Combine runtime and pile time information 187。 BlueRISC Copyright 2020 UCB amp。 Man Kaufmann ECE668 .27 Adapted from Patterson, Katz and Culler 169。 UCB 。 21164 hold 2X branch predictions based on local behavior (2K vs. 1K local predictor in the 21264) ? What about power? ? Large predictors give some increase in prediction rate but for a large power cost Copyright 2020 UCB amp。 UCB Power Consumption BlueRISC’s Compilerdriven PowerAware Branch Prediction Comparison with 512 entry BTAC bimodal (patentissued) Copyright 2020 CAM amp。 Runtime 187。 UCB How to Reduce Power ?Reduce load capacitances switched ? Smaller BTAC ? Smaller local, global predictors ? Less associativity ? How do you know which branches? 187。 PARISC can annul any following instruction ?Drawbacks to conditional instructions ?Still takes a clock even if “annulled” ?Stall if condition evaluated late: Complex conditions reduce effectiveness since condition bees known late in pipeline x A = B op C Predicated Execution Copyright 2020 UCB amp。 delete entry from target buffer Normal instruction execution Branch correctly predicted。 a small Branch Target Cache Branch PC Predicted PC =? Prediction state bits (optional) Yes: predicted taken branch found No: not found PC Copyright 2020 UCB amp。 use predicted PC as next PC (if predict Taken) No: branch not predicted。 Man Kaufmann ECE668 .19 Adapted from Patterson, Katz a