【正文】
HT together, maximum of threads that can be executed at a time is 4 per processor 國家高性能計算中心(合肥) 57 2022/2/16 多核技術(shù)與超線程技術(shù)的結(jié)合 產(chǎn) 量 … 多功能 … 功 效 Core0 Core1 Front Side Bus Dual Core with HyperThreading 4 threads/socket Core0 Core1 Front Side Bus Dual Core 2 threads/socket 國家高性能計算中心(合肥) 58 2022/2/16 AMD與 Intel雙核架構(gòu) 的對比 AMD Opteron雙核架構(gòu)示意圖 Intel 奔騰至尊版雙核架構(gòu)示意圖 國家高性能計算中心(合肥) 59 2022/2/16 The cache coherence problem cache一致性問題 ? Since we have private caches: How to keep the data consistent across caches? ? Each core should perceive the memory as a monolithic array, shared by all the cores 國家高性能計算中心(合肥) 60 2022/2/16 The cache coherence problem Suppose variable x initially contains 15213 Core 1 Core 2 Core 3 Core 4 One or more levels of cache One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=15213 multicore chip 國家高性能計算中心(合肥) 61 2022/2/16 The cache coherence problem Core 1 reads x Core 1 Core 2 Core 3 Core 4 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=15213 multicore chip 國家高性能計算中心(合肥) 62 2022/2/16 The cache coherence problem Core 2 reads x Core 1 Core 2 Core 3 Core 4 One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=15213 multicore chip 國家高性能計算中心(合肥) 63 2022/2/16 The cache coherence problem Core 1 writes to x, setting it to 21660 Core 1 Core 2 Core 3 Core 4 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multicore chip assuming writethrough caches 國家高性能計算中心(合肥) 64 2022/2/16 The cache coherence problem Core 2 attempts to read x… gets a stale copy Core 1 Core 2 Core 3 Core 4 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multicore chip 國家高性能計算中心(合肥) 65 2022/2/16 Solutions for cache coherence ? This is a general problem with multiprocessors, not limited just to multicore ? There exist many solution algorithms, coherence protocols, etc. ? A simple solution: invalidationbased protocol with snooping 國家高性能計算中心(合肥) 66 2022/2/16 Intercore bus Core 1 Core 2 Core 3 Core 4 One or more levels of cache One or more levels of cache One or more levels of cache One or more levels of cache Main memory multicore chip intercore bus 國家高性能計算中心(合肥) 67 2022/2/16 Invalidation protocol with snooping ? Invalidation: If a core writes to a data item, all other copies of this data item in other caches are invalidated ? Snooping: All cores continuously “snoop” (monitor) the bus connecting the cores. 國家高性能計算中心(合肥) 68 2022/2/16 The cache coherence problem Revisited: Cores 1 and 2 have both read x Core 1 Core 2 Core 3 Core 4 One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=15213 multicore chip 國家高性能計算中心(合肥) 69 2022/2/16 The cache coherence problem Core 1 writes to x, setting it to 21660 Core 1 Core 2 Core 3 Core 4 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multicore chip assuming writethrough caches INVALIDATED sends invalidation request intercore bus 國家高性能計算中心(合肥) 70 2022/2/16 The cache coherence problem After invalidation: Core 1 Core 2 Core 3 Core 4 One or more levels of cache x=21660 One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=21660 multicore chip 國家高性能計算中心(合肥) 71 2022/2/16 The cache coherence problem Core 2 reads x. Cache misses, and loads the new copy. Core 1 Core 2 Core 3 Core 4 One or more levels of cache x=21660 One or more levels of cache x=21660 One or more levels of cache One or more levels of cache Main memory x=21660 mult