【正文】
ctly to home ? hierarchical schemes Hierarchical of caches that guarantee the inclusion property。USTC How to Find Directory Information 176。 Suggests techniques to reduce storage overhead PCA L16 Wu Spring 04 169。 Generally, few sharers at a write, scales slowly with P 176。 Potential for cost and performance advantages ? amortization of node fixed costs over multiple processors applies even if processors simply packaged together but not coherent ? can use modity SMPs ? less nodes for directory to keep track of ? much munication may be contained within node (cheaper) ? nodes prefetch data for each other (fewer “remote” misses) ? bining of requests (like hierarchical, only twolevel) ? can even share caches (overlapping of working sets) ? benefits depend on sharing pattern (and mapping) good for widely readshared: . tree data in BarnesHut good for nearestneighbor, if properly mapped not so good for alltoall munication PCA L16 Wu Spring 04 169。USTC Example Twolevel Hierarchies PCS n o o p i n g B1B2PCPCB1PCM a i nM e mM a i nM e mAd a p t e rS n o o p i n gAd a p t e rPCB1B u s ( o r R i n g )PCPCB1PCM a i nM e mM a i nM e mNe t w o r kAs s i s t As s i s tNe t w o r k 2PCAM / DNe t w o r k 1PCAM / DDi r e c t o r y a d a p t e rPCAM / DNe t w o r k 1PCAM / DDi r e c t o r y a d a p t e rPCAM / DNe t w o r k 1PCAM / DDi r / S n o o p y a d a p t e rPCAM / DNe t w o r k 1PCAM / DDi r / S n o o p y a d a p t e r( a ) S n o o p i n g s n o o p i n g ( b ) S n o o p i n g d i r e c t o r yDi r . Di r .( c ) D i r e c t o r y d i r e c t o r y ( d ) D i r e c t o r y s n o o p i n gPCA L16 Wu Spring 04 169。 Examples: ? Convex Exemplar: directorydirectory ? Sequent, Data General, HAL: directorysnoopy 176。 Coherence across nodes is directorybased ? directory keeps track of nodes, not individual processors 176。 Twolevel “hierarchy” 176。 ... } ? ... PCA L16 Wu Spring 04 169。 turn dirtybit ON。 send invalidations to all caches that have the block。 supply recalled data to i。 turn dirtybit OFF。 } ? if dirtybit ON then { recall line from dirty proc (cache state to shared)。USTC Basic Operation of Directory ? k processors. ? With each cacheblock in memory: k presencebits, 1 dirtybit ? With each cacheblock in cache: 1 valid bit, and 1 dirty (owner) bit ? ??P PC a c h e C a c h eM e m o r y D i r e c t o r yp r e s e n c e b i t s d i r t y b i tI n t e r c o n n e c t i o n N e t w o r k? Read from main memory by processor i: ? If dirtybit OFF then { read from main memory。 Many alternatives for anizing directory information PCA L16 Wu Spring 04 169。USTC Scalable Approach: Directories 176。 Problems: ? high latency: multiple levels, and snoop/lookup at every level ? bandwidth bottleneck at root 176。 Extend snooping approach: hierarchy of broadcast media ? tree of buses or rings (KSR1) ? processors are in the bus or ringbased multiprocessors at the leaves ? parents and children connected by twoway snoopy interfaces snoop both buses and propagate relevant transactions ? main memory may be centralized at root or distributed among leaves 176。 Scalable coherence: ? can have same cache states and state transition diagram ? different mechanisms to manage protocol PCA L16 Wu Spring 04 169。 在規(guī)模不同的網絡上都可實現 ? 向所有處理器廣播 , 并使它們做出響應 176。USTC 基于總線的一致性 176。 在所有的系統(tǒng)中都使用同樣的方法進行 (0) ? 存儲塊的狀態(tài)保存在高速緩存中 ? 若未命中則調用協議 176。 提供狀態(tài)集 , 狀態(tài)轉移圖 , 以及動作 176。 未命中,與目錄通信 ? 決定高速緩存拷貝的地址 ? 決定將要進行的操作 ? 確定協議以保持同步 P1C a c heM e m or yS c a l a bl e I nt e r c onn e c t i on N e t w or kC om m .A s s i s tP1C a c heC om mA s s i s tD i r e c t or y M e m or yD i r e c t or yPCA L16 Wu Spring 04 169。USTC 解決方法 : 目錄協議 176。 PCA L16 Wu Spring 04 169。當每次訪問遠地主存只能獲得一個單字時,共享存儲所具有的空間局部性的優(yōu)點就蕩然無存了。 硬件不支持高速緩存一致性 (NCCNUMA結構 ) ? 為了避免一致性問題,共享數據被標識為不可高速緩存的,只有私有數據才能被高速緩存 ? 好處在于僅需要很少的硬件支持就足夠 ? 缺點在于: ①支持透明的軟件高速緩存一致性的編譯機制非常有限,基于編譯支持的軟件高速緩存一致性是不太現實的。 放松的存儲一致性模型 PCA L16 Wu Spring 04 169。 Review of Lec14 176。USTC Overview 176。 ? 混合實現的分布式共享存儲系統(tǒng) , 其基本思想是結合軟硬件實現的分布式共享存儲系統(tǒng)的優(yōu)點 。 176。 ? 優(yōu)點是在消息傳遞的系統(tǒng)上實現共享存儲的編程界面 , 但主要問題是難以獲得滿意的性能 與硬件共享存儲系統(tǒng)相比 , SVM系統(tǒng)中較大的通信和共享粒度 (通常是存儲頁 )會導致假共享及額外的通信; 在基于機群的 SVM系統(tǒng)中 , 通信開銷很大 。USTC 共享虛擬存儲 SVM結構 176。 ? 目前采用唯高速緩存結構的系統(tǒng)有 Kendall Square Research的K