freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

cachecoherenceinscalablemachines-資料下載頁(yè)

2025-07-15 17:55本頁(yè)面
  

【正文】 h ? conceptual and protocol design versus implementation ? Tradeoffs within an approach ? performance enhancements often add plexity, plicate correctness – more concurrency, potential race conditions – not strict requestreply ? Many subtle corner cases ? BUT, increasing understanding/adoption makes job much easier ? automatic verification is important but hard ? Let’s look at memory and cachebased more deeply Flat, Memorybased Protocols ? Use SGI Origin2022 Case Study ? Protocol similar to Stanford DASH, but with some different tradeoffs ? Also Alewife, FLASH, HAL ? Outline: ? System Overview ? Coherence States, Representation and Protocol ? Correctness and Performance Tradeoffs ? Implementation Issues ? Quantiative Performance Characteristics Origin2022 System Overview ? Single 16”by11” PCB ? Directory state in same or separate DRAMs, accessed in parallel ? Upto 512 nodes (1024 processors) ? With 195MHz R10K processor, peak 390MFLOPS or 780 MIPS per proc ? Peak SysAD bus bw is 780MB/s, so also HubMem ? Hub to router chip and to Xbow is GB/s (both are ofboard) L 2 cach eP(1 4 M B)L 2 cach eP(1 4 M B)H ubM a i n M e m or y( 1 4 G B )D i r e c t or yL 2 cach eP(1 4 M B)L 2 cach eP(1 4 M B)H ubM a i n M e m or y( 1 4 G B )D i r e c t or yI nt e r c onne c t i o n N e t w o r kS y s A D b usS y s A D b usOrigin Node Board ? Hub is 500Kgate in u CMOS ? Has outstanding transaction buffers for each processor (4 each) ? Has two block transfer engines (memory copy and fill) ? Interfaces to and connects processor, memory, work and I/O ? Provides support for synch primitives, and for page migration (later) ? Two processors within node not snoopycoherent (motivation is cost) R 10 KSC SCSC SCT agR 10 KSC SCSC SCT agE x t e nd e dM a i n M e m or yM a i n M e m or yBC BC BC BC BCBCH ubP w r/ g n d P w r/ g n d P w r/ g n dN et w o rk I/ OCo n n ec t i o n s t o B a ck p l a n ea nd 16 b i t D i r e c t or ya nd 16 b i t D i r e c t or yD i r e c t or yOrigin Network ? Each router has six pairs of ? Two to nodes, four to other routers ? latency: 41ns pin to pin across a router ? Flexible cables up to 3 ft long ? Four “virtual channels”: request, reply, other two for priority or I/O NNNNNNNNNNNN( b ) 4 n o d e ( c ) 8 n o d e( d ) 1 6 n o d e( e ) 6 4 n o d e( d ) 3 2 n o d em e t a r o u t e rOrigin I/O ? Xbow is 8port crossbar, connects two Hubs (nodes) to six cards ? Similar to router, but simpler so can hold 8 ports ? Except graphics, most other devices connect through bridge and bus ? can reserve bandwidth for things like video or realtime ? Global I/O space: any proc can access any I/O device ? through uncached memory ops to I/O space or coherent DMA ? any I/O device can write to or read from any memory (m thru routers) B r i dg e I O C 3 S I OS C S I S C S IB r i dg eL I N C C T R LT o B r i dg eT o B r i dg e16G r a ph i c sG r a ph i c s16161616161616H ub 1H ub 2X b o wOrigin Directory Structure ? Flat, Memory based: all directory information at the home ? Three directory formats: ? (1) if exclusive in a cache, entry is pointer to that specific processor (not node) ? (2) if shared, bit vector: each bit points to a node (Hub), not processor ? invalidation sent to a Hub is broadcast to both processors in the node ? two sizes, depending on scale – 16bit format (32 procs), kept in main memory DRAM – 64bit format (128 procs), extra bits kept in extension memory ? (3) for larger machines, coarse vector: each bit corresponds to p/64 nodes ? invalidation is sent to all Hubs in that group, which each bcast to their 2 procs ? machine can choose between bit vector and coarse vector dynamically – is application confined to a 64node or less part of machine? ? Ignore coarse vector in discussion for simplicity Origin Cache and Directory States ? Cache states: MESI ? Seven directory states ? unowned: no cache has a copy, memory copy is valid ? shared: one or more caches has a shared copy, memory is valid ? exclusive: one cache (pointed to) has block in modified or exclusive state ? three pending or busy states, one for each of the above: – indicates directory has received a previous request for the block – couldn’t satisfy it itself, sent it to another node and is waiting – cannot take another request for the block yet ? poisoned state, used for efficient page migration (later) ? Let’s see how it handles read and “write” requests ? no pointtopoint order assumed in work Handling a Read Miss ? Hub looks at address ? if remote, sends request to home ? if local, looks up directory entry and memory itself ? directory may indicate one of many states ? Shared or Unowned State: ? if shared, directory sets presence bit ? if unowned, goes to exclusive state and uses pointer format ? replies with block to requestor – strict requestreply (no work transactions if home is local) ? actually, also looks up memory speculatively to get data, in parallel with dir – directory lookup returns one cycle earlier – if directory is shared or unowned, it’s a win: data already obtained by Hub – if not one of these, speculative memory access is wasted ? Busy state: not ready to handle ? NACK, so as not to hold up buffer space for long Read Miss to Block in Exclusive State ? Most interesting case ? if owner is not home, need to get data to home and requestor from owner ? Uses reply forwarding for lowest latency and traffic – not strict requestreply L H R1 : r e q 2 : i n t e r v e n t i o n3 b : r e s p o n s e3 a : r e v i s e? Problems with “intervention forwarding” option ? replies e to home (which then repl
點(diǎn)擊復(fù)制文檔內(nèi)容
教學(xué)課件相關(guān)推薦
文庫(kù)吧 www.dybbs8.com
備案圖鄂ICP備17016276號(hào)-1