【正文】
GFS: The Google File System Brad Karp UCL Computer Science CS Z03 / 4030 30th October, 2021 2 Motivating Application: Google ? Crawl the whole web ? Store it all on “one big disk” ? Process users’ searches on “one big CPU” ? More storage, CPU required than one PC can offer ? Custom parallel superputer: expensive (so much so not really available today) 3 Cluster of PCs as Superputer ? Lots of cheap PCs, each with disk and CPU – High aggregate storage capacity – Spread search processing across many CPUs ? How to share data among PCs? ? Ivy: shared virtual memory – Finegrained, relatively strong consistency at load/store level – Fault tolerance? ? NFS: share fs from one server, many clients – Goal: mimic original UNIX local fs semantics – Compromise: closetoopen consistency (performance) – Fault tolerance? GFS: File system for sharing data on clusters, designed with Google’s pplication workload specifically in mind 4 Google Platform Characteristics ? 100s to 1000s of PCs in cluster ? Cheap, modity parts in PCs ? Many modes of failure for each PC: – App bugs, OS bugs – Human error – Disk failure, memory failure, failure, power supply failure – Connector failure ? Monitoring, fault tolerance, autorecovery essential 5 Google File System: Design Criteria ? Detect, tolerate, recover from failures automatically ? Large files, = 100 MB in size ? Large, streaming reads (= 1 MB in size) – Read once ? Large, sequential writes that append – Write once ? Concurrent appends by multiple clients (., produc