【正文】
modity Hardware Typically in 2 level architecture – Nodes are modity PCs – 3040 nodes/rack – Uplink from rack is 34 gigabit – Rackinternal is 1 gigabit Goals of HDFS ? Very Large Distributed File System – 10K nodes, 100 million files, 10 PB ? Assumes Commodity Hardware – Files are replicated to handle hardware failure – Detect failures and recovers from them ? Optimized for Batch Processing – Data locations exposed so that putations can move to where data resides – Provides very high aggregate bandwidth ? User Space, runs on heterogeneous OS Distributed File System ? Single Namespace for entire cluster ? Data Coherency – Writeoncereadmany access model – Client can only append to existing files ? Files are broken up into blocks – Typically 128 MB block size – Each block replicated on multiple DataNodes ? Intelligent Client – Client can find location of blocks – Client accesses data directly from DataNode NameNode Metadata ? Metadata in Memory – The entire metadata is in main memory – No demand paging of metadata ? Types of Metadata – List of files – List of Blocks for each file – List of DataNodes for each block – File attributes, creation time, replication factor ? A Transaction Log – Records file creations, file deletions. etc DataNode ? A Block Server – Stores data in the local file system (. ext3) – Stores metadata of a block (. CRC) – Serves data and metadata to Clients ? Block Report – Periodically sends a report of all existing blocks to the NameNode ? Facilitates Pipelining of Data – Forwards data to other specified DataNodes Block Placement ? Current Strategy One replica on local node Second replica on a remote rack Third replica on same remote rack Additional replicas are randomly placed ? Clients read from nearest replica ? Would like to make this policy pluggable Data Correctness ? Use Checksums to validate data – Use CRC32 ? File Creation – Client putes checksum per 512 byte – DataNode stores the checksum ? File access – Client retrieves the data and checksum from DataNode – If Validation fails, Client tries other replicas NameNode Failure ? A single point of failure ? Transaction Log stored in multiple directories – A directory on the local file system – A directory on a remote file system (NFS/CIFS) Data Pipelining ? Client retrieves a list of DataNodes on which to place replicas of a block ? Client writes block to the first DataNode ? The first DataNode forwards the data to the next DataNode in the Pipeline ? When all replicas are written, the Client moves on to write the next block in file Rebalancer ? Goal: % disk full on DataNodes should be similar ? Usually run when new DataNodes are added ? Cluster is online when Rebalancer is active ? Rebalancer is to avoid work congestion Hadoop at Facebook ? Production cluster ? 4800 cores, 600 machines, 16GB per machine – April 2009 ? 8000 cores, 1000 machines, 32 GB per machine – July 2009 ? 4 SATA disks of 1 TB each per machine ? 2 level work hierarchy, 40 machines per rack ? Total cluster size is 2 PB, projected to be 12 PB in Q3 2009 ? Test cluster ? 800 cores, 16GB each 主要內(nèi)容 118 ? 云計(jì)算概述 ? Google 云計(jì)算技術(shù): GFS, Bigtable 和Mapreduce ?開源平臺(tái) Hadoop介紹 ?云計(jì)算理論 ?事務(wù)處理理論 ? DataLog理論 The CAP Theorem Consistency Partition tolerance Availability The CAP Theorem Once a writer has written, all readers will see that write Consistency Partition tolerance Availability The CAP Theorem System is available during software and hardware upgrades and node failures. Consistency Partition tolerance Availability The CAP Theorem A system can continue to operate in the presence of a work partitions. Consistency Partition tolerance Availability The CAP Theorem Theorem: You can have at most two of these properties for any shareddata system Consistency Partition tolerance Availability Consistency ? Two kinds of consistency: ? strong consistency – ACID(Atomicity Consistency Isolation Durability) ? weak consistency – BASE(Basically Available Softstate Eventual consistency ) A tailor 3NF LOCK ACID RDBMS 主要內(nèi)容 126 ? 云計(jì)算概述 ? Google 云計(jì)算技術(shù): GFS, Bigtable 和Mapreduce ? 各種開源平臺(tái)介紹 ?云計(jì)算理論 ?事務(wù)處理理論 ? DataLog理論 Datalog ? Main expressive advantage: recursive queries. ? More convenient for analysis: papers look better. ? Without recursion but with negation it is equivalent in power to relational algebra ? Has affected real practice: (., recursion in SQL3, magic sets transformations). Datalog ? Example Datalog program: ? parent(bill,mary). parent(mary,john). ? ancestor(X,Y) : parent(X,Y). ancestor(X,Y) : parent(X,Z),ancestor(Z,Y). ? ? ancestor(bill,X) Joseph’s Conjecture(1) ? CONJECTURE 1. Consistency And Logical Monotonicity (CALM). ? A program has an eventually consistent, coordinationfree execution strategy if and only if it is expressible in (monotonic) Datalog. Joseph’s Conjecture (2) ? CONJECTURE 2. Causality Required Only for Nonmonotonicity (CRON). ? Program semantics require causal message ordering if and only if the messages participate in nonmonotonic derivations. Joseph’s Conjecture (3) ? CONJECTURE 3. The minimum number of Dedalus timesteps required to evaluate a program on a given input data set is equivalent to the program’s Coordination Complexity. Joseph’s Conjecture (4) ? CONJECTURE 4. Any Dedalus program P can be rewritten into an equivalent temporallyminimized program P’ such that each inductive or asynchronous rule of P’ is necessary: converting that rule to a deductive rule would result in a program with no unique m