【正文】
s . . . )I n p u t k e y * v a l u e p a i r sI n p u t k e y * v a l u e p a i r s= = B a r r i e r = = : A g g r e g a t e s i n t e r m e d i a t e v a l u e s b y o u t p u t k e yr e d u c e r e d u c e r e d u c ek e y 1 , i n t e r m e d i a t e v a l u e sk e y 2 , i n t e r m e d i a t e v a l u e sk e y 3 , i n t e r m e d i a t e v a l u e sf i n a l k e y 1 v a l u e sf i n a l k e y 2 v a l u e sf i n a l k e y 3 v a l u e s. . .Mapreduce 控制 數(shù)據(jù)流 一個 Jobtracker 多個 tasktrackers MapReduce ? Jobtraker (Master) –接收任務(wù)( job)的提交 –提供任務(wù)的監(jiān)控 (monitoring)和控制 (control) –把 job劃分成多個 tasks,交給 Tasktracker執(zhí)行,并管理這些 tasks的執(zhí)行 ? Tasktracker (Worker) –管理單個 task的 map任務(wù)和 reduce任務(wù)的執(zhí)行 Word count: file0: hello world file1: hello mapreduce file2: bye bye Input files file0 file2 file1 0, “hello world” 0, “hello mapreduce” 0, “bye bye” “hello”, 1 “world”, 1 “bye”, 2 “hello”, 1 “mpareduce”, 1 “hello”, 2 “world”, 1 “mapreduce”, 1 “bye”, 2 files line offset, line content word, count word, count files 目錄 ? Hadoop簡介 –HDFS (Hadoop Distributed File System) –MapReduce ? Hive ? Hadoop的企業(yè)級應(yīng)用 What is HIVE ? 數(shù)據(jù)倉庫業(yè)務(wù)具有多樣性、多變性和邏輯復雜性,傳統(tǒng)的Parallel DBMSs只能使用 SQL語句,語言表達力不夠應(yīng)付現(xiàn)有的類似 google, facebook等的數(shù)據(jù)倉庫需