freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

云計(jì)算平臺、架構(gòu)和理論英文云計(jì)算課件(編輯修改稿)

2025-06-04 01:58 本頁面
 

【文章內(nèi)容簡介】 output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0。 for each v in intermediate_values: result += ParseInt(v)。 Emit(AsString(result))。 MapReduce: Execution overview MapReduce: Example MapReduce in Parallel: Example MapReduce: Fault Tolerance ? Handled via reexecution of tasks. ? Task pletion mitted through master ? What happens if Mapper fails ? ? Reexecute pleted + inprogress map tasks ? What happens if Reducer fails ? ? Reexecute in progress reduce tasks ? What happens if Master fails ? ? Potential trouble !! MapReduce: Walk through of One more Application MapReduce : PageRank ? PageRank models the behavior of a “random surfer”. ? C(t) is the outdegree of t, and (1d) is a damping factor (random jump) ? The “random surfer” keeps clicking on successive links at random not taking content into consideration. ? Distributes its pages rank equally among all pages it links to. ? The dampening factor takes the surfer “getting bored” and typing arbitrary URL. ????? ni iitCtPRddxPR1 )()()1()(PageRank : Key Insights ? Effects at each iteration is local. i+1th iteration depends only on ith iteration ? At iteration i, PageRank for individual nodes can be puted independently PageRank using MapReduce ? Use Sparse matrix representation (M) ? Map each row of M to a list of PageRank ―credit‖ to assign to out link neighbours. ? These prestige scores are reduced to a single PageRank value for a page by aggregating over them. PageRank using MapReduce Map: distribute PageRank “credit” to link targets Reduce: gather up PageRank “credit” from multiple sources to pute new PageRank value Iterate until convergence Source of Image: Lin 2008 Phase 1: Process HTML ? Map task takes (URL, pagecontent) pairs and maps them to (URL, (PRinit, listofurls)) ?PRinit is the ―seed‖ PageRank for URL ?listofurls contains all pages pointed to by URL ? Reduce task is just the identity function Phase 2: PageRank Distribution ? Reduce task gets (URL, url_list) and many (URL, val) values ?Sum vals and fix up with d to get new PR ?Emit (URL, (new_rank, url_list)) ? Check for convergence using non parallel ponent MapReduce: Some More Apps ? Distributed Grep. ? Count of URL Access Frequency. ? Clustering (Kmeans) ? Graph Algorithms. ? Indexing Systems MapReduce Programs In Google Source Tree MapReduce: Extensions and similar apps ? PIG (Yahoo) ? Hadoop (Apache) ? DryadLinq (Microsoft) Large Scale Systems Architecture using MapReduce User App MapReduce Distributed File Systems (GFS) BigTable: A Distributed Storage System for Structured Data Introduction ? BigTable is a distributed storage system for managing structured data. ? Designed to scale to a very large size ? Petabytes of data across thousands of servers ? Used for many Google projects ? Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, … ? Flexible, highperformance solution for all of Google’s products Motivation ? Lots of (semi)structured data at Google ? URLs: ? Contents, crawl metadata, links, anchors, pagerank, … ? Peruser data: ? User preference settings, recent queries/search results, … ? Geographic locations: ? Physical entities (shops, restaurants, etc.), roads, satellite image data, user annotations, … ? Scale is large ? Billions of URLs, many versions/page (~20K/version) ? Hundreds of millions of users, thousands or q/sec ? 100TB+ of satellite image data Why not just use mercial DB? ? Scale is too large for most mercial databases ? Even if it weren’t, cost would be very high ? Building internally means system can be applied across many projects for low incremental cost ? Lowlevel storage optimizations help performance significantly ? Much harder to do when running on top of a database layer Goals ? Want asynchronous processes to be continuously updating different pieces of data ? Want access to most current data at any time ? Need to support: ? Very high read/write rates (millions of ops per second) ? Efficient scans over all or interesting subsets of data ? Efficient joins of large onetoone and onetomany datasets ? Often want to examine data changes over time ? . Contents of a web page over multiple crawls BigTable ? Distributed multilevel map ? Faulttolerant, persistent ? Scalable ? Thousands of servers ? Terabytes of inmemory data ? Petabyte of diskbased data ? Millions of reads/writes per second, efficient scans ? Selfmanaging ? Servers can be added/removed dynamically ? Servers adjust to load imbalance Building Blocks ? Building blocks: ? Google File System (GFS): Raw storage ? Scheduler: schedules jobs onto machines ? Lock service: distributed lock manager ? MapReduce: simplified largescale data processing ? BigTable uses of building blocks: ? GFS: stores persistent data (SSTable file format for storage of data) ? Scheduler: schedules jobs involved in BigTable serving ? Lock service: master election, location bootstrapping ? Map Reduce: often used to read/write BigTable data Basic Data Model ? A BigTable is a sparse, distributed persistent multidimensional sorted map (row, column, timestamp) cell contents ? Good match for most Google applications WebTable Example ? Want to keep copy of a large c
點(diǎn)擊復(fù)制文檔內(nèi)容
教學(xué)課件相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖片鄂ICP備17016276號-1