正文內(nèi)容

slingforbigdata(編輯修改稿)

2025-11-04 20:10 本頁(yè)面

　

【文章內(nèi)容簡(jiǎn)介】 r item: – Fine if putation time interarrival time – Otherwise build up putation backlog O(N) ? Better: “skip counting” – Find random index m(n) of next selection n – Distribution: Prob[m(n) ≤ m] = 1 (1pn+1)*(1pn+2)*…*(1pm) ? Expected number of selections from stream is k + Σkm≤N pm = k + Σkm≤N k/m = O(k ( 1 + ln (N/k) )) ? Vitter’85 provided algorithm with this average running time Sampling for Big Data Reservoir Sampling via Order Sampling ? Order sampling . bottomk sample, minhashing ? Uniform sampling of stream into reservoir of size k ? Each arrival n: generate onetime random value rn ? U[0,1] – rn also known as hash, rank, tag… ? Store k items with the smallest random tags ? Each item has same chance of least tag, so uniform ? Fast to implement via priority queue ? Can run on multiple input streams separately, then merge Sampling for Big Data Handling Weights ? So far: uniform sampling from a stream using a reservoir ? Extend to nonuniform sampling from weighted streams – Easy case: k=1 – Sampling probability p(n) = xn/Wn where Wn = ?i=1n xi ? k1 is harder – Can have elements with large weight: would be sampled with prob 1? ? Number of different weighted ordersampling schemes proposed to realize desired distributional objectives – Rank rn = f(un, xn ) for some function f and un ? U[0,1] – kmins sketches [Cohen 1997], Bottomk sketches [Cohen Kaplan 2020] – [Rosen 1972], Weighted random sampling [Efraimidis Spirakis 2020] – Order PPS Sampling [Ohlsson 1990, Rosen 1997] – Priority Sampling [Duffield Lund Thorup 2020], [Alon+DLT 2020] Sampling for Big Data Weighted random sampling ? Weighted random sampling [Efraimidis Spirakis 06] generalizes minwise – For each item draw rn uniformly at random in range [0,1] – Compute the ‘tag’ of an item as rn (1/xn) – Keep the items with the k smallest tags – Can prove the correctness of the exponential sampling distribution ? Can also make efficient via skip counting ideas Sampling for Big Data Priority Sampling ? Each item xi given priority zi = xi / ri with rn uniform random in (0,1] ? Maintain reservoir of k+1 items (xi , zi ) of highest priority ? Estimation – Let z* = (k+1)st highest priority – Topk priority items: weight estimate x’I = max{ xi , z* } – All other items: weight estimate zero ? Statistics and bounds – x’I unbiased。 zero covariance: Cov[x’i , x’j ] = 0 for i≠j – Relative variance for any subset sum ≤ 1/(k1) [Szegedy, 2020] Sampling for Big Data Priority Sampling in Databases ? One Time Sample Preparation – Compute priorities of all items, sort in decreasing priority order □ No discard ? Sample and Estimate – Estimate any subset sum X(S) = ?i?S xi by X’(S) = ?i?S x’I for some S’ ? S – Method: select items in decreasing priority order ? Two variants: bounded variance or plexity 1. S’ = first k items from S: relative variance bounded ≤ 1/(k1) □ x’I = max{ xi , z* } where z* = (k+1)st highest priority in S 2. S’ = items from S in first k: execution time O(k) □ x’I = max{ xi , z* } where z* = (k+1)st highest priority [Alon et. al., 2020] Sampling for Big Data Making Stream Samples Smarter ? Observation: we see the whole stream, even if we can’t store it – Can keep more information about sampled items if repeated – Simple information: if item sampled, count all repeats ? Counting Samples [Gibbons amp。 Mattias 98] – Sample new items with fixed probability p, count repeats as ci – Unbiased estimate of total count: 1/p + (ci – 1) ? Sample and Hold [Estan amp。 Varghese 02]: generalize to weighted keys – New key with weight b sampled with probability 1 (1p)b ? Lower variance pared with independent sampling – But sample size will grow as pn ? Adaptive sample and hold: reduce p when needed – “Sticky sampling”: geometric decreases in p [Manku, Motwani 02] – Much subsequent work tuning decrease in p to maintain sample size Sampling for Big Data Sketch Guided Sampling ? Go further: avoid sampling the heavy keys as much – Uniform sampling will pick from the heavy keys again and again ? Idea: use an oracle to tell when a key is heavy [Kumar Xu 06] – Adjust sampling probability accordingly ? Can use a “sketch” data structure to play the role of oracle – Like a hash table with collisions, tracks approximate frequencies – . (Counting) Bloom Filters, CountMin Sketch ? Track probability with which key is sampled, use HT estimators – Set probability of sampling key with (estimated) weight w as 1/(1 + ?w) for parameter ? : decreases as w increases – Decreasing ? improves accuracy, increases sample size Sampling for Big Data Challenges for Smart Stream Sampling ? Current router constraints – Flow tables maintained in fast expensive SRAM □ To support per packet key lookup at line rate ? Implementation requirements – Sample and Hold: still need per packet lookup – Sampled NetFlow: (uniform) sampling reduces lookup rate □ Easier to implement despite inferior statistical properties ? Long development times to realize new sampling algorithms ? Similar concerns affect sampling in other applications – Processing large amounts of data needs awareness of hardware – Uniform sampling means no coordination needed in distributed setting Sampling for Big Data Future for Smarter Stream Sampling ? Software Defined Networking – Current: proprietary software running on

點(diǎn)擊復(fù)制文檔內(nèi)容

教學(xué)課件相關(guān)推薦

[ppt模板]ppt圖標(biāo)素材-資料下載頁(yè)

【總結(jié)】圖標(biāo)圖標(biāo)禁止圖標(biāo)5/交通圖標(biāo)圖標(biāo)圖標(biāo)圖標(biāo)圖標(biāo)圖標(biāo)圖標(biāo)圖標(biāo)圖標(biāo)圖標(biāo)圖標(biāo)圖標(biāo)作圖元素小區(qū)、業(yè)務(wù)平臺(tái)等的表示圖標(biāo)，上面放終端或產(chǎn)品示意圖作圖元素?cái)?shù)據(jù)庫(kù)立體部件化組合作圖元素放一些財(cái)務(wù)數(shù)據(jù)或者市場(chǎng)份額（國(guó)內(nèi)、國(guó)際）、產(chǎn)品歸類(lèi)、組織結(jié)構(gòu)之類(lèi)的文字，反白字，加陰影。作圖元素C&C08iN

2025-02-14 00:44

窮人ppt課件ppt(2)-資料下載頁(yè)

【總結(jié)】六年級(jí)語(yǔ)文《窮人》PPT?制作人：王小剛?制作單位：甘肅省慶陽(yáng)市鎮(zhèn)原縣上肖鄉(xiāng)姜曹小學(xué)?制作時(shí)間：2020年10月13日?交流QQ:891156421列夫〃托爾斯泰列夫〃托爾斯泰(1828—1910)，偉大的俄國(guó)作家。他出身貴族家庭，早年接受典型的貴族教育。1851年參軍，不久開(kāi)始創(chuàng)

2024-11-24 13:47

[ppt模板]精美ppt模板-資料下載頁(yè)

【總結(jié)】青衣單擊添加標(biāo)題——單擊添加標(biāo)題幻燈藝術(shù)POWERPOINT青衣單擊添加標(biāo)題——單擊添加標(biāo)題幻燈藝術(shù)OWERPOINT青衣單擊此處添加標(biāo)題——單擊添加標(biāo)題單擊此處添加標(biāo)題單擊添加標(biāo)題單擊此處添加標(biāo)題單擊添加標(biāo)題單擊此處添加標(biāo)題

2025-01-19 07:22

[ppt模板]愛(ài)心ppt-資料下載頁(yè)

【總結(jié)】PPT下載網(wǎng)?配色方案修改：?配色方案在【格式】【幻燈片設(shè)計(jì)】【配色方案】【編輯配色方案】下調(diào)整。?LOGO的添加：?Logo添加修改在【視圖】【母版】【幻燈片母版】下調(diào)整。直接選擇logo圖片刪除或修改。?字體格式的設(shè)置：?括標(biāo)題和文本格式的設(shè)置在【視圖

2025-01-19 09:51

[ppt模板]商用ppt-資料下載頁(yè)

【總結(jié)】無(wú)憂PPT整理發(fā)布2

2025-01-19 08:22

軍事ppt模版ppt課件-資料下載頁(yè)

【總結(jié)】SECMINSECMINSECMINSECMINSECMINSECMINSECMINSECMINSECMINSECMINSECMINSTART1949-1984年1985-1999年2022-現(xiàn)在陸軍戰(zhàn)略區(qū)域防衛(wèi)型區(qū)域防衛(wèi)型向全域機(jī)動(dòng)型轉(zhuǎn)變兵力數(shù)量約62

2025-01-17 06:39

山中訪友pptppt-資料下載頁(yè)

【總結(jié)】賞悟賞悟賞悟賞悟賞悟賞悟賞悟賞悟賞悟賞悟賞悟賞悟賞悟賞悟賞

2024-11-23 17:57

1、草原pptppt-資料下載頁(yè)

【總結(jié)】老舍作者：老舍節(jié)選自《內(nèi)蒙風(fēng)光》理清課文脈絡(luò)自由讀課文，想一想課文寫(xiě)了哪些內(nèi)容？課文中作者又是怎樣把那草原的美寫(xiě)出來(lái)呢？請(qǐng)讀一讀，找一找，劃一劃。那里的天比別處的更可愛(ài),空氣是那么清鮮,天空是那么明朗,使我總想高歌一曲,表

2024-11-21 06:46

16絕招pptppt-資料下載頁(yè)

【總結(jié)】車(chē)站小學(xué)：楊莉你知道什么是絕招嗎？你見(jiàn)過(guò)哪些絕招？一起來(lái)瞧瞧這些孩子的絕招吧！tǐngbiēwéi絕招挺著憋氣唯獨(dú)豎起武術(shù)空翻鎮(zhèn)住調(diào)換禁不住

2024-11-21 06:03

畫(huà)楊桃pptppt課件-資料下載頁(yè)

【總結(jié)】人教版語(yǔ)文第六冊(cè)廣南縣城區(qū)一小孫榮仙這兩個(gè)楊桃有什么不同?，知道同一個(gè)事物從不同的角度看會(huì)有不同的結(jié)果，使學(xué)生從中受到科學(xué)思想方法的教育。系，學(xué)會(huì)怎樣把一段話寫(xiě)清楚。，練習(xí)用“不像……而像……”、“不要……要……”說(shuō)話。，背誦最后兩個(gè)自然段。我要做到

2025-05-01 18:20

[ppt模板]ppt快速提升-資料下載頁(yè)

【總結(jié)】PPT快速提升問(wèn)題一：PPT的分類(lèi)及重點(diǎn)？連白天做夢(mèng)都想考100分閱讀文檔類(lèi)的PPT制作重點(diǎn)文字處理圖示版式演示輔助類(lèi)的PPT制作重點(diǎn)圖片圖片還是圖片自動(dòng)演示類(lèi)的PPT制作重點(diǎn)動(dòng)畫(huà)交互多媒體雜項(xiàng)類(lèi)的PPT制作重點(diǎn)功能構(gòu)思VBA外部協(xié)同問(wèn)題二：不會(huì)設(shè)計(jì)怎么辦？2022不懂設(shè)計(jì)，怎造諾亞方舟

2025-04-14 02:08

freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

slingforbigdata(編輯修改稿)

[ppt模板]ppt圖標(biāo)素材-資料下載頁(yè)

窮人ppt課件ppt(2)-資料下載頁(yè)

[ppt模板]精美ppt模板-資料下載頁(yè)

[ppt模板]愛(ài)心ppt-資料下載頁(yè)

[ppt模板]商用ppt-資料下載頁(yè)

軍事ppt模版ppt課件-資料下載頁(yè)

山中訪友pptppt-資料下載頁(yè)

1、草原pptppt-資料下載頁(yè)

16絕招pptppt-資料下載頁(yè)

畫(huà)楊桃pptppt課件-資料下載頁(yè)

[ppt模板]ppt快速提升-資料下載頁(yè)

在家里pptppt-資料下載頁(yè)

唱臉譜pptppt-資料下載頁(yè)

靜夜思pptppt-資料下載頁(yè)

紅色ppt模板ppt課件-資料下載頁(yè)

slingforbigdata(專(zhuān)業(yè)版)

slingforbigdata(留存版)

slingforbigdata-文庫(kù)吧

slingforbigdata-wenkub

freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

slingforbigdata(編輯修改稿)

[ppt模板]ppt圖標(biāo)素材-資料下載頁(yè)

窮人ppt課件ppt(2)-資料下載頁(yè)

[ppt模板]精美ppt模板-資料下載頁(yè)

[ppt模板]愛(ài)心ppt-資料下載頁(yè)

[ppt模板]商用ppt-資料下載頁(yè)

軍事ppt模版ppt課件-資料下載頁(yè)

山中訪友pptppt-資料下載頁(yè)

1、草原pptppt-資料下載頁(yè)

16絕招pptppt-資料下載頁(yè)

畫(huà)楊桃pptppt課件-資料下載頁(yè)

[ppt模板]ppt快速提升-資料下載頁(yè)

在家里pptppt-資料下載頁(yè)

唱臉譜pptppt-資料下載頁(yè)

靜夜思pptppt-資料下載頁(yè)

紅色ppt模板ppt課件-資料下載頁(yè)

slingforbigdata(專(zhuān)業(yè)版)

slingforbigdata(留存版)

slingforbigdata-文庫(kù)吧

slingforbigdata-wenkub

1、草原pptppt-資料下載頁(yè)