freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

slingforbigdata(編輯修改稿)

2025-11-04 20:10 本頁(yè)面
 

【文章內(nèi)容簡(jiǎn)介】 r item: – Fine if putation time interarrival time – Otherwise build up putation backlog O(N) ? Better: “skip counting” – Find random index m(n) of next selection n – Distribution: Prob[m(n) ≤ m] = 1 (1pn+1)*(1pn+2)*…*(1pm) ? Expected number of selections from stream is k + Σkm≤N pm = k + Σkm≤N k/m = O(k ( 1 + ln (N/k) )) ? Vitter’85 provided algorithm with this average running time Sampling for Big Data Reservoir Sampling via Order Sampling ? Order sampling . bottomk sample, minhashing ? Uniform sampling of stream into reservoir of size k ? Each arrival n: generate onetime random value rn ? U[0,1] – rn also known as hash, rank, tag… ? Store k items with the smallest random tags ? Each item has same chance of least tag, so uniform ? Fast to implement via priority queue ? Can run on multiple input streams separately, then merge Sampling for Big Data Handling Weights ? So far: uniform sampling from a stream using a reservoir ? Extend to nonuniform sampling from weighted streams – Easy case: k=1 – Sampling probability p(n) = xn/Wn where Wn = ?i=1n xi ? k1 is harder – Can have elements with large weight: would be sampled with prob 1? ? Number of different weighted ordersampling schemes proposed to realize desired distributional objectives – Rank rn = f(un, xn ) for some function f and un ? U[0,1] – kmins sketches [Cohen 1997], Bottomk sketches [Cohen Kaplan 2020] – [Rosen 1972], Weighted random sampling [Efraimidis Spirakis 2020] – Order PPS Sampling [Ohlsson 1990, Rosen 1997] – Priority Sampling [Duffield Lund Thorup 2020], [Alon+DLT 2020] Sampling for Big Data Weighted random sampling ? Weighted random sampling [Efraimidis Spirakis 06] generalizes minwise – For each item draw rn uniformly at random in range [0,1] – Compute the ‘tag’ of an item as rn (1/xn) – Keep the items with the k smallest tags – Can prove the correctness of the exponential sampling distribution ? Can also make efficient via skip counting ideas Sampling for Big Data Priority Sampling ? Each item xi given priority zi = xi / ri with rn uniform random in (0,1] ? Maintain reservoir of k+1 items (xi , zi ) of highest priority ? Estimation – Let z* = (k+1)st highest priority – Topk priority items: weight estimate x’I = max{ xi , z* } – All other items: weight estimate zero ? Statistics and bounds – x’I unbiased。 zero covariance: Cov[x’i , x’j ] = 0 for i≠j – Relative variance for any subset sum ≤ 1/(k1) [Szegedy, 2020] Sampling for Big Data Priority Sampling in Databases ? One Time Sample Preparation – Compute priorities of all items, sort in decreasing priority order □ No discard ? Sample and Estimate – Estimate any subset sum X(S) = ?i?S xi by X’(S) = ?i?S x’I for some S’ ? S – Method: select items in decreasing priority order ? Two variants: bounded variance or plexity 1. S’ = first k items from S: relative variance bounded ≤ 1/(k1) □ x’I = max{ xi , z* } where z* = (k+1)st highest priority in S 2. S’ = items from S in first k: execution time O(k) □ x’I = max{ xi , z* } where z* = (k+1)st highest priority [Alon et. al., 2020] Sampling for Big Data Making Stream Samples Smarter ? Observation: we see the whole stream, even if we can’t store it – Can keep more information about sampled items if repeated – Simple information: if item sampled, count all repeats ? Counting Samples [Gibbons amp。 Mattias 98] – Sample new items with fixed probability p, count repeats as ci – Unbiased estimate of total count: 1/p + (ci – 1) ? Sample and Hold [Estan amp。 Varghese 02]: generalize to weighted keys – New key with weight b sampled with probability 1 (1p)b ? Lower variance pared with independent sampling – But sample size will grow as pn ? Adaptive sample and hold: reduce p when needed – “Sticky sampling”: geometric decreases in p [Manku, Motwani 02] – Much subsequent work tuning decrease in p to maintain sample size Sampling for Big Data Sketch Guided Sampling ? Go further: avoid sampling the heavy keys as much – Uniform sampling will pick from the heavy keys again and again ? Idea: use an oracle to tell when a key is heavy [Kumar Xu 06] – Adjust sampling probability accordingly ? Can use a “sketch” data structure to play the role of oracle – Like a hash table with collisions, tracks approximate frequencies – . (Counting) Bloom Filters, CountMin Sketch ? Track probability with which key is sampled, use HT estimators – Set probability of sampling key with (estimated) weight w as 1/(1 + ?w) for parameter ? : decreases as w increases – Decreasing ? improves accuracy, increases sample size Sampling for Big Data Challenges for Smart Stream Sampling ? Current router constraints – Flow tables maintained in fast expensive SRAM □ To support per packet key lookup at line rate ? Implementation requirements – Sample and Hold: still need per packet lookup – Sampled NetFlow: (uniform) sampling reduces lookup rate □ Easier to implement despite inferior statistical properties ? Long development times to realize new sampling algorithms ? Similar concerns affect sampling in other applications – Processing large amounts of data needs awareness of hardware – Uniform sampling means no coordination needed in distributed setting Sampling for Big Data Future for Smarter Stream Sampling ? Software Defined Networking – Current: proprietary software running on
點(diǎn)擊復(fù)制文檔內(nèi)容
教學(xué)課件相關(guān)推薦
文庫(kù)吧 www.dybbs8.com
備案圖片鄂ICP備17016276號(hào)-1