freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

slingforbigdata-資料下載頁

2025-09-20 20:10本頁面

【導(dǎo)讀】Commonthemes:. WhyReduce?mostbigdataisbig!–Manyapplications(teles,ISPs,searchengines)can’tkeepeverything. WhySample?Samplingis(usually)easytounderstand. Allworthyofstudy–inothertutorials. Graphsampling

  

【正文】 or Big Data Cost Optimization for Sampling Several different approaches optimize for different objectives: 1. Fixed Sample Size IPPS Sample – Variance Optimal sampling: minimal variance unbiased estimation 2. Structure Aware Sampling – Improve estimation accuracy for sub queries using topological cost 3. Fair Sampling – Adaptively balance sampling budget over subpopulations of flows – Uniform estimation accuracy regardless of subpopulation size 4. Stable Sampling – Increase stability of sample set by imposing cost on changes Sampling for Big Data IPPS Stream Reservoir Sampling ? Each arriving item: – Provisionally include item in reservoir – If m+1 items, discard 1 item randomly □ Calculate threshold z to sample m items on average: z solves ?i pz(xi) = m □ Discard item i with probability qi =1 – pz(xi) □ Adjust m surviving xi with HorvitzThompson x’i = xi / pi = max{xi,z} ? Efficient Implementation: – Computational cost O(log m ) per item, amortized cost O(log log m) [Cohen, Duffield, Lund, Kaplan, Thorup。 SODA 2020, SIAM J. Comput. 2020] x9 x8 x7 x6 x5 x4 x3 x2 x1 Example: m=9 x10 Recalculate threshold z: ?? ?101i i 9z}xm in {1,z 0 1 Recalculate Discard probs: z}xm in {1, 1q i i ?Adjust weights: z},max{xx39。i i ?x’9 x’8 x’10 x’6 x’5 x’4 x’3 x’2 x’1 Sampling for Big Data Structure (Un)Aware Sampling ? Sampling is oblivious to structure in keys (IP address hierarchy) – Estimation disperses the weight of discarded items to surviving samples ? Queries structure aware: subset sums over related keys (IP subs) – Accuracy on LHS is decreased by discarding weight on RHS ? 0 1 00 01 10 000 001 010 011 100 101 110 111 11 Sampling for Big Data Localizing Weight Redistribution ? Initial weight set {xi : i?S} for some S ? Ω – . Ω = possible IP addresses, S =observed IP addresses ? Attribute “range cost” C({xi : i?R}) for each weight subset R?S – Possible factors for Range Cost: □ Sampling variance □ Topology . height of lowest mon ancestor – Heuristics: R* = Nearest Neighbor {xi , xj} of minimal xixj ? Sample k items from S: – Progressively remove one item from subset with minimal range cost: – While(|S| k) □ Find R*?S of minimal range cost. □ Remove a weight from R* w/ VarOpt [Cohen, Cormode, Duffield。 PVLDB 2020] ? 0 1 00 01 10 000 001 010 011 100 101 110 111 11 No change outside subtree below closest ancestor Order of magnitude reduction in average sub error vs. VarOpt Sampling for Big Data Fair Sampling Across Subpopulations ? Analysis queries often focus on specific subpopulations – . working: different customers, user applications, work paths ? Wide variation in subpopulation size – 5 orders of magnitude variation in traffic on interfaces of access router ? If uniform sampling across subpopulations: – Poor estimation accuracy on subset sums within small subpopulations Sample Color = subpopulation , = interesting items – occurrence proportional to subpopulation size Uniform Sampling across subpopulations: – Difficult to track proportion of interesting items within small subpopulations: Sampling for Big Data Fair Sampling Across Subpopulations ? Minimize relative variance by sharing budget m over subpopulations – Total n objects in subpopulations n1,…,nd with ?ini=n – Allocate budget mi to each subpopulation ni with ?imi=m ? Minimize average population relative variance R = const. ?i1/mi ? Theorem: – R minimized when {mi} are MaxMin Fair share of m under demands {ni} ? Streaming – Problem: don’t know subpopulation sizes {ni} in advance ? Solution: progressive fair sharing as reservoir sample – Provisionally include each arrival – Discard 1 item as VarOpt sample from any maximal subpopulation ? Theorem [Duffield。 Sigmetrics 2020]: – MaxMin Fair at all times。 equality in distribution with VarOpt samples {mi from ni} Sampling for Big Data Stable Sampling ? Setting: Sampling a population over successive periods ? Sample independently at each time period? – Cost associated with sample churn – Time series analysis of set of relatively stable keys ? Find sampling probabilities through cost minimization – Minimize Cost = Estimation Variance + z * E[Churn] ? Size m sample with maximal expected churn D – weights {xi}, previous sampling probabilities {pi} – find new sampling probabilities {qi} to minimize cost of taking m samples – Minimize ?ix2i / qi subject to 1 ≥ qi ≥ 0, ?I qi = m and ?I | pi – qi | ≤ D [Cohen, Cormode, Duffield, Lund 13] Sampling for Big Data Summary of Part 1 ? Sampling as a powerful, general summarization technique ? Unbiased estimation via HorvitzThompson estimators ? Sampling from streams of data – Uniform sampling: reservoir sampling – Weighted generalizations: sample and hold, counting samples ? Advances in stream sampling – The cost principle for sample design, and IPPS methods – Threshold, priority and VarOpt sampling – Extending the cost principle: □ structure aware, fair sampling, stable sampling, sketch guided Graham Cormode, University of Warwick Nick Duffield, Texas Aamp。M University Sampling for Big Data x9 x8 x7 x6 x5 x4 x3 x2 x1 x10 x’9 x’8 x’10 x’6 x’5 x’4 x’3 x’2 x’1 ? 0 1 00 01 10 000 001 010 011 100 101 110 111 11
點(diǎn)擊復(fù)制文檔內(nèi)容
教學(xué)課件相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖鄂ICP備17016276號-1