freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

slingforbigdata-文庫(kù)吧

2025-08-26 20:10 本頁(yè)面


【正文】 of summarization Sampling has been proved to be a flexible method to acplish this Sampling for Big Data Data Scale: Summarization and Sampling Sampling for Big Data Traffic Measurement in the ISP Network Access Router Centers Backbone Business Datacenters Management Traffic Matrices Flow records from routers Sampling for Big Data Massive Dataset: Flow Records ? IP Flow: set of packets with mon key observed close in time ? Flow Key: IP src/dst address, TCP/UDP ports, ToS,… *64 to 104+ bits+ ? Flow Records: – Protocol level summaries of flows, piled and exported by routers – Flow key, packet and byte counts, first/last packet time, some router state – Realizations: Cisco Netflow, IETF Standards ? Scale: 100’s TeraBytes of flow records daily are generated in a large ISP ? Used to manage work over range of timescales: – Capacity planning (months),…., detecting work attacks (seconds) ? Analysis tasks – Easy: timeseries of predetermined aggregates (. address prefixes) – Hard: fast queries over exploratory selectors, history, munications subgraphs flow 1 flow 2 flow 3 flow 4 time Sampling for Big Data Flows, Flow Records and Sampling ? Two types of sampling used in practice for inter traffic: 1. Sampling packet stream in router prior to forming flow records □ Limits the rate of lookups of packet key in flow cache □ Realized as Packet Sampled NetFlow (more later…) 2. Downstream sampling of flow records in collection infrastructure □ Limits transmission bandwidth, storage requirements □ Realized in ISP measurement collection infrastructure (more later…) ? Two cases illustrative of general property – Different underlying distributions require different sample designs – Statistical optimality sometimes limited by implementation constraints □ Availability of router storage, processing cycles Sampling for Big Data Abstraction: Keyed Data Streams ? Data Model: objects are keyed weights – Objects (x,k): Weight x。 key k □ Example 1: objects = packets, x = bytes, k = key (source/destination) □ Example 2: objects = flows, x = packets or bytes, k = key □ Example 3: objects = account updates, x = credit/debit, k = account ID ? Stream of keyed weights, {(xi , ki): i = 1,2,…,n ? Generic query: subset sums – X(S) = Σi?S xi for S ? ,1,2,…,n . total weight of index subset S – Typically S = S(K) = {i: ki ? K} : objects with keys in K □ Example 1, 2: X(S(K)) = total bytes to given IP dest address / UDP port □ Example 3: X(S(K)) = total balance change over set of accounts ? Aim: Compute fixed size summary of stream that can be used to estimate arbitrary subset sums with known error bounds Sampling for Big Data Inclusion Sampling and Estimation ? HorvitzThompson Estimation: – Object of size xi sampled with probability pi – Unbiased estimate x’i = xi / pi (if sampled), 0 if not sampled: E[x’i] = xi ? Linearity: – Estimate of subset sum = sum of matching estimates – Subset sum X(S)= ?i?S xi is estimated by X’(S) = ?i?S x’i ? Accuracy: – Exponential Bounds: Pr* |X’(S) X(S)| δX(S)+ ≤ exp[g(δ)X(S)] – Confidence intervals: X(S) ? [X(?) , X+(?)] with probability 1 ? ? Futureproof: – Don’t need to know queries at time of sampling □ “Where/where did that suspicious UDP port first bee so active?” □ “Which is the most active IP address within than anomalous sub?” – Retrospective estimate: subset sum over relevant keyset Sampling for Big Data Independent Stream Sampling ? Bernoulli Sampling – IID sampling of objects with some probability p – Sampled weight x has HT estimate x/p ? Poisson Sampling – Weight xi sampled with probability pi 。 HT estimate xi / pi ? When to use Poisson vs. Bernoulli sampling? – Elephants and mice: Poisson allows probability to depend on weight… ? What is best choice of probabilities for given stream {xi} ? Sampling for Big Data Bernoulli Sampling ? The easiest possible case of sampling: all weights are 1 – N objects, and want to sample k from them uniformly – Each possible subset of k should be equally likely ? Uniformly sample an index from N (without replacement) k times – Some subtleties: truly random numbers from *1…N+ on a puter? – Assume that random number generators are good enough ? Common trick in DB: assign a random number to each item and sort – Costly if N is very big, but so is random access ? Interesting problem: take a single linear scan of data to draw sample – Streaming model of putation: see each element once – Application: IP flow sampling, too many (for us) to store – (For a while) mon tech interview question Sampling for Big Data Reservoir Sampling “Reservoir sampling” described by [Knuth 69, 81]。 enhancements [Vitter 85] ? Fixed size k uniform sample from arbitrary size N stream in one pass – No need to know stream size in advance – Include first k items . 1 – Include item n k with probability p(n) = k/n, n k □ Pick j uniformly from ,1,2,…,n □ If j ≤ k, swap item n into location j in reservoir, discard replaced item ? Neat proof shows the uniformity of the sampling method: – Let Sn = sample set after n arrivals k=7 n m ( n) Previously sampled item: induction m ? Sn1 . pn1 ? m ? Sn . pn1 * (1 – pn / k) = pn New item: selection probability Prob[n ? Sn ] = pn := k/n Sampling for Big Data Reservoir Sampling: Skip Counting ? Simple approach: check each item in turn – O(1) pe
點(diǎn)擊復(fù)制文檔內(nèi)容
教學(xué)課件相關(guān)推薦
文庫(kù)吧 www.dybbs8.com
備案圖鄂ICP備17016276號(hào)-1