正文內(nèi)容

高級人工智能第十二章(留存版)

2024-11-27 05:48上一頁面

下一頁面

　　

【正文】 e 3: determine the global counts for itemsets ? how to find? find frequent kitemsets and replicate in all nodes 并行關(guān)聯(lián)規(guī)則挖掘 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 64 kitemset An itemset having k items Lk Set of frequent kitemsets (those with minimum support) Each member of this set has 2 fields: itemset and support count Ck Set of candidate kitemsets (potentially frequent itemsets) Each member of this set has 2 fields: itemset and support count Pi Processor with idi Di The dataset local to the processor Pi D Ri The dataset local to the processor Pi after repartitioning Cik The candidate set maintained with the processor Pi during the kth pass (there are k items in each candidate) 并行關(guān)聯(lián)規(guī)則挖掘 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 65 Objective: minimizing munication Techniques: Straightforward parallelization of Apriori Carry out redundant duplicate putations in parallel to avoid munication Only requires municating count values (no data tuples are exchanged) Processors can scan the local data asynchronously in parallel 計(jì)數(shù)分布 CD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 66 Algorithm: Pass 1: (1) Each processor Pi generates its local candidate itemset Ci1 depending on the items present in its local data partition Di (2) Develop and Exchange local counts Ci1 (3) Develop global support counts C1 計(jì)數(shù)分布 CD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 67 Algorithm: Pass k1: (1) Pi generates the plete Ck using the plete Lk1 created at the end of pass (k1). Each processor has the identical Lk1 thus generates identical Ck and puts its count values in a mon order into a count array (2) Pi makes a pass over data partition Di and develop local support counts for candidates in Ck (3) Pi exchanges local Ck counts with all other processors to develop global Ck counts. All processors must synchronize. (4) Pi putes Lk from Ck (5) Pi independently decide to terminate or continue to the next pass 計(jì)數(shù)分布 CD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 68 計(jì)數(shù)分布 CD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 69 Disadvantages: ? CD does not exploit the aggregate memory of the system ? Must synchronize and develop global count at the end of each pass M over [Ck] N * |M| N M over [Ck] |M| 1 Usage of memory per processor Total amount of memory Number of processor 計(jì)數(shù)分布 CD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 70 Objective: utilize aggregate main memory of the system effectively Technique: ?Partitions the candidates into disjoint sets, which are assigned to different processors. Each processor works with the entire dataset but only portion of the candidate set. ?Each processor counts mutually exclusive candidates. On a Nprocessor configuration, DD can count in a single pass candidate set that would require N pass in CD 數(shù)據(jù)分布 DD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 71 Basic Idea Ck Ck1 Ck2 Lk1 Lk2 Lk Processor 1 Processor 2 Example: 2 processors ? Data Distribution only processes a subset of Ck to utilize the aggregate memory ? Exchange data to develop global counts for Cki data 數(shù)據(jù)分布 DD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 72 Algorithm: Pass 1: Same as the CD algorithm Pass k1: (1) Pi generates Ck from Lk1. It retains only 1/N of the itemsets forming Cik (2) Pi develops support counts for itemsets in Cik for ALL transactions (using local data pages and data pages received from other processors) (3) At the end of the data pass, Pi calculates Lik using local Cik (4) Processors exchange Lik so that every processor has the plete Lk for generating Ck+1 for the next pass (requires processors to synchronize) (5) Pi can independently decide whether to terminate or continue on to the next pass 數(shù)據(jù)分布 DD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 73 Lik Lik Lik Lk 數(shù)據(jù)分布 DD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 74 Disadvantages: heavy munication Each processor must broadcast their local data and frequent itemsets to all other processors and synchronize in every pass. 數(shù)據(jù)分布 DD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 75 Problem: CD and DD require processors to synchronize at the end of each pass Basic Idea: Remove dependence among processors ? Data dependence Complete transactions are required to pute support count (in CD) ? Frequent itemsets dependency A global itemset Lk is needed during the pruning step of Apiori candidate generation algorithm(in DD) 候選分布 CaD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 76 Remove Data Dependency ? Each processor Pi works on Cki, a disjoint subset of Ck ? Pi derives global support counts for Cki from local data. ? Replicate data amongst processors in order to achieve the above Reduce Frequent itemset dependency ? Does not wait for the plete pruning information to arrive from other processors. ? Prune the candidate set as much as possible ? Late arriving pruning information is used in subsequent passes. 候選分布 CaD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 77 Algorithm: Pass kl: Use either the CD or DD algorithm Pass k=l : (1) Partition Lk1 among N processors (2) Pi generates Cik logically using only the Lik1 partition (use standard pruning) (3) Pi develops global counts for candidates in Cik and the database is repartitioned into D Ri at the same time (requires municating local data) (4) Pi receive Ljk from all other processors needed for pruning Cik+1 (5) Pi putes Lik from Cik and asynchronously send it to the other N1 processors Pass kl: (1) Pi collects all frequent itemsets sent by other processors (2) Pi generates Cik using local Lik, take care of pruning(Ljl1) (3) Pi passes over D Ri and counts Cik (4) Pi putes Lik from Cik and asynchronously send it to the other N1 processors 候選分布 CaD 2020/11/4 史忠植關(guān)聯(lián)規(guī)則 78 ?Count Distribution attempts to minimize munication by replicating the candidate sets in each processor’s memory ?Data Distribution maximizes the use of aggregate memory by allowing each processor works with the entire

點(diǎn)擊復(fù)制文檔內(nèi)容

教學(xué)課件相關(guān)推薦

freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

高級人工智能第十二章(留存版)

第十二章小結(jié)與復(fù)習(xí)-資料下載頁

第十二章礦山排水設(shè)備-資料下載頁

第十二章機(jī)械波-資料下載頁

高級人工智能第十二章-在線瀏覽

高級人工智能第十二章-閱讀頁

高級人工智能第十二章(文件)

高級人工智能第十二章-全文預(yù)覽

高級人工智能第十二章-預(yù)覽頁