freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘第8章(完整版)

2025-02-16 23:33上一頁面

下一頁面
  

【正文】 hniquesThe Apriori Algorithmn Pseudocode:Ck: Candidate itemset of size kLk : frequent itemset of size kL1 = {frequent items}。例如:性別 =“ 女 ”= 職業(yè) =“ 秘書 ” ,這條規(guī)則就涉及到兩個(gè)維中字段的信息,是兩個(gè)維上的一條關(guān)聯(lián)規(guī)則 2023/2/27 星期六 11Data Mining: Concepts and Techniques關(guān)聯(lián)規(guī)則挖掘的過程 〖定義 8- 4〗 在關(guān)聯(lián)規(guī)則挖掘算法中,把項(xiàng)目的集合稱為項(xiàng)集( itemset), 包含有 k個(gè)項(xiàng)目的項(xiàng)集稱為 k項(xiàng)集。例如: 性別 =“ 女 ”= 職業(yè) =“ 秘書 ” ,是布爾型關(guān)聯(lián)規(guī)則; 性別 =“ 女 ”= avg( 月收入) =2300,涉及的收入是數(shù)值類型,所以是一個(gè)量化型關(guān)聯(lián)規(guī)則。因此,可以把關(guān)聯(lián)規(guī)則挖掘劃分為以下兩個(gè)子問題: 根據(jù)最小支持度找出事務(wù)集 D中的所有頻繁項(xiàng)集。通過設(shè)置最小支持度和最小置信度可以了解某些數(shù)據(jù)之間的關(guān)聯(lián)程度?!级x 8- 1〗 令 I={i1, i2, …,in} 是項(xiàng)目集, D是全體事務(wù)的集合。第 6章 : 關(guān)聯(lián)規(guī)則挖掘n Association rule miningn Algorithms for scalable mining of (singledimensional Boolean) association rules in transactional databasesn Mining various kinds of association/correlation rules n Constraintbased association miningn Sequential pattern miningn Applications/extensions of frequent pattern miningn Summary2023/2/27 星期六 1Data Mining: Concepts and TechniquesWhat Is Association Mining?n Association rule mining:n Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.n Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database [AIS93]n Motivation: finding regularities in datan What products were often purchased together? — Beer and diapers?!n What are the subsequent purchases after buying a PC?n What kinds of DNA are sensitive to this new drug?n Can we automatically classify web documents?2023/2/27 星期六 2Data Mining: Concepts and Techniques關(guān)聯(lián)規(guī)則挖掘的基本概念 購物籃分析-引發(fā)關(guān)聯(lián)規(guī)則挖掘的例子問題: “ 什么商品組或集合顧客多半會(huì)在一次購物中同時(shí)購買? ”購物籃分析:設(shè)全域?yàn)樯痰瓿鍪鄣纳唐返募希错?xiàng)目全集),一次購物購買(即事務(wù))的商品為項(xiàng)目全集的子集,若每種商品用一個(gè)布爾變量表示該商品的有無,則每個(gè)購物籃可用一個(gè)布爾向量表示。事務(wù) T是 I上的一個(gè)子集,集合 T?I, 每個(gè)事務(wù)用唯一的標(biāo)志 TID來標(biāo)識(shí)。167。 ― 核心 根據(jù)頻繁項(xiàng)集和最小置信度產(chǎn)生關(guān)聯(lián)規(guī)則。2023/2/27 星期六 9Data Mining: Concepts and Techniques關(guān)聯(lián)規(guī)則挖掘的分類 — 基于 抽象層次 基于規(guī)則中數(shù)據(jù)的抽象層次 ,可以分為單層關(guān)聯(lián)規(guī)則和多層關(guān)聯(lián)規(guī)則:n單層的關(guān)聯(lián)規(guī)則:所有的變量都不涉及不同抽象層次的項(xiàng)或?qū)傩?。包含?xiàng)集的事務(wù)數(shù)稱為項(xiàng)集的出現(xiàn)頻率,簡稱為項(xiàng)集的頻率或支持度計(jì)數(shù)。for (k = 1。2023/2/27 星期六 23Data Mining: Concepts and Techniques由頻繁項(xiàng)集而產(chǎn)生關(guān)聯(lián)規(guī)則〖例〗假設(shè)數(shù)據(jù)包含頻繁項(xiàng)集 I={I1, I2, I5}:第 1步:對(duì)于頻繁項(xiàng)集 I={I1, I2, I5}, 產(chǎn)生 I的所有非空子集:{I1,I2},{I1,I5},{I2,I5},{I1},{I2},{I5}第 2步:對(duì)于 I的每一個(gè)非空子集 s, 輸出關(guān)聯(lián)規(guī)則 “ s?(Is)”I1∧I2→I5 confidence=2/4=50%I1∧I5→I2 confidence=2/2=100%I2∧I5→I1 confidence=2/2=100%I1→I2∧I5 confidence=2/6=33%I2→I1∧I5 confidence=2/7=29%I5→I1∧I2 confidence=2/7=100%如果最小置信度設(shè)定為 70%,則只有以下三個(gè)關(guān)聯(lián)規(guī)則輸出:I1∧I5→I2 confidence=2/2=100%I2∧I5→I1 confidence=2/2=100%I5→I1∧I2 confidence=2/7=100%2023/2/27 星期六 24Data Mining: Concepts and Techniques例子〖例〗以下表所示的事務(wù)集為例,其中 C[i]是候選集, L[i]是大數(shù)據(jù)項(xiàng)集。則數(shù)據(jù)項(xiàng)在候選集中至少要出現(xiàn) 4次以上才能滿足大數(shù)據(jù)項(xiàng)的條件,規(guī)則的可信度至少要大于 70%才能形成關(guān)聯(lián)規(guī)則。 k++) do begin Ck+1 = candidates generated from Lk。 如果項(xiàng)集滿足最小支持度,則稱該項(xiàng)集為頻繁項(xiàng)集( frequent itemset )。n多層的關(guān)聯(lián)規(guī)則:變量涉及不同抽象層次的項(xiàng)或?qū)傩浴?關(guān)聯(lián)規(guī)則挖掘:給定一組 Item和記錄集合,挖掘出 Item間的相關(guān)性,使其置信度和支持度分別大于用戶給定的最小置信度和、最小支持度。 2023/2/27 星期六 7Data Mining: Concepts and Techniques關(guān)聯(lián)規(guī)則挖掘167。2023/2/27 星期六 5Data Mining: Concepts and Techniques置信度和支持度〖定義 8- 2〗 關(guān)聯(lián)規(guī)則 X?Y對(duì)事物集 D的支持度( support,) 定義為 D中包含有事務(wù) X和 Y的百分比。這些模式可用關(guān)聯(lián)規(guī)則描述。2023/2/27 星期六 3Data Mining: Concepts and TechniquesWhy Is Frequent Pattern or Assoiciation Mining an Essential Task in Data Mining?n Foundation for many essential data mining tasksn Association, correlation, causalityn Sequential patterns, temporal or cyclic association, partial periodicity, spatial and multimedia associationn Associative classification, cluster analysis, iceberg cube, fascicles (semantic data pression)n Broad applicationsn Basket data analysis, crossmarketing, catalog design, sale campaign analysisn Web log (click stream) analysis, DNA sequence analysis, etc.2023/2/27 星期六 4Data Mining: Concepts and Techniques關(guān)聯(lián)規(guī)則 關(guān)聯(lián)( Associations) 分析的目的是為了挖掘隱藏在數(shù)據(jù)間的相互關(guān)系,即對(duì)于給定的一組項(xiàng)目和一個(gè)記錄集,通過對(duì)記錄集的分析,得出項(xiàng)目集中的項(xiàng)目之間的相關(guān)性。即:support(X?Y) = min_supconfidence(X?Y) = min_conf 的關(guān)聯(lián)規(guī)則稱為強(qiáng)規(guī)則;否則稱為弱規(guī)則。167。n 量化型關(guān)聯(lián)規(guī)則:如果描述的是量化的項(xiàng)或?qū)傩灾g的關(guān)聯(lián),則該規(guī)則是量化型的關(guān)聯(lián)規(guī)則。例如:用戶購買的物品: “ 咖啡 =砂糖 ” ,這條規(guī)則只涉及到用戶的購買的物品。第 2步:由頻繁項(xiàng)集產(chǎn)生強(qiáng)關(guān)聯(lián)規(guī)則,即找出滿足最小支持度和最小置信度的關(guān)聯(lián)規(guī)則。置信度使用下式計(jì)算:Confidence(A?B)=support_count(A∪B)/support_count(A)其中: support_count(A∪B) 是包含 A∪B 的事務(wù)數(shù), support_count(A) 是包含 A的事務(wù)數(shù)。00.TID Items10 a, c, d, e, f20 a, b, e30 c, e, f40 a, c, d, f50 c, e, fMin_sup=22023/2/27 星期六 52Data Mining: Concepts and TechniquesMining Frequent Closed Patterns: CHARMn Use vertical data format: t(AB)={T1, T12, …}n Derive closed pattern based on vertical intersectionsn t(X)=t(Y): X and Y always happen togethern t(X)?t(Y): transaction having X always has Yn Use diffset to accelerate miningn Only keep track of difference of tidsn t(X)={T1, T2, T3}, t(Xy )={T1, T3} n Diffset(Xy, X)={T2}n M. Zaki. CHARM: An Efficient Algorithm for Closed Association Rule Mining, CSTR9910, Rensselaer Polytechnic Instituten M. Zaki, Fast Vertical Mining Using Diffsets, TR011, Department of Computer Science, Rensselaer Polytechnic Institute2023/2/27 星期六 53Data Mining: Concepts and TechniquesVisualization of Association Rules: Pane Graph2023/2/27 星期六 54Data Mining: Concepts and TechniquesVisualization of Association Rules: Rule Graph2023/2/27 星期六 55Data Mining: Concepts and TechniquesChapter 6: Mining Association Rules in Large Databasesn Association rule miningn Algorithms for scalable mining of (singledimensional Boolean) association rules in transactional databasesn Mining various kinds of association/correlation rules n Constraintbased association miningn Sequential pattern miningn Applications/extensions of frequent pattern miningn Summary2023/2/27 星期六 56Data Mining: Concepts and TechniquesMining Various Kinds of Rules or Regularitiesn Multilevel, quantitative association rules, correlation and causality, ratio rules, sequen
點(diǎn)擊復(fù)制文檔內(nèi)容
環(huán)評(píng)公示相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖鄂ICP備17016276號(hào)-1