【正文】
s No sum(S) ? v (items could be of any value, v ? 0) Yes No No …… 38 ConstraintBased Mining — A General Picture Constraint Antimonotone Monotone Succinct v ? S no yes yes S ? V no yes yes S ? V yes no yes min(S) ? v no yes yes min(S) ? v yes no yes max(S) ? v yes no yes max(S) ? v no yes yes count(S) ? v yes no weakly count(S) ? v no yes weakly sum(S) ? v ( a ? S, a ? 0 ) yes no no sum(S) ? v ( a ? S, a ? 0 ) no yes no range(S) ? v yes no no range(S) ? v no yes no avg(S) ? v, ? ? { ?, ?, ? } convertible convertible no support(S) ? ? yes no no support(S) ? ? no yes no 39 Chapter 7 : Advanced Frequent Pattern Mining ? Pattern Mining: A Road Map ? Pattern Mining in MultiLevel, MultiDimensional Space ? ConstraintBased Frequent Pattern Mining ? Mining HighDimensional Data and Colossal Patterns ? Mining Compressed or Approximate Patterns ? Pattern Exploration and Application ? Summary 40 Mining Colossal Frequent Patterns ? F. Zhu, X. Yan, J. Han, P. S. Yu, and H. Cheng, “Mining Colossal Frequent Patterns by Core Pattern Fusion”, ICDE39。 SrikantVLB’95, Han amp。 Pei. All rights reserved. March 13, 2022 Data Mining: Concepts and Techniques 2 3 Chapter 7 : Advanced Frequent Pattern Mining ? Pattern Mining: A Road Map ? Pattern Mining in MultiLevel, MultiDimensional Space ? ConstraintBased Frequent Pattern Mining ? Mining HighDimensional Data and Colossal Patterns ? Mining Compressed or Approximate Patterns ? Pattern Exploration and Application ? Summary Research on Pattern Mining: A Road Map 4 5 Chapter 7 : Advanced Frequent Pattern Mining ? Pattern Mining: A Road Map ? Pattern Mining in MultiLevel, MultiDimensional Space ? Mining MultiLevel Association ? Mining MultiDimensional Association ? Mining Quantitative Association Rules ? Mining Rare Patterns and Negative Patterns ? ConstraintBased Frequent Pattern Mining ? Mining HighDimensional Data and Colossal Patterns ? Mining Compressed or Approximate Patterns ? Pattern Exploration and Application ? Summary 6 Mining MultipleLevel Association Rules ? Items often form hierarchies ? Flexible support settings ? Items at the lower level are expected to have lower support ? Exploration of shared multilevel mining (Agrawal amp。07. ? We have many algorithms, but can we mine large (., colossal) patterns? ― such as just size around 50 to 100? Unfortunately, not! ? Why not? ― the curse of “downward closure” of frequent patterns ? The “downward closure” property ? Any subpattern of a frequent pattern is frequent. ? Example. If (a1, a2, …, a 100) is frequent, then a1, a2, …, a 100, (a1, a2), (a1, a3), …, (a 1, a100), (a1, a2, a3), … are all frequent! There are about 2100 such frequent itemsets! ? No matter using breadthfirst search (., Apriori) or depthfirst search (FPgrowth), we have to examine so many patterns ? Thus the downward closure property leads to explosion! 41 Closed/maximal patterns may partially alleviate the problem but not really solve it: We often need to mine scattered large patterns! Let the minimum support threshold σ= 20 There are frequent patterns of size 20 Each is closed and maximal patterns = The size of the answer set is exponential to n Colossal Patterns: A Motivating Example T1 = 1 2 3 4 ….. 39 40 T2 = 1 2 3 4 ….. 39 40 : . : . : . : . T40=1 2 3 4 ….. 39 40 ????????2040T1 = 2 3 4 ….. 39 40 T2 = 1 3 4 ….. 39 40 : . : . : . : . T40=1 2 3 4 …… 39 nnn n2/22/ ??????????Then delete the items on the diagonal Let’s make a set of 40 transactions 42 Colossal Pattern Set: Small but Interesting ? It is often the case that only a small number of patterns are colossal, ., of large size ? Colossal patterns are usually attached with greater importance than those of small pattern sizes 43 Mining Colossal Patterns: Motivation and Philosophy ? Motivation: Many realworld tasks need mining colossal patterns ? Microarray analysis in bioinformatics (when support is low) ? Biological sequence patterns ? Biological/sociological/information graph pattern mining ? No hope for pleteness ? If the mining of midsized patterns is explosive in size, there is no hope to find colossal patterns efficiently by insisting “plete set” mining philosophy ? Jumping out of the swamp of the midsized results ? What we may develop is a philosophy that may jump out of the swamp of midsized results that are explosive in size and jump to reach colossal patterns ? Striving for mining almost plete colossal patterns ? The key is to develop a mechanism that may quickly reach colossal patterns and discover most of them 44 Let the minsupport threshold σ= 20 Then there are closed/maximal frequent patterns of size 20 However, there is only one with size greater than 20, (., colossal): α= {41,42,…,79} of size 39 Alas, A Show of Colossal Pattern Mining! ????????2040T1 = 2 3 4 ….. 39 40 T2 = 1 3 4 ….. 39 40 : . : . : . : . T40=1 2 3 4 …… 39 T41= 41 42 43 ….. 79 T42= 41 42 43 ….. 79 : . : . T60= 41 42 43 … 79 The existing fastest mining algorithms (., FPClose, LCM) fail to plete running Our algorithm outputs this colossal pattern in seconds 45 Methodology of PatternFusion Strategy ? PatternFusion traverses the tree in a boundedbreadth way ? Always pushes down a frontier of a boundedsize candidate pool ? Only a fixed number of patterns in the current candidate pool will b