【正文】
temsn “abc” is a frequent patternn Get all transactions having “abc”: DB|abcn “d” is a local frequent item in DB|abc ? abcd is a frequent pattern2023/2/27 星期六 33Data Mining: Concepts and TechniquesConstruct FPtree from a Transaction Database{}f:4 c:1b:1p:1b:1c:3a:3b:1m:2p:2 m:1Header TableItem frequency head f 4c 4a 3b 3m 3p 3min_support = 3TID Items bought (ordered) frequent items100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}200 {a, b, c, f, l, m, o} {f, c, a, b, m}300 {b, f, h, j, o, w} {f, b}400 {b, c, k, s, p} {c, b, p}500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}1. Scan DB once, find frequent 1itemset (single item pattern)2. Sort frequent items in frequency descending order, flist3. Scan DB again, construct FPtreeFlist=fcabmp2023/2/27 星期六 34Data Mining: Concepts and TechniquesBenefits of the FPtree Structuren Completeness n Preserve plete information for frequent pattern miningn Never break a long pattern of any transactionn Compactnessn Reduce irrelevant info—infrequent items are gonen Items in frequency descending order: the more frequently occurring, the more likely to be sharedn Never be larger than the original database (not count nodelinks and the count field)n For Connect4 DB, pression ratio could be over 1002023/2/27 星期六 35Data Mining: Concepts and TechniquesPartition Patterns and Databasesn Frequent patterns can be partitioned into subsets according to flistn Flist=fcabmpn Patterns containing pn Patterns having m but no pn …n Patterns having c but no a nor b, m, pn Pattern fn Completeness and nonredundency2023/2/27 星期六 36Data Mining: Concepts and TechniquesFind Patterns Having P From Pconditional Databasen Starting at the frequent item header table in the FPtreen Traverse the FPtree by following the link of each frequent item pn Accumulate all of transformed prefix paths of item p to form p’s conditional pattern baseConditional pattern basesitem cond. pattern basec f:3a fc:3b fca:1, f:1, c:1m fca:2, fcab:1p fcam:2, cb:1{}f:4 c:1b:1p:1b:1c:3a:3b:1m:2p:2 m:1Header TableItem frequency head f 4c 4a 3b 3m 3p 32023/2/27 星期六 37Data Mining: Concepts and TechniquesFrom Conditional Patternbases to Conditional FPtrees n For each patternbasen Accumulate the count for each item in the basen Construct the FPtree for the frequent items of the pattern basemconditional pattern base:fca:2, fcab:1{}f:3c:3a:3mconditional FPtreeAll frequent patterns relate to mm, fm, cm, am, fcm, fam, cam, fcam? ?{}f:4 c:1b:1p:1b:1c:3a:3b:1m:2p:2 m:1Header TableItem frequency head f 4c 4a 3b 3m 3p 32023/2/27 星期六 38Data Mining: Concepts and TechniquesRecursion: Mining Each Conditional FPtree{}f:3c:3a:3mconditional FPtreeCond. pattern base of “am”: (fc:3){}f:3c:3amconditional FPtreeCond. pattern base of “cm”: (f:3){}f:3cmconditional FPtreeCond. pattern base of “cam”: (f:3){}f:3camconditional FPtree2023/2/27 星期六 39Data Mining: Concepts and TechniquesA Special Case: Single Prefix Path in FPtreen Suppose a (conditional) FPtree T has a shared single prefixpath Pn Mining can be deposed into two partsn Reduction of the single prefix path into one noden Concatenation of the mining results of the two parts?a2:n2a3:n3a1:n1{}b1:m1 C1:k1C2:k2 C3:k3b1:m1 C1:k1C2:k2 C3:k3r1+a2:n2a3:n3a1:n1{}r1 =2023/2/27 星期六 40Data Mining: Concepts and TechniquesMining Frequent Patterns With FPtreesn Idea: Frequent pattern growthn Recursively grow frequent patterns by pattern and database partitionn Method n For each frequent item, construct its conditional patternbase, and then its conditional FPtreen Repeat the process on each newly created conditional FPtree n Until the resulting FPtree is empty, or it contains only one path—single path will generate all the binations of its subpaths, each of which is a frequent pattern2023/2/27 星期六 41Data Mining: Concepts and Techniques算法中文說明2023/2/27 星期六 42Data Mining: Concepts and TechniquesScaling FPgrowth by DB Projectionn FPtree cannot fit in memory?—DB projectionn First partition a database into a set of projected DBsn Then construct and mine FPtree for each projected DBn Parallel projection vs. Partition projection techniquesn Parallel projection is space costly2023/2/27 星期六 43Data Mining: Concepts and TechniquesPartitionbased Projectionn Parallel projection needs a lot of disk space n Partition projection saves itTran. DB fcampfcabmfbcbpfcamppproj DB fcamcbfcammproj DB fcabfcafcabproj DB fcb…aproj DBfc…cproj DBf…fproj DB …amproj DB fcfcfccmproj DB fff…2023/2/27 星期六 44Data Mining: Concepts and TechniquesFPGrowth vs. Apriori: Scalability With the Support ThresholdData set T25I20D10K2023/2/27 星期六 45Data Mining: Concepts and TechniquesFPGrowth vs. TreeProjection: Scalability with the Support ThresholdData set T25I20D100K2023/2/27 星期六 46Data Mining: Concepts and TechniquesWhy Is FPGrowth the Winner?n Divideandconquer: n depose both the mining task and DB according to the frequent patterns obtained so farn leads to focused search of smaller databasesn Other factorsn no candidate generation, no candidate testn pressed database: FPtree structuren no repeated scan of entire database n basic ops—counting local freq items and building sub FPtree, no pattern search and matching2023/2/27 星期六 47Data Mining: Concepts and TechniquesImplications of the Methodology n Mining closed frequent itemsets and maxpatternsn CLOSET (DMKD’00)n Mining sequential patternsn FreeSpan (KDD’00), PrefixSpan (ICDE’01)n Constraintbased mining of fr