【正文】
? Use confidence, correlation, and other measures when possible 21 22 ConstraintBased Frequent Pattern Mining ? Pattern space pruning constraints ? Antimonotonic: If constraint c is violated, its further mining can be terminated ? Monotonic: If c is satisfied, no need to check c again ? Succinct: c must be satisfied, so one can start with the data sets satisfying c ? Convertible: c is not monotonic nor antimonotonic, but it can be converted into it if items in the transaction can be properly ordered ? Data space pruning constraint ? Data succinct: Data space can be pruned at the initial pattern mining process ? Data antimonotonic: If a transaction t does not satisfy c, t can be pruned from its further mining 23 Pattern Space Pruning with AntiMonotonicity Constraints ? A constraint C is antimonotone if the super pattern satisfies C, all of its subpatterns do so too ? In other words, antimonotonicity: If an itemset S violates the constraint, so does any of its superset ? Ex. 1. sum() ? v is antimonotone ? Ex. 2. range() ? 15 is antimonotone ? Itemset ab violates C ? So does every superset of ab ? Ex. 3. sum() ? v is not antimonotone ? Ex. 4. support count is antimonotone: core property used in Apriori TID Transaction 10 a, b, c, d, f 20 b, c, d, f, g, h 30 a, c, d, e, f 40 c, e, f, g TDB (min_sup=2) Item Profit a 40 b 0 c 20 d 10 e 30 f 30 g 20 h 10 24 Pattern Space Pruning with Monotonicity Constraints ? A constraint C is monotone if the pattern satisfies C, we do not need to check C in subsequent mining ? Alternatively, monotonicity: If an itemset S satisfies the constraint, so does any of its superset ? Ex. 1. sum() ? v is monotone ? Ex. 2. min() ? v is monotone ? Ex. 3. C: range() ? 15 ? Itemset ab satisfies C ? So does every superset of ab TID Transaction 10 a, b, c, d, f 20 b, c, d, f, g, h 30 a, c, d, e, f 40 c, e, f, g TDB (min_sup=2) Item Profit a 40 b 0 c 20 d 10 e 30 f 30 g 20 h 10 25 Data Space Pruning with Data Antimonotonicity ? A constraint c is data antimonotone if for a pattern p cannot satisfy a transaction t under c, p’s superset cannot satisfy t under c either ? The key for data antimonotone is recursive data reduction ? Ex. 1. sum() ? v is data antimonotone ? Ex. 2. min() ? v is data antimonotone ? Ex. 3. C: range() ? 25 is data antimonotone ? Itemset {b, c}’s projected DB: ? T10’: {d, f, h}, T20’: {d, f, g, h}, T30’: {d, f, g} ? since C cannot satisfy T10’, T10’ can be pruned TID Transaction 10 a, b, c, d, f, h 20 b, c, d, f, g, h 30 b, c, d, f, g 40 c, e, f, g TDB (min_sup=2) Item Profit a 40 b 0 c 20 d 15 e 30 f 10 g 20 h 5 26 Pattern Space Pruning with Succinctness ? Succinctness: ? Given A1, the set of items satisfying a succinctness constraint C, then any set S satisfying C is based on A1 , ., S contains a subset belonging to A1 ? Idea: Without looking at the transaction database, whether an itemset S satisfies constraint C can be determined based on the selection of items ? min() ? v is succinct ? sum() ? v is not succinct ? Optimization: If C is succinct, C is precounting pushable 27 Na239。 FuVLDB’95) uniform support Milk [support = 10%] 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Level 1 min_sup = 5% Level 2 min_sup = 3% reduced support 7 Multilevel Association: Flexible Support and Redundancy filtering ? Flexible minsupport thresholds: Some items are more valuable but less frequent ? Use nonuniform, groupbased minsupport ? ., {diamond, watch, camera}: %。 Simon Fraser University 169。 2022 Han, Kamber amp。 {bread, milk}: 5%。ve Algorithm: Apriori + Constraint T ID Ite m s100 1 3 4200 2 3 5300 1 2 3 5400 2 5Database D ite m s e t s u p .{ 1 } 2{ 2 } 3{ 3 } 3{ 4 } 1{ 5 } 3i te m s e t s u p .{ 1 } 2{ 2 } 3{ 3 } 3{ 5 } 3Scan D C1 L1 item set{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}ite m s et s up{ 1 2} 1{ 1 3} 2{ 1 5} 1{ 2 3} 2{ 2 5} 3{ 3 5} 2ite m s e t s u p{ 1 3 } 2{ 2 3 } 2{ 2 5 } 3{ 3 5 } 2L2 C2 C2 Scan D C3 L3 item set{2 3 5}Scan D ite m s e t s u p{ 2 3 5 } 2Constraint: Sum{} 5 28 Constrained Apriori : Push a Succinct Constraint Deep T ID Ite m s100 1 3 4200 2 3 5300 1 2 3 5400 2 5Database D ite m s e t s u p .{ 1 } 2{ 2 } 3{ 3 } 3{ 4 } 1{ 5 } 3i te m s e t s u p .{ 1 } 2{ 2 } 3{ 3 } 3{ 5 } 3Scan D C1 L1 item set{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}ite m s et s up{ 1 2} 1{ 1 3} 2{ 1 5} 1{ 2 3} 2{ 2 5} 3{ 3 5} 2ite m s e t s u p{ 1 3 } 2{ 2 3 } 2{ 2 5 } 3{ 3 5 } 2L2 C2 C2 Scan D C3 L3 item set{2 3 5}Scan D ite m s e t s u p{ 2 3 5 } 2Constraint: min{ } = 1 not immediately to be used 29 Constrained FPGrowth: Push a Succinct Constraint Deep Constraint: min{ } = 1 T ID Ite m s100 1 3 4200 2 3 5300 1 2 3 5400 2 5T ID Ite m s100 1 3200 2 3 5300 1 2 3 5400 2 5Remove infrequent length 1 FPTree T ID Ite m s100 3 4300 2 3 51Projected DB No Need to project on 2, 3, or 5 30 Constrained FPGrowth: Push a Data Antimonotonic Constraint Deep Constraint: min{ } = 1 T ID Ite m s100 1 3 4200 2 3 5300 1 2 3 5400 2 5T ID Ite m s100 1 3300 1 3 FPTree Single branch, we are done Remove from data 31 Constrained FPGrowth: Push a Data Antimonotonic Constraint Deep Constraint: range{ } 25 min_sup = 2 FPTree TID Transaction 10 a, c, d, f, h 20 c, d, f, g, h 30 c, d, f, g BProjected DB B FPTree TID Transaction 10 a, b, c, d, f, h 20 b, c, d