【正文】
98, 368379, New York, NY. ? . Zaki. Efficient enumeration of frequent sequences. CIKM’98. Novermber 1998. ? . Garofalakis, R. Rastogi, K. Shim: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. VLDB 1999: 223234, Edinburgh, Scotland. 94 文獻 : 空間 , 多媒體 , 文本和 Web 數(shù)據(jù)庫頻繁模式挖掘 ? K. Koperski, J. Han, and G. B. Marchisio, Mining Spatial and Image Data through 。01), Heidelberg, Germany, April 2022. ? B. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE39。95, 314, Taipei, Taiwan. ? R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. EDBT’96. ? J. Han, J. Pei, B. MortazaviAsl, Q. Chen, U. Dayal, . Hsu, FreeSpan: Frequent PatternProjected Sequential Pattern Mining, Proc. 2022 Int. Conf. on Knowledge Discovery and Data Mining (KDD39。00), Boston, MA, August 2022. ? R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD39。 A. Pang. ―Exploratory mining and pruning optimizations of constrained association rules.‖ SIGMOD’98 ? J. Pei, J. Han, and L. V. S. Lakshmanan, Mining Frequent Itemsets with Convertible Constraints, Proc. 2022 Int. Conf. on Data Engineering (ICDE39。00, 512521, San Diego, CA, Feb. 2022. ? Y. Fu and J. Han. Metaruleguided mining of association rules in relational databases. KDOOD39。99, 398416, Jerusalem, Israel, Jan. 1999. ? M. Zaki. Generating NonRedundant Association Rules. KDD39。98, 8593, Seattle, Washington. ? J. Pei, J. Han, and R. Mao, CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets, Proc. 2022 ACMSIGMOD Int. Workshop on Data Mining and Knowledge Discovery (DMKD39。98, 494502, Orlando, FL, Feb. 1998. ? D. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov. Query flocks: A generalization of associationrule mining. SIGMOD39。96, 122133, Bombay, India. ? . Miller and Y. Yang. Association rules over interval data. SIGMOD39。98, 582593, New York, NY. 88 文獻 : 頻繁模式挖掘 (外延 ) ? B. Lent, A. Swami, and J. Widom. Clustering association rules. ICDE39。95, 420431, Zurich, Switzerland. ? M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and . Verkamo. Finding interesting rules from large sets of discovered association rules. CIKM39。97, Newport Beach, CA, Aug. 1997. ? M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm for discovery of association rules. Data Mining and Knowledge Discovery, 1:343374, 1997. 87 文獻 : 頻繁模式挖掘 (外延 ) ? S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. SIGMOD39。95, San Jose, CA. ? S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. SIGMOD39。97, Tucson, Arizona. ? . Park, . Chen, and . Yu. An effective hashbased algorithm for mining association rules. SIGMOD39。96, New Orleans, LA. ? T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using twodimensional optimized association rules: Scheme, algorithms, and visualization. SIGMOD39。96, 134145, Bombay, India, Sept. 1996. ? . Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. KDD’97. August 1997. 85 文獻 : 頻繁模式挖掘 (性能改進 ) ? S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket analysis. SIGMOD39。95, 407419, Zurich, Switzerland, Sept. 1995. ? R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD39。95, 432443, Zurich, Switzerland. ? C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. VLDB39。94 487499, Santiago, Chile. ? J. Han, J. Pei, and Y. Yin: ―Mining frequent patterns without candidate generation”. In Proc. ACMSIGMOD’2022, pp. 112, Dallas, TX, May 2022. ? H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. KDD39。 ? M. Zaki et al. New algorithms for fast discovery of association rules. In KDD’97 ? 掃描一次數(shù)據(jù)集將 水平格式 數(shù)據(jù)轉化為垂直格式 ? 通過頻繁 k項集的 tidlist的交集,計算對應 (k+1)項集的 tidlist 32 頻繁模式挖掘的瓶頸 ? 多遍數(shù)據(jù)庫掃描是 昂貴的 ? 挖掘長模式需要很多遍掃描 , 并產生大量候選 ? 挖掘頻繁模式 i1i2…i 100 ?掃描次數(shù) : 100 ?候選個數(shù) : (1001) + (1002) + … + ( 110000) = 21001 = *1030 ! ? 瓶頸 : 候選產生 測試 ? 能夠避免候選產生嗎 ? 33 挖掘頻繁模式而不產生候選 ? 使用局部頻繁的項 , 由 短模式 增長產生 長模式 ? ―abc‖ 是頻繁模式 ? 得到包含 “ abc‖的所有事務 : DB|abc ? ―d‖ 是 DB|abc 中的局部頻繁項 ? abcd 是頻繁模式 34 由事務數(shù)據(jù)庫構造 FP樹 {} f:4 c:1 b:1 p:1 b:1 c:3 a:3 b:1 m:2 p:2 m:1 Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 min_support = 3 TID Items bought (ordered) frequent items 100 {f, a, c, d, g, i, m, p} {f, c, a, m, p} 200 {a, b, c, f, l, m, o} {f, c, a, b, m} 300 {b, f, h, j, o, w} {f, b} 400 {b, c, k, s, p} {c, b, p} 500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} 1. 掃描 DB 一次 , 找出頻繁 1itemset (單個項的模式 ) 2. 按頻率的降序將頻繁項排序 , 得到 flist 3. 再次掃描 DB, 構造 FP樹 Flist=fcabmp 35 劃分模式和數(shù)據(jù)庫 ? 可以按照 flist 將頻繁模式劃分成子集 ? Flist=fcabmp ? 包含 p的模式 ? 包含 m 但不包含 p的模式 ? … ? 包含 c 但不包含 a, b, m, p ? 模式 f ? 完全性和非冗余性 36 從 p條件數(shù)據(jù)庫找出含 p的模式 ? 從 FP樹的頻繁項頭表開始 ? 沿著頻繁項 p 的鏈搜索 FP樹 ? 收集項 p 的所有 變換的前綴路徑 形成 p 的模式基 條件 模式基 item cond. pattern base c f:3 a fc:3 b fca:1, f:1, c:1 m fca:2, fcab:1 p fcam:2, cb:1 {} f:4 c:1 b:1 p:1 b:1 c:3 a:3 b:1 m:2 p:2 m:1 頭表 Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 37 Empty Empty f {(f:3)}|c {(f:3)} c {(f:3, c:3)}|a {(fc:3)} a Empty {(fca:1), (f:1), (c:1)} b {(f:3, c:3, a:3)}|m {(fca:2), (fcab:1)} m {(c:3)}|p {(fcam:2), (cb:1)} p 條件 FPtree 條件模式庫 項 通過建立條件模式庫得到頻繁集 38 從條件模式基到條件 FP樹 ? 對于每個條件模式基 ? 累計 條件模