【正文】
n a database?—unrealistic because the patterns could be too many but uninteresting ? Data mining should be an interactive process ? User directs what to be mined ? Users must be provided with a set of primitives(原語 ,基本要素 ) to be used to municate with the data mining system ? Incorporating these primitives in a data mining query language ? More flexible user interaction ? Foundation for design of graphical user interface ? Standardization of data mining industry and practice 49 數(shù)據(jù)挖據(jù)查詢語言 ? 通過數(shù)據(jù)挖掘查詢語言,數(shù)據(jù)挖掘任務(wù)可以通過查詢的形式輸入到數(shù)據(jù)挖掘系統(tǒng)中。 ? 定義數(shù)據(jù)挖據(jù)查詢語言的優(yōu)勢 50 Primitives that Define a Data Mining Task ? Taskrelevant data ? Database or data warehouse name ? Database tables or data warehouse cubes ? Condition for data selection ? Relevant attributes or dimensions ? Data grouping criteria ? Type of knowledge to be mined ? Characterization, discrimination, association, classification, prediction, clustering, outlier analysis, other data mining tasks ? Background knowledge ? Pattern interestingness measurements ? Visualization/presentation of discovered patterns 51 數(shù)據(jù)挖據(jù)原語 52 Primitive 3: Background Knowledge ? A typical kind of background knowledge: Concept hierarchies ? Schema hierarchy ? ., street city province_or_state country ? Setgrouping hierarchy ? ., {2039} = young, {4059} = middle_aged ? Operationderived hierarchy ? address: loginname department university country ? Rulebased hierarchy ? low_profit_margin (X) = price(X, P1) and cost (X, P2) and (P1 P2) $50 53 Primitive 4: Pattern Interestingness Measure ? Simplicity ., (association) rule length, (decision) tree size ? Certainty ., confidence, P(A|B) = (A and B)/ (B), classification reliability or accuracy, certainty factor, rule strength, rule quality, discriminating weight, etc. ? Utility potential usefulness, ., support (association), noise threshold (description) ? Novelty not previously known, surprising (used to remove redundant rules, ., Illinois vs. Champaign rule implication support ratio) 54 Primitive 5: Presentation of Discovered Patterns ? Different backgrounds/usages may require different forms of representation ? ., rules, tables, crosstabs, pie/bar chart, etc. ? Concept hierarchy is also important ? Discovered knowledge might be more understandable when represented at high level of abstraction ? Interactive drill up/down, pivoting, slicing and dicing provide different perspectives to data ? Different kinds of knowledge require different representation: association, classification, clustering, etc. 55 An Example Query in DMQL 56 數(shù)據(jù)挖掘的主要問題 (1) ? 挖掘方法和用戶交互 ? 在數(shù)據(jù)庫中挖掘不同類型的知識 ? 在多個抽象層的交互式知識挖掘 ? 結(jié)合背景知識 ? 數(shù)據(jù)挖掘語言和啟發(fā)式數(shù)據(jù)挖掘 ? 數(shù)據(jù)挖掘結(jié)果的表示和可視化 ? 處理噪音和不完全數(shù)據(jù) ? 模式評估 : 興趣度問題 ? 性能和可伸縮性 ( scalability) ? 數(shù)據(jù)挖掘算法的性能和可伸縮性 ? 并行 , 分布和增量的挖掘方法 57 數(shù)據(jù)挖掘的主要問題 (2) ? 數(shù)據(jù)類型的多樣性問題 ? 處理關(guān)系的和復(fù)雜類型的數(shù)據(jù) ? 從異種數(shù)據(jù)庫和全球信息系統(tǒng) (WWW)挖掘信息 ? 應(yīng)用和社會效果問題 ? 發(fā)現(xiàn)知識的應(yīng)用 ? 特定領(lǐng)域的數(shù)據(jù)挖掘工具 ? 智能查詢回答 ? 過程控制和決策制定 ? 發(fā)現(xiàn)知識與已有知識的集成 : 知識融合問題 ? 數(shù)據(jù)安全 , 完整和私有的保護(hù) 58 小結(jié) ? 數(shù)據(jù)挖掘 : 從大量數(shù)據(jù)中發(fā)現(xiàn)有趣的模式 ? 數(shù)據(jù)庫技術(shù)的自然進(jìn)化 , 具有巨大需求和廣泛應(yīng)用 ? KDD 過程包括數(shù)據(jù)清理 , 數(shù)據(jù)集成 , 數(shù)據(jù)選擇 , 變換 , 數(shù)據(jù)挖掘 , 模式評估 , 和知識表示 ? 挖掘可以在各種數(shù)據(jù)存儲上進(jìn)行 ? 數(shù)據(jù)挖掘功能 : 特征 , 區(qū)分 , 關(guān)聯(lián) , 分類 , 聚類 , 孤立點 和趨勢分析 , 等 . ? 數(shù)據(jù)挖掘系統(tǒng)的分類 ? 數(shù)據(jù)挖掘的主要問題 59 參考文獻(xiàn) ? U. M. Fayyad, G. PiatetskyShapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996. ? J. Han and M. Kamber. Data Mining: Concepts and Techniques. Man Kaufmann, 2022. ? T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Communications of ACM, 39:5864, 1996. ? G. PiatetskyShapiro, U. Fayyad, and P. Smith. From data mining to knowledge discovery: An overview. In . Fayyad, et al. (eds.), Advances in Knowledge Discovery and Data Mining, 135. AAAI/MIT Press, 1996. ? G. PiatetskyShapiro and W. J. Frawley. Knowledge Discovery in Databases. AAAI/MIT Press, 1991. 60 謝謝大家 !