【正文】
ing— Practical Machine Learning Tools and Techniques with Java Implementations》, 新西蘭 Ian H. Witten、Eibe Frank著 ? Weka – An open source framework for text analysis implemented in Java that is being developed at the University of Waikato in New Zealand. – – 概念: KDD、 ML、 OLAP與 DM ? KDD( Knowledge Discovery in Database) 是一種知識發(fā)現(xiàn)的 一連串過程 。 ? ML( Machine Learning) = KD, 不限于 Database的數(shù)據(jù) 過程: 挖掘-數(shù)據(jù)模式-表示-驗證-預(yù)測 ? OLAP( Online Analytical Process) 是數(shù)據(jù)庫在線分析過程。 ? DM用在產(chǎn)生假設(shè) ,而 OLAP則用于查證假設(shè) 概念: DM與 DB ? Data Preparation要占 Data mining過程 70%工作量 ? 「 Data base」+「 Data mining」= 會說話的數(shù)據(jù)庫 概念: Data Mining ? 概念:數(shù)據(jù)挖掘是從大量的數(shù)據(jù)中,抽取出潛在的、有價值的知識(模型或規(guī)則)的過程 – Key Characteristics of Data Mining: ? Large amount of data ? Discovering previously unknown, hidden information ? Extracting valuable information ? Making important business decision using the information ? DM/ML的一些要點 – The data is stored electronically and the search is automated by puter。 – Defined as the process of discovering patterns in data。 – to bee aware by information or from observation。 – to be informed of, ascertain(確定 )。ve Bayes) ? 使用所有屬性,假設(shè)屬性無關(guān)、且同等重要 ? Divide and conquer: Constructing decision trees ? 循環(huán)選擇一個屬性來分割樣本 (算法: ID ) ? Covering algorithms: Constructing rules( 算法:Prism) ? Take each class in turn and seek a way of covering all instances in it, at the same time excluding instances not in the class. ? Covering approach導(dǎo)出一個規(guī)則集而不是決策樹 算法: The basic methods ? Mining association rules: – 參數(shù): coverage(support), accuracy(confidence) ? Linear models( 參考 ) – 主要用于值預(yù)估和分類( Line