【正文】
也可以根據(jù)所挖掘的知識的粒度或抽象層進(jìn)行區(qū)分,包括概化知識(在高抽象層),原始層知識(在原始數(shù)據(jù)層),或多層知識(考慮若干抽象層)。一個高級的數(shù)據(jù)挖掘系統(tǒng)應(yīng)當(dāng)支持多抽象層的知識發(fā)現(xiàn)。 數(shù)據(jù)挖掘系統(tǒng)還可以分類為挖掘數(shù)據(jù)規(guī)則性(通常出現(xiàn)的模式)和數(shù)據(jù)不規(guī)則性(如異?;蚬铝Ⅻc)這幾種。一般地,概念描述、關(guān) 聯(lián)分析、分類、預(yù)測和聚類挖掘數(shù)據(jù)規(guī)律,將孤立點作為噪聲排除。這些方法也能幫助檢測孤立點。 3)根據(jù)所用的技術(shù)進(jìn)行分類。 數(shù)據(jù)挖掘系統(tǒng)也可以根據(jù)所用的數(shù)據(jù)挖掘技術(shù)進(jìn)行分類。這些技術(shù)可以根據(jù)用戶交互程度(例如自動系統(tǒng)、交互探查系統(tǒng)、查詢驅(qū)動系統(tǒng)),或利用的數(shù)據(jù)分析方法(例如面向數(shù)據(jù)庫或數(shù)據(jù)倉庫的技術(shù)、機(jī)器學(xué)習(xí)、統(tǒng)計學(xué)、可視化、模式識別、神經(jīng)網(wǎng)絡(luò)等)來描述。復(fù)雜的數(shù)據(jù)挖掘系統(tǒng)通常采用多種數(shù)據(jù)挖掘技術(shù),或是采用有效的、集成的技術(shù),結(jié)合一些方法的優(yōu)點。 What is Data Mining? Simply stated, data mining refers to extracting or “mining” knowledge from large amounts of data. The term is actually a misnomer. Remember that the mining of gold from rocks or sand is referred to as gold mining rather than rock or sand mining. Thus, “data mining” should have been more appropriately named “knowledge mining from data”, which is unfortunately somewhat long. “Knowledge mining”, a shorter term, may not reflect the emphasis on mining from large amounts of data. Nevertheless, mining is a vivid term characterizing the process that finds a small set of precious nuggets from a great deal of raw material. Thus, such a misnomer which carries both “data” and “mining” became a popular choice. There are many other terms carrying a similar or slightly different meaning to data mining, such as knowledge mining from databases, knowledge extraction, data / pattern analysis, data archaeology, and data dredging. Many people treat data mining as a synonym for another popularly used term, “Knowledge Discovery in Databases”, or KDD. Alternatively, others view data mining as simply an essential step in the process of knowledge discovery in databases. Knowledge discovery consists of an iterative sequence of the following steps: data cleaning: to remove noise or irrelevant data, data integration: where multiple data sources may be bined, data selection : where data relevant to the analysis task are retrieved from the database, data transformation : where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance, data mining: an essential process where intelligent methods are applied in order to extract data patterns, pattern evaluation: to identify the truly interesting patterns representing knowledge based on some interestingness measures, and knowledge presentation: where visualization and knowledge representation techniques are used to present the mined knowledge to the user . The data mining step may interact with the user or a knowledge base. The interesting patterns are presented to the user, and may be stored as new knowledge in the knowledge base. Note that according to this view, data mining is only one step in the entire process, albeit an essential one since it uncovers hidden patterns for evaluation. We agree that data mining is a know