【正文】
掘是從大量的數(shù)據(jù)中發(fā)現(xiàn)其潛在規(guī)律的技術(shù),是當(dāng)前計算機(jī)科學(xué)研究的熱點之一。數(shù)據(jù)挖掘系統(tǒng)也從第一、二代系統(tǒng)轉(zhuǎn)向第三、四代系統(tǒng)的研制。而數(shù)據(jù)挖掘系統(tǒng)是數(shù)據(jù)挖掘研究和應(yīng)用的橋梁,對數(shù)據(jù)挖掘技術(shù)的推廣起到很大的作用。 本文研究第三代數(shù)據(jù)挖掘系統(tǒng)設(shè)計及其實現(xiàn)的關(guān)鍵技術(shù),提出了一個統(tǒng)一的框架,設(shè)計并實現(xiàn)了基于第三代技術(shù)的數(shù)據(jù)挖掘應(yīng)用平臺,對數(shù)據(jù) 挖掘系統(tǒng)的建設(shè)和發(fā)展具有理論和實際的指導(dǎo)意義。 2) 構(gòu)建了一種新穎的數(shù)據(jù)挖掘體系結(jié)構(gòu), 將數(shù)據(jù)挖掘劃分成數(shù)據(jù)層、算法層、業(yè)務(wù)邏輯層、行業(yè)表示層五個層次 。 提出通用的平臺不能解決特定的領(lǐng)域問題,應(yīng)該和各個領(lǐng)域的業(yè)務(wù)邏輯相結(jié)合 構(gòu)建應(yīng)用平臺,最后在行業(yè)具體應(yīng)用上進(jìn)行實施的論點。提出了帶負(fù)屬性的關(guān)聯(lián)規(guī)則算法和帶時間特征的序列模式算法 TESP。 TESP 算法引入序列模式時間特征的概念,在找出模式的同時,也給出序列模式的時間特征,并且允許用戶在挖掘之前對模式的這些時間特征進(jìn)行限制,提高了序列模式挖掘的靈活性和有用性。對決策樹算法 SLIQ、局部異常因子檢測 LOF 等算法給出了設(shè)計和實現(xiàn)上的優(yōu)化。 5) 設(shè)計并實現(xiàn)了數(shù)據(jù)抽取轉(zhuǎn)換裝載工具 DMETL、關(guān)聯(lián)規(guī)則工具 ARMiner 和數(shù)據(jù)挖掘工具集 DMiner、以及客戶智能分析系統(tǒng) CIAS。 數(shù)據(jù)挖掘應(yīng)用平臺及其關(guān)鍵技術(shù)研究 復(fù)旦大學(xué)博士學(xué)位論文 3 Abstract Data Mining is a process of extracting previously unknown, actionable information from very large database and is a hot field in the research of puter science now. The emphases of research are moving from discovering techniques into system applications after more than ten years’ development. It now pays more attention to the integration of several discovering strategies and techniques and infiltration of multiple subjects. The data mining systems are moving from 1st, 2nd generations into 3rd, 4th generations. Data mining is an application oriented multiplesubject intersectional field and the data mining techniques and theories are motivated by applications. Data mining systems are the bridges between data mining researches and applications and play an important role in popularization of data mining techniques. It is an exigent problem to be solved in the research of data mining systems how to collect existing algorithms under a uniform framework integrating with specific domains and how to construct the data mining systems that can be accepted by different users. In this paper, we have a study on the key techniques in designing and implementing 3rd generation data mining systems and propose a uniform framework, design and implement a Data Mining Application Platform based on 3rd generation techniques. It may be a theoretical and practical guidance for the construction and development of data mining systems. The majority of our work is summarized here: 1) Propose the conclusion to break the development of data mi ning systems into four generations from technique aspect and three phases from evolution aspect, then induce the trend that data mining systems should be integrated with applications, and bring forward the concept of Data Mining Application Platform. 2) Design a novel data mining system architecture that divides data mining into five layers: data layer: algorithm layer, business rule layer, business presentation layer. In this article, we extend the CRISP_DM data mining process model by adding process model’s support to user role and closed loop, then design the framework and architecture of Data Mining Application Platform. We conclude that the universal platform cannot solve the problem in specific domain and we should construct the application platform through 錯誤 !未找到