【正文】
更改模型的算法。例如:目標(biāo)客戶尋找,需把客戶的各種靜態(tài)狀態(tài)用特定的符號(hào)表示,并把不同類型的數(shù)據(jù)分組、分類。數(shù)據(jù)的選擇在很大程度上決定了模型最終建立的結(jié)果,訓(xùn)練集應(yīng)該具備一定得數(shù)據(jù),數(shù)據(jù)應(yīng)該從海量的數(shù)據(jù)中提取覆蓋所有數(shù)據(jù)來(lái)源、數(shù)據(jù)類型、數(shù)據(jù)內(nèi)容、數(shù)據(jù)格式等方面的數(shù)據(jù)。CRISP 過(guò)程如圖所示 [7]:圖 CRISP 過(guò)程數(shù)據(jù)挖掘的過(guò)程可以分為下列幾個(gè)步驟:提出問(wèn)題、數(shù)據(jù)選擇、數(shù)據(jù)變換、數(shù)據(jù)挖掘、模型評(píng)估及結(jié)果分析。數(shù)據(jù)挖掘過(guò)程分為五個(gè)階段: 抽樣、說(shuō)明、預(yù)處理、建模以及挖掘結(jié)果的評(píng)估。數(shù)據(jù)挖掘基于統(tǒng)計(jì)學(xué)、數(shù)據(jù)庫(kù)技術(shù)、面向?qū)ο蠓椒?、人工智能、高性能?jì)算、機(jī)器學(xué)習(xí)、知識(shí)工程、信息檢索及數(shù)據(jù)可視化等多種技術(shù)結(jié)合為一體的多學(xué)科的交叉研究領(lǐng)域,不但能夠查詢歷史信息,還能從歷史信息中尋找相關(guān)潛在聯(lián)系,然后進(jìn)行高層次的分析,從中提取有價(jià)值的、潛在的模型、知識(shí)、模式和規(guī)律等,在此過(guò)程中,能夠根據(jù)已有的發(fā)現(xiàn)從而對(duì)未來(lái)進(jìn)行預(yù)測(cè),幫助決策者調(diào)整市場(chǎng),最后做出科學(xué)的決策。 3 / 37二、數(shù)據(jù)挖掘與客戶分類概述(一)關(guān)于數(shù)據(jù)挖掘在“數(shù)據(jù)膨脹但是只是貧乏”的時(shí)代,人們?yōu)榱四軌蚋玫睦矛F(xiàn)有數(shù)據(jù),對(duì)其進(jìn)行更深層次的分析。從相關(guān)資料分析得出,新浪微博平臺(tái)往往帶來(lái)非目標(biāo)客戶如兒童、老人等,也帶來(lái)大量捏造的、虛假的客戶資料。2 / 37從這些方面能夠看出,當(dāng)今社會(huì)數(shù)據(jù)挖掘技術(shù)對(duì)于客戶分類具有相當(dāng)重要的意義及作用。對(duì)于企業(yè)而言,有助于降低企業(yè)成本,提高企業(yè)競(jìng)爭(zhēng)力,能夠幫助企業(yè)“走出去”,快速交換、獲得信息。數(shù)據(jù)挖掘技術(shù)是從先前不知的、大量的、模糊的、不完整的隨機(jī)的數(shù)據(jù)中提取潛在的有用的知識(shí)及信息的一個(gè)過(guò)程。關(guān)鍵詞: 決策樹;目標(biāo)客戶;CART;新浪微博;數(shù)據(jù)挖掘II / 37ABSTRACTOur society, now is full of information. Based on the rapid development of the data warehouse and data mining technology, peting in work platform increases day by day. So the customer management bees one of the most important issues.This paper, based on the theory, technology and methods about data mining and got classification tree for the main ideas of the modeling. Made the customer information of Microblog into the properties the Conclusion form, using the CART algorithm of classification tree which based on the smallest of Gini index. By building the tree, pruning the tree and assessing the tree, the customers are classified. As a result, target and nontarget customers are distinguished rapidly and accurately.Based on the data mining, the models of microblog with target customers do some adjustments, then we can finally get the optimization model. The bination of data warehouse model, applied to real life can greatly improve efficiency, in other words, the customer or the pany will both benefit lots from this.Key words:Decision Tree;Searching Target Customers ;CART ;Microblog ;Data Mining3 / 37目 錄摘要 ........................................................................................................................................IABSTRACT ..........................................................................................................................II一、前言 ................................................................................................................................1(一)研究背景 ..............................................................................................................1(二)選題目的 ..............................................................................................................2二、數(shù)據(jù)挖掘與客戶分類概述 ............................................................................................3(一)關(guān)于數(shù)據(jù)挖掘 ......................................................................................................31.?dāng)?shù)據(jù)挖掘的概念及其操作過(guò)程 ....................................................................32.?dāng)?shù)據(jù)挖掘常用技術(shù) ........................................................................................5(二)關(guān)于客戶分類 ........................................................................................................61. 客戶分類的概念 ...............................................................................................62. 新浪微博客戶分類的意義 ...............................................................................63. 新浪微博客戶操作流程 ...................................................................................74. 新浪微博客戶分類中的具體應(yīng)用 ...................................................................85. 新浪微博客戶分類及特征 ...............................................................................8三、CART 算法及其在新浪微博客戶分類中的具體應(yīng)用 ..............................................10(一 )CART 算法簡(jiǎn)介 ................................................................................................10(二)CART 算法的優(yōu)缺點(diǎn)及適用性 ........................................................................13(三)CART 算法在新浪微博客戶分類中的具體應(yīng)用 ............................................141. 問(wèn)題定義 .......................................................................................................142. 數(shù)據(jù)準(zhǔn)備 .......................................................................................................143. 數(shù)據(jù)變換 .......................................................................................................164. CART 算法的具體應(yīng)用過(guò)程 ........................................................................21四、對(duì)新浪微博客戶分類的結(jié)果分析 ..............................................................................28(一)客戶分類及其相應(yīng)的營(yíng)銷策略 ........................................................................28(二)CART 算法的不足與改進(jìn) ................................................................................30結(jié)論 ......................................................................................................................................32參考文獻(xiàn) ..............................................................................................................................33致謝 ......................................................................................................................................341 / 37一、前言由于通訊技術(shù)迅猛發(fā)展,中國(guó)網(wǎng)絡(luò)發(fā)生了根本性地改變,與國(guó)外相比,國(guó)內(nèi)的交流平臺(tái)面對(duì)著一個(gè)全新的,全球化的,競(jìng)爭(zhēng)更加激烈的市場(chǎng)環(huán)境。I / 37摘 要基于當(dāng)今這個(gè)高度信息化的時(shí)代,數(shù)據(jù)挖掘技術(shù)及數(shù)據(jù)倉(cāng)庫(kù)的高速發(fā)展,通過(guò)網(wǎng)絡(luò)平臺(tái)交流的用戶日趨增加,客戶分類就成為了當(dāng)今社會(huì)首要解決的問(wèn)題。該模型有著響應(yīng)時(shí)間較短且精度高的特點(diǎn),若運(yùn)用到實(shí)際生活中能夠大大地提升客戶分類的效率,那么無(wú)論是企業(yè)還是個(gè)人都將從中受益良多。而當(dāng)前的數(shù)據(jù)庫(kù)技術(shù)雖可以對(duì)數(shù)據(jù)高效查詢、分析及統(tǒng)計(jì),但是仍無(wú)法發(fā)現(xiàn)潛在的規(guī)律和聯(lián)系,因此便無(wú)法對(duì)未來(lái)發(fā)展的趨勢(shì)進(jìn)行更好地預(yù)測(cè),導(dǎo)致了一種“數(shù)據(jù)膨脹但是知識(shí)貧乏”的現(xiàn)象 [1],這樣的需求便使數(shù)據(jù)挖掘這門技術(shù)孕育而生。網(wǎng)絡(luò)信息交換的新風(fēng)向標(biāo)新浪微博平臺(tái)的推廣,打破了時(shí)空的限制,改變了交流的形勢(shì),加速了整個(gè)社會(huì)的信息快速流通。齊克芒德認(rèn)為,“成功的管理者必須同時(shí)了解營(yíng)銷概念和信息系統(tǒng)結(jié)構(gòu),才能持續(xù)形成全面、可靠和完整的客戶觀念并加以成功應(yīng)用”企業(yè)必須建立適合自己的客戶管理系統(tǒng),構(gòu)建數(shù)據(jù)倉(cāng)庫(kù),將客戶關(guān)系管理系統(tǒng)與數(shù)據(jù)挖掘技術(shù)有效結(jié)合,深層分析存儲(chǔ)大量客戶信息的數(shù)據(jù)倉(cāng)庫(kù),提高企業(yè)市場(chǎng)競(jìng)爭(zhēng)力,獲得有利于商業(yè)運(yùn)作、有效信息,爭(zhēng)取新的客戶,讓已有的客戶創(chuàng)造更多的利潤(rùn)、保持住有價(jià)值的客戶 [2]。為降低成本,提高新浪微博生成效率,降低成本,如何在數(shù)量巨大的客戶中準(zhǔn)確地尋找到目標(biāo)客戶,成為一個(gè)急需解決的重要問(wèn)題。本論文采用數(shù)據(jù)挖掘的方法,分析模型的方式,建立客戶篩選模型,對(duì)這一問(wèn)題進(jìn)行深入分析、研究。與傳統(tǒng)的分析方法不同,數(shù)據(jù)挖掘技術(shù)(Date Mining, DM)是在沒(méi)有明確的假設(shè)下挖掘信息和發(fā)現(xiàn)知識(shí),它是數(shù)據(jù)庫(kù)只是發(fā)現(xiàn)(Knowledge Discover Database, KDD)中的一個(gè)步驟,是從龐大的數(shù)據(jù)中獲得潛在的、具有價(jià)值的