正文內(nèi)容

《數(shù)據(jù)倉庫與數(shù)據(jù)挖掘》第9章(文件)

2025-02-05 00:07 上一頁面

下一頁面

　

【正文】 tributes are categorical (if continuousvalued, they are discretized in advance)n Examples are partitioned recursively based on selected attributesn Test attributes are selected on the basis of a heuristic or statistical measure (., information gain)n Conditions for stopping partitioningn All samples for a given node belong to the same classn There are no remaining attributes for further partitioning – majority voting is employed for classifying the leafn There are no samples left2023/2/27 星期六 13Data Mining: Concepts and TechniquesAttribute Selection Measure: Information Gain (ID3/)n Select the attribute with the highest information gainn S contains si tuples of class Ci for i = {1, …, m} n information measures info required to classify any arbitrary tuplen entropy of attribute A with values {a1,a2,…,av}n information gained by branching on attribute A2023/2/27 星期六 14Data Mining: Concepts and TechniquesAttribute Selection by Information Gain Computationg Class P: buys_puter = “yes”g Class N: buys_puter = “no”g I(p, n) = I(9, 5) =g Compute the entropy for age: means “age =30” has 5 out of 14 samples, with 2 yes’es and 3 no’s. HenceSimilarly,2023/2/27 星期六 15Data Mining: Concepts and TechniquesOther Attribute Selection Measuresn Gini index (CART, IBM IntelligentMiner)n All attributes are assumed continuousvaluedn Assume there exist several possible split values for each attributen May need other tools, such as clustering, to get the possible split valuesn Can be modified for categorical attributes2023/2/27 星期六 16Data Mining: Concepts and TechniquesGini Index (IBM IntelligentMiner)n If a data set T contains examples from n classes, gini index, gini(T) is defined as where pj is the relative frequency of class j in T.n If a data set T is split into two subsets T1 and T2 with sizes N1 and N2 respectively, the gini index of the split data contains examples from n classes, the gini index gini(T) is defined asn The attribute provides the smallest ginisplit(T) is chosen to split the node (need to enumerate all possible splitting points for each attribute).2023/2/27 星期六 17Data Mining: Concepts and TechniquesExtracting Classification Rules from Treesn Represent the knowledge in the form of IFTHEN rulesn One rule is created for each path from the root to a leafn Each attributevalue pair along a path forms a conjunctionn The leaf node holds the class predictionn Rules are easier for humans to understandn ExampleIF age = “=30” AND student = “no” THEN buys_puter = “no”IF age = “=30” AND student = “yes” THEN buys_puter = “yes”IF age = “31…40” THEN buys_puter = “yes”IF age = “40” AND credit_rating = “excellent” THEN buys_puter = “yes”IF age = “=30” AND credit_rating = “fair” THEN buys_puter = “no”2023/2/27 星期六 18Data Mining: Concepts and TechniquesAvoid Overfitting in Classificationn Overfitting: An induced tree may overfit the training data n Too many branches, some may reflect anomalies due to noise or outliersn Poor accuracy for unseen samplesn Two approaches to avoid overfitting n Prepruning: Halt tree construction early—do not split a node if this would result in the goodness measure falling below a thresholdn Difficult to choose an appropriate thresholdn Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned treesn Use a set of data different from the training data to decide which is the “best pruned tree”2023/2/27 星期六 19Data Mining: Concepts and TechniquesApproaches to Determine the Final Tree Sizen Separate training (2/3) and testing (1/3) setsn Use cross validation, ., 10fold cross validationn Use all the data for trainingn but apply a statistical test (., chisquare) to estimate whether expanding or pruning a node may improve the entire distributionn Use minimum description length (MDL) principlen halting growth of the tree when the encoding is minimized2023/2/27 星期六 20Data Mining: Concepts and TechniquesEnhancements to basic decision tree inductionn Allow for continuousvalued attributesn Dynamically define new discretevalued attributes that partition the continuous attribute value into a discrete set of intervalsn Handle missing attribute valuesn Assign the most mon value of the attributen Assign probability to each of the possible valuesn Attribute constructionn Create new attributes based on existing ones that are sparsely representedn This reduces fragmentation, repetition, and replication2023/2/27 星期六 21Data Mining: Concepts and TechniquesClassification in Large Databasesn Classification—a classical problem extensively studied by statisticians and machine learning researchersn Scalability: Classifying data sets with millions of examples and hundreds of attributes with reasonable speedn Why decision tree induction in data mining?n relatively faster learning speed (than other classification methods)n convertible to simple and easy to understand classification rulesn can use SQL queries for accessing databasesn parable classification accuracy with other methods2023/2/27 星期六 22Data Mining: Concepts and TechniquesScalable Decision Tree Induction Methods in Data Mining Studiesn SLIQ (EDBT’96 — Mehta et al.)n builds an index for each attribute and only class list and the current attribute list reside in memoryn SPRINT (VLDB’96 — J. Shafer et al.)n constructs an attribute list data structure n PUBLIC (VLDB’98 — Rastogi Shim)n integrates tree splitting and tree pruning: stop growing the tree earliern RainForest (VLDB’98 — Gehrke, Ramakrishnan Ganti)n separates the scalability aspects from the criteria that determine the quality of the treen builds an AVClist (attribute, value, class label)2023/2/27 星期六 23Data Mining: Concepts and TechniquesData CubeBased Decisio

點(diǎn)擊復(fù)制文檔內(nèi)容

試題試卷相關(guān)推薦

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘技術(shù)教案ppt(6-10章)-資料下載頁

【摘要】E-MAIL:數(shù)據(jù)倉庫與數(shù)據(jù)挖掘技術(shù)ElectronicCommerce夏火松E-MAIL:?IstituteOfMISAndLMS,wuse()E-MAIL:第6

2025-03-09 12:39

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘2--資料下載頁

【摘要】0第二章數(shù)據(jù)倉庫原理1第二章數(shù)據(jù)倉庫原理?數(shù)據(jù)倉庫定義?數(shù)據(jù)倉庫特征?數(shù)據(jù)庫體系化環(huán)境?數(shù)據(jù)倉構(gòu)造模式?數(shù)據(jù)倉庫概念結(jié)構(gòu)?數(shù)據(jù)倉庫中的數(shù)據(jù)組織?小節(jié)2?數(shù)據(jù)倉庫中的數(shù)據(jù)組織?粒度?分區(qū)?維度?元數(shù)據(jù)

2025-09-25 17:57

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘概述(1)-資料下載頁

【摘要】第1章數(shù)據(jù)倉庫與數(shù)據(jù)挖掘概述第1章數(shù)據(jù)倉庫的興起數(shù)據(jù)挖掘的興起數(shù)據(jù)倉庫和數(shù)據(jù)挖掘的結(jié)合數(shù)據(jù)倉庫的興起?從數(shù)據(jù)庫到數(shù)據(jù)倉庫?從OLTP到OLAP?數(shù)據(jù)字典與元數(shù)據(jù)?數(shù)據(jù)倉庫的定義與特點(diǎn)從數(shù)據(jù)庫到數(shù)據(jù)倉庫（1

2025-05-15 00:05

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘基礎(chǔ)第2章olap(趙志升)xxxx修改-資料下載頁

【摘要】河北北方學(xué)院：趙志升數(shù)據(jù)倉庫與數(shù)據(jù)挖掘DataWarehouseandDataMining?4．1OLAP概念、特點(diǎn)與分類?4．2OLAP的基本操作?4．3OLAP的數(shù)據(jù)模型?4．4基于多維數(shù)據(jù)庫的OLAP(MOLAP)?4

2025-03-09 12:39

數(shù)據(jù)倉庫和數(shù)據(jù)挖掘的olap技術(shù)-資料下載頁

【摘要】數(shù)據(jù)倉庫和數(shù)據(jù)挖掘的OLAP技術(shù)數(shù)據(jù)倉庫和數(shù)據(jù)挖掘的OLAP技術(shù)?什么是數(shù)據(jù)倉庫？?多維數(shù)據(jù)模型?數(shù)據(jù)倉庫的體系結(jié)構(gòu)?數(shù)據(jù)倉庫實(shí)現(xiàn)?數(shù)據(jù)立方體技術(shù)的進(jìn)一步發(fā)展?從數(shù)據(jù)倉庫到數(shù)據(jù)挖掘什么是數(shù)據(jù)倉庫??數(shù)據(jù)倉庫的定義很多，但卻很難有一種嚴(yán)格的定義?它是一個(gè)提供決策支持功能的數(shù)據(jù)庫，它與公司的操作數(shù)據(jù)庫分開維護(hù)。?為統(tǒng)一的歷

2025-01-25 18:09

第2章數(shù)據(jù)倉庫的數(shù)據(jù)存儲(chǔ)與處理-資料下載頁

【摘要】數(shù)據(jù)倉庫與數(shù)據(jù)挖掘第2章數(shù)據(jù)倉庫的數(shù)據(jù)存儲(chǔ)與處理2數(shù)據(jù)倉庫的三層數(shù)據(jù)結(jié)構(gòu)數(shù)據(jù)倉庫的數(shù)據(jù)特征數(shù)據(jù)倉庫的數(shù)據(jù)ETL過程多維數(shù)據(jù)模型主要內(nèi)容3數(shù)據(jù)倉庫的三層數(shù)據(jù)結(jié)構(gòu)導(dǎo)出數(shù)據(jù)(如:數(shù)據(jù)集市)數(shù)據(jù)集市元數(shù)據(jù)調(diào)和數(shù)據(jù)(EDWO

2025-01-11 13:12

第1章數(shù)據(jù)倉庫概述-資料下載頁

【摘要】第1章數(shù)據(jù)倉庫概述DWDM1．l數(shù)據(jù)庫到數(shù)據(jù)倉庫的演變DWDM第一章數(shù)據(jù)倉庫概述數(shù)據(jù)庫到數(shù)據(jù)倉庫的演變倉庫的應(yīng)用前景數(shù)據(jù)庫到數(shù)據(jù)倉庫的演變數(shù)據(jù)庫的發(fā)展→企業(yè)運(yùn)營環(huán)境→以數(shù)據(jù)庫為中心企業(yè)級(jí)數(shù)據(jù)庫市場部→銷售、市場策劃財(cái)務(wù)部→產(chǎn)生財(cái)務(wù)報(bào)表人事部→人員

2025-01-10 02:25

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘第二章b-資料下載頁

2025-09-25 18:05

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘的決策支持-資料下載頁

【摘要】第5章數(shù)據(jù)倉庫與數(shù)據(jù)挖掘的決策支持?基本原理?知識(shí)發(fā)現(xiàn)與?數(shù)據(jù)挖掘?數(shù)據(jù)倉庫系統(tǒng)?聯(lián)機(jī)分析處理?數(shù)據(jù)倉庫與數(shù)據(jù)挖掘的決策支持?jǐn)?shù)據(jù)倉庫的支持系統(tǒng)決策支持系統(tǒng)醫(yī)藥信息工程學(xué)院.數(shù)據(jù)倉庫與數(shù)據(jù)挖掘的決策支持?jǐn)?shù)據(jù)倉庫的基本原理數(shù)據(jù)倉庫的興起

2025-05-13 01:59

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘?qū)嶒?yàn)報(bào)告-資料下載頁

【摘要】數(shù)據(jù)倉庫與數(shù)據(jù)挖掘?qū)嶒?yàn)報(bào)告姓名：巖羊先生班級(jí)：數(shù)技2011學(xué)號(hào)：XXXXXX實(shí)驗(yàn)日期：2013年11月14日目錄實(shí)驗(yàn) 4【實(shí)驗(yàn)?zāi)康摹?41、熟悉SQLservermanagerstudio和VisualStudio2008軟件功能和操作特點(diǎn); 42、了解S

2025-07-21 11:15

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘課程設(shè)計(jì)-資料下載頁

【摘要】目錄1.緒論 2 2提出問題 22數(shù)據(jù)庫倉庫與數(shù)據(jù)集的概念介紹 2 2 23數(shù)據(jù)倉庫 3數(shù)據(jù)倉庫的設(shè)計(jì) 3 3 3數(shù)據(jù)倉庫的建立 3 3 4 4 4 4 45、實(shí)驗(yàn)心得 126、大總結(jié) 121.緒論在現(xiàn)在大數(shù)據(jù)時(shí)代，各行各業(yè)需要對(duì)商品及相關(guān)關(guān)節(jié)的數(shù)據(jù)進(jìn)行收集處理，

2025-06-25 07:21

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘課程設(shè)計(jì)-資料下載頁

【摘要】《數(shù)據(jù)倉庫與數(shù)據(jù)挖掘》大作業(yè)院（系）名稱信息技術(shù)學(xué)院專業(yè)年級(jí)10級(jí)電子商務(wù)學(xué)號(hào)101144054學(xué)生姓名張澤果

2025-05-07 21:19

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘概述(學(xué)時(shí)2)-資料下載頁

【摘要】數(shù)據(jù)倉庫與數(shù)據(jù)挖掘（DATAWAREHOUSINGANDDATAMINING）石家莊鐵道大學(xué)DW&DM2021/11/12-2-課程介紹21世紀(jì)是一個(gè)以計(jì)算機(jī)技術(shù)和知識(shí)經(jīng)濟(jì)為核心的信息化時(shí)代。隨著計(jì)算機(jī)技術(shù)、網(wǎng)絡(luò)技術(shù)的飛速發(fā)展和數(shù)據(jù)庫應(yīng)用的不斷深化，數(shù)據(jù)倉庫（DataWareh

2025-10-10 19:41

ch14數(shù)據(jù)倉庫與數(shù)據(jù)挖掘-資料下載頁

【摘要】2021年6月14日星期一數(shù)據(jù)庫教程（沈）1第四部分新技術(shù)篇1.概述2.數(shù)據(jù)倉庫

2025-05-11 16:42

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘項(xiàng)目建設(shè)方案-資料下載頁

【摘要】數(shù)據(jù)倉庫與數(shù)據(jù)挖掘項(xiàng)目建設(shè)1.數(shù)據(jù)倉庫知識(shí)簡介軟件質(zhì)量控制的主要目的是為了獲得更高的開發(fā)效率，避免返工，提高產(chǎn)品的市場競爭力，從而為客戶提高符合質(zhì)量需求的穩(wěn)定可靠的軟件產(chǎn)品，同時(shí)它也是控制方法的集合，包括軟件建模、度量、評(píng)審以及其他活動(dòng)。：1.目標(biāo)問題度量法，即通過軟件質(zhì)量目標(biāo)并持續(xù)觀察這些目標(biāo)是否達(dá)到軟件質(zhì)量控制的一種方法2.風(fēng)險(xiǎn)管理法，即識(shí)別與控制軟件開發(fā)中對(duì)成

2025-05-15 00:09

freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

《數(shù)據(jù)倉庫與數(shù)據(jù)挖掘》第9章(文件)

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘技術(shù)教案ppt(6-10章)-資料下載頁

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘2--資料下載頁

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘概述(1)-資料下載頁

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘基礎(chǔ)第2章olap(趙志升)xxxx修改-資料下載頁

數(shù)據(jù)倉庫和數(shù)據(jù)挖掘的olap技術(shù)-資料下載頁

第2章數(shù)據(jù)倉庫的數(shù)據(jù)存儲(chǔ)與處理-資料下載頁

第1章數(shù)據(jù)倉庫概述-資料下載頁

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘第二章b-資料下載頁

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘的決策支持-資料下載頁

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘?qū)嶒?yàn)報(bào)告-資料下載頁

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘課程設(shè)計(jì)-資料下載頁

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘課程設(shè)計(jì)-資料下載頁

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘概述(學(xué)時(shí)2)-資料下載頁

ch14數(shù)據(jù)倉庫與數(shù)據(jù)挖掘-資料下載頁

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘項(xiàng)目建設(shè)方案-資料下載頁

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘第9章(留存版)

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘第9章-文庫吧

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘第9章-wenkub

數(shù)據(jù)倉庫與數(shù)據(jù)挖掘第9章(已修改)