【正文】
d Single, Divorced 80K 80K Ref un d Ma ri tal S tatu s Taxa bl e I nco m e Chea t No Ma rri ed 80K ? 10 Test Data Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced 80K 80K Ref un d Ma ri tal S tatu s Taxa bl e I nco m e Chea t No Ma rri ed 80K ? 10 Test Data Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced 80K 80K Ref un d Ma ri tal S tatu s Taxa bl e I nco m e Chea t No Ma rri ed 80K ? 10 Test Data Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced 80K 80K Ref un d Ma ri tal S tatu s Taxa bl e I nco m e Chea t No Ma rri ed 80K ? 10 Test Data Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced 80K 80K Ref un d Ma ri tal S tatu s Taxa bl e I nco m e Chea t No Ma rri ed 80K ? 10 Test Data Assign Cheat to “No” 決策樹原理 基本算法(貪心算法) 自上而下 分而治之 離散值 遞歸 啟發(fā)式規(guī)則 停止分割的條件 同一個類別 沒有屬性 Check for the above base cases. For each attribute a, find the normalized information gain ratio from splitting on a. Let a_best be the attribute with the highest normalized information gain. Create a decision node that splits on a_best. Recur on the sublists obtained by splitting on a_best, and add those nodes as children of node. Pseudocode 例子:算法過程 Ti d Refun d M ar italS t atu sT ax ableIne Chea t1 Y es S i n gl e 12 5 K No2 No M arr i ed 10 0 K No3 No S i n gl e 70K No4 Y es M arr i ed 12 0 K No5 No Di v orc ed 95K Y es6 No M arr i ed 60K No7 Y es Di v orc ed 22 0 K No8 No S i n gl e 85K Y es9 No M arr i ed 75K No10 No S i n gl e 90K Y es10Refund Yes No 1. samples = { 1,2,3,4,5,6,7,8,9,10 } attribute_list = {Refund, MarSt, TaxInc } 假設選擇 Refund為最優(yōu)分割屬性: 2. samples = { 1,4,7 } attribute_list = { MarSt, TaxInc } 3. samples = {2,3,5,6,8,9,10 } attribute_list = { MarSt, TaxInc } 例子:算法過程 Ti d Refun d M ar italS t atu sT ax ableIne Chea t1 Y es S i n gl e 12 5 K No2 No M arr i ed 10 0 K No3 No S i n gl e 70K No4 Y es M arr i ed 12 0 K No5 No Di v orc ed 95K Y es6 No M arr i ed 60K No7 Y es Di v orc ed 22 0 K No8 No S i n gl e 85K Y es9 No M arr i ed 75K No10 No S i n gl e 90K Y es1