【正文】
個元組(或樣本)屬于預(yù)先定義的某一個類別,由一個類標(biāo)號屬性(class label attribute)來確定 這些元組(或樣本)的集合稱為訓(xùn)練集,用于構(gòu)建模型;由于提供了每個訓(xùn)練樣本的類標(biāo)號,稱作有指導(dǎo)的學(xué)習(xí) 最終的模型用決策樹、分類規(guī)則或者數(shù)學(xué)公式等來表示 模型應(yīng)用: 對未知的數(shù)據(jù)對象進行分類,分類分析第一步:構(gòu)建模型,分類分析第二步:模型應(yīng)用,分類分析舉例,對信用卡持卡人的信譽進行分類分析 記錄集合: 持卡人的記錄集 一組標(biāo)記:良好、普通、較差;(信譽程度) 先為每個持卡人賦予一個標(biāo)記,即信譽等級 對同類記錄(即同信譽等級的持卡人)的特征進行描述。 于是這個連鎖店的經(jīng)理當(dāng)機立斷地重新布置了貨架,把啤酒類商品布置在嬰兒尿布貨架附近,并在二者之間放上土豆片之類的佐酒小食品,同時把男士們需要的日常生活用品也就近布置。????39。????39。????39。????39。????39。????39。=GROUPING(color) FROM my_cube GROUP BY model, theyear, color WITH CUBE,GROUPING 是一個聚合函數(shù),它產(chǎn)生一個附加的列,當(dāng)用 CUBE 或 ROLLUP 運算符添加行時,附加的列輸出值為1,當(dāng)所添加的行不是由 CUBE 或 ROLLUP 產(chǎn)生時,附加列值為0。=GROUPING(theyear), color, 39。=GROUPING(model), year, 39。=SUM(units_sold), model, 39。 GROUP BY Model,cube,CUBE,cube,select Model, Year, Color, sum(Sales) from Sales groupby Model, Year, Color with cube,總行數(shù)= (model個數(shù)+1) * (theyear個數(shù)+1) * (color個數(shù)+1) = (2 + 1) * (3 + 1) * (3 + 1) = 48,CUBE,SELECT SUM(units_sold), model, theyear, color FROM my_cube GROUP BY model, theyear, color WITH CUBE,SELECT 39。 GROUP BY Model, Year UNION SELECT Model, ALL, ALL, SUM(Sales) FROM Sales WHERE Model = 39。 GROUP BY Model, Year, Color UNION SELECT Model, Year, ALL, SUM(Sales) FROM Sales WHERE Model = 39。 drill down 用戶想使用交叉表,F() G() H(),Red Brick的擴展,Ntile 將所有元組按值大小分為n個連續(xù)區(qū)間,每個區(qū)間的元組個數(shù)相同,返回每個區(qū)間的平均值 select percentile, avg(salary) from EMP groupby N_tile(salary, 10) as percentile Ratio_To_Total 計算每個分組的和在總和中的比例 Rank 返回值在所有列值中的序號,TOP,select [ top n [ percent ] [ with ties ] ] select_list,select top 5 title_id, price, type from titles select top 5 title_id, price, type from titles order by price desc select top 5 WITH TIES title_id, price, type from titles order by price desc select top 30 PERCENT title_id, price, type from titles order by price desc,我要的不多 只需要n個,直方圖,( select 1, avg(*) from EMP where salary = (select max(salary) from EMP) * 2/3 union ( select 2, avg(*) from EMP where salary =(select max(salary) from EMP) /3 union ( select 3, avg(*) from EMP where salary (select max(salary) from EMP) / 3,rank,select T1.S, GRADE, (select count (distinct T2.GRADE) from SC AS T2 where T1.GRADE = T2.GRADE) as rank from SC as T1 where GRADE is not null order by rank,中位數(shù),declare temp INT, median INT set temp = (select count(*) from sc) / 2 declare my_curs cursor for select GRADE from SC order by GRADE open my_curs while(temp0) begin fetch my_curs temp = temp – 1 end fetch my_curs into median,給出成績排在最中間的學(xué)生的成績,落差,create view rankgrade( GRADE, graderank ) as select GRADE, (select count( distinct GRADE ) from SC as T1 where T1.GRADE = T2.GRADE) as rank from SC AS T2 select G1=V1.GRADE, G2=V2.GRADE, DIFF=(V2.GRADE V1.GRADE) from rankgrade as V1 left outer join rankgrade as V2 on (V2.graderank = V1.graderank + 1),給出所有相鄰兩個成績之間的差,Skyline: 問題的引入,找一個便宜并且離海灘近的旅館 系統(tǒng)無法決定哪些是最好的,但它會提供所有的備選(interesting)旅館,也即它們不會在兩個維上都比其他任何旅館差,稱其為Skyline,Skyline: 問題的引入,稱點x統(tǒng)治(dominate)點y,如果x在所有維上都不比y差,并且至少在一個維上好過y 旅館 (price=50, distance=0.8)統(tǒng)治 (price=100,distance=1.0),Skyline:更高、更靠近河流的建筑,東食西宿:更英俊、更有錢,Skyline的性質(zhì),一個集合M,一個單調(diào)計分函數(shù)R,如果p?M使得R最大,那么p一定在M的Skyline中 不管你如何偏好旅館的價格和距離,你最中意的旅館總是在Skyline中 對Skyline中的任意一點p,總存在一個單調(diào)計分函數(shù),p使得它最大,也即Skyline不會包含不是任何人偏好的旅館 統(tǒng)治滿足傳遞性,也即如果p統(tǒng)治q,q統(tǒng)治r,則p統(tǒng)治r,帶Skyline的SQL擴展,SELECT…FROM…WHERE GROUP BY…HAVING… SKYLINE OF [ DISTINCT ] d1 [ MIN | MAX | DIFF ], … , dn [ MIN | MAX | DIFF ] TOP … ORDER BY…,SKYLINE OF