freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

數(shù)據(jù)挖掘概念與技術(shù)chapter2-了解數(shù)據(jù)-wenkub

2023-04-06 07:50:08 本頁面
 

【正文】 te) of the vectors )||. . .|||(|),( 2222211 pp jxixjxixjxixjid ???????||.. .||||),( 2211 pp jxixjxixjxixjid ???????61 Example: Minkowski Distance Dissimilarity Matrices p o i n t a t t r i b u t e 1 a t t r i b u t e 2x1 1 2x2 3 5x3 2 0x4 4 5L x1 x2 x3 x4x1 0x2 5 0x3 3 6 0x4 6 1 7 0L2 x1 x2 x3 x4x1 0x2 3 . 6 1 0x3 2 . 2 4 5 . 1 0x4 4 . 2 4 1 5 . 3 9 0L ? x1 x2 x3 x4x1 0x2 3 0x3 2 5 0x4 3 1 5 0Manhattan (L1) Euclidean (L2) Supremum 62 有序變量 Ordinal Variables ? 一個(gè)序變量可以離散的或連續(xù)的 ? Order is important, ., rank ? Can be treated like intervalscaled ? 用他們的序代替 xif ? 映射每一個(gè)變量的范圍于 [0,1],用如下支代替第 fth變量的 ith對象 ? pute the dissimilarity using methods for intervalscaled variables 11???fifif Mrz},...,1{ fif Mr ?63 混合型屬性 ? A database may contain all attribute types ? Nominal, symmetric binary, asymmetric binary, numeric, ordinal ? 可以用加權(quán)法計(jì)算合并的影響 ? f is binary or nominal: dij(f) = 0 if xif = xjf , or dij(f) = 1 otherwise ? f is numeric: use the normalized distance ? f is ordinal ? Compute ranks rif and ? Treat zif as intervalscaled )(1)()(1),(fijpffijfijpf djid???????11???fifMrzif64 余弦相似性 Cosine Similarity ? A document can be represented by thousands of attributes, each recording the frequency of a particular word (such as keywords) or phrase in the document. ? Other vector objects: gene features in microarrays, … ? Applications: information retrieval, biologic taxonomy, gene feature mapping, ... ? Cosine measure: If d1 and d2 are two vectors (., termfrequency vectors), then cos(d1, d2) = (d1 ? d2) /||d1|| ||d2|| , where ? indicates vector dot product, ||d||: the length of vector d ????????piipiipiiiyxyxyx12121),c o s (65 Example: Cosine Similarity ? cos(d1, d2) = (d1 ? d2) /||d1|| ||d2|| , where ? indicates vector dot product, ||d|: the length of vector d ? Ex: Find the similarity between documents 1 and 2. d1 = (5, 0, 3, 0, 2, 0, 0, 2, 0, 0) d2 = (3, 0, 2, 0, 1, 1, 0, 1, 0, 1) d1?d2 = 5*3+0*0+3*2+0*0+2*1+0*1+0*1+2*1+0*0+0*1 = 25 ||d1||= (5*5+0*0+3*3+0*0+2*2+0*0+0*0+2*2+0*0+0*0)=(42) = ||d2||= (3*3+0*0+2*2+0*0+1*1+1*1+0*0+1*1+0*0+1*1)=(17) = cos(d1, d2 ) = 66 Summary ? Data attribute types: nominal, binary, ordinal, intervalscaled, ratioscaled ? Many types of data sets, ., numerical, text, graph, Web, image. ? Gain insight into the data by: ? Basic statistical data description: central tendency, dispersion, graphical displays ? Data visualization: map data onto graphical primitives ? Measure data similarity ? Above steps are the beginning of data preprocessing. ? Many methods have been developed but still an active area of research. 67 References ? W. Cleveland, Visualizing Data, Hobart Press, 1993 ? T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley, 2022 ? U. Fayyad, G. Grinstein, and A. Wierse. Information Visualization in Data Mining and Knowledge Discovery, Man Kaufmann, 2022 ? L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley amp。 ? Sum, count ? 代數(shù)度量 algebraic measure ? 可用一個(gè)函數(shù)于一個(gè)或多個(gè)分布度量計(jì)算的度量 ? 整體度量 holistic measure ? 必須對整個(gè)數(shù)據(jù)集計(jì)算的度量 13 度量數(shù)據(jù)的中心趨勢 ? 均值 (代數(shù)度量 ) (樣本 vs. 總體 ): Note: n 樣本大小, N 總體大小 . ? 加權(quán)算術(shù)均值 : ? 截?cái)嗑?: 去掉高低極端值 ? 中位數(shù) : ? 奇數(shù)則為有序集的中間值 , 否則為中間兩個(gè)數(shù)的平均
點(diǎn)擊復(fù)制文檔內(nèi)容
教學(xué)課件相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖片鄂ICP備17016276號(hào)-1