freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

數(shù)據(jù)挖掘概念與技術(shù)chapter2-了解數(shù)據(jù)(留存版)

2025-05-06 07:50上一頁面

下一頁面
  

【正文】 Example ? 性別是對(duì)稱屬性 ? The remaining attributes are asymmetric binary ? 令 Y and P 值為 1, 且 N值為 0 N a m e G e n d e r F e v e r C o u g h T e s t 1 T e s t 2 T e s t 3 T e s t 4J a c k M Y N P N N NM a ry F Y N P N P NJ im M Y P N N N N21121),(11111),(10210),(???????????????m a r yjimdjimj a c kdm a r yj a c kd57 規(guī)范數(shù)值數(shù)據(jù) ? Zscore: ? X: 需標(biāo)準(zhǔn)化的原始數(shù)值 , μ: 總體均值 , σ: 標(biāo)準(zhǔn)差 ? 在標(biāo)準(zhǔn)偏差單位下, 原始分?jǐn)?shù)和總體均值之間的距離 ? ―‖, ―+‖ ? 另一種方法 : Calculate the mean absolute deviation 其中 ? standardized measure (zscore): ? 使用平均絕對(duì)偏差比使用標(biāo)準(zhǔn)差更穩(wěn)健 .). . .211 nffff xx(xn m ????|)|. . .|||(|1 21 fnffffff mxmxmxns ???????ffifif smx z ??? ??? x z58 例 : 數(shù)據(jù)矩陣和相異度矩陣 p o i n t a t t r i b u t e 1 a t t r i b u t e 2x1 1 2x2 3 5x3 2 0x4 4 5Dissimilarity Matrix (with Euclidean Distance) x1 x2 x3 x4x1 0x2 3 . 6 1 0x3 5 . 1 5 . 1 0x4 4 . 2 4 1 5 . 3 9 0Data Matrix 59 數(shù)值數(shù)據(jù)的距離 : Minkowski Distance ? Minkowski distance:一種流行的距離測(cè)度 其中 i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp)為兩個(gè) p維數(shù)據(jù)點(diǎn) , and h is the order (the distance so defined is also called Lh norm) ? 特性 ? d(i, j)0 if i≠j, and d(i,i)=0 (正定 Positive definiteness) ? d(i, j) = d(j, i) (Symmetry) ? d(i, j) ? d(i, k) + d(k, j) (Triangle Inequality) ? A distance that satisfies these properties is a metric度量 60 閔可夫斯基距離 特殊形式 ? h = 1: Manhattan (city block, L1 norm) distance曼哈頓距離( L1范數(shù)) ? ., the Hamming distance: the number of bits that are different between two binary vectors ? h = 2: (L2 norm) Euclidean distance ? h ? ?.上確界 “supremum” (Lmax norm, L? norm) distance. ? This is the maximum difference between any ponent (attribute) of the vectors )||. . .|||(|),( 2222211 pp jxixjxixjxixjid ???????||.. .||||),( 2211 pp jxixjxixjxixjid ???????61 Example: Minkowski Distance Dissimilarity Matrices p o i n t a t t r i b u t e 1 a t t r i b u t e 2x1 1 2x2 3 5x3 2 0x4 4 5L x1 x2 x3 x4x1 0x2 5 0x3 3 6 0x4 6 1 7 0L2 x1 x2 x3 x4x1 0x2 3 . 6 1 0x3 2 . 2 4 5 . 1 0x4 4 . 2 4 1 5 . 3 9 0L ? x1 x2 x3 x4x1 0x2 3 0x3 2 5 0x4 3 1 5 0Manhattan (L1) Euclidean (L2) Supremum 62 有序變量 Ordinal Variables ? 一個(gè)序變量可以離散的或連續(xù)的 ? Order is important, ., rank ? Can be treated like intervalscaled ? 用他們的序代替 xif ? 映射每一個(gè)變量的范圍于 [0,1],用如下支代替第 fth變量的 ith對(duì)象 ? pute the dissimilarity using methods for intervalscaled variables 11???fifif Mrz},...,1{ fif Mr ?63 混合型屬性 ? A database may contain all attribute types ? Nominal, symmetric binary, asymmetric binary, numeric, ordinal ? 可以用加權(quán)法計(jì)算合并的影響 ? f is binary or nominal: dij(f) = 0 if xif = xjf , or dij(f) = 1 otherwise ? f is numeric: use the normalized distance ? f is ordinal ? Compute ranks rif and ? Treat zif as intervalscaled )(1)()(1),(fijpffijfijpf djid???????11???fifMrzif64 余弦相似性 Cosine Similarity ? A document can be represented by thousands of attributes, each recording the frequency of a particular word (such as keywords) or phrase in the document. ? Other vector objects: gene features in microarrays, … ? Applications: information retrieval, biologic taxonomy, gene feature mapping, ... ? Cosine measure: If d1 and d2 are two vectors (., termfrequency vectors), then cos(d1, d2) = (d1 ? d2) /||d1|| ||d2|| , where ? indicates vector dot product, ||d||: the length of vector d ??
點(diǎn)擊復(fù)制文檔內(nèi)容
教學(xué)課件相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖鄂ICP備17016276號(hào)-1