【正文】
e l C o l s amp。 overview of data) ? Projection to Latent潛在的 Structures (PLS。s to form the scores u. Notation Each obs has values of t (and u) – Each variable has values of p (and w and c) ? One Component consists of one t and one p (PCA) or t, p, w, u, c (PLS). The total number of ponents is A. ? Model: The data are approximated by a plane or hyper plane, (the model) with as many dimensions as ponents extracted. ? DModX: also called Distance to the model, is the distance of a given observation to the model plane. ? T2: Hotelling’ s T2, is a bination of all the scores (t) of all A ponents. T2 measures how far away an observation is from the center of a PC or PLS model. Notation ? R2X: The fraction of the variation of the X variables explained by the model. ? R2Y: The fraction of the variation of the Y variables explained by the model. ? Q2X: The fraction of the variation of the X variables predicted by the model. ? Q2Y: The fraction of the variation of the Y variables predicted by the model. MVA – SIMCA Road Map Methods available ? Preprocessing。 the new summarizing variables (coordinates in the hyper plane of Xspace) ? u: the Y scores in PLS。 A typical data analysis situation ? 12 Jams samples were made from berries plucked in various cultivars and seasonal times. ? Several parameters (sensory measurements) were measured on each sample. Data set Raspberry Jams What samples are similar/dissimilar to each other? Sample parison according to 1 variable: Redness What about the 11 other parameters? Sample parison according to 2 variables: Redness and colour What about the 10 other parameters? 456784 5 6 7 8 9 ( R E D N E S S , C O L O U R )C 1 _ H 1C 2 _ H 1C 3 _ H 1C 4 _ H 1C 1 _ H 3C 2 _ H 3C 3 _ H 3C 4 _ H 3C 1 _ H 2C 2 _ H 2C 3 _ H 2C 4 _ H 2E le m e n t s :S l o p e :O f f s e t :C o r r e la t io n :R M S E D :S E D :B i a s : 1 20 . 7 8 6 6 9 21 . 5 6 9 4 0 90 . 9 8 2 5 1 50 . 5 3 0 9 0 70 . 3 9 2 5 4 40 . 3 7 4 9 8 3Sample parison according to 3 variables: Redness, colour and R. Smell What about the 9 other parameters? ( R E D N E S S , C O L O U R , R . S M E L L )3 . 54 . 04 . 55 . 05 . 56 . 03456789 4 5678XY123456789101112Sample parison according to all 12 variables: multivariate model (PCA) 2101233 2 1 0 1 2 3 4 R E SU L T 1 , X e x p l : 5 8 % , 2 8 % C 1 _ H 1C 2 _ H 1C 3 _ H 1C 4 _ H 1C 1 _ H 3C 2 _ H 3C 3 _ H 3C 4 _ H 3C 1 _ H 2C 2 _ H 2C 3 _ H 2C 4 _ H 2P C 1P C 2 S c o re sMap of samples Sample parison according to all 12 variables: multivariate model (PCA) Map of variables 0 . 60 . 40 . 200 . 20 . 40 . 60 . 2 0 0 . 2 0 . 4 0 . 6 0 . 8 R E SU L T 1 , X e x p l : 5 8 % , 2 8 % R ED N E S SC O L O U RS H I N I N E SR . S M EL LR . F L A VS W EE T N ESS O U R N E SSB I T T ER N EO F F F L A VJ U I C I N EST H I C K N E SC H E W . R E SP C 1P C 2 X l o a d in g sSample parison according to all 12 variables: multivariate model (PCA) Map of Samples amp。這個圖叫做載荷圖。 而最后的幾個主成分和原先的變量就不那么相關(guān)了 。 ? 相關(guān)系數(shù) (絕對值 ) 越大 , 主成分對該變量的代表性也越大 。 ? 如用 x1,x2,x3,x4,x5,x6分別表示原先的六個變量 , 而用 y1,y2,y3,y4,y5,y6表示新的主成分 , 那么 , 原先六個變量 x1,x2,x3,x4,x5,x6與第一和第二主成分 y1,y2的關(guān)系為: x1= + x2= + x3= + x4= + x5= + x6= + ? 這些系數(shù)稱為主成分載荷 ( loading) , 它表示主成分和相應(yīng)的原先變量的相關(guān)系數(shù) 。是怎么樣的組合呢? C o m p o n e n t M a t r i xa . 8 0 6 . 3 5 3 . 0 4 0 . 4 6 8 . 0 2 1 . 0 6 8 . 6 7 4 . 5 3 1 . 4 5 4 . 2 4 0 . 0 0 1 . 0 0 6 . 6 7 5 . 5 1 3 . 4 9 9 . 1 8 1 . 0 0 2 . 0 0 3. 8 9 3 . 3 0 6 . 0 0 4 . 0 3 7 . 0 7 7 . 3 2 0. 8 2 5 . 4 3 5 . 0 0 2 . 0 7 9 . 3 4 2 . 0 8 3. 8 3 6 . 4 2 5 . 0 0 0 . 0 7 4 . 2 7 6 . 1 9 7M A T HP H Y SC H E ML I T E R A TH I S T O R YE N G L I S H1 2 3 4 5 6C o m p o n e n tE x t r a c t i o n M e t h o d : P r i n c i p a l C o m p o n e n t A n a l y s i s .6 c o m p o n e n t s e x t r a c t e d .a . ? 這里每一列代表一個主成分作為原來變量線性組合的系數(shù) ( 比例 ) 。 后面的特征值的貢獻(xiàn)越來越少 。 ? 這里的 Initial Eigenvalues就是這里的六個主軸長度 , 又稱特征值 ( 數(shù)據(jù)相關(guān)陣的特征值 ) 。 什么是標(biāo)準(zhǔn)呢 ? ? 那就是這些被選的主成分所代表主軸的長度之和占了主軸長度總和的大部分 。 ? 正如二維橢圓有兩個主軸 , 三維橢球有三個主軸一樣 , 有幾個變量 , 就有幾個主成分 。 ? 注意: 和二維情況類似 , 高維橢球的主軸也是互相垂直的 。 4 2 0 2 442024? 對于多維變量的情況和二維類似 , 也有高維的橢球 。 ? 如果長軸變量代表了數(shù)據(jù)包含的大部分信息 , 就用該變量代替原先的兩個變量 ( 舍去次要的一維 ) , 降維就完成了 。 ? 但坐標(biāo)軸通常并不和橢圓的長短軸平行 。 在短軸方向上 , 數(shù)據(jù)變化很少;在極端的情況 , 短軸如果退化成一點(diǎn) , 那只有在長軸的方向才能夠解釋這些點(diǎn)的變化了;這樣 , 由二維降到了一維 。 我們希望把 6維空間用低維空間表示 。 從本例可能提出的問題 ? 能不能將 6個變量用一兩個綜