正文內(nèi)容

交叉驗證我的總結(jié)(存儲版)

2025-09-09 12:18上一頁面

下一頁面

　　

【正文】然后用訓(xùn)練集2來訓(xùn)練NN,SVM等,用驗證集來驗證所得分類器(此處以分類為例,對回歸應(yīng)該也一樣)的錯誤碼率然后再次選擇另外n/k個元素組成驗證集剩下的做為訓(xùn)練集2循環(huán),直到所有元素n/k個元素全部被選擇一遍為止比較以上每次循環(huán)所得分類器的錯誤率把所得錯誤率最低的那個參數(shù)認為是最優(yōu)的參數(shù)fold該過程重復(fù)K次，取K次過程中的測試錯誤的平均值作為推廣誤差。其過程為：將訓(xùn)練樣本集隨機分為K個集合，通常分為K等份，對其中的K1個集合進行訓(xùn)練，得到一個決策函數(shù)，并用決策函數(shù)對剩下的一個集合進行樣本測試。validation的一個特例即loo中的k=n.kfoldcrossvalidation主要是干什么：可知：影響期望風(fēng)險上界的因子有兩個方面：首先是訓(xùn)練集的規(guī)模 n，其次是 VC 維 h?？偨Y(jié)一下就是訓(xùn)練樣本在線性可分的情況下，全部樣本能被正確地分類（咦這個不就是傳說中的 yi*(w*xi+b)）=1的條件嗎），即經(jīng)驗風(fēng)險Remp 為 0 的前提下，通過對分類間隔最大化（咦，這個就是Φ（w）＝(1/2)*w*w嘛），使分類器獲得最好的推廣性能。那么解釋完線性可分了，我們知道其實很多時候是線性不可分的啊，那么有什么區(qū)別沒有?。繌U話區(qū)別當(dāng)然會有啦，嘿嘿那么什么是本質(zhì)的區(qū)別??？本質(zhì)的區(qū)別就是不知道是否線性可分但是允許有錯分的樣本存在（這個咋回事還是沒明白hoho）但是正是由于允許存在錯分樣本，此時的軟間隔分類超平面表示在剔除那些錯分樣本后最大分類間隔的超平面。本文選定 RBF 核為 SVM 的核函數(shù)（RBF 核K(x, y) = exp(－γ || x －y ||的平方),γ 0）。 (4)用交叉驗證找到最好的參數(shù) C 和γ 。這樣，整個訓(xùn)練集中的每一個子集被預(yù)測一次，交叉驗證的正確率是 k次正確分類數(shù)據(jù)百分比的平均值。參考：5. 選擇核函數(shù)，可以優(yōu)先考慮rbf。當(dāng)然有人會擔(dān)心3步中抽樣會有一定的誤差，導(dǎo)致8得到的精度不一定是最好的，因此可以重復(fù)3－8得到多個模型的精度，然后選擇最好的一個精度最為模型的精度（或者求所有精度的均值做為模型精度）。 2. 兩組子集必須從完整集合中均勻取樣。不過在實務(wù)上2CV并不常用，主要原因是training set樣本數(shù)太少，通常不足以代表母體樣本的分布，導(dǎo)致test階段辨識率容易出現(xiàn)明顯落差。實驗過程中沒有隨機因素會影響實驗數(shù)據(jù)，確保實驗過程是可以被復(fù)制的。而kCV的test辨識率，則是k組test sets對應(yīng)到EA訓(xùn)練所得的k個classifiers辨識率之平均值。用驗證集來驗證所得分類器或者回歸的錯誤碼率。Figure 26: Cross validation checks how well a model generalizes to new dataFig. 26 shows an example of cross validation performing better than residual error. The data set in the top two graphs is a simple underlying function with significant noise. Cross validation tells us that broad smoothing is best. The data set in the bottom two graphs is a plex underlying function with no noise. Cross validation tells us that very little smoothing is best for this data set. Now we return to the question of choosing a good metacode for data set : File Open Edit Metacode A90:9Model LOOPredictEdit Metacode L90:9Model LOOPredictEdit Metacode L10:9Model LOOPredictLOOPredict goes through the entire data set and makes LOO predictions for each point. At the bottom of the page it shows the summary statistics including Mean LOO error, RMS LOO error, and information about the data point with the largest error. The mean absolute LOOXVEs for the three metacodes given above (the same three used to generate the graphs in fig. 25), are , , and . Those values show that global linear regression is the best metacode of those three, which agrees with our intuitive feeling from looking at the plots in fig. 25. If you repeat the above operation on data set you39。 Ripley, 1996, p. 73). Choice of crossvalidation method+++++++++++++++++++++++++++++++++Crossvalidation can be used simply to estimate the generalization error ofa given model, or it can be used for model selection by choosing one ofseveral models that has the smallest estimated generalization error. Forexample, you might use crossvalidation to choose the number of hiddenunits, or you could use crossvalidation to choose a subset of the inputs(subset selection). A subset that contains all relevant inputs will becalled a good subsets, while the subset that contains all relevant inputsbut no others will be called the best subset. Note that subsets are goodand best in an asymptotic sense (as the number of training cases goes toinfinity). With a small training set, it is possible that a subset that issmaller than the best subset may provide better generalization error. Leaveoneout crossvalidation often works well for estimatinggeneralization error for continuous error functions such as the mean squarederror, but it may perform poorly for discontinuous error functions such asthe number of misclassified cases. In the latter case, kfoldcrossvalidation is preferred. But if k gets too small, the error estimateis pessimistically biased because of the difference in trainingset sizebetween the fullsample analysis and the crossvalidation analyses. (Formodelselection purposes, this bias can actually help。re stuckwith. So there is no conflict between Shao and Kearns, but there is aconflict between the two goals of choosing the best subset and getting thebest generalization in splitsample validation. Bootstrapping+++++++++++++Bootstrapping seems to work better than crossvalidation in many cases(Efron, 1983). In the simplest form of bootstrapping, instead of repeatedlyanalyzing subsets of the data, you repeatedly analyze subsamples of thedata. Each subsample is a random sample with replacement from the fullsample. Depending on what you want to do, anywhere from 50 to 2000subsamples might be used. There are many more sophisticated bootstrapmethods that can be used not only for estimating generalization error butalso for estimating confidence bounds for network outputs (Efron andTibshirani 1993). For estimating generalization error in classificationproblems, the .632+ bootstrap (an improvement on the popular .632 bootstrap)is one of the currently favored methods that has the advantage of performingwell even when there is severe overfitting. Use of bootstrapping for NNs isdescribed in Baxt and White (1995), Tibshirani (1996), and Masters (1995).However, the results obtained so far are not very thorough, and it is knownthat bootstrapping does not work well for some other methodologies such asempirical decision trees (Breiman, Friedman, Olshen, and Stone, 1984。 Shao and Tu, 1995). Hence, these r

點擊復(fù)制文檔內(nèi)容

數(shù)學(xué)相關(guān)推薦

freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

交叉驗證我的總結(jié)(存儲版)

交叉銷售案例-資料下載頁

交叉配血不合的原概要-資料下載頁

潔凈車間驗證方案潔凈廠房驗證方案-資料下載頁

交叉作業(yè)安全防護方案與交叉路口安全專項方案匯編-資料下載頁

交叉施工作業(yè)安全合同-資料下載頁

公司制劑工藝驗證工藝驗證管理規(guī)程-資料下載頁

潔凈車間驗證方案潔凈廠房驗證方案-資料下載頁

檢驗方法的驗證與確認-資料下載頁

純化水系統(tǒng)再驗證驗證方案(定稿)-資料下載頁

藥品生產(chǎn)驗證(工藝驗證)-資料下載頁

前交叉韌帶損傷的教學(xué)查房-資料下載頁

橋下交叉施工方案-資料下載頁

交叉施工作業(yè)安全合同-資料下載頁

經(jīng)濟學(xué)研究的學(xué)科交叉-資料下載頁

交叉帶小車的設(shè)計開題報告-資料下載頁

交叉驗證我的總結(jié)-文庫吧

交叉驗證我的總結(jié)-wenkub

交叉驗證我的總結(jié)(已修改)

交叉驗證我的總結(jié)(編輯修改稿)