【正文】
has a greater chance of being fitted accidentally by errors in data復(fù)雜的模型在擬合上更容易受錯誤數(shù)據(jù)誤導(dǎo) 因此在評估一個模型時需要考慮其模型復(fù)雜度 Regularization(規(guī)范化) 直觀的 : small values for parameters “Simpler” hypothesis Less prone to overfitting Regularization L2 and L1 regularization ? L2: easy to optimize, closed form solution ? L1: sparsity More than two classes? More than two classes 評論最小二乘分類 ? 不是分類問題最好的辦法 ? But 易于訓(xùn)練 , closed form solution(閉式解) 可以與很多經(jīng)典的學(xué)習(xí)原理相結(jié)合 Crossvalidation(交叉驗(yàn)證) ? 基本思想 : 如果一個模型有一些過擬合(對訓(xùn)練數(shù)據(jù)敏感),那么這個模型是不穩(wěn)定的。也就是說移除部分?jǐn)?shù)據(jù)會顯著地改變擬合結(jié)果。 ? 因此我們先 取出 部分?jǐn)?shù)據(jù),在剩余數(shù)據(jù)中做擬合,然后在取出的數(shù)據(jù)中做測試 Crossvalidation Crossvalidation Crossvalidation Crossvalidation Learning Framework Model/parameter learning paradigm ? Choose a model class NB, kNN, decision tree, loss/regularization bination ? Model selection Cross validation ? Training Optimization ? Testing Summary Supervised learning (1)Classification Na239。ve Bayes model Decision tree Least squares classification (2)Regression Least squares regression 課后思考題 ? 試證明對于不含沖突數(shù)據(jù)(即特征向量完全相同但標(biāo)記不同)的訓(xùn)練集,必存在與訓(xùn)練集一致(即訓(xùn)練誤差為 0)的決策樹。 演講完畢,謝謝觀看!