【正文】
the examples are not so intuitive ? The book An Introduction to Support Vector Machines by Cristianini and ShaweTaylor ? Not introductory level, but the explanation about Mercer’s Theorem is better than above literatures ? Neural Networks and Learning Machines by Haykin ? Contains a nice chapter on SVM introduction 。03 ? H. Yu, J. Yang, and J. Han. Classifying large data sets using SVM with hierarchical clusters. KDD39。99 56 References (2) ? R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2ed. John Wiley, 2022 ? T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. SpringerVerlag, 2022 ? S. Haykin, Neural Networks and Learning Machines, Prentice Hall, 2022 ? D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian works: The bination of knowledge and statistical data. Machine Learning, 1995. ? V. Kecman, Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic, MIT Press, 2022 ? W. Li, J. Han, and J. Pei, CMAR: Accurate and Efficient Classification Based on Multiple ClassAssociation Rules, ICDM39。08 ? N. Cristianini and J. ShaweTaylor, Introduction to Support Vector Machines and Other KernelBased Learning Methods, Cambridge University Press, 2022 ? A. J. Dobson. An Introduction to Generalized Linear Models. Chapman amp。08 53 DDPMine Efficiency: Runtime PatClass Harmony DDPMine PatClass: ICDE’07 Pattern Classification Alg. 54 Summary ? Effective and advanced classification methods ? Bayesian belief work (probabilistic works) ? Backpropagation (Neural works) ? Support Vector Machine (SVM) ? Patternbased classification ? Other classification methods: lazy learners (KNN, casebased reasoning), geic algorithms, rough set and fuzzy set approaches ? Additional Topics on Classification ? Multiclass classification ? Semisupervised classification ? Active learning ? Transfer learning 55 References (1) ? C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, 1995 ? C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2): 121168, 1998 ? H. Cheng, X. Yan, J. Han, and . Hsu, Discriminative Frequent pattern Analysis for Effective Classification, ICDE39。 Han, SDM’03) ? 產(chǎn)生預(yù)測(cè)性規(guī)則 (FOILlike analysis) 允許覆蓋的元組以降低權(quán)重形式保留下來構(gòu)造新規(guī)則 ? (根據(jù)期望準(zhǔn)確率)使用最好的 k 個(gè)規(guī)則預(yù)測(cè) ? 更有效(產(chǎn)生規(guī)則少) , 精確性類似 CMAR 47 頻繁模式 vs. 單個(gè)特征 (a) Austral (c) Sonar (b) Cleve Fig. 1. Information Gain vs. Pattern Length 某些頻繁模式的判別能力高于單個(gè)特征 . 48 經(jīng)驗(yàn)結(jié)果 0 100 200 300 400 500 600 70000 . 10 . 20 . 30 . 40 . 50 . 60 . 70 . 80 . 91I n f o G a i nI G _ U p p e r B n dSu p p o r t Information Gain (a) Austral (c) Sonar (b) Breast Fig. 2. Information Gain vs. Pattern Frequency 49 特征選擇 Feature Selection ? 給定頻繁模式集合 , 存在 nondiscriminative和redundant 的模式 , 他們會(huì)引起過度擬合 ? 我們希望選出 discriminative patterns,并且去除冗余 ? 借用 Maximal Marginal Relevance (MMR)的概念 ? A document has high marginal relevance if it is both relevant to the query and contains minimal marginal similarity to previously selected documents 50 實(shí)驗(yàn)結(jié)果 50 51 Scalability Tests 52 基于頻繁模式的分類 ? H. Cheng, X. Yan, J. Han, and . Hsu, ―Discriminative Frequent Pattern Analysis for Effective Classification‖, ICDE39。 vice versa ? Other methods, ., joint probability distribution of features and labels 40 主動(dòng)學(xué)習(xí) Active Learning ? 獲取類標(biāo)簽是昂貴 ? Active learner: query human (oracle) for labels ? Poolbased approach: Uses a pool of unlabeled data ? L: D中有標(biāo)簽的樣本子集 , U: D的一個(gè)未標(biāo)記數(shù)據(jù)集 ? 使用一個(gè)查詢函數(shù)小心地從 U選擇 1或多個(gè)元組,并咨詢標(biāo)簽 an oracle (a human annotator) ? The newly labeled samples are added to L, and learn a model ? Goal: Achieve high accuracy using as few labeled data as possible ? Evaluated using learning curves: Accuracy as a function of the number of instances queried ( of tuples to be queried should be small) ? Research issue: How to choose the data tuples to be queried? ? Uncertainty sampling: choose the least certain ones ? Reduce version space, the subset of hypotheses consistent w. the tra