【正文】
sification by Neural Networksn Classification by Support Vector Machines (SVM)n Classification based on concepts from association rule miningn Other Classification Methodsn Predictionn Classification accuracyn Summary2023/2/27 星期六 1Data Mining: Concepts and Techniquesn Classification: n predicts categorical class labels (discrete or nominal)n classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new datan Prediction: n models continuousvalued functions, ., predicts unknown or missing values n Typical Applicationsn credit approvaln target marketingn medical diagnosisn treatment effectiveness analysisClassification vs. Prediction2023/2/27 星期六 2Data Mining: Concepts and TechniquesClassification—A TwoStep Process n Model construction: describing a set of predetermined classesn Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attributen The set of tuples used for model construction is training setn The model is represented as classification rules, decision trees, or mathematical formulaen Model usage: for classifying future or unknown objectsn Estimate accuracy of the modeln The known label of test sample is pared with the classified result from the modeln Accuracy rate is the percentage of test set samples that are correctly classified by the modeln Test set is independent of training set, otherwise overfitting will occurn If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known2023/2/27 星期六 3Data Mining: Concepts and TechniquesClassification Process (1): Model ConstructionTrainingDataClassificationAlgorithmsIF rank = ‘professor’OR years 6THEN tenured = ‘yes’ Classifier(Model)2023/2/27 星期六 4Data Mining: Concepts and TechniquesClassification Process (2): Use the Model in PredictionClassifierTestingData Unseen Data(Jeff, Professor, 4)Tenured?2023/2/27 星期六 5Data Mining: Concepts and TechniquesSupervised vs. Unsupervised Learningn Supervised learning (classification)n Supervision: The training data (observations, measurements, etc.) are acpanied by labels indicating the class of the observationsn New data is classified based on the training setn Unsupervised learning (clustering)n The class labels of training data is unknownn Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data2023/2/27 星期六 6Data Mining: Concepts and Techniques第 7章 : 分類和預(yù)測n What is classification? What is prediction?n Issues regarding classification and predictionn Classification by decision tree inductionn Bayesian Classificationn Classification by Neural Networksn Classification by Support Vector Machines (SVM)n Classification based on concepts from association rule miningn Other Classification Methodsn Predictionn Classification accuracyn Summary2023/2/27 星期六 7Data Mining: Concepts and TechniquesIssues Regarding Classification and Prediction (1): Data Preparationn Data cleaningn Preprocess data in order to reduce noise and handle missing valuesn Relevance analysis (feature selection)n Remove the irrelevant or redundant attributesn Data transformationn Generalize and/or normalize data2023/2/27 星期六 8Data Mining: Concepts and TechniquesIssues regarding classification and prediction (2): Evaluating Classification Methodsn Predictive accuracyn Speed and scalabilityn time to construct the modeln time to use the modeln Robustnessn handling noise and missing valuesn Scalabilityn efficiency in diskresident databases n Interpretability: n understanding and insight provided by the modeln Goodness of rulesn decision tree sizen pactness of classification rules2023/2/27 星期六 9Data Mining: Concepts and Techniques第 7章 : 分類和預(yù)測n What is classification? What is prediction?n Issues regarding classification and predictionn Classification by decision tree inductionn Bayesian Classificationn Classification by Neural Networksn Classification by Support Vector Machines (SVM)n Classification based on concepts from association rule miningn Other Classification Methodsn Predictionn Classification accuracyn Summary2023/2/27 星期六 10Data Mining: Concepts and TechniquesTraining DatasetThis follows an example from Quinlan’s ID32023/2/27 星期六 11Data Mining: Concepts and TechniquesOutput: A Decision Tree for “buys_puter”age?overcaststudent? credit rating?no yes fairexcellent=30 40no noyes yesyes30..402023/2/27 星期六 12Data Mining: Concepts and TechniquesAlgorithm for Decision Tree Inductionn Basic algorithm (a greedy algorithm)n Tree is constructed in a topdown recursive divideandconquer mannern At start, all the training examples are at the rootn Attributes are categorical (if continuousvalued, they are discretized in advance)n Examples are partitioned recursively based on selected attributesn Test attributes are selected on the basis of a heuristic or statistical measure (., information gain)n Conditions for stopping partitioningn All samples for a given node belong to the same classn There are no remaining attributes for further partitioning – majority voting is employed for classifying the leafn There are no samples left2023/2/27 星期六 13Data Mining: Concepts and TechniquesAttribute Selection Measure: Information Gain (ID3/)n Select the attribute with the highest information gainn S contains si tuples of class Ci for i = {1, …, m} n information measures info required to classify any arbitrary tuplen entropy of attribute A with values {a1,a2,…,av}n information gained by branching on attribute A2023/2/27 星期六 14Data Mining: Concepts and TechniquesAttribute Selectio