【正文】
pon MPs. 94 The MOUCLAS algorithm ? Two steps: Step 1. Discovery of the frequent and accurate and reliable MPs. Step 2. Construction of a classifier, called DeMP, based on MPs. 95 The MOUCLAS algorithm The core of the first step in the Mouclas algorithm is to find all cluster_rules that satisfy minsup and minconf and minR. Let C denote the dataset D after dimensionality reduction processing. A cluster_rule represents a MP, namely a rule: cluset ? y, where cluset is a set of itemsets from a cluster Cluster(C)t, y is a class label, y ? Y. 96 The MOUCLAS algorithm Algorithm of the first step: Mouclas Mining frequent and accurate and reliable Mouclas patterns (MPs) ? Input: A training transaction database, D。 minimum support threshold (minsup)。 minimum confidence threshold (minconf) 。 minimum reliability threshold (minR) ? Output: A set of frequent and accurate and reliable Mouclas patterns (MPs) 97 The MOUCLAS algorithm ? Methods: (1) Reduce the dimensionality of transactions d, which efficiently reduces the data size by removing irrelevant or redundant attributes (or dimensions) from the training data, and (2) Identify the clusters of database C for all transactions d after dimensionality reduction on attributes Aj in database C, based on the Mountain function, which is a fuzzy set membership function, and specially capable of transforming quantitative values of attributes in transactions into linguistic terms, and (3) Generate a set of MPs that are both frequent and accurate, namely, which satisfy the userspecified minimum support (called minsup) and minimum confidence (called minconf) and minimum reliability (called minR) constraints. 98 The MOUCLAS algorithm 1 X = reduceDim (I)。 // reduce the dimensionality on the set of all items I of in D 2 Cluster(C)t = genCluster (C)。 // identify the plete clusters of C 3 for each Cluster(C)t do E = genClusterrules(cluset, class)。 // generate a set of candidate cluster_rules 4 for each transaction d ? C do 5 Ed = genSubClusterrules (E, d)。 // find all the cluster_rules in E whose cluset are supported by d 6 for each e ? Ed do 7 e. clusupCount++。 // accumulate the clusupCount of the cluset of cluster_rule e 8 if = then ++ // accumulate the cisupCount of cluster_rule e supported by d 9 end 10 end 11 F = {e ? E ? ? minsupi }。 // construct the set of frequent cluster_rules 12 MP = genRules (F)。 //generate MP using the genRules function by minconf and minR 13 end 14 MPs = ∪ MP。 // discover the final set of MPs 99 The MOUCLAS algorithm The task of the second step in Mouclas algorithm: ? Using a heuristic method to generate a classifier, named DeMP, where the discovered MPs can cover D and are anized according to a decreasing precedence based on their confidence and support. ? Suppose R be the set of frequent and accurate and reliable MPs which are generated in the past step, and MPdefault_class denotes the default class, which has the lowest precedence. We can then present the DeMP classifier in the form of MP1, MP2, …, MPn, MPdefault_class, where MPi ? R, i = 1 to n, MPa ?MPb if n ? b a ? 1 and a, b? i, C ? ∪ cluset of MPi 100 The MOUCLAS algorithm Algorithm: Mouclas constructing DeMP Classifier ? Input: A training database after dimensionality reduction, C。 The set of frequent and accurate and reliable Mouclas patterns (MPs) ? Output: DeMP Classifier ? Methods: (1) Identify the order of all discovered MPs based on the definition of precedence and sequence them according to decreasing precedence order. (2) Determine possible MPs for DeMP classifier from R following the descending sequence of MPs. (3) Discard the MPs which cannot contribute to the improvement of the accuracy of the DeMP classifier and keep the final set of MPs to construct the DeMP classifier. 101 The MOUCLAS algorithm 1 R = sort(R)。 // sort MPs based on their precedence 2 for each MP? R in sequence do 3 temp = ? 。 4 for each transaction d ? C do 5 if d satisfies the cluset of MP then 6 store in temp。 7 if MP correctly classifies d then 8 insert MP at the end of L。 9 delete the transaction who has ID in temp from C。 10 selecting a default class for the current L。 // determine the default class based on majority class of remaining transactions in C 11 end 12 pute the total number of errors of L。 // pute the total number of errors that are made by the current L and the default class 13 end 14 Find the first MP in L with the lowest total number of errors and discard all the MPs after the MP in L。 15 Add the default class associated with the above mentioned first MP to end of L。 16 DeMP classifier = L 102 Example of MOUCLAS application The well logging data sets include attributes (well logging curves) of GR (gamma ray), RDEV (deep resistivity), RMEV (shallow resistivity), RXO (flushed zone resistivity), RHOB (bulk density), NPHI (neutron porosity), PEF (photoelectric factor) and DT (sonic travel time). A hypothetically useful MP may suggest a relation between well logging data and the class label of oil/gas formation since. 103 Mining Distancebased Association Rules ? Binning methods do not capture the semantics of interval data ? Distancebased partitioning, more meaningful discretization considering: ? density/number of points in an interval ? ―closen