【正文】
tial patterns, emerging patterns, temporal associations, partial periodicityn Classification, clustering, iceberg cubes, etc.2023/2/27 星期六 57Data Mining: Concepts and TechniquesMultiplelevel Association Rulesn Items often form hierarchyn Flexible support settings: Items at the lower level are expected to have lower support.n Transaction database can be encoded based on dimensions and levelsn explore shared multilevel mininguniform supportMilk[support = 10%]2% Milk [support = 6%]Skim Milk [support = 4%]Level 1min_sup = 5%Level 2min_sup = 5%Level 1min_sup = 5%Level 2min_sup = 3%reduced support2023/2/27 星期六 58Data Mining: Concepts and TechniquesML/MD Associations with Flexible Support Constraintsn Why flexible support constraints?n Real life occurrence frequencies vary greatlyn Diamond, watch, pens in a shopping basketn Uniform support may not be an interesting modeln A flexible modeln The lowerlevel, the more dimension bination, and the long pattern length, usually the smaller supportn General rules should be easy to specify and understandn Special items and special group of items may be specified individually and have higher priority2023/2/27 星期六 59Data Mining: Concepts and TechniquesMultidimensional Associationn Singledimensional rules:buys(X, “milk”) ? buys(X, “bread”)n Multidimensional rules: ? 2 dimensions or predicatesn Interdimension assoc. rules (no repeated predicates)age(X,”1925”) ? occupation(X,“student”) ? buys(X,“coke”)n hybriddimension assoc. rules (repeated predicates)age(X,”1925”) ? buys(X, “popcorn”) ? buys(X, “coke”)n Categorical Attributesn finite number of possible values, no ordering among valuesn Quantitative Attributesn numeric, implicit ordering among values2023/2/27 星期六 60Data Mining: Concepts and TechniquesMultilevel Association: Redundancy Filteringn Some rules may be redundant due to “ancestor” relationships between items.n Examplen milk ? wheat bread [support = 8%, confidence = 70%]n 2% milk ? wheat bread [support = 2%, confidence = 72%]n We say the first rule is an ancestor of the second rule.n A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor.2023/2/27 星期六 61Data Mining: Concepts and TechniquesMultiLevel Mining: Progressive Deepeningn A topdown, progressive deepening approach:n First mine highlevel frequent items: milk (15%), bread (10%)n Then mine their lowerlevel “weaker” frequent itemsets: 2% milk (5%), wheat bread (4%)n Different min_support threshold across multilevels lead to different algorithms:n If adopting the same min_support across multilevelsthen toss t if any of t’s ancestors is infrequent.n If adopting reduced min_support at lower levelsthen examine only those descendents whose ancestor’s support is frequent/nonnegligible.2023/2/27 星期六 62Data Mining: Concepts and TechniquesTechniques for Mining MD Associationsn Search for frequent kpredicate set:n Example: {age, occupation, buys} is a 3predicate setn Techniques can be categorized by how age are treated1. Using static discretization of quantitative attributesn Quantitative attributes are statically discretized by using predefined concept hierarchies2. Quantitative association rulesn Quantitative attributes are dynamically discretized into “bins”based on the distribution of the data3. Distancebased association rulesn This is a dynamic discretization process that considers the distance between data points2023/2/27 星期六 63Data Mining: Concepts and TechniquesStatic Discretization of Quantitative Attributesn Discretized prior to mining using concept hierarchy.n Numeric values are replaced by ranges.n In relational database, finding all frequent kpredicate sets will require k or k+1 table scans.n Data cube is well suited for mining.n The cells of an ndimensional cuboid correspond to the predicate sets.n Mining from data cubescan be much faster.(ine)(age)()(buys)(age, ine) (age,buys) (ine,buys)(age,ine,buys)2023/2/27 星期六 64Data Mining: Concepts and TechniquesQuantitative Association Rulesage(X,”3034”) ? ine(X,”24K 48K”) ? buys(X,”high resolution TV”)n Numeric attributes are dynamically discretizedn Such that the confidence or pactness of the rules mined is maximizedn 2D quantitative association rules: Aquan1 ? Aquan2 ? Acatn Cluster “adjacent” association rulesto form general rules using a 2D gridn Example2023/2/27 星期六 65Data Mining: Concepts and TechniquesMining Distancebased Association Rulesn Binning methods do not capture the semantics of interval datan Distancebased partitioning, more meaningful discretization considering:n density/number of points in an intervaln “closeness” of points in an interval2023/2/27 星期六 66Data Mining: Concepts and TechniquesInterestingness Measure: Correlations (Lift)n play basketball ? eat cereal [40%, %] is misleadingn The overall percentage of students eating cereal is 75% which is higher than %.n play basketball ? not eat cereal [20%, %] is more accurate, although with lower support and confidencen Measure of dependent/correlated events: liftBasketball Not basketball Sum (row)Cereal 2023 1750 3750Not cereal 1000 250 1250Sum(col.) 3000 2023 50002023/2/27 星期六 67Data Mining: Concepts and TechniquesChapter 6: Mining Association Rules in Large Databasesn Association rule miningn Algorithms for scalable mining of (singledimensional Boolean) association rules in transactional databasesn Mining various kinds of association/correlation rules n Constraintbased association miningn Sequential pattern miningn Applications/extensions of frequent pattern miningn Summary2023/2/27 星期六 68Data Mining: Concepts and Techniqu