freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

8-1數(shù)據(jù)倉(cāng)庫(kù)與數(shù)據(jù)挖掘(完整版)

  

【正文】 1 –1 /k) if each class the same number of instances. k i 1 p2i 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition More Warehouse Design Issues ? Data cleansing ? ., correct mistakes in addresses (misspellings, zip code errors) ? Merge address lists from different sources and purge duplicates ? How to propagate updates ? Warehouse schema may be a (materialized) view of schema from data sources ? What data to summarize ? Raw data may be too large to store online ? Aggregate values (totals/subtotals) often suffice ? Queries on raw data can often be transformed by query optimizer to use aggregate values 169。Chapter 20: Data Analysis 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Warehouse Schemas ? Dimension values are usually encoded using small integers and mapped to full values via dimension tables ? Resultant schema is called a star schema ? More plicated schema structures ? Snowflake schema: multiple levels of dimension tables ? Constellation: multiple fact tables 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Best Splits (Cont.) ? Another measure of purity is the entropy measure, which is defined as entropy (S) = – ? ? When a set S is split into multiple sets Si, I=1, 2, …, r, we can measure the purity of the resultant set of sets as: purity(S1, S2, ….., S r) = ? ? The information gain due to particular split of S into Si, i = 1, 2, …., r Informationgain (S, {S1, S2, …., Sr) = purity(S ) – purity (S1, S2, … Sr) r i= 1 |Si| |S| purity (Si) k i 1 pilog2 pi 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Other Types of Classifiers ? Neural classifiers are studied in artificial intelligence and are not covered here ? Bayesian classifiers use Bayes theorem, which says p (cj | d ) = p (d | cj ) p (cj ) p ( d ) where p (cj | d ) = probability of instance d being in class cj, p (d | cj ) = probability of generating instance d given class cj, p (cj ) = probability of occurrence of class cj, and p (d ) = probability of instance d occuring 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Finding Association Rules ? We are generally only interested in association rules with reasonably high support (., support of 2% or greater) ? Na239。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Other Types of Mining ? Text mining: application of data mining to textual documents ? cluster Web pages to find related pages ? cluster pages a user has visited to anize their visit history ? classify Web pages automatically into a Web directory ? Data visualization systems help users examine large volumes of data and detect patterns visually ? Can visually encode large amounts of information on a single screen ? Humans are very good a detecting visual patterns End of Chapter 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Figure 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Finding Support ? Determine support of itemsets via a single pass on set of transactions ? Large itemsets: sets with a high count at the end of the pass ? If memory not enough to hold all counts for all itemsets use multiple passes, considering only some itemsets in each pass. ? Optimization: Once an itemset is eliminated because its count (support) is too small none of its supersets needs to be considered. ? The a priori technique to find large itemsets: ? Pass 1: count support of all sets with just 1 item. Eliminate those items with low support ? Pass i: candidates: every set of i items such that all its i1 item su
點(diǎn)擊復(fù)制文檔內(nèi)容
數(shù)學(xué)相關(guān)推薦
文庫(kù)吧 www.dybbs8.com
備案圖鄂ICP備17016276號(hào)-1