【正文】
Sudarshan Database System Concepts 6th Edition Data Mining ? Data mining is the process of semiautomatically analyzing large databases to find useful patterns ? Prediction based on past history ? Predict if a credit card applicant poses a good credit risk, based on some attributes (ine, job type, age, ..) and past history ? Predict if a pattern of phone calling card usage is likely to be fraudulent ? Some examples of prediction mechanisms: ? Classification ? Given a new item whose class is unknown, predict to which class it belongs ? Regression formulae ? Given a set of mappings for an unknown function, predict the function result for a new parameter value 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Warehouse Schemas ? Dimension values are usually encoded using small integers and mapped to full values via dimension tables ? Resultant schema is called a star schema ? More plicated schema structures ? Snowflake schema: multiple levels of dimension tables ? Constellation: multiple fact tables 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Design Issues ? When and how to gather data ? Source driven architecture: data sources transmit new information to warehouse, either continuously or periodically (., at night) ? Destination driven architecture: warehouse periodically requests new information from data sources ? Keeping warehouse exactly synchronized with data sources (., using twophase mit) is too expensive ? Usually OK to have slightly outofdate data at warehouse ? Data/updates are periodically downloaded form online transaction processing (OLTP) systems. ? What schema to use ? Schema integration 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Data Warehousing ? Data sources often store only current data, not historical data ? Corporate decision making requires a unified view of all anizational data, including historical data ? A data warehouse is a repository (archive) of information gathered from multiple sources, stored under a unified schema, at a single site ? Greatly simplifies querying, permits study of historical trends ? Shifts decision support query load away from transaction processing systems 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Decision Support Systems ? Decisionsupport systems are used to make business decisions, often based on data collected by online transactionprocessing systems. ? Examples of business decisions: ? What items to stock? ? What insurance premium to change? ? To whom to send advertisements? ? Examples of data used for making decisions ? Retail sales transaction details ? Customer profiles (ine, age, gender, etc.) 169。Chapter 20: Data Analysis 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition Chapter 20: Data Analysis ? Decision Support Systems ? Data Warehousing ? Data Mining ? Classification ? Association Rules ? Clustering 169。Silberschatz, Korth and Sudarshan Database System Concepts 6th Edition DecisionSupport Systems: Overview ? Data analysis tasks are simplified by specialized tools and SQL extensions ? Example tasks ? For each product category and each region, what were the total sales in the last quarter and how do they pare with the same quarter last year ? As above, for each product category and each customer category ? Statistical analysis packages (., : S++) can be interfaced with databases ? Statistical analysis is a large field, but not covered here ? Data mining seeks to discover knowledge automatically in the form of statistical rules and patterns from large databases. ? A data warehouse archives information gathered from multiple sources, and stores it under a unified schema, at a single site. ? Important fo