【正文】
DW Projects, 20% (Meta) to 70% (OTR, DWN) fail ? High failure rate for nonbusiness driven initiatives ? Very few systems meet the expectations of the business ? Failure not due to technology, due to “soft” issues ? Massive upside to successful projects (100% 20xx+% ROI) ? 99% politics 1% technology 參考文獻(xiàn) ? Inmon,.,” Building the Data Warehouse” ,Johm Wiley and Sons,1996. ? Ladley,John,”O(jiān)perational Data Stores:Building an Effective Strategy”,Data warehouse:Pratical Advice form the Experts,Prentice Hall,Englewood Cliffs,NJ,1997. ? Gardmer,Stephen R., “Building the Data warehouse”,Communication of ACM, September 1998, Volume 41, Numver 9, 5260. ? Douglas Hackney , , DW101: A Practical Overview, 20xx ? Pieter R. Mimno, “The Big Picture How Brio Competes in the Data Warehousing Market”, Presentation to Brio Technology August 4, 1998. ? Alex Berson, Stephen Smith, Kurt Therling, “Building Data Mining Application for CRM”, McGrawHill, 1999 ? Martin Stardt, Anca Vaduva, Thomas Vetterli, “The Role of Meta for Data Warehouse”, 20xx ? , Ken Rudin, Christopher K. Buss, Ryan Sousa, “Data Warehouse Performance”, John Wiley amp。 [Alex Berson etc, 1999] ? 技術(shù)元數(shù)據(jù) ? 包括為數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)人員和管理員使用的數(shù)據(jù)倉(cāng)庫(kù)數(shù)據(jù)信息 , 用于執(zhí)行數(shù)據(jù)倉(cāng)庫(kù)開發(fā)和管理任務(wù) 。 Reporting HR Analytics amp。 remendations Informed decisions amp。 Reporting EKP Enterprise Knowledge Management Portal EPM Analytics amp。 ponents Analytic applications Front and backoffice OLTP eBusiness systems External information providers CRM Analytics amp。 analysis Metadata Interchange Federated data warehouse and data mart systems Decision engine models, rules and metrics OLAP amp。 DW templates Data profiling amp。 ? 分割 ? 數(shù)據(jù)分散到各自的物理單元中去,它們能獨(dú)立地處理。 ? 粒度 ? 數(shù)據(jù)倉(cāng)庫(kù)的數(shù)據(jù)單位中保存數(shù)據(jù)的細(xì)化或綜合程度的級(jí)別。 Microsoft DTS。四個(gè)基本特點(diǎn):面向主題的 (Subject Oriented)、 集成的、可變的、 當(dāng)前或接近當(dāng)前的。數(shù)據(jù)倉(cāng)庫(kù)與數(shù)據(jù)挖掘綜述 概念、體系結(jié)構(gòu)、趨勢(shì)、應(yīng)用 報(bào)告人:朱建秋 20xx年 6月 7日 提綱 ? 數(shù)據(jù)倉(cāng)庫(kù)概念 ? 數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組件 ? 數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì) ? 數(shù)據(jù)倉(cāng)庫(kù)技術(shù)(與數(shù)據(jù)庫(kù)技術(shù)的區(qū)別) ? 數(shù)據(jù)倉(cāng)庫(kù)性能 ? 數(shù)據(jù)倉(cāng)庫(kù)應(yīng)用 ? 數(shù)據(jù)挖掘應(yīng)用概述 ? 數(shù)據(jù)挖掘技術(shù)與趨勢(shì) ? 數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目) 數(shù)據(jù)倉(cāng)庫(kù)概念 ? 基本概念 ? 對(duì)數(shù)據(jù)倉(cāng)庫(kù)的一些誤解 基本概念 —數(shù)據(jù)倉(cāng)庫(kù) ? Data warehouse is a subject oriented, integrated,nonvolatile and time variant collection of data in support of management’s decision —— [Inmon,1996]. ? Data warehouse is a set of methods, techniques,and tools that may be leveraged together to produce a vehicle that delivers data to endusers on an integrated platform —— [Ladley,1997]. ? Data warehouse is a process of crating, maintaining,and using a decisionsupport infrastructure —— [Appleton,1995][Haley,1997][Gardner 1998]. 基本概念 —數(shù)據(jù)倉(cāng)庫(kù)特征 [Inmon,1996] ? 面向主題 ? 一個(gè)主題領(lǐng)域的表來(lái)源于多個(gè)操作型應(yīng)用(如:客戶主題,來(lái)源于:定單處理;應(yīng)收帳目;應(yīng)付帳目; … ) ? 典型的主題領(lǐng)域:客戶;產(chǎn)品;交易;帳目 ? 主題領(lǐng)域以一組相關(guān)的表來(lái)具體實(shí)現(xiàn) ? 相關(guān)的表通過(guò)公共的鍵碼聯(lián)系起來(lái)(如:顧客標(biāo)識(shí)號(hào) Customer ID) ? 每個(gè)鍵碼都有時(shí)間元素(從日期到日期;每月累積;單獨(dú)日期 … ) ? 主題內(nèi)數(shù)據(jù)可以存儲(chǔ)在不同介質(zhì)上(綜合級(jí),細(xì)節(jié)級(jí),多粒度) ? 集成 ? 數(shù)據(jù)提取、凈化、轉(zhuǎn)換、裝載 ? 穩(wěn)定性 ? 批處理增加,倉(cāng)庫(kù)已經(jīng)存在的數(shù)據(jù)不會(huì)改變 ? 隨時(shí)間而變化(時(shí)間維) ? 管理決策支持 基本概念 —Data Mart, ODS ? Data Mart ? 數(shù)據(jù)集市 小型的,面向部門或工作組級(jí)數(shù)據(jù)倉(cāng)庫(kù)。 ? Operation Data Store ? 操作數(shù)據(jù)存儲(chǔ) — ODS是能支持企業(yè)日常的全局應(yīng)用的數(shù)據(jù)集合 ,是不同于 DB的一種新的數(shù)據(jù)環(huán)境 , 是 DW 擴(kuò)展后得到的一個(gè)混合形式。 基本概念 —ETL, 元數(shù)據(jù),粒度,分割 ? ETL ? ETL( Extract/Transformation/Load) —數(shù)據(jù)裝載、轉(zhuǎn)換、抽取工具。 IBM Visual Warehouse etc. ? 元數(shù)據(jù) ? 關(guān)于數(shù)據(jù)的數(shù)據(jù), 用于構(gòu)造、維持、管理、和使用數(shù)據(jù)倉(cāng)庫(kù), 在數(shù)據(jù)倉(cāng)庫(kù)中尤為重要。細(xì)化程度越高,粒度越小。 對(duì)數(shù)據(jù)倉(cāng)庫(kù)的一些誤解 ? 數(shù)據(jù)倉(cāng)庫(kù)與 OLAP ? 星型數(shù)據(jù)模型 ? 多維分析 ? 數(shù)據(jù)倉(cāng)庫(kù)不是一個(gè)虛擬的概念 ? 數(shù)據(jù)倉(cāng)庫(kù)與范式理論 ? 需要非范式化處理 提綱 ? 數(shù)據(jù)倉(cāng)庫(kù)概念 ? 數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組件 ? 數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì) ? 數(shù)據(jù)倉(cāng)庫(kù)技術(shù)(與數(shù)據(jù)庫(kù)技術(shù)的區(qū)別) ? 數(shù)據(jù)倉(cāng)庫(kù)性能 ? 數(shù)據(jù)倉(cāng)庫(kù)應(yīng)用 ? 數(shù)據(jù)挖掘應(yīng)用概述 ? 數(shù)據(jù)挖掘技術(shù)與趨勢(shì) ? 數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目) 數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組件 ? 體系結(jié)構(gòu) ? ETL工具 ? 元數(shù)據(jù)庫(kù) (Repository)及元數(shù)據(jù)管理 ? 數(shù)據(jù)訪問和分析工具 體系結(jié)構(gòu) [Pieter ,1998] Source Databases Data Extraction, Transformation, load Warehouse Admin. Tools Extract, Transform and Load Data Modeling Tool Central Metadata Architected Data Marts Data Access and Analysis EndUser DW Tools Central Data Warehouse Central Data Warehouse Mid Tier Mid Tier Data Mart Data Mart Local Metadata Local Metadata Local Metadata Metadata Exchange MDB Data Cleansing Tool Relational Appl. Package