【正文】
1. Differences between operational database systems and data warehouses Since most people are familiar with mercial relational database systems, it is easy to understand what a data warehouse is by paring these two kinds of systems. The major task of online operational database systems is to perform online transaction and query processing. These systems are called online transaction processing (OLTP) systems. They cover most of the daytoday operations of an anization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers in the role of data analysis and decision making. Such systems can anize and present data in various formats in order to acmodate the diverse needs of the different users. These systems are known as online analytical processing (OLAP) systems. The major distinguishing features between OLTP and OLAP are summarized as follows. (1). Users and system orientation: An OLTP system is customeroriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is marketoriented and is used for data analysis by knowledge workers, including managers, executives, and analysts. (2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making. (3). Database design: An OLTP system usually adopts an entityrelationship (ER) data model and an application oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subjectoriented database design. (4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different anizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an anization. OLAP systems also deal with information that originates from different anizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media. (5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly readonly operations (since most data warehouses store historical rather than uptodate information), although many could be plex queries. Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on. 2. But, why have a separate data warehouse? “Since operational databases store huge amounts of data, you observe, “why not perform online analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse? A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned queries. On the other hand, data warehouse queries are often plex. They involve the putation of large groups of data at summarized levels, and may require the use of special data anization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks. Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs readonly access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system. Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from plete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases. 數(shù)據(jù)倉庫 數(shù)據(jù)倉庫為商務(wù)運(yùn)作提供結(jié)構(gòu)與工具,以便系統(tǒng)地組織、理解和使用數(shù)據(jù)進(jìn)行決策。大量組織機(jī)構(gòu)已經(jīng)發(fā)現(xiàn),在當(dāng)今這個(gè)充滿競(jìng)爭(zhēng)、快速發(fā)展的世界,數(shù)據(jù)倉庫是一個(gè)有價(jià)值的工具。在過去的幾年中,許多公司已花費(fèi)數(shù)百萬美元,建立企業(yè)范圍的數(shù)據(jù)倉庫。許多人感到,隨著工業(yè)競(jìng)爭(zhēng)的加劇,數(shù)據(jù)倉庫成了必備的最新營(yíng)銷武器 —— 通過更多地了解客戶需求而保住客戶的途徑。 “那么”,你可能會(huì)充滿神秘地問,“到底什么是數(shù)據(jù)倉庫?” 數(shù)據(jù)倉庫已被多種方式定義,使得很難嚴(yán)格地定義它。寬松地講,數(shù)據(jù)倉庫是一個(gè)數(shù)據(jù)庫,它與組織機(jī)構(gòu)的操作數(shù)據(jù)庫分別 維護(hù)。數(shù)據(jù)倉庫系統(tǒng)允許將各種應(yīng)用系統(tǒng)集成在一起,為統(tǒng)一的歷史數(shù)據(jù)分析提供堅(jiān)實(shí)的平臺(tái),對(duì)信息處理提供支持。 按照 W. H. Inmon,一位數(shù)據(jù)倉庫系統(tǒng)構(gòu)造方面的領(lǐng)頭建筑師的說法,“數(shù)據(jù)倉庫是一個(gè)面向主題的、集成的、時(shí)變的、非易失的數(shù)據(jù)集合,支持管理決策制定”。這個(gè)簡(jiǎn)短、全面的定義指出了數(shù)據(jù)倉庫的主要特征。四個(gè)關(guān)鍵詞,面向主題的、集成的、時(shí)變的、非易失的,將數(shù)據(jù)倉庫與其它數(shù)據(jù)存儲(chǔ)系統(tǒng)(如,關(guān)系數(shù)據(jù)庫系統(tǒng)、事務(wù)處理系統(tǒng)、和文件系統(tǒng))相區(qū)別。讓我們進(jìn)一步看看這些關(guān)鍵特征。 (1)、 面向主題的:數(shù)據(jù)倉庫圍繞一些主 題,如顧客、供應(yīng)商、產(chǎn)品和銷售組織。數(shù)據(jù)倉庫關(guān)注決策者的數(shù)據(jù)建模與分析,而不是構(gòu)造組織機(jī)構(gòu)的日常操作和