【正文】
機磁盤存取 ?不包括寫輸出到磁盤的代價 ?記算法 A的代價估算為 EA 169。 等于?A(r)的大小 ? SC(A, r): 關(guān)系 r 中屬性 A的 選擇基數(shù) , 滿足 A上等值條件的平均記錄數(shù) ? 若 r 中元組在文件中 連續(xù)存放 , 則 : ????????? rfrnrb169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition What You Should Know ? Query optimization is critical for any mercial DBMS ? good/bad can be orders of magnitude ? The 3 ponents of the SystemR style optimization ? Plan space definition ? Cost estimation ? Search algorithm ? huge number of alternative, semantically equivalent plans ? Ideal goal: ? map a declarative query to the most efficient plan ? Conventional wisdom: avoid bad plans ? State of the art: ? industry: most optimizers are SystemR style ? academic: always a core database research topic 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition Selectivity Factors: Range Selection ? What is assumed in these formulas? ? column value1: ?F = (maxValue value1) / rangeOfValue ? column in value1:value2 ?F = (value2 value1) / rangeOfValue 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition Cost Estimation ? Input: simple DB statistics ? of tuples amp。 delay it in the plan –Q: how to avoid Cartesian products? ?May miss an optimal plan! 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition Space of Query Plans ? Selections: ?algorithms: sequential, index scan ? Joins: ?algorithms: nestedloop, sort merge, hash ? Ordering/Grouping: ?can an “interesting order” be produced by join/selections? ?algorithm: sorting, hashbased ? They interleave with each other! 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition Optimization: Different Strategies ? Optimal approach: ?Enumerate(枚舉 ) each possible plan ?Measure its performance by running it ?Pick the fastest one ? Heuristics approach: ?fixed heuristics all the way through plan construction ?.: always nested loop joins, indexed relation as inner ?.: order relations from smallest to biggest 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition SQL Constructs ? SECLECT (DISTINCT) list of columns ? FROM list of tables ? WHERE list of Boolean Factors ? GROUP BY list of columns ? HAVING list of Boolean Factors ? ORDER BY list of columns 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition 概述 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition 主要內(nèi)容 ?概述 ?用于代價估算的統(tǒng)計信息 ?關(guān)系代數(shù)表達式的轉(zhuǎn)換 ?基于代價的優(yōu)化算法 ?物化視圖與視圖維護 169。169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition 第 6章 查詢優(yōu)化 ?本章內(nèi)容參考 ?數(shù)據(jù)庫概念 (第四版 ) by A. Silberschatz ?Chapter 14 Optimization ?簡介 +自學(xué) ?本章要解決的關(guān)鍵問題 ?如何找到具有最低求值代價的求值計劃 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition 概述 ? 一個給定查詢有多種可選擇的求值方法 ?等價表達式 ?一個操作有若干不同算法 (Chapter 5) ? 一個查詢求值方法的好壞帶來的代價差別可能是巨大的 ? 例如 : 執(zhí)行 r X s 后續(xù)選擇 = 比執(zhí)行一個同樣條件的連接慢得多 ? 需要估算操作的代價 ?依賴于數(shù)據(jù)庫維護的統(tǒng)計信息 ?例如元組數(shù) , 連接屬性的不同值的數(shù)目 , 等等 ?需要對中間結(jié)果估算統(tǒng)計信息以便對復(fù)雜表達式計算代價 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition 概述 ? 對一個表達式的查詢求值方案的生成涉及幾個步驟 : ?利用等價規(guī)則將一個表達式轉(zhuǎn)換成另一個等價的表達式 (Annotate)結(jié)果表達式以得到其他查詢計劃 ? 整個過程稱為基于代價的優(yōu)化 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition SQL Semantics ? Take Cartesian product of FROM tables ? Project only those referenced columns ? WHERE: apply all filters in WHERE ? GROUP BY: form groups on results ? HAVING: apply filter to groups ? ORDER BY: make sure results in right order ? DISTINCT: remove duplicates ? Q: Is this “operational semantics” efficient? ? Different plans: mainly different in the first three 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition Costbased Optimization ? Plan space: what is the space of query plans? ? Cost estimation: how to estimate the cost, without executing each? ? Search algorithm: how to search the space, as guided by cost estimates 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition Huge Space! Assumptions to Help ? Typical assumptions to help reduce the space: ?Projections: ?pushed down to reduce of columns ?Selections: ?pushed down to reduce of rows ?Joins: ?leftdeep joins ?avoid Cartesian products。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition Cost/Size Estimation ? Accurate relatively ?goal is to pare plans, not to predict exact cost ?more of an art than an exact science ? Each operator: input size, cost, output size ?estimate cost based on input size ?estimate output size (for next operator) or selectivity ?selectivity = ratio of output to input 169。 disk pages ? of distinct values per column ?statistics updated periodically ? Assumption of attribute/predicate independence ?When no estimate available, use magic number ? New/Alternative approaches: ?sampling, histogram of DB 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition Search the Plan Space ? Baseline: exhaustive search ?enumerate all binations, and pare their cost ? Search method parameters: ?plan tree development: ?construction: bottomup, topdown ?modification: improve a somehowconstructed tree ?algorithms: ?heuristic selections: make choices based on heuristics ?branch and bound: search bounded by the current best tree ?hill climbing: find nearby plans with lowest cost ?dynamic programming: construction by greedy selections 169。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edition 用于代價估算的統(tǒng)計信息 (回顧 )(10/14) ? nr : 關(guān)系 r 中的元組個數(shù) ? br : 包含 r 中元組的塊數(shù) ? sr : r 中元組的大小 ? fr : r 的塊因子 — 即 , 能放入一塊之中的 r 的元組數(shù) ? V(A, r): r 中出現(xiàn)的 A屬性上的不同值的個數(shù) 。Silberschatz, Korth and Sudarshan Database System Concepts 3rd Edi