数据库查询语句的内部执行逻辑
- Turn SQL into a logical plan, which is represented as algebraic laws
- Turn logical query plan into a physical one
Query optimization can be done in 2 ways above:
1.Logical Query Optimization(关系代数优化,启发式查询)
2.Physical Query Optimization(IO成本最低优化,这里我们主要讨论物理层面上的查询过程)
Logical/physical operators
Logical operator: what they do
union, join, selection, projection, group by
Physical operator: how they do(implementation)
- scanning, hashing, sorting, and index-based
- methods for implementing join can be: nested loop join, sort-merge join, hash join, index join
- different algorithm has different requirements on the amount of available memory & different costs
Each operation is implemented by 3 functions:(operation简单的说就是对数据操作的过程)
- Open: sets up the data structures and performs initializations
- GetNext: returns the next tuple of the result.
- Close: ends the operations. Cleans up the data structures
Cost Model
Notation:
- M = number of blocks/pages that are available in the main memory
- B(R) = number of blocks holding in relation R
- T(R) = number of tuples in the relation R
- V(R, a) = number of distinct values of the attribute a in relation R
Assumption:
- only consider I/O cost( ignore CPU computation cost)
- don't include the cost of writing the result (最后一步的结果写出不计在内)
- table(relation) is clustered, that is for relation R, the cost of scanning the whole table is B(R)
Physical operator classification
- One-pass algorithm
- Nested-Loop Join algorithm
- Two-pass algorithm (sorting & hashing)
- K-pass algorithm
- Index-based algorithm
One-pass algorithm
- Read the data only once from disk
- Usually, require at least one of the input relations fits in main memory
主要是处理relation比较小的情况,内存空间足够,一次读入处理完成
例子: 两个relation相交
- Read S into M-2 buffers and build a search structure
- Read each block of R, and for each tuple t of R, see if t is also in S --- 有一张表必须足够小到可以完全放进内存中
- If so, copy t to the output; if not, ignore t
这里需要保证内存的空间最够,也就是(M-2)>= min(B(R),B(S)),保证一次能够将某一个relation完全读入,两外的一个input page作为另一个relation的读入,一个output page作为处理结果输出
Nested-loop join algorithm
- Read one relation only once, while the other will be read repeatedly from disk (one to many mode)
example:
- Tuple-based Nested Loop Join Join R ⋈ S
Assumption: neither relation is clustered
for each tuple r in R do
for each tuple s in S do
if r and s join then output (r,s)
total cost: T(R)*T(S)
- Block-based Nested Loop Join Join R ⋈ S
Assumption:
each relation is clustered;
memory size M pages;
B(R)<=B(S) and B(R)>m 表明内存无法一次完全读取一张表
for each (M-2) blocks br of R do
for each block bs of S do
for each tuple r in br do
for each tuple s in bs do
if r and s join then output(r,s)
Cost:
– Read R once: cost B(R)
– Outer loop runs B(R)/(M-2) times, and each time need to read S: costs B(R)B(S)/(M-2)
– Total cost: B(R) + B(R)B(S)/(M-2)
R as outer loop Cost: B(R) + B(R)B(S)/(M-2)
S as outer loop Cost: B(S) + B(R)B(S)/(M-2)
=> It is better to iterate over the smaller relation first
Suppose M = 102 blocks (i.e., pages), B(R) = 1,000 blocks, B(S) = 5,000 blocks
以R作为outer loop,每次读取100个blocks进入内存,也就是说B(R)这张大表被分割成50个每个大小为100pages的小表,然后对每个表进行一次内部循环,内部循环就是对其中的每个记录去S这张大表中检索,由于有50次,每一次的内部循环都需要去读取整个B(S) ,所以最后的cost为 1000(外部循环读取的成本) + 50*5,000 (内部循环读取的成本)
- What is the minimum memory requirement?
Answer: 3
Two-pass algorithm based on sorting
- First pass: read data from disk, process it, write it to the disk
- Second pass: read the data for further processing
为什么需要two-pass? an operation can not be completed in one pass,主要就是空间的问题
1.Simple example: B(R) distinct operation(duplicate elimination)
步骤:1. sort first 2. eliminate duplicate
pass1: read B(R), sort it into runs with M pages and write Cost: 2B(R)
pass2: merge M-1 runs, include each tuple only once Cost: B(R)
Assumption:
number of runs from pass1: B/M
B/M must be less of than M-1 because we can only proceed M-1 ways merging in pass2
so
B/M <= M-1
roughly B <= M^2
2.Sort-Merge Join => 基于Sort-Merge的join其实是有问题的,想想为什么?
(Assumption: memory buffer is enough to hold join tuples for at least one relation)
3.Simple Sort-based Join
这里和Sort-Merge Join的区别在于,首先将B(R)和B(S)进行merge-sort得到每一个表的一个runs,Sort-Merge Join只是根据内存的大小对B(R),B(S)进行排序得到B/M个runs
所以一般而言Simple Sort-based Join比较保险
Assumption:
B(R) <= M(M-1) , B(S) <= M(M-1)
and at least one set of the tuples with a common value for the join attributes fit in M-2 pages
Note that we only need 1 page buffer for the other relation
Two-pass algorithm based on Hashing
hash idea:
All the tuples of input relations get to the same bucket
Reduce the size of input relations by a factor of M
Perform the operation by working on a bucket at a time
Index-Based algorithm
useful for selection operations
Selection on equality a=v(R) (ignore the cost of reading index block and here we consider the worst time)
Clustered index on attribute a => cost: B(R)/V(R,a)
Unclustered index on a => cost: T(R)/V(R,a)
Example: B(R) = 2000, T(R) = 100,000, V(R, a) = 20, compute the cost of a=v(R)
Cost of using table scan:
– If R is clustered: B(R) = 2000 I/Os
– If R is unclustered: T(R) = 100,000 I/Os
Cost of index-based selection:
– If index is clustered: B(R)/V(R,a) = 100 I/Os
– If index is unclustered: T(R)/V(R,a) = 500 I/Os
Index-Based Join
Assumption:
1.S has an index on the join attribute
2.R is clustered
Step: iterate over each tuple in R, fetch corresponding tuple from S
Cost:
if the index is clustered:B(R) + T(R)B(S)/V(S,a)
if the index is unclustered: B(R) + T(R)T(S)/V(S,a)
if both R and S have a clustered index on the join attribute
Then can perform a sort-merge join where sorting is already done (for free)
Cost: B(R) + B(S)