查询处理 Query Processing¶
Basic Steps¶
- Parsing and translation translate the query into its internal form. This is then translated into relational algebra. Parser checks syntax, verifies relations
- Optimization Amongst all equivalent evaluation plans choose the one with lowest cost.
- Evaluation The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.


- 选择运算尽量先做(往下推)
衡量Query¶
-
$ t_s $:number of seek
-
$ t_t $:number of block transfer(read & write)
-
Cost for b block transfers plus S seeks
b * $ t_t $ + S * $ t_s $
-

-

-

-
对select的条件进行排序:外部排序
-


- 性能:

- $ (b_r/M) $ 为归并段数,$ log_{M-1}(b_r/M) $ 为轮次,2br为每次的传输消耗,最后一次+br,如果要写回磁盘就+2br。

- 外部排序(External Merge Sort) 中,给一个段run分配bb 块(而不是1块)作为缓冲,可以减少每轮合并(merge)的seek次数,但也可能增加merge的轮数。对于确定的关系大小br 和确定的内存块数M,理论上应该有一个最佳的bb取值,使得算法代价最小。

-

-
Join Operation
-
Nested-loop join
- 两重循环
- 代价:
- nr * bs + br block transfers
- nr + br seeks
- nr是记录数,block中含有多个记录
-
Block nested-loop join

- 代价:
- Worst case estimate: br * bs + br block transfers + 2 * br seeks Each block in the inner relation s is read once for each block in the outer relation
- Best case: br + bs block transfers + 2 seeks.(内存足够大,每个表只要进入内存一次)
-
小关系作为外关系更好
-

- 内存有M块的情况:留1块作为output的缓存,外关系给M-2块,内关系反正每次要seek,只给1块。
-
Indexed nested-loop join

- 多块就是br/(M-2)
- 外关系小(nr小)的时候选这种方法
-
Merge-join
- 两个关系已经有序

-
Hash-join
