查询处理 Query Processing¶

Basic Steps¶

Parsing and translation translate the query into its internal form. This is then translated into relational algebra. Parser checks syntax, verifies relations
Optimization Amongst all equivalent evaluation plans choose the one with lowest cost.
Evaluation The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.
选择运算尽量先做（往下推）

$ t_s $：number of seek
$ t_t $：number of block transfer(read & write)
Cost for b block transfers plus S seeks

b * $ t_t $ + S * $ t_s $
对select的条件进行排序：外部排序
性能：
$ (b_r/M) $ 为归并段数，$ log_{M-1}(b_r/M) $ 为轮次，2br为每次的传输消耗，最后一次+br，如果要写回磁盘就+2br。
外部排序（External Merge Sort）中，给一个段run分配bb 块（而不是1块）作为缓冲，可以减少每轮合并（merge）的seek次数，但也可能增加merge的轮数。对于确定的关系大小br 和确定的内存块数M，理论上应该有一个最佳的bb取值，使得算法代价最小。
Join Operation
Nested-loop join
- 两重循环
- 代价：
- nr * bs + br block transfers
- nr + br seeks
- nr是记录数，block中含有多个记录
Block nested-loop join
- 代价：
- Worst case estimate: br * bs + br block transfers + 2 * br seeks Each block in the inner relation s is read once for each block in the outer relation
- Best case: br + bs block transfers + 2 seeks.（内存足够大，每个表只要进入内存一次）
- 小关系作为外关系更好
- 内存有M块的情况：留1块作为output的缓存，外关系给M-2块，内关系反正每次要seek，只给1块。
Indexed nested-loop join
- 多块就是br/(M-2)
- 外关系小(nr小)的时候选这种方法
Merge-join
- 两个关系已经有序
Hash-join