跳转至

查询处理 Query Processing

Basic Steps

  • Parsing and translation translate the query into its internal form. This is then translated into relational algebra. Parser checks syntax, verifies relations
  • Optimization Amongst all equivalent evaluation plans choose the one with lowest cost.
  • Evaluation The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.
  • image-20230612001751736
  • image-20230612003553870
  • 选择运算尽量先做(往下推)

衡量Query

  • $ t_s $:number of seek

  • $ t_t $:number of block transfer(read & write)

  • Cost for b block transfers plus S seeks

    b * $ t_t $ + S * $ t_s $

  • image-20230612004605830

  • image-20230612004618669

  • image-20230612004920316

  • 对select的条件进行排序:外部排序

  • image-20230618161811420

  • image-20230618161824886
  • 性能:
  • image-20230618161847075
  • $ (b_r/M) $ 为归并段数,$ log_{M-1}(b_r/M) $ 为轮次,2br为每次的传输消耗,最后一次+br,如果要写回磁盘就+2br。
  • image-20230618163335699
  • 外部排序(External Merge Sort) 中,给一个段run分配bb 块(而不是1块)作为缓冲,可以减少每轮合并(merge)的seek次数,但也可能增加merge的轮数。对于确定的关系大小br 和确定的内存块数M,理论上应该有一个最佳的bb取值,使得算法代价最小。
  • image-20230618163544585
  • image-20230618164207925

  • Join Operation

  • Nested-loop join

    • 两重循环
    • 代价:
    • nr * bs + br block transfers
    • nr + br seeks
    • nr是记录数,block中含有多个记录
  • Block nested-loop join

    • image-20230618164914151
    • 代价:
    • Worst case estimate: br * bs + br block transfers + 2 * br seeks Each block in the inner relation s is read once for each block in the outer relation
    • Best case: br + bs block transfers + 2 seeks.(内存足够大,每个表只要进入内存一次)
    • 小关系作为外关系更好

    • image-20230618165335108

    • 内存有M块的情况:留1块作为output的缓存,外关系给M-2块,内关系反正每次要seek,只给1块。
  • Indexed nested-loop join

    • image-20230618170934436
    • 多块就是br/(M-2)
    • 外关系小(nr小)的时候选这种方法
  • Merge-join

    • 两个关系已经有序
    • image-20230618205150053
  • Hash-join

  • image-20230619142752872