mysql 多表join顺序计算-贪婪算法笔记

算法流程：

1. 对需要进行参与join的表进行排序，按照rows的多少进行排序。

2. 然后采用深度优先遍历方式查找最优的表join顺序

注: 如果表个数比较少，则退化为穷举算法。如果表个数太多，则仅仅探测到指定深度结束。

思路：

每次表都有两个集合，已经链接的集合A，以及未连接的集合A'。其中A中的链接顺序已经指定，cost代价也已经计算出来。

每次都是拿A的最优子计划和A'中的每一个表做尝试（尝试的使用了递归编程，即深度优先遍历。是的代码很不好读）。

最开始A的集合为空，经过一次BestExtension，则会找出一个最优的可连接表，放到A中。

N层：

BestExtension实现，使用了递归编码，每次递归前先把之前rows-order表原始顺序保存一份，带本次递归调用结束，再复原。

保证探测的时候，不会破坏这个顺序表。

实现是一个for循环，每次都去一个表，swap到idx，然后把该表剔除，剩下的表A'，递归输入。

。。。

0层：经过若干次递归栈帧，最后传入的A'集合只有一个表T7（假定T7），计算T7的最优访问路径，加上之前的传入的cost（A集合中链接顺序的

代价Cost），这时深度优先遍历，就到达了递归探测的底部【探底】，则目前最计划的Cost还是DBL_MAX，对比后，则自然

可以替换DBL_MAX，最优计划Cost/探底。

1 层：探底后，则会返回到上层，则A'集合会为2个表，探测另外一个表，继续探底，或者代码已经大于当前最优计划，终止探测。返回上层。

。。。

返回到N层，A’的集合也会增加。

层次越高，未连接表越多，层次底部则未连接表为1。

探测的原则，如果发现当前cost已经大于当前最优Cost，则终止探测，否则继续探测，知道探底，几所出当前分支所有连接顺序的Cost。如果最优，替换之，否则，返回上层，继续探测。

有两个比较极端的情况：

1）当需要JOIN的表的数量小于search_depth时，这里就退化为一个深度优先的穷举确定最优执行计划

2）当search_depth = 1的时候，函数退化为"极其"贪婪的算法，每次从当前的剩余的表中取一个成本最小的，来扩展当前的执行计划

剩余的情况就是介于上面两者之间。

源码缩略：

1. 顶层：

procedure greedy_search

input: remaining_tables

output: pplan;

{

    pplan = ;

    do {

        (t, a) = best_extension(pplan, remaining_tables);

        pplan = concat(pplan, (t, a));

        remaining_tables = remaining_tables - t;

    } while (remaining_tables != {})

    return pplan;

}

这里的(t , a)表示，每次best_extension返回下一个需要JOIN的表t，并且确定的访问方式是a。上面的代码中，执行计划的扩展由函数best_extension，初始pplan为空，do循环结束输出最终的执行计划。

2. 核心算法

procedure best_extension_by_limited_search(

    pplan in, // in, partial plan of tables-joined-so-far

    pplan_cost, // in, cost of pplan

    remaining_tables, // in, set of tables not referenced in pplan

    best_plan_so_far, // in/out, best plan found so far

    best_plan_so_far_cost,// in/out, cost of best_plan_so_far

    search_depth) // in, maximum size of the plans being considered

{

    // 保存原始rows-order表顺序

    memcpy(saved_refs, join->best_ref + idx, sizeof(JOIN_TAB*) * (join->tables - idx));

    // 递归遍历A'中的每一个表

    for each table T from remaining_tables

    {

        // 此次都是把Swap一下目标探测表，到Idx，Idx表，探测会被放到A集合中

swap_variables(JOIN_TAB*, join->best_ref[idx], *pos);

// Calculate the cost of using table T as above

cost = complex-series-of-calculations;

        // Add the cost to the cost so far.

        pplan_cost+= cost;

        // 如果已经超出当前的代价，则跳过

        if (pplan_cost >= best_plan_so_far_cost)

continue;

// 计算当前表的最佳访问方法（scan、ssek）

        pplan= expand pplan by best_access_method;

        remaining_tables= remaining_tables - table T;

        if (remaining_tables is not an empty set

      and search_depth > 1)

        {

          best_extension_by_limited_search(pplan, pplan_cost,

            remaining_tables,

            best_plan_so_far,

            best_plan_so_far_cost,

            search_depth - 1);

        }

        else

        {

          best_plan_so_far_cost= pplan_cost;

            best_plan_so_far= pplan;

        }

    }

fun_end:

    // 还原rows-order表的顺序

    memcpy(join->best_ref + idx, saved_refs, sizeof(JOIN_TAB*) * (join->tables-idx));

}

mysql 多表join顺序计算-贪婪算法笔记

你可能感兴趣的:(mysql 多表join顺序计算-贪婪算法笔记)