calcite物化视图基于规则查询改写原理解析

1. 术语定义

物化视图:将视图的查询结果物化保存下来的结果。
物化视图 QueryRel: 生成物化视图的SQL关系表达式(查询语句)。
物化视图 TableRel:生成物化视图结果存储的关系表达式(存储物化视图的tableScan算子)。
COMPLETE : 查询表模型和物化视图表模型完全相同,比如查询引用了a,b,c三张表,物化视图也引用了a,b,c三张表。
VIEW_PARTIAL:查询表模型完全包含物化视图表模型,比如查询引用了a,b,c三张表,物化视图也引用了a,b两张表。
QUERY_PARTIAL: 物化试图表模型完全包含查询表模型,比如查询引用了a,b两张表,物化视图引用了a,b,c三张表。

2. 背景

物化视图指将SQL查询的结果保存下来。
查询使用物化视图改写是一种有效的加速方式,即将查询语句的全部或者部分改写成物化视图进行加速。
物化视图和查询完全等效可以直接命中查询,查询和物化视图不完全等效的情形下需要通过条件补偿,聚合上拉等方式,使用物化视图对于查询关系代数局部进行改写。

3. 问题定义

怎样通过基于规则的方式,使用物化视图对查询表达式进行局部改写?或者整体替换?

4. 概述

Calcite中基于规则UnifyRule查询改写的主要原理就是通过循环遍历查询SQL的RelNode关系表达式和生成物化视图QueryRelNode表达式,基于RelNode关系表达式命中对应的UnifyRule规则,如果匹配match UnifyRule规则,就调用对应规则的apply方法,使用物化结果的TableRel表达式对于查询SQL关系表达式进行改写。

5.流程图

5.1 结构关系示意图

在SubstitutionVistor中会使用UnifyRule,使用Target MutableNode 对于 Query MutableNode进行改写


结构关系示意图

5.2 视图替换流程图

图中相同的颜色表示相同的节点,视图替换核心流程示意图


物化视图匹配流程图.png

6.核心组件

UnifyRule:

使用物化视图target对查询关系表达式进行改写的规则,下面是源码到注释

  /** Rule that attempts to match a query relational expression
   * against a target relational expression.
   *
   * 

The rule declares the query and target types; this allows the * engine to fire only a few rules in a given context.

*/

UnifyRule子类如下

UnifyRule子类

SubstitutionVisitor:

替换查询关系表达式树的核心类,使用从下而上的替换算法,可以进行一定的改写和条件补偿,查询关系表达式和物化结果查询表达式不必完全相等

/**
 * Substitutes part of a tree of relational expressions with another tree.
 * 

Uses a bottom-up matching algorithm. Nodes do not need to be identical. * At each level, returns the residue.

*/

MutableRel

关系表达式RelNode在进行视图替换之前,会首先转换成MutableRel,之后使用MutableRel在SubstitutionVisitor中进行查询改写,当改写完成后,会把MutableRel再转成RelNode,它和RelNode是等价的,并且记录了它在父节点中的位置,便于视图替换的时候进行便利和回溯

/** Mutable equivalent of {@link RelNode}.
 *
 * 

Each node has mutable state, and keeps track of its parent and position * within parent. */

核心方法

org.apache.calcite.plan.SubstitutionVisitor#go(org.apache.calcite.rel.RelNode)

7. 视图替换核心流程

7.1替换过程数据

这里选择一个聚合上拉的例子分析下基于规则视图改写机制

物化视图SQL语句

select C,D, count(A) from "@jingda".employees
GROUP BY C,D

物化视图查询关系表达式

  LogicalAggregate(group=[{0, 1}], EXPR$2=[COUNT($2)])
    LogicalProject(C=[$2], D=[$3], A=[$0])
      ScanCrel(table=["@jingda".employees], columns=[`A`, `B`, `C`, `D`, `E`, `F`], splits=[1])

物化视图结果存储算子

LogicalProject(C=[$0], D=[$1], EXPR$2=[CAST($2):BIGINT NOT NULL])
  ScanCrel(table=["__accelerator"."7db4b655-d381-4cc8-ba6f-adc2c40d0153"."479ce684-efd6-4420-8a5b-68350789b8bb"], columns=[`C`, `D`, `EXPR$2`], splits=[3])

查询语句SQL语句

select D, count(A) from "@jingda".employees
GROUP BY D

查询语句关系表达式

 LogicalAggregate(group=[{0}], EXPR$1=[COUNT($1)])
  LogicalProject(D=[$3], A=[$0])
    ScanCrel(table=["@jingda".employees], columns=[`A`, `B`, `C`, `D`, `E`, `F`], splits=[1])

改写后的SQL语句示意

select D, sum(A) FROM
(select C,D, count(A) from "@jingda".employees GROUP BY C,D)

改写后的关系表达式
这个地方可以看到,查询关系表达式已经使用了提前物化好的结果进行了改写

LogicalAggregate(group=[{1}], EXPR$1=[$SUM0($2)])
  LogicalProject(C=[$0], D=[$1], EXPR$2=[CAST($2):BIGINT NOT NULL])
    ScanCrel(table=["__accelerator"."7db4b655-d381-4cc8-ba6f-adc2c40d0153"."479ce684-efd6-4420-8a5b-68350789b8bb"], columns=[`C`, `D`, `EXPR$2`], splits=[3])

7.2 数据流转图

数据流转图.png

图中初始为Query为查询的SQL语句,Target为生成物化视图的SQL语句,Replacement为物化视图存储的位置算子

经过第一轮是命中了CalcToCalcUnifyRule规则,对于底层下面的关系表达式进行改写变成了

Calc(program: (expr#0..2=[{inputs}], D=[$t1], A=[$t2]))
 Calc(program: (expr#0..5=[{inputs}], C=[$t2], D=[$t3], A=[$t0]))
   Scan(table: [@rp_test, employees])

第二轮是命中了AggregateOnCalcToAggregateUnifyRule规则,对于底层下面的关系表达式进行改写变成了

Aggregate(groupSet: {1}, groupSets: [{1}], calls: [$SUM0($2)])
  Aggregate(groupSet: {0, 1}, groupSets: [{0, 1}], calls: [COUNT($2)])

最后把整个查询的关系表达式改写成

Holder
  Aggregate(groupSet: {1}, groupSets: [{1}], calls: [$SUM0($2)])
    Project(projects: [$0, $1, CAST($2):BIGINT NOT NULL])
      Scan(table: [__accelerator, 1c4b39df-c7c2-4e40-aebb-dfa87faa80a9, 14c0517b-10e6-4d66-92d3-f68e451c4216])

7.3 核心代码分析

核心的代码在org.apache.calcite.plan.SubstitutionVisitor#go(org.apache.calcite.rel.mutable.MutableRel)中

for (;;) {
      int count = 0;
      MutableRel queryDescendant = query;
    outer:
      while (queryDescendant != null) {
        for (Replacement r : attempted) {
          // 如果当前查询节点已经使用物化视图进行了替换,就搜索queryDescendant的另一个分支
          if (r.stopTrying && queryDescendant == r.after) {
            // This node has been replaced by previous iterations in the
            // hope to match its ancestors and stopTrying indicates
            // there's no need to be matched again.
            queryDescendant = MutableRels.preOrderTraverseNext(queryDescendant);
            continue outer;
          }
        }
        final MutableRel next = MutableRels.preOrderTraverseNext(queryDescendant);
        final MutableRel childOrNext =
            queryDescendant.getInputs().isEmpty()
                ? next : queryDescendant.getInputs().get(0);
        // 对于当前queryDescendant,遍历所有物化视图的关系表达式节点
        for (MutableRel targetDescendant : targetDescendants) {
         // 根据关系表达式节点获取可用的规则UnifyRule
          for (UnifyRule rule
              : applicableRules(queryDescendant, targetDescendant)) {
            UnifyRuleCall call =
                rule.match(this, queryDescendant, targetDescendant);
            if (call != null) {
              // 执行规则
              final UnifyResult result = rule.apply(call);
              if (result != null) {
                // 说明找到了匹配的物化视图,处理局部视图替换的逻辑
                ++count;
                attempted.add(
                    new Replacement(result.call.query, result.result, result.stopTrying));
                result.call.query.replaceInParent(result.result);

                // Replace previous equivalents with new equivalents, higher up
                // the tree.
                for (int i = 0; i < rule.slotCount; i++) {
                  Collection equi = equivalents.get(slots[i]);
                  if (!equi.isEmpty()) {
                    equivalents.remove(slots[i], equi.iterator().next());
                  }
                }
                assert rowTypesAreEquivalent(result.result, result.call.query, Litmus.THROW);
                equivalents.put(result.result, result.call.query);
                // 如果待改写的节点等于物化视图结果,进行改写替换
                if (targetDescendant == target) {
                  // A real substitution happens. We purge the attempted
                  // replacement list and add them into substitution list.
                  // Meanwhile we stop matching the descendants and jump
                  // to the next subtree in pre-order traversal.
                  if (!target.equals(replacement)) {
                    Replacement r = replace(
                        query.getInput(), target, replacement.clone());
                    assert r != null
                        : rule + "should have returned a result containing the target.";
                    attempted.add(r);
                  }
                  substitutions.add(ImmutableList.copyOf(attempted));
                  attempted.clear();
                  queryDescendant = next;
                  continue outer;
                }
                // We will try walking the query tree all over again to see
                // if there can be any substitutions after the replacement
                // attempt.
                break outer;
              }
            }
          }
        }
        queryDescendant = childOrNext;
      }
      // Quit the entire loop if:
      // 1) we have walked the entire query tree with one or more successful
      //    substitutions, thus count != 0 && attempted.isEmpty();
      // 2) we have walked the entire query tree but have made no replacement
      //    attempt, thus count == 0 && attempted.isEmpty();
      // 3) we had done some replacement attempt in a previous walk, but in
      //    this one we have not found any potential matches or substitutions,
      //    thus count == 0 && !attempted.isEmpty().
      if (count == 0 || attempted.isEmpty()) {
        break;
      }
    }
    if (!attempted.isEmpty()) {
      // We had done some replacement attempt in the previous walk, but that
      // did not lead to any substitutions in this walk, so we need to recover
      // the replacement.
      undoReplacement(attempted);
    }
    return substitutions;

8. 查询改写技术总结

查询改写在业界大概的分类有三种技术

  • 基于结构信息改写
  • 基于规则视图替换
  • 基于语法改写
    本文介绍的是基于规则的视图替换技术,核心就是寻找查询关系表达式和物化视图表达式的相同视图,进行局部改写和替换,后面会介绍基于结构信息改写的技术特性,下面是三种技术的简单对比。
查询改写技术对比

原创不易,转载请注明出处,谢谢!

你可能感兴趣的:(calcite物化视图基于规则查询改写原理解析)