1. 术语定义
物化视图:将视图的查询结果物化保存下来的结果。
物化视图 QueryRel: 生成物化视图的SQL关系表达式(查询语句)。
物化视图 TableRel:生成物化视图结果存储的关系表达式(存储物化视图的tableScan算子)。
COMPLETE : 查询表模型和物化视图表模型完全相同,比如查询引用了a,b,c三张表,物化视图也引用了a,b,c三张表。
VIEW_PARTIAL:查询表模型完全包含物化视图表模型,比如查询引用了a,b,c三张表,物化视图也引用了a,b两张表。
QUERY_PARTIAL: 物化试图表模型完全包含查询表模型,比如查询引用了a,b两张表,物化视图引用了a,b,c三张表。
2. 背景
物化视图指将SQL查询的结果保存下来。
查询使用物化视图改写是一种有效的加速方式,即将查询语句的全部或者部分改写成物化视图进行加速。
物化视图和查询完全等效可以直接命中查询,查询和物化视图不完全等效的情形下需要通过条件补偿,聚合上拉等方式,使用物化视图对于查询关系代数局部进行改写。
3. 问题定义
怎样通过基于规则的方式,使用物化视图对查询表达式进行局部改写?或者整体替换?
4. 概述
Calcite中基于规则UnifyRule查询改写的主要原理就是通过循环遍历查询SQL的RelNode关系表达式和生成物化视图QueryRelNode表达式,基于RelNode关系表达式命中对应的UnifyRule规则,如果匹配match UnifyRule规则,就调用对应规则的apply方法,使用物化结果的TableRel表达式对于查询SQL关系表达式进行改写。
5.流程图
5.1 结构关系示意图
在SubstitutionVistor中会使用UnifyRule,使用Target MutableNode 对于 Query MutableNode进行改写
5.2 视图替换流程图
图中相同的颜色表示相同的节点,视图替换核心流程示意图
6.核心组件
UnifyRule:
使用物化视图target对查询关系表达式进行改写的规则,下面是源码到注释
/** Rule that attempts to match a query relational expression
* against a target relational expression.
*
* The rule declares the query and target types; this allows the
* engine to fire only a few rules in a given context.
*/
UnifyRule子类如下
SubstitutionVisitor:
替换查询关系表达式树的核心类,使用从下而上的替换算法,可以进行一定的改写和条件补偿,查询关系表达式和物化结果查询表达式不必完全相等
/**
* Substitutes part of a tree of relational expressions with another tree.
* Uses a bottom-up matching algorithm. Nodes do not need to be identical.
* At each level, returns the residue.
*/
MutableRel
关系表达式RelNode在进行视图替换之前,会首先转换成MutableRel,之后使用MutableRel在SubstitutionVisitor中进行查询改写,当改写完成后,会把MutableRel再转成RelNode,它和RelNode是等价的,并且记录了它在父节点中的位置,便于视图替换的时候进行便利和回溯
/** Mutable equivalent of {@link RelNode}.
*
* Each node has mutable state, and keeps track of its parent and position
* within parent.
*/
核心方法
org.apache.calcite.plan.SubstitutionVisitor#go(org.apache.calcite.rel.RelNode)
7. 视图替换核心流程
7.1替换过程数据
这里选择一个聚合上拉的例子分析下基于规则视图改写机制
物化视图SQL语句
select C,D, count(A) from "@jingda".employees
GROUP BY C,D
物化视图查询关系表达式
LogicalAggregate(group=[{0, 1}], EXPR$2=[COUNT($2)])
LogicalProject(C=[$2], D=[$3], A=[$0])
ScanCrel(table=["@jingda".employees], columns=[`A`, `B`, `C`, `D`, `E`, `F`], splits=[1])
物化视图结果存储算子
LogicalProject(C=[$0], D=[$1], EXPR$2=[CAST($2):BIGINT NOT NULL])
ScanCrel(table=["__accelerator"."7db4b655-d381-4cc8-ba6f-adc2c40d0153"."479ce684-efd6-4420-8a5b-68350789b8bb"], columns=[`C`, `D`, `EXPR$2`], splits=[3])
查询语句SQL语句
select D, count(A) from "@jingda".employees
GROUP BY D
查询语句关系表达式
LogicalAggregate(group=[{0}], EXPR$1=[COUNT($1)])
LogicalProject(D=[$3], A=[$0])
ScanCrel(table=["@jingda".employees], columns=[`A`, `B`, `C`, `D`, `E`, `F`], splits=[1])
改写后的SQL语句示意
select D, sum(A) FROM
(select C,D, count(A) from "@jingda".employees GROUP BY C,D)
改写后的关系表达式
这个地方可以看到,查询关系表达式已经使用了提前物化好的结果进行了改写
LogicalAggregate(group=[{1}], EXPR$1=[$SUM0($2)])
LogicalProject(C=[$0], D=[$1], EXPR$2=[CAST($2):BIGINT NOT NULL])
ScanCrel(table=["__accelerator"."7db4b655-d381-4cc8-ba6f-adc2c40d0153"."479ce684-efd6-4420-8a5b-68350789b8bb"], columns=[`C`, `D`, `EXPR$2`], splits=[3])
7.2 数据流转图
图中初始为Query为查询的SQL语句,Target为生成物化视图的SQL语句,Replacement为物化视图存储的位置算子
经过第一轮是命中了CalcToCalcUnifyRule规则,对于底层下面的关系表达式进行改写变成了
Calc(program: (expr#0..2=[{inputs}], D=[$t1], A=[$t2]))
Calc(program: (expr#0..5=[{inputs}], C=[$t2], D=[$t3], A=[$t0]))
Scan(table: [@rp_test, employees])
第二轮是命中了AggregateOnCalcToAggregateUnifyRule规则,对于底层下面的关系表达式进行改写变成了
Aggregate(groupSet: {1}, groupSets: [{1}], calls: [$SUM0($2)])
Aggregate(groupSet: {0, 1}, groupSets: [{0, 1}], calls: [COUNT($2)])
最后把整个查询的关系表达式改写成
Holder
Aggregate(groupSet: {1}, groupSets: [{1}], calls: [$SUM0($2)])
Project(projects: [$0, $1, CAST($2):BIGINT NOT NULL])
Scan(table: [__accelerator, 1c4b39df-c7c2-4e40-aebb-dfa87faa80a9, 14c0517b-10e6-4d66-92d3-f68e451c4216])
7.3 核心代码分析
核心的代码在org.apache.calcite.plan.SubstitutionVisitor#go(org.apache.calcite.rel.mutable.MutableRel)中
for (;;) {
int count = 0;
MutableRel queryDescendant = query;
outer:
while (queryDescendant != null) {
for (Replacement r : attempted) {
// 如果当前查询节点已经使用物化视图进行了替换,就搜索queryDescendant的另一个分支
if (r.stopTrying && queryDescendant == r.after) {
// This node has been replaced by previous iterations in the
// hope to match its ancestors and stopTrying indicates
// there's no need to be matched again.
queryDescendant = MutableRels.preOrderTraverseNext(queryDescendant);
continue outer;
}
}
final MutableRel next = MutableRels.preOrderTraverseNext(queryDescendant);
final MutableRel childOrNext =
queryDescendant.getInputs().isEmpty()
? next : queryDescendant.getInputs().get(0);
// 对于当前queryDescendant,遍历所有物化视图的关系表达式节点
for (MutableRel targetDescendant : targetDescendants) {
// 根据关系表达式节点获取可用的规则UnifyRule
for (UnifyRule rule
: applicableRules(queryDescendant, targetDescendant)) {
UnifyRuleCall call =
rule.match(this, queryDescendant, targetDescendant);
if (call != null) {
// 执行规则
final UnifyResult result = rule.apply(call);
if (result != null) {
// 说明找到了匹配的物化视图,处理局部视图替换的逻辑
++count;
attempted.add(
new Replacement(result.call.query, result.result, result.stopTrying));
result.call.query.replaceInParent(result.result);
// Replace previous equivalents with new equivalents, higher up
// the tree.
for (int i = 0; i < rule.slotCount; i++) {
Collection equi = equivalents.get(slots[i]);
if (!equi.isEmpty()) {
equivalents.remove(slots[i], equi.iterator().next());
}
}
assert rowTypesAreEquivalent(result.result, result.call.query, Litmus.THROW);
equivalents.put(result.result, result.call.query);
// 如果待改写的节点等于物化视图结果,进行改写替换
if (targetDescendant == target) {
// A real substitution happens. We purge the attempted
// replacement list and add them into substitution list.
// Meanwhile we stop matching the descendants and jump
// to the next subtree in pre-order traversal.
if (!target.equals(replacement)) {
Replacement r = replace(
query.getInput(), target, replacement.clone());
assert r != null
: rule + "should have returned a result containing the target.";
attempted.add(r);
}
substitutions.add(ImmutableList.copyOf(attempted));
attempted.clear();
queryDescendant = next;
continue outer;
}
// We will try walking the query tree all over again to see
// if there can be any substitutions after the replacement
// attempt.
break outer;
}
}
}
}
queryDescendant = childOrNext;
}
// Quit the entire loop if:
// 1) we have walked the entire query tree with one or more successful
// substitutions, thus count != 0 && attempted.isEmpty();
// 2) we have walked the entire query tree but have made no replacement
// attempt, thus count == 0 && attempted.isEmpty();
// 3) we had done some replacement attempt in a previous walk, but in
// this one we have not found any potential matches or substitutions,
// thus count == 0 && !attempted.isEmpty().
if (count == 0 || attempted.isEmpty()) {
break;
}
}
if (!attempted.isEmpty()) {
// We had done some replacement attempt in the previous walk, but that
// did not lead to any substitutions in this walk, so we need to recover
// the replacement.
undoReplacement(attempted);
}
return substitutions;
8. 查询改写技术总结
查询改写在业界大概的分类有三种技术
- 基于结构信息改写
- 基于规则视图替换
- 基于语法改写
本文介绍的是基于规则的视图替换技术,核心就是寻找查询关系表达式和物化视图表达式的相同视图,进行局部改写和替换,后面会介绍基于结构信息改写的技术特性,下面是三种技术的简单对比。
原创不易,转载请注明出处,谢谢!