SubLink,子查询/子链接,他们的区别:子查询不在表达式中子句,子链接在in/exists表达式中的子句。
若以范围表的方式存在,则是子查询;
若以表达式的存在,则是子连接;
出现在FROM关键字后的子句是子查询语句,出现在where/on等约束条件或者投影中的子句是子连接
提升子链接,尝试将ANY和EXISTS子链接作为半联接或反半联接处理。
下面情况,不能实现提升。
子连接右操作数:不能出现包含上层任何Var对象
子连接左操作数:
一定与上层出现的Var结构体表示的对象有相同,如果没有,可以直接求解,不用和上层关联
不能引用上层出现的关系
不能出现易失函数
简单子查询的提升:
select * from t01 as a, (select * from t02) as b where a.id = b.t01id and a.c1 = 100;
转化为:
select * from t01 as a, t02 as b where a.id = b.t01id and a.c1 = 100;
explain
select*from t01 as a,
(select*from t02) as b
where a.id = b.t01id
and a.c1 =100;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=9.55..3118.70rows=899 width=55)
-> Bitmap Heap Scan on t01 a (cost=4.35..32.11rows=9 width=41)
Recheck Cond: (c1 =100)
-> Bitmap Index Scan on idx_t01_c1 (cost=0.00..4.35rows=9 width=0)
Index Cond: (c1 =100)
-> Bitmap Heap Scan on t02 (cost=5.20..341.95rows=100 width=14)
Recheck Cond: (t01id = a.id)
-> Bitmap Index Scan on idx_t02_t01id (cost=0.00..5.17rows=100 width=0)
Index Cond: (t01id = a.id)
子查询含有集合操作、聚合操作、sort/limit/with/group, 当关键列的过滤条件使用常量,可以支持提升。
explain
select*from t01 as a,
(select t01id, count(*) tups from t02 groupby t01id) as b
where a.id = b.t01id
and a.id =100;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.71..17.98rows=100 width=53)
-> Index Scan using t01_pkey on t01 a (cost=0.29..8.30rows=1 width=41)
Index Cond: (id =100)
-> GroupAggregate (cost=0.42..7.67rows=100 width=12)
Group Key: t02.t01id
-> Index Only Scan using idx_t02_t01id on t02 (cost=0.42..6.17rows=100 width=4)
Index Cond: (t01id =100)
并不是所有的子查询都能提升,含有集合操作、聚合操作、sort/limit/with/group、易失函数、from为空等,关键列的过滤条件使用非常量,是不支持提升的。 如下:
explain
select*from t01 as a,
(select t01id, count(*) tups from t02 groupby t01id) as b
where a.id = b.t01id
and a.c1 =100;
QUERY PLAN
-----------------------------------------------------------------------------------
Hash Join (cost=20467.23..20693.85rows=9 width=53)
Hash Cond: (t02.t01id = a.id)
-> HashAggregate (cost=20435.00..20535.16rows=10016 width=12)
Group Key: t02.t01id
-> Seq Scan on t02 (cost=0.00..15435.00rows=1000000 width=4)
-> Hash (cost=32.11..32.11rows=9 width=41)
-> Bitmap Heap Scan on t01 a (cost=4.35..32.11rows=9 width=41)
Recheck Cond: (c1 =100)
-> Bitmap Index Scan on idx_t01_c1 (cost=0.00..4.35rows=9 width=0)
Index Cond: (c1 =100)
子查询含有集合操作、聚合操作、sort/limit/with/group、易失函数、from为空等,关键列的过滤条件使用非常量,通过SQL改写,实现提升子查询。
lateral关键字,可以将关联条件,写入子查询中,让子查询可以循环执行。
explain
select*from t01 as a,
lateral (select t01id, count(*) tups, sum(b.v1) v1 from t02 as b where a.id = b.t01id groupby t01id)
where a.c1 =100;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=9.56..3390.67rows=900 width=85)
-> Bitmap Heap Scan on t01 a (cost=4.35..32.11rows=9 width=41)
Recheck Cond: (c1 =100)
-> Bitmap Index Scan on idx_t01_c1 (cost=0.00..4.35rows=9 width=0)
Index Cond: (c1 =100)
-> GroupAggregate (cost=5.21..371.17rows=100 width=44)
Group Key: b.t01id
-> Bitmap Heap Scan on t02 b (cost=5.21..369.17rows=101 width=10)
Recheck Cond: (a.id = t01id)
-> Bitmap Index Scan on idx_t02_t01id (cost=0.00..5.18rows=101 width=0)
Index Cond: (t01id = a.id)
使用any和array将关联条件,转化为稳定函数
explain
with a as (select*from t01 as a
where a.c1 =100)
select*from a,
(select t01id, count(*) tups, sum(b.v1) v1 from t02 groupby t01id) as b
where a.id = b.t01id
and b.t01id =any (array (select id from a)) ;
QUERY PLAN
-----------------------------------------------------------------------------------
Hash Join (cost=2659.27..2684.79rows=43 width=84)
Hash Cond: (t02.t01id = a.id)
CTE a
-> Bitmap Heap Scan on t01 a_1 (cost=4.35..32.11rows=9 width=41)
Recheck Cond: (c1 =100)
-> Bitmap Index Scan on idx_t01_c1 (cost=0.00..4.35rows=9 width=0)
Index Cond: (c1 =100)
InitPlan 2 (returns $1)
-> CTE Scan on a a_2 (cost=0.00..0.18rows=9 width=4)
-> HashAggregate (cost=2626.68..2638.63rows=956 width=44)
Group Key: t02.t01id
-> Bitmap Heap Scan on t02 (cost=52.08..2619.15rows=1005 width=10)
Recheck Cond: (t01id =ANY ($1))
-> Bitmap Index Scan on idx_t02_t01id (cost=0.00..51.83rows=1005 width=0)
Index Cond: (t01id =ANY ($1))
-> Hash (cost=0.18..0.18rows=9 width=40)
-> CTE Scan on a (cost=0.00..0.18rows=9 width=40)
为了便于查询语句的共享与功能扩展,可以将关联表嵌入子查询,并使用新的子查询脚本,创建公用视图。
createview v_0102 asselect (a).*, tups
from (select a, count(*) tups,sum(v1) v1
from t02 b
join t01 a on b.t01id = a.id
groupby b.t01id, a);
explain
select*from v_0102
where c1 =100;
QUERY PLAN
-----------------------------------------------------------------------------------
Subquery Scan on t (cost=141.10..168.10rows=900 width=48)
-> GroupAggregate (cost=141.10..159.10rows=900 width=109)
Group Key: b.t01id, a.*-> Sort (cost=141.10..143.35rows=900 width=69)
Sort Key: b.t01id, a.*-> Nested Loop (cost=4.78..96.94rows=900 width=69)
-> Bitmap Heap Scan on t01 a (cost=4.35..32.11rows=9 width=69)
Recheck Cond: (c1 =1000)
-> Bitmap Index Scan on idx_t01_c1 (cost=0.00..4.35rows=9 width=0)
Index Cond: (c1 =1000)
-> Index Only Scan using idx_t02_t01id on t02 b (cost=0.42..6.19rows=101 width=4)
Index Cond: (t01id = a.id)
应用程序通过 SQL 语句来操作数据库时会使用大量的子查询,这种写法比直接对两个表做连接操作在结构上和思路上更清晰,尤其是在一些比较复杂的查询语句中,子查询有更完整、更独立的语义,会使 SQL 对业务逻辑的表达更清晰更容易理解,因此得到了广泛的应用。
子查询 SubQuery:对应于查询解析树中的范围表,更通俗一些指的是出现在 FROM/JOIN 语句后面的独立的 SELECT 语句。
子链接 SubLink:对应于查询解析树中的表达式,更通俗一些指的是出现在 where/on 子句、selectlist 里面的语句。
综上,对于查询解析树而言,SubQuery 的本质是范围表,而 SubLink 的本质是表达式。
其中分析系统和事务分析混合系统场景中,常用的 sublink 为 exist_sublink、any_sublink,在 Kingbase的优化引擎中对其应用场景做了优化(子链接提升),由于 SQL 语句中子查询的使用的灵活性,会带来 SQL 子查询过于复杂造成性能问题 。
复杂SQL子查询,通过SQL改写,可以转化为子链接,实现提升子链接。提升后的子链接,虽然可以提升在少量数据的性能,但随着数据量的增加,执行时长就会大幅度超过HASH JOIN全量数据。