ClickHouse 能够自动优化简单查询,可以使用
EXPLAIN SYNTAX
手动优化查询提升性能,但对于复杂查询需要更多的方式组合使用。
CREATE TABLE deleteme
(
`number` UInt64
)
ENGINE = MergeTree
PARTITION BY number % 10
ORDER BY number AS
SELECT number
FROM numbers(100000000)
该查询设置了10个分区,下面在第一个分区中执行查询。
SELECT sum(number)
FROM
(
SELECT number
FROM deleteme
)
WHERE (number % 10) = 1
Query id: 5a187605-22e9-4303-b2d7-b429d38832f3
┌─────sum(number)─┐
│ 499999960000000 │
└─────────────────┘
1 rows in set. Elapsed: 0.065 sec. Processed 10.00 million rows, 80.00 MB (154.64 million rows/s., 1.24 GB/s.)
我们看到实际我们仅需要10M行记录。
如果您使用EXPLAIN SYNTAX
,您将看到ClickHouse自动优化了查询,将WHERE
过滤器推到子查询内部。这就是为什么它只需要处理表的1/10。
EXPLAIN SYNTAX
SELECT sum(number)
FROM
(
SELECT number
FROM deleteme
)
WHERE (number % 10) = 1
Query id: babfaeff-5904-486f-9c6a-caad90392e0f
┌─explain─────────────────────┐
│ SELECT sum(number) │
│ FROM │
│ ( │
│ SELECT number │
│ FROM deleteme │
│ WHERE (number % 10) = 1 │
│ ) │
│ WHERE (number % 10) = 1 │
└─────────────────────────────┘
现在尝试一个更复杂的查询,涉及自连接:
SELECT sum(number) AS n
FROM deleteme AS a
INNER JOIN
(
SELECT number
FROM deleteme
) AS b ON a.number = b.number
WHERE (a.number % 10) = 1
Query id: c7ca27eb-783b-4296-aed5-6da585f0da51
┌───────────────n─┐
│ 499999960000000 │
└─────────────────┘
1 rows in set. Elapsed: 10.693 sec. Processed 110.00 million rows, 880.00 MB (10.29 million rows/s., 82.30 MB/s.)
我们看到ClickHouse处理110M行,我们再次利用EXPLAIN SYNTAX查看ClickHouse我们的建议:
EXPLAIN SYNTAX
SELECT sum(number) AS n
FROM deleteme AS a
INNER JOIN
(
SELECT number
FROM deleteme
) AS b ON a.number = b.number
WHERE (a.number % 10) = 1
Query id: edd566cb-f7f2-4e5b-b5b1-c7aa1812feec
┌─explain─────────────────────┐
│ SELECT sum(number) AS n │
│ FROM deleteme AS a │
│ ALL INNER JOIN │
│ ( │
│ SELECT number │
│ FROM deleteme │
│ ) AS b ON number = b.number │
│ WHERE (number % 10) = 1 │
└─────────────────────────────┘
这一次,ClickHouse没有下推where,对子查询进行预过滤,因此读了100M行计算子查询。基于Explain 提供的信息,我们手动查询并移动where条件至子查询中:
SELECT sum(number) AS n
FROM deleteme AS a
INNER JOIN
(
SELECT number
FROM deleteme
WHERE (number % 10) = 1
) AS b ON a.number = b.number
WHERE (a.number % 10) = 1
Query id: aba46ec5-1948-4ea9-a6a4-fcca895eea08
┌───────────────n─┐
│ 499999960000000 │
└─────────────────┘
1 rows in set. Elapsed: 1.857 sec. Processed 20.00 million rows, 160.00 MB (10.77 million rows/s., 86.17 MB/s.)
我们看到结果相同,但速度快了一个数量级。
假如我们需要同时对某列计算sum和count,正常语法为:
SELECT sum(ColumnA), count(ColumnA) FROM my_table
然而对一列值,首先计算sum,然后再计算count,意味着读取两次。ClickHouse提供了sumCount函数可以优化该查询,同时计算两个函数,并返回tuple,格式为(sum,count).
SELECT sumCount(ColumnA) FROM my_table
┌─sumCount(x)─┐
│ (122,14) │
└─────────────┘
我们能启用自动优化查询配置参数 :optimize_syntax_fuse_functions。然后通过EXPLAIN SYNTAX 进行验证:
EXPLAIN SYNTAX SELECT sum(ColumnA), count(ColumnA) FROM my_table
再引用官网的示例:
CREATE TABLE fuse_tbl(a Int8, b Int8) Engine = Log;
SET optimize_syntax_fuse_functions = 1;
EXPLAIN SYNTAX SELECT sum(a), sum(b), count(b), avg(b) from fuse_tbl FORMAT TSV;
返回结果为:
SELECT
sum(a),
sumCount(b).1,
sumCount(b).2,
(sumCount(b).1) / (sumCount(b).2)
FROM fuse_tbl
虽然ClickHouse能够自动优化一些查询,但对于复杂查询并不能自动优化。基于EXPLAIN SYNTAX进行分析结果并结合查询日志,会对复杂查询优化提供更多的优化思路。