使用EXPLAIN SYNTAX优化ClickHouse查询语句

ClickHouse 能够自动优化简单查询,可以使用EXPLAIN SYNTAX手动优化查询提升性能,但对于复杂查询需要更多的方式组合使用。

示例1

CREATE TABLE deleteme
(
    `number` UInt64
)
ENGINE = MergeTree
PARTITION BY number % 10
ORDER BY number AS
SELECT number
FROM numbers(100000000)

该查询设置了10个分区,下面在第一个分区中执行查询。

SELECT sum(number)
FROM
(
    SELECT number
    FROM deleteme
)
WHERE (number % 10) = 1

Query id: 5a187605-22e9-4303-b2d7-b429d38832f3

┌─────sum(number)─┐
│ 499999960000000 │
└─────────────────┘

1 rows in set. Elapsed: 0.065 sec. Processed 10.00 million rows, 80.00 MB (154.64 million rows/s., 1.24 GB/s.)

我们看到实际我们仅需要10M行记录。

如果您使用EXPLAIN SYNTAX,您将看到ClickHouse自动优化了查询,将WHERE过滤器推到子查询内部。这就是为什么它只需要处理表的1/10。

EXPLAIN SYNTAX
SELECT sum(number)
FROM
(
    SELECT number
    FROM deleteme
)
WHERE (number % 10) = 1

Query id: babfaeff-5904-486f-9c6a-caad90392e0f

┌─explain─────────────────────┐
│ SELECT sum(number)          │
│ FROM                        │
│ (                           │
│     SELECT number           │
│     FROM deleteme           │
│     WHERE (number % 10) = 1 │
│ )                           │
│ WHERE (number % 10) = 1     │
└─────────────────────────────┘

现在尝试一个更复杂的查询,涉及自连接:

SELECT sum(number) AS n
FROM deleteme AS a
INNER JOIN
(
    SELECT number
    FROM deleteme
) AS b ON a.number = b.number
WHERE (a.number % 10) = 1

Query id: c7ca27eb-783b-4296-aed5-6da585f0da51

┌───────────────n─┐
│ 499999960000000 │
└─────────────────┘

1 rows in set. Elapsed: 10.693 sec. Processed 110.00 million rows, 880.00 MB (10.29 million rows/s., 82.30 MB/s.)

我们看到ClickHouse处理110M行,我们再次利用EXPLAIN SYNTAX查看ClickHouse我们的建议:

EXPLAIN SYNTAX
SELECT sum(number) AS n
FROM deleteme AS a
INNER JOIN
(
    SELECT number
    FROM deleteme
) AS b ON a.number = b.number
WHERE (a.number % 10) = 1

Query id: edd566cb-f7f2-4e5b-b5b1-c7aa1812feec

┌─explain─────────────────────┐
│ SELECT sum(number) AS n     │
│ FROM deleteme AS a          │
│ ALL INNER JOIN              │
│ (                           │
│     SELECT number           │
│     FROM deleteme           │
│ ) AS b ON number = b.number │
│ WHERE (number % 10) = 1     │
└─────────────────────────────┘

这一次,ClickHouse没有下推where,对子查询进行预过滤,因此读了100M行计算子查询。基于Explain 提供的信息,我们手动查询并移动where条件至子查询中:

SELECT sum(number) AS n
FROM deleteme AS a
INNER JOIN
(
    SELECT number
    FROM deleteme
    WHERE (number % 10) = 1
) AS b ON a.number = b.number
WHERE (a.number % 10) = 1

Query id: aba46ec5-1948-4ea9-a6a4-fcca895eea08

┌───────────────n─┐
│ 499999960000000 │
└─────────────────┘

1 rows in set. Elapsed: 1.857 sec. Processed 20.00 million rows, 160.00 MB (10.77 million rows/s., 86.17 MB/s.)

我们看到结果相同,但速度快了一个数量级。

示例2

假如我们需要同时对某列计算sum和count,正常语法为:

SELECT sum(ColumnA), count(ColumnA) FROM my_table

然而对一列值,首先计算sum,然后再计算count,意味着读取两次。ClickHouse提供了sumCount函数可以优化该查询,同时计算两个函数,并返回tuple,格式为(sum,count).

SELECT sumCount(ColumnA) FROM my_table

┌─sumCount(x)─┐
│ (122,14)    │
└─────────────┘

我们能启用自动优化查询配置参数 :optimize_syntax_fuse_functions。然后通过EXPLAIN SYNTAX 进行验证:

EXPLAIN SYNTAX SELECT sum(ColumnA), count(ColumnA) FROM my_table

再引用官网的示例:

CREATE TABLE fuse_tbl(a Int8, b Int8) Engine = Log;
SET optimize_syntax_fuse_functions = 1;
EXPLAIN SYNTAX SELECT sum(a), sum(b), count(b), avg(b) from fuse_tbl FORMAT TSV;

返回结果为:

SELECT
    sum(a),
    sumCount(b).1,
    sumCount(b).2,
    (sumCount(b).1) / (sumCount(b).2)
FROM fuse_tbl

总结

虽然ClickHouse能够自动优化一些查询,但对于复杂查询并不能自动优化。基于EXPLAIN SYNTAX进行分析结果并结合查询日志,会对复杂查询优化提供更多的优化思路。

你可能感兴趣的:(ClickHouse,clickhouse,数据库,sql)