解读ClickHouse日志中的SQL查询计划

截至目前，ClickHouse并未在正式版本中提供查看查询计划的原生EXPLAIN语法。虽然GitHub上对应的Pull Request已经实现了，但是还没有发布，需要自行编译新版源码才能享受到便利。不过，我们可以通过ClickHouse的日志间接地读出每条SQL的查询计划，借助clickhouse-client，命令形式如下：

clickhouse-client -h  --port  --password  --send_logs_level=trace <<< "
// SQL statement here
" > /dev/null

其中，send_logs_level参数指定日志等级为trace，<<<将SQL语句重定向至clickhouse-client进行查询，> /dev/null将查询结果重定向到空设备吞掉，以便观察日志。

话休絮烦，看一个简单的示例。

~ clickhouse-client -h localhost --port 9070 --password ck2020 --send_logs_level=trace <<< "
SELECT event_type,category_id,merchandise_id FROM ods.analytics_access_log
WHERE ts_date = '2020-08-09' AND site_id = 10037
" > /dev/null

[bigdata-ck-test001] 2020.08.09 21:45:06.872889 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  executeQuery: (from 127.0.0.1:55270) SELECT event_type, category_id, merchandise_id FROM ods.analytics_access_log WHERE (ts_date = '2020-08-09') AND (site_id = 10037)
[bigdata-ck-test001] 2020.08.09 21:45:06.873151 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(site_id = 10037) AND (ts_date = '2020-08-09')" moved to PREWHERE
[bigdata-ck-test001] 2020.08.09 21:45:06.873577 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  ods.analytics_access_log (SelectExecutor): Key condition: (column 1 in [10037, 10037]), unknown, and
[bigdata-ck-test001] 2020.08.09 21:45:06.873596 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  ods.analytics_access_log (SelectExecutor): MinMax index condition: unknown, (column 0 in [18483, 18483]), and
[bigdata-ck-test001] 2020.08.09 21:45:06.873705 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  ods.analytics_access_log (SelectExecutor): Selected 8 parts by date, 8 parts by key, 49 marks to read from 39 ranges
[bigdata-ck-test001] 2020.08.09 21:45:06.873923 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  ods.analytics_access_log (SelectExecutor): Reading approx. 345563 rows with 8 streams
[bigdata-ck-test001] 2020.08.09 21:45:06.874091 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  InterpreterSelectQuery: FetchColumns -> Complete
[bigdata-ck-test001] 2020.08.09 21:45:06.874172 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  executeQuery: Query pipeline:
Union
 Expression × 8
  Expression
   MergeTreeThread

[bigdata-ck-test001] 2020.08.09 21:45:06.879051 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  UnionBlockInputStream: Waiting for threads to finish
[bigdata-ck-test001] 2020.08.09 21:45:06.879070 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  UnionBlockInputStream: Waited for threads to finish
[bigdata-ck-test001] 2020.08.09 21:45:06.879174 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  executeQuery: Read 332795 rows, 8.67 MiB in 0.006 sec., 53309149 rows/sec., 1.36 GiB/sec.
[bigdata-ck-test001] 2020.08.09 21:45:06.879199 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  MemoryTracker: Peak memory usage (for query): 7.41 MiB.

其中，ts_date为分区（PARTITION BY）列，site_id为索引（ORDER BY）列。下面我们逐行分析之。

condition "(site_id = 10037) AND (ts_date = '2020-08-09')" moved to PREWHERE
这一行表示两个WHERE谓词都被优化成了PREWHERE。PREWHERE是MergeTree引擎族特有的优化逻辑，即先根据指定的列进行数据过滤，过滤完之后再取被查询列的数据，可以大大提高效率。默认不需要我们手写PREWHERE逻辑，因为optimize_move_to_prewhere参数默认是开启的，可以由ClickHouse自动优化。
Key condition: (column 1 in [10037, 10037]), unknown
这一行表示查询时使用了索引列site_id，范围为[10037, 10037]。
MinMax index condition: unknown, (column 0 in [18483, 18483])
这一行表示查询时使用了分区列ts_date（注意分区列本身是有minmax索引的），范围为[18483, 18483]。为什么是这个数呢？因为Date类型在ClickHouse中存储时，是以距离1970年1月1日的天数来记录的。
Selected 8 parts by date, 8 parts by key, 49 marks to read from 39 ranges
这一行表示查询时需要扫描多少个数据分块（parts），多少个稀疏索引的标记（marks）以及索引范围（ranges）。
Reading approx. 345563 rows with 8 streams
这一行表示需要扫描大约多少行数据，以及所使用的线程数。
Query pipeline:
以下这些是查询计划的流水线，× 8表示8个线程并行处理。Expression似乎有些语焉不详，其实下面的Expression表示WHERE子句，而上面的Expression表示SELECT子句。

Union
 Expression × 8
  Expression
   MergeTreeThread

Read 332795 rows, 8.67 MiB in 0.006 sec., 53309149 rows/sec., 1.36 GiB/sec.
Peak memory usage (for query): 7.41 MiB.
这两行表示实际执行查询时读取的数据行数、数据大小、速率，以及内存用量峰值。

由此可见，ClickHouse日志中的查询计划还是能够给我们提供一些优化SQL的思路的，尤其是索引和扫描数据量的统计，非常直接。

下面再来举一个两表关联的示例。

~ clickhouse-client -h localhost --port 9070 --password ck2020 --send_logs_level=trace <<< "
SELECT a.column_type,a.category_id,a.merchandise_id,b.price
FROM ods.analytics_access_log a
LEFT JOIN ods.ms_order_done b
ON a.ts_date = b.ts_date AND a.merchandise_id = b.merchandise_id
WHERE a.ts_date = '2020-08-09' AND a.site_id = 10037 AND b.rebate_amount > 0
" > /dev/null

[bigdata-ck-test001] 2020.08.09 22:07:25.906404 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  executeQuery: (from 127.0.0.1:55676) SELECT a.column_type, a.category_id, a.merchandise_id, b.price FROM ods.analytics_access_log AS a LEFT JOIN ods.ms_order_done AS b ON (a.ts_date = b.ts_date) AND (a.merchandise_id = b.merchandise_id) WHERE (a.ts_date = '2020-08-09') AND (a.site_id = 10037) AND (b.rebate_amount > 0)
[bigdata-ck-test001] 2020.08.09 22:07:25.906737 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(site_id = 10037) AND (ts_date = '2020-08-09')" moved to PREWHERE
[bigdata-ck-test001] 2020.08.09 22:07:25.907057 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  Join: setSampleBlock: rebate_amount Int64 Int64(size = 0), price Int64 Int64(size = 0), b.merchandise_id Int64 Int64(size = 0), b.ts_date Date UInt16(size = 0)
[bigdata-ck-test001] 2020.08.09 22:07:25.907641 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  Join: setSampleBlock: rebate_amount Int64 Int64(size = 0), price Int64 Int64(size = 0), b.merchandise_id Int64 Int64(size = 0), b.ts_date Date UInt16(size = 0)
[bigdata-ck-test001] 2020.08.09 22:07:25.908164 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  ods.analytics_access_log (SelectExecutor): Key condition: (column 1 in [10037, 10037]), unknown, and, unknown, and, (column 1 in [10037, 10037]), unknown, and, and
[bigdata-ck-test001] 2020.08.09 22:07:25.908182 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  ods.analytics_access_log (SelectExecutor): MinMax index condition: unknown, (column 0 in [18483, 18483]), and, unknown, and, unknown, (column 0 in [18483, 18483]), and, and
[bigdata-ck-test001] 2020.08.09 22:07:25.908328 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  ods.analytics_access_log (SelectExecutor): Selected 6 parts by date, 6 parts by key, 51 marks to read from 40 ranges
[bigdata-ck-test001] 2020.08.09 22:07:25.908551 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  ods.analytics_access_log (SelectExecutor): Reading approx. 359443 rows with 6 streams
[bigdata-ck-test001] 2020.08.09 22:07:25.908711 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  InterpreterSelectQuery: FetchColumns -> Complete
[bigdata-ck-test001] 2020.08.09 22:07:25.908920 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  executeQuery: Query pipeline:
Expression
 CreatingSets
  Lazy
  Union
   Expression × 6
    Filter
     Expression
      MergeTreeThread

[bigdata-ck-test001] 2020.08.09 22:07:25.909040 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  CreatingSetsBlockInputStream: Creating join.
[bigdata-ck-test001] 2020.08.09 22:07:25.909162 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  ods.ms_order_done (SelectExecutor): Key condition: unknown
[bigdata-ck-test001] 2020.08.09 22:07:25.909172 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  ods.ms_order_done (SelectExecutor): MinMax index condition: unknown
[bigdata-ck-test001] 2020.08.09 22:07:25.909314 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  ods.ms_order_done (SelectExecutor): Selected 86 parts by date, 86 parts by key, 843 marks to read from 86 ranges
[bigdata-ck-test001] 2020.08.09 22:07:25.910543 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  ods.ms_order_done (SelectExecutor): Reading approx. 6395624 rows with 20 streams
[bigdata-ck-test001] 2020.08.09 22:07:25.911586 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  InterpreterSelectQuery: FetchColumns -> Complete
[bigdata-ck-test001] 2020.08.09 22:07:26.403970 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  CreatingSetsBlockInputStream: Created. Join with 519299 entries from 6118166 rows. In 0.495 sec.
[bigdata-ck-test001] 2020.08.09 22:07:26.441105 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  UnionBlockInputStream: Waiting for threads to finish
[bigdata-ck-test001] 2020.08.09 22:07:26.441143 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  UnionBlockInputStream: Waited for threads to finish
[bigdata-ck-test001] 2020.08.09 22:07:26.441161 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  UnionBlockInputStream: Waiting for threads to finish
[bigdata-ck-test001] 2020.08.09 22:07:26.441176 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  UnionBlockInputStream: Waited for threads to finish
[bigdata-ck-test001] 2020.08.09 22:07:26.441308 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  executeQuery: Read 6474417 rows, 160.76 MiB in 0.535 sec., 12104900 rows/sec., 300.56 MiB/sec.
[bigdata-ck-test001] 2020.08.09 22:07:26.441329 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  MemoryTracker: Peak memory usage (for query): 410.84 MiB.

重复的东西就不再分析了，重点观察ods.ms_order_done (SelectExecutor): Key condition: unknown、ods.ms_order_done (SelectExecutor): MinMax index condition: unknown及以下的几行日志，说明按照上面SQL语句的写法，JOIN的谓词没有被下推，右表的日期分区和WHERE过滤条件都是全表扫描之后才执行的，非常浪费。

接下来我们手动将谓词以子查询的形式写在里面，再观察一次查询计划。

~ clickhouse-client -h localhost --port 9070 --password ck2020 --send_logs_level=trace <<< "
SELECT a.column_type,a.category_id,a.merchandise_id,b.price
FROM (
  SELECT column_type,category_id,merchandise_id
  FROM ods.analytics_access_log
  WHERE ts_date = '2020-08-09' AND site_id = 10037
) a
LEFT JOIN (
  SELECT merchandise_id,price
  FROM ods.ms_order_done
  WHERE ts_date = '2020-08-09' AND rebate_amount > 0
) b
ON a.merchandise_id = b.merchandise_id
" > /dev/null

[bigdata-ck-test001] 2020.08.09 22:15:49.269429 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  executeQuery: (from 127.0.0.1:55686) SELECT a.column_type, a.category_id, a.merchandise_id, b.price FROM (SELECT column_type, category_id, merchandise_id FROM ods.analytics_access_log WHERE (ts_date = '2020-08-09') AND (site_id = 10037)) AS a LEFT JOIN (SELECT merchandise_id, price FROM ods.ms_order_done WHERE (ts_date = '2020-08-09') AND (rebate_amount > 0)) AS b ON a.merchandise_id = b.merchandise_id
## 略去......
[bigdata-ck-test001] 2020.08.09 22:15:49.272530 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  executeQuery: Query pipeline:
Expression
 CreatingSets
  Lazy
  Union
   Expression × 8
    Expression
     Expression
      Expression
       MergeTreeThread

[bigdata-ck-test001] 2020.08.09 22:15:49.272668 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  CreatingSetsBlockInputStream: Creating join.
[bigdata-ck-test001] 2020.08.09 22:15:49.272942 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  ods.ms_order_done (SelectExecutor): Key condition: unknown, unknown, and, unknown, and
[bigdata-ck-test001] 2020.08.09 22:15:49.272965 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  ods.ms_order_done (SelectExecutor): MinMax index condition: (column 0 in [18483, 18483]), unknown, and, (column 0 in [18483, 18483]), and
[bigdata-ck-test001] 2020.08.09 22:15:49.272990 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  ods.ms_order_done (SelectExecutor): Selected 5 parts by date, 5 parts by key, 23 marks to read from 5 ranges
[bigdata-ck-test001] 2020.08.09 22:15:49.273175 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  ods.ms_order_done (SelectExecutor): Reading approx. 163922 rows with 5 streams
[bigdata-ck-test001] 2020.08.09 22:15:49.273335 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  InterpreterSelectQuery: FetchColumns -> Complete
[bigdata-ck-test001] 2020.08.09 22:15:49.280033 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  CreatingSetsBlockInputStream: Created. Join with 10788 entries from 143210 rows. In 0.007 sec.
[bigdata-ck-test001] 2020.08.09 22:15:49.305967 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  UnionBlockInputStream: Waiting for threads to finish
[bigdata-ck-test001] 2020.08.09 22:15:49.305990 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  UnionBlockInputStream: Waited for threads to finish
[bigdata-ck-test001] 2020.08.09 22:15:49.306014 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  UnionBlockInputStream: Waiting for threads to finish
[bigdata-ck-test001] 2020.08.09 22:15:49.306023 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  UnionBlockInputStream: Waited for threads to finish
[bigdata-ck-test001] 2020.08.09 22:15:49.306143 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  executeQuery: Read 510445 rows, 12.52 MiB in 0.037 sec., 13918334 rows/sec., 341.43 MiB/sec.
[bigdata-ck-test001] 2020.08.09 22:15:49.306167 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  MemoryTracker: Peak memory usage (for query): 58.70 MiB.

可见，通过改变SQL的写法，右表的数据在关联之前就得到了分区过滤与谓词过滤，处理数据量和内存占用都大大减少了。由此也可看出，ClickHouse的SQL优化器是比较弱的，需要我们多加注意。

最后，由于必须要实际执行查询才能输出日志，所以当预期的查询结果集比较大时，需要加上LIMIT子句来限制返回的数据量。

明天早起搬砖，民那晚安晚安。

解读ClickHouse日志中的SQL查询计划

你可能感兴趣的:(ClickHouse,mysql,java,数据库,sql,索引)