解读ClickHouse日志中的SQL查询计划

截至目前,ClickHouse并未在正式版本中提供查看查询计划的原生EXPLAIN语法。虽然GitHub上对应的Pull Request已经实现了,但是还没有发布,需要自行编译新版源码才能享受到便利。不过,我们可以通过ClickHouse的日志间接地读出每条SQL的查询计划,借助clickhouse-client,命令形式如下:

clickhouse-client -h  --port  --password  --send_logs_level=trace <<< "
// SQL statement here
" > /dev/null

其中,send_logs_level参数指定日志等级为trace,<<<将SQL语句重定向至clickhouse-client进行查询,> /dev/null将查询结果重定向到空设备吞掉,以便观察日志。

话休絮烦,看一个简单的示例。

~ clickhouse-client -h localhost --port 9070 --password ck2020 --send_logs_level=trace <<< "
SELECT event_type,category_id,merchandise_id FROM ods.analytics_access_log
WHERE ts_date = '2020-08-09' AND site_id = 10037
" > /dev/null

[bigdata-ck-test001] 2020.08.09 21:45:06.872889 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  executeQuery: (from 127.0.0.1:55270) SELECT event_type, category_id, merchandise_id FROM ods.analytics_access_log WHERE (ts_date = '2020-08-09') AND (site_id = 10037)
[bigdata-ck-test001] 2020.08.09 21:45:06.873151 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(site_id = 10037) AND (ts_date = '2020-08-09')" moved to PREWHERE
[bigdata-ck-test001] 2020.08.09 21:45:06.873577 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  ods.analytics_access_log (SelectExecutor): Key condition: (column 1 in [10037, 10037]), unknown, and
[bigdata-ck-test001] 2020.08.09 21:45:06.873596 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  ods.analytics_access_log (SelectExecutor): MinMax index condition: unknown, (column 0 in [18483, 18483]), and
[bigdata-ck-test001] 2020.08.09 21:45:06.873705 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  ods.analytics_access_log (SelectExecutor): Selected 8 parts by date, 8 parts by key, 49 marks to read from 39 ranges
[bigdata-ck-test001] 2020.08.09 21:45:06.873923 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  ods.analytics_access_log (SelectExecutor): Reading approx. 345563 rows with 8 streams
[bigdata-ck-test001] 2020.08.09 21:45:06.874091 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  InterpreterSelectQuery: FetchColumns -> Complete
[bigdata-ck-test001] 2020.08.09 21:45:06.874172 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  executeQuery: Query pipeline:
Union
 Expression × 8
  Expression
   MergeTreeThread

[bigdata-ck-test001] 2020.08.09 21:45:06.879051 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  UnionBlockInputStream: Waiting for threads to finish
[bigdata-ck-test001] 2020.08.09 21:45:06.879070 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  UnionBlockInputStream: Waited for threads to finish
[bigdata-ck-test001] 2020.08.09 21:45:06.879174 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  executeQuery: Read 332795 rows, 8.67 MiB in 0.006 sec., 53309149 rows/sec., 1.36 GiB/sec.
[bigdata-ck-test001] 2020.08.09 21:45:06.879199 {40825efb-8afe-4e67-b443-585e77da49d6} [ 141 ]  MemoryTracker: Peak memory usage (for query): 7.41 MiB.

其中,ts_date为分区(PARTITION BY)列,site_id为索引(ORDER BY)列。下面我们逐行分析之。

  • condition "(site_id = 10037) AND (ts_date = '2020-08-09')" moved to PREWHERE
    这一行表示两个WHERE谓词都被优化成了PREWHERE。PREWHERE是MergeTree引擎族特有的优化逻辑,即先根据指定的列进行数据过滤,过滤完之后再取被查询列的数据,可以大大提高效率。默认不需要我们手写PREWHERE逻辑,因为optimize_move_to_prewhere参数默认是开启的,可以由ClickHouse自动优化。

  • Key condition: (column 1 in [10037, 10037]), unknown
    这一行表示查询时使用了索引列site_id,范围为[10037, 10037]。

  • MinMax index condition: unknown, (column 0 in [18483, 18483])
    这一行表示查询时使用了分区列ts_date(注意分区列本身是有minmax索引的),范围为[18483, 18483]。为什么是这个数呢?因为Date类型在ClickHouse中存储时,是以距离1970年1月1日的天数来记录的。

  • Selected 8 parts by date, 8 parts by key, 49 marks to read from 39 ranges
    这一行表示查询时需要扫描多少个数据分块(parts),多少个稀疏索引的标记(marks)以及索引范围(ranges)。

  • Reading approx. 345563 rows with 8 streams
    这一行表示需要扫描大约多少行数据,以及所使用的线程数。

  • Query pipeline:
    以下这些是查询计划的流水线,× 8表示8个线程并行处理。Expression似乎有些语焉不详,其实下面的Expression表示WHERE子句,而上面的Expression表示SELECT子句。

Union
 Expression × 8
  Expression
   MergeTreeThread
  • Read 332795 rows, 8.67 MiB in 0.006 sec., 53309149 rows/sec., 1.36 GiB/sec.
  • Peak memory usage (for query): 7.41 MiB.
    这两行表示实际执行查询时读取的数据行数、数据大小、速率,以及内存用量峰值。

由此可见,ClickHouse日志中的查询计划还是能够给我们提供一些优化SQL的思路的,尤其是索引和扫描数据量的统计,非常直接。

下面再来举一个两表关联的示例。

~ clickhouse-client -h localhost --port 9070 --password ck2020 --send_logs_level=trace <<< "
SELECT a.column_type,a.category_id,a.merchandise_id,b.price
FROM ods.analytics_access_log a
LEFT JOIN ods.ms_order_done b
ON a.ts_date = b.ts_date AND a.merchandise_id = b.merchandise_id
WHERE a.ts_date = '2020-08-09' AND a.site_id = 10037 AND b.rebate_amount > 0
" > /dev/null

[bigdata-ck-test001] 2020.08.09 22:07:25.906404 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  executeQuery: (from 127.0.0.1:55676) SELECT a.column_type, a.category_id, a.merchandise_id, b.price FROM ods.analytics_access_log AS a LEFT JOIN ods.ms_order_done AS b ON (a.ts_date = b.ts_date) AND (a.merchandise_id = b.merchandise_id) WHERE (a.ts_date = '2020-08-09') AND (a.site_id = 10037) AND (b.rebate_amount > 0)
[bigdata-ck-test001] 2020.08.09 22:07:25.906737 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(site_id = 10037) AND (ts_date = '2020-08-09')" moved to PREWHERE
[bigdata-ck-test001] 2020.08.09 22:07:25.907057 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  Join: setSampleBlock: rebate_amount Int64 Int64(size = 0), price Int64 Int64(size = 0), b.merchandise_id Int64 Int64(size = 0), b.ts_date Date UInt16(size = 0)
[bigdata-ck-test001] 2020.08.09 22:07:25.907641 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  Join: setSampleBlock: rebate_amount Int64 Int64(size = 0), price Int64 Int64(size = 0), b.merchandise_id Int64 Int64(size = 0), b.ts_date Date UInt16(size = 0)
[bigdata-ck-test001] 2020.08.09 22:07:25.908164 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  ods.analytics_access_log (SelectExecutor): Key condition: (column 1 in [10037, 10037]), unknown, and, unknown, and, (column 1 in [10037, 10037]), unknown, and, and
[bigdata-ck-test001] 2020.08.09 22:07:25.908182 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  ods.analytics_access_log (SelectExecutor): MinMax index condition: unknown, (column 0 in [18483, 18483]), and, unknown, and, unknown, (column 0 in [18483, 18483]), and, and
[bigdata-ck-test001] 2020.08.09 22:07:25.908328 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  ods.analytics_access_log (SelectExecutor): Selected 6 parts by date, 6 parts by key, 51 marks to read from 40 ranges
[bigdata-ck-test001] 2020.08.09 22:07:25.908551 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  ods.analytics_access_log (SelectExecutor): Reading approx. 359443 rows with 6 streams
[bigdata-ck-test001] 2020.08.09 22:07:25.908711 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  InterpreterSelectQuery: FetchColumns -> Complete
[bigdata-ck-test001] 2020.08.09 22:07:25.908920 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  executeQuery: Query pipeline:
Expression
 CreatingSets
  Lazy
  Union
   Expression × 6
    Filter
     Expression
      MergeTreeThread

[bigdata-ck-test001] 2020.08.09 22:07:25.909040 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  CreatingSetsBlockInputStream: Creating join.
[bigdata-ck-test001] 2020.08.09 22:07:25.909162 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  ods.ms_order_done (SelectExecutor): Key condition: unknown
[bigdata-ck-test001] 2020.08.09 22:07:25.909172 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  ods.ms_order_done (SelectExecutor): MinMax index condition: unknown
[bigdata-ck-test001] 2020.08.09 22:07:25.909314 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  ods.ms_order_done (SelectExecutor): Selected 86 parts by date, 86 parts by key, 843 marks to read from 86 ranges
[bigdata-ck-test001] 2020.08.09 22:07:25.910543 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  ods.ms_order_done (SelectExecutor): Reading approx. 6395624 rows with 20 streams
[bigdata-ck-test001] 2020.08.09 22:07:25.911586 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  InterpreterSelectQuery: FetchColumns -> Complete
[bigdata-ck-test001] 2020.08.09 22:07:26.403970 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 94 ]  CreatingSetsBlockInputStream: Created. Join with 519299 entries from 6118166 rows. In 0.495 sec.
[bigdata-ck-test001] 2020.08.09 22:07:26.441105 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  UnionBlockInputStream: Waiting for threads to finish
[bigdata-ck-test001] 2020.08.09 22:07:26.441143 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  UnionBlockInputStream: Waited for threads to finish
[bigdata-ck-test001] 2020.08.09 22:07:26.441161 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  UnionBlockInputStream: Waiting for threads to finish
[bigdata-ck-test001] 2020.08.09 22:07:26.441176 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  UnionBlockInputStream: Waited for threads to finish
[bigdata-ck-test001] 2020.08.09 22:07:26.441308 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  executeQuery: Read 6474417 rows, 160.76 MiB in 0.535 sec., 12104900 rows/sec., 300.56 MiB/sec.
[bigdata-ck-test001] 2020.08.09 22:07:26.441329 {c9dd6fb6-6173-455c-a7c2-5839169a7003} [ 164 ]  MemoryTracker: Peak memory usage (for query): 410.84 MiB.

重复的东西就不再分析了,重点观察ods.ms_order_done (SelectExecutor): Key condition: unknownods.ms_order_done (SelectExecutor): MinMax index condition: unknown及以下的几行日志,说明按照上面SQL语句的写法,JOIN的谓词没有被下推,右表的日期分区和WHERE过滤条件都是全表扫描之后才执行的,非常浪费。

接下来我们手动将谓词以子查询的形式写在里面,再观察一次查询计划。

~ clickhouse-client -h localhost --port 9070 --password ck2020 --send_logs_level=trace <<< "
SELECT a.column_type,a.category_id,a.merchandise_id,b.price
FROM (
  SELECT column_type,category_id,merchandise_id
  FROM ods.analytics_access_log
  WHERE ts_date = '2020-08-09' AND site_id = 10037
) a
LEFT JOIN (
  SELECT merchandise_id,price
  FROM ods.ms_order_done
  WHERE ts_date = '2020-08-09' AND rebate_amount > 0
) b
ON a.merchandise_id = b.merchandise_id
" > /dev/null

[bigdata-ck-test001] 2020.08.09 22:15:49.269429 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  executeQuery: (from 127.0.0.1:55686) SELECT a.column_type, a.category_id, a.merchandise_id, b.price FROM (SELECT column_type, category_id, merchandise_id FROM ods.analytics_access_log WHERE (ts_date = '2020-08-09') AND (site_id = 10037)) AS a LEFT JOIN (SELECT merchandise_id, price FROM ods.ms_order_done WHERE (ts_date = '2020-08-09') AND (rebate_amount > 0)) AS b ON a.merchandise_id = b.merchandise_id
## 略去......
[bigdata-ck-test001] 2020.08.09 22:15:49.272530 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  executeQuery: Query pipeline:
Expression
 CreatingSets
  Lazy
  Union
   Expression × 8
    Expression
     Expression
      Expression
       MergeTreeThread

[bigdata-ck-test001] 2020.08.09 22:15:49.272668 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  CreatingSetsBlockInputStream: Creating join.
[bigdata-ck-test001] 2020.08.09 22:15:49.272942 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  ods.ms_order_done (SelectExecutor): Key condition: unknown, unknown, and, unknown, and
[bigdata-ck-test001] 2020.08.09 22:15:49.272965 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  ods.ms_order_done (SelectExecutor): MinMax index condition: (column 0 in [18483, 18483]), unknown, and, (column 0 in [18483, 18483]), and
[bigdata-ck-test001] 2020.08.09 22:15:49.272990 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  ods.ms_order_done (SelectExecutor): Selected 5 parts by date, 5 parts by key, 23 marks to read from 5 ranges
[bigdata-ck-test001] 2020.08.09 22:15:49.273175 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  ods.ms_order_done (SelectExecutor): Reading approx. 163922 rows with 5 streams
[bigdata-ck-test001] 2020.08.09 22:15:49.273335 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  InterpreterSelectQuery: FetchColumns -> Complete
[bigdata-ck-test001] 2020.08.09 22:15:49.280033 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 51 ]  CreatingSetsBlockInputStream: Created. Join with 10788 entries from 143210 rows. In 0.007 sec.
[bigdata-ck-test001] 2020.08.09 22:15:49.305967 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  UnionBlockInputStream: Waiting for threads to finish
[bigdata-ck-test001] 2020.08.09 22:15:49.305990 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  UnionBlockInputStream: Waited for threads to finish
[bigdata-ck-test001] 2020.08.09 22:15:49.306014 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  UnionBlockInputStream: Waiting for threads to finish
[bigdata-ck-test001] 2020.08.09 22:15:49.306023 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  UnionBlockInputStream: Waited for threads to finish
[bigdata-ck-test001] 2020.08.09 22:15:49.306143 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  executeQuery: Read 510445 rows, 12.52 MiB in 0.037 sec., 13918334 rows/sec., 341.43 MiB/sec.
[bigdata-ck-test001] 2020.08.09 22:15:49.306167 {3ebb7bc0-0a5b-49b7-87a2-7972a566a22e} [ 127 ]  MemoryTracker: Peak memory usage (for query): 58.70 MiB.

可见,通过改变SQL的写法,右表的数据在关联之前就得到了分区过滤与谓词过滤,处理数据量和内存占用都大大减少了。由此也可看出,ClickHouse的SQL优化器是比较弱的,需要我们多加注意。

最后,由于必须要实际执行查询才能输出日志,所以当预期的查询结果集比较大时,需要加上LIMIT子句来限制返回的数据量。

明天早起搬砖,民那晚安晚安。

你可能感兴趣的:(ClickHouse,mysql,java,数据库,sql,索引)