ClickHouse 中有没有一些能够 “安家立命” 的运维 SQL 语句。我想对于这个问题很多朋友都会有兴趣,所以就在这里做一个简单的分享。
在 ClickHouse 默认的 system 数据库下(databse),拥有众多的系统表。我们对 ClickHouse 运行状态的各种信息,就主要来自于这些系统表。
接下来就列举一些常用的运维 SQL 语句。
当前连接数
众所周知,CH 对外暴露的原生接口分为 TCP 和 HTTP 两类,通过 system.metrics 即可查询当前的 TCP、HTTP 与内部副本的连接数。
ch7.nauu.com :) SELECT * FROM system.metrics WHERE metric LIKE '%Connection';
SELECT *
FROM system.metrics
WHERE metric LIKE '%Connection'
┌─metric────────────────┬─value─┬─description─────────────────────────────────────────────────────────┐
│ TCPConnection │ 2 │ Number of connections to TCP server (clients with native interface) │
│ HTTPConnection │ 1 │ Number of connections to HTTP server │
│ InterserverConnection │ 0 │ Number of connections from other replicas to fetch parts │
└───────────────────────┴───────┴─────────────────────────────────────────────────────────────────────┘
当前正在执行的查询
通过 system.processes 可以查询目前正在执行的查询,例如:
ch7.nauu.com :) SELECT query_id, user, address, query FROM system.processes ORDER BY query_id;
SELECT
query_id,
user,
address,
query
FROM system.processes
ORDER BY query_id ASC
┌─query_id─────────────────────────────┬─user────┬─address────────────┬─query─────────────────────────────────────────────────────────────────────────────┐
│ 203f1d0e-944e-472d-8d8f-bae548ff9899 │ default │ ::ffff:10.37.129.4 │ SELECT query_id, user, address, query FROM system.processes ORDER BY query_id ASC │
│ fb7fba85-b2a0-4271-87ff-22da97ae511b │ default │ ::ffff:10.37.129.4 │ INSERT INTO hits_v1 FORMAT TSV │
└──────────────────────────────────────┴─────────┴────────────────────┴───────────────────────────────────────────────────────────────────────────────────┘
可以看到,CH 目前正在执行两条语句,其中第 2 条是 INSERT 查询正在写入数据。
终止查询
通过 KILL QUERY 语句,可以终止正在执行的查询:
KILL QUERY WHERE query_id = 'query_id'
例如,终止刚才的 INSERT 查询 :
ch7.nauu.com :) KILL QUERY WHERE query_id='ff695827-dbf5-45ad-9858-a853946ea140';
KILL QUERY WHERE query_id = 'ff695827-dbf5-45ad-9858-a853946ea140' ASYNC
Ok.
0 rows in set. Elapsed: 0.024 sec.
众所周知,除了常规的 SELECT 和 INSERT 之外,在 ClickHouse 中还存在一类被称作 Mutation 的操作,也就是 ALTER DELETE 和 ALTER UPDATE。
对于 Mutation 操作, ClickHouse 专门提供了 system.mutations 用于查询,例如:
ch7.nauu.com :) SELECT database, table, mutation_id, command, create_time, is_done FROM system.mutations;
SELECT
database,
table,
mutation_id,
command,
create_time,
is_done
FROM system.mutations
┌─database─┬─table──────┬─mutation_id────┬─command──────────────────┬─────────create_time─┬─is_done─┐
│ default │ testcol_v9 │ mutation_2.txt │ DELETE WHERE ID = 'A003' │ 2020-06-29 01:15:04 │ 1 │
└──────────┴────────────┴────────────────┴──────────────────────────┴─────────────────────┴─────────┘
1 rows in set. Elapsed: 0.002 sec.
同样的,可以使用 KILL MUTATION 终止正在执行的 Mutation 操作:
KILL MUTATION WHERE mutation_id = 'mutation_id';
存储空间统计
查询 CH 各个存储路径的空间:
ch5.nauu.com :) SELECT name,path,formatReadableSize(free_space) AS free,formatReadableSize(total_space) AS total,formatReadableSize(keep_free_space) AS reserved FROM system.disks
SELECT
name,
path,
formatReadableSize(free_space) AS free,
formatReadableSize(total_space) AS total,
formatReadableSize(keep_free_space) AS reserved
FROM system.disks
┌─name──────┬─path──────────────┬─free──────┬─total─────┬─reserved─┐
│ default │ /chbase/data/ │ 36.35 GiB │ 49.09 GiB │ 0.00 B │
│ disk_cold │ /chbase/cloddata/ │ 35.35 GiB │ 48.09 GiB │ 1.00 GiB │
│ disk_hot1 │ /chbase/data/ │ 36.35 GiB │ 49.09 GiB │ 0.00 B │
│ disk_hot2 │ /chbase/hotdata1/ │ 36.35 GiB │ 49.09 GiB │ 0.00 B │
└───────────┴───────────────────┴───────────┴───────────┴──────────┘
4 rows in set. Elapsed: 0.001 sec.
各数据库占用空间统计
ch7.nauu.com :) SELECT database, formatReadableSize(sum(bytes_on_disk)) on_disk FROM system.parts GROUP BY database;
SELECT
database,
formatReadableSize(sum(bytes_on_disk)) AS on_disk
FROM system.parts
GROUP BY database
┌─database─┬─on_disk──┐
│ system │ 1.59 MiB │
│ default │ 3.60 GiB │
└──────────┴──────────┘
个列字段占用空间统计
每个列字段的压缩大小、压缩比率以及该列的每行数据大小的占比
SELECT
database,
table,
column,
any(type),
sum(column_data_compressed_bytes) AS compressed,
sum(column_data_uncompressed_bytes) AS uncompressed,
round(uncompressed / compressed, 2) AS ratio,
compressed / sum(rows) AS bpr,
sum(rows)
FROM system.parts_columns
WHERE active AND database != 'system'
GROUP BY
database,
table,
column
ORDER BY
database ASC,
table ASC,
column ASC
┌─database─┬─table────────┬─column─────────────────────┬─any(type)──────────────────────────────┬─compressed─┬─uncompressed─┬──ratio─┬───────────────────bpr─┬─sum(rows)─┐
│ default │ hits_v1 │ AdvEngineID │ UInt8 │ 351534 │ 26621706 │ 75.73 │ 0.013204788603705563 │ 26621706 │
│ default │ hits_v1 │ Age │ UInt8 │ 7543552 │ 26621706 │ 3.53 │ 0.2833609536518809 │ 26621706 │
│ default │ hits_v1 │ BrowserCountry │ FixedString(2) │ 6549379 │ 53243412 │ 8.13 │ 0.24601650247358303 │ 26621706 │
│ default │ hits_v1 │ BrowserLanguage │ FixedString(2) │ 2819085 │ 53243412 │ 18.89 │ 0.10589422781545255 │ 26621706 │
│ default │ hits_v1 │ CLID │ UInt32 │ 2311006 │ 106486824 │ 46.08 │ 0.08680908729140048 │ 26621706 │
│ default │ hits_v1 │ ClientEventTime │ DateTime │ 98518704 │ 106486824 │ 1.08 │ 3.7006908573026838 │ 26621706 │
│ default │ hits_v1 │ ClientIP │ UInt32 │ 25120766 │ 106486824 │ 4.24 │ 0.9436196913901761 │ 26621706 │
│ default │ hits_v1 │ ClientIP6 │ FixedString(16) │ 25088558 │ 425947296 │ 16.98 │ 0.9424098515699934 │ 26621706 │
│ default │ hits_v1 │ ClientTimeZone │ Int16 │ 8487148 │ 53243412 │ 6.27 │ 0.3188055641512982 │ 26621706 │
│ default │ hits_v1 │ CodeVersion │ UInt32 │ 11976952 │ 106486824 │ 8.89 │ 0.4498942329240658 │ 26621706 │
│ default │ hits_v1 │ ConnectTiming │ Int32 │ 27937373 │ 106486824 │ 3.81 │ 1.0494208372671534 │ 26621706 │
│ default │ hits_v1 │ CookieEnable │ UInt8 │ 202718 │ 26621706 │ 131.32 │ 0.007614763681936838 │ 26621706 │
│ default │ hits_v1 │ CounterClass │ Int8 │ 425492 │ 26621706 │ 62.57 │ 0.015982897564866805 │ 26621706 │
...
慢查询
SELECT
user,
client_hostname AS host,
client_name AS client,
formatDateTime(query_start_time, '%T') AS started,
query_duration_ms / 1000 AS sec,
round(memory_usage / 1048576) AS MEM_MB,
result_rows AS RES_CNT,
result_bytes / 1048576 AS RES_MB,
read_rows AS R_CNT,
round(read_bytes / 1048576) AS R_MB,
written_rows AS W_CNT,
round(written_bytes / 1048576) AS W_MB,
query
FROM system.query_log
WHERE type = 2
ORDER BY query_duration_ms DESC
LIMIT 10
┌─user────┬─host─────────┬─client────────────┬─started──┬────sec─┬─MEM_MB─┬──RES_CNT─┬────────────────RES_MB─┬────R_CNT─┬─R_MB─┬───W_CNT─┬─W_MB─┬─query───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ default │ ch7.nauu.com │ ClickHouse client │ 01:05:03 │ 51.434 │ 1031 │ 8873898 │ 8706.51146697998 │ 0 │ 0 │ 8873898 │ 8707 │ INSERT INTO hits_v1 FORMAT TSV │
│ default │ ch7.nauu.com │ ClickHouse client │ 01:01:48 │ 43.511 │ 1031 │ 8873898 │ 8706.51146697998 │ 0 │ 0 │ 8873898 │ 8707 │ INSERT INTO hits_v1 FORMAT TSV │
│ default │ ch7.nauu.com │ ClickHouse client │ 17:12:04 │ 11.12 │ 1801 │ 18874398 │ 446.8216323852539 │ 6291466 │ 351 │ 0 │ 0 │ SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY id ORDER BY v ASC │
│ default │ ch7.nauu.com │ ClickHouse client │ 17:13:28 │ 3.992 │ 1549 │ 18874398 │ 446.8216323852539 │ 6291466 │ 351 │ 0 │ 0 │ SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY id │
│ default │ ch7.nauu.com │ ClickHouse client │ 17:13:12 │ 3.976 │ 1549 │ 18874398 │ 446.8216323852539 │ 6291466 │ 351 │ 0 │ 0 │ SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY id │
│ default │ ch7.nauu.com │ ClickHouse client │ 01:25:39 │ 3.962 │ 1549 │ 18874398 │ 446.8216323852539 │ 6291466 │ 351 │ 0 │ 0 │ SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY id │
│ default │ ch7.nauu.com │ ClickHouse client │ 04:32:29 │ 3.114 │ 1542 │ 10000000 │ 219.82192993164062 │ 10500000 │ 231 │ 0 │ 0 │ SELECT user_id, argMax(score, create_time) AS score, argMax(deleted, create_time) AS deleted, max(create_time) AS ctime FROM test_a GROUP BY user_id HAVING deleted = 0 │
│ default │ ch7.nauu.com │ ClickHouse client │ 02:59:56 │ 3.03 │ 1544 │ 10000000 │ 219.75380992889404 │ 10500000 │ 231 │ 0 │ 0 │ SELECT user_id, argMax(score, create_time) AS score, argMax(is_update, create_time) AS is_update, max(create_time) AS ctime FROM test_a GROUP BY user_id │
│ default │ ch7.nauu.com │ ClickHouse client │ 02:54:01 │ 3.019 │ 1543 │ 10000000 │ 219.3450927734375 │ 10500000 │ 230 │ 0 │ 0 │ SELECT user_id, argMax(score, create_time) AS score, argMax(delete, create_time) AS delete, max(create_time) AS ctime FROM test_a GROUP BY user_id │
│ default │ │ │ 03:03:12 │ 2.857 │ 1543 │ 10 │ 0.0002269744873046875 │ 10500000 │ 231 │ 0 │ 0 │ SELECT * FROM view_test_a limit 10 │
└─────────┴──────────────┴───────────────────┴──────────┴────────┴────────┴──────────┴───────────────────────┴──────────┴──────┴─────────┴──────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
10 rows in set. Elapsed: 0.017 sec. Processed 1.44 thousand rows, 200.81 KB (83.78 thousand rows/s., 11.68 MB/s.)
副本预警监控
通过下面的 SQL 语句对副本进行预警监控,其中各个预警的变量可以根据自身情况调整。
SELECT database, table, is_leader, total_replicas, active_replicas
FROM system.replicas
WHERE is_readonly
OR is_session_expired
OR future_parts > 30
OR parts_to_check > 20
OR queue_size > 30
OR inserts_in_queue > 20
OR log_max_index - log_pointer > 20
OR total_replicas < 2
OR active_replicas < total_replicas
┌─database─┬─table───────────────────────┬─is_leader─┬─total_replicas─┬─active_replicas─┐
│ default │ replicated_sales_12 │ 0 │ 0 │ 0 │
│ default │ test_fetch │ 0 │ 0 │ 0 │
│ default │ test_sharding_simple2_local │ 0 │ 0 │ 0 │
└──────────┴─────────────────────────────┴───────────┴────────────────┴─────────────────┘
好了,今天的分享就到这里。对于 CH 日常的运维 SQL 远不止这些,这里也只是抱砖引玉啦。
如果这篇文章对你有帮助,欢迎 订阅、转发、在看 三连击 :)