ClickHouse 运维DBA实用场景

ClickHouse 运维DBA实用场景

  • ClickHouse 运维DBA实用场景
    • 当前连接数
    • 当前正在执行的查询
    • 终止查询
    • 存储空间统计
      • 各数据库占用空间统计
      • 个列字段占用空间统计
    • 慢查询
    • 副本预警监控

ClickHouse 运维DBA实用场景

在 ClickHouse 中有没有一些能够 “安家立命” 的运维 SQL 语句,为此做一次简单的分享。
在 ClickHouse 默认的 system 数据库下(databse),拥有众多的系统表。我们对 ClickHouse 运行状态的各种信息,就主要来自于这些系统表。

当前连接数

CK 对外暴露的原生接口分为 TCP 和 HTTP 两类,通过 system.metrics 即可查询当前的 TCP、HTTP 与内部副本的连接数。

SELECT *
FROM system.metrics
WHERE metric LIKE '%Connection';

┌─metric────────────────┬─value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ TCPConnection	        │     9 │ Number of connections to TCP server (clients with native interface), also included server-server distributed query connections │
│ MySQLConnection	    │     0 │ Number of client connections using MySQL protocol                                                                              │
│ HTTPConnection	    │     1 │ Number of connections to HTTP server                                                                                           │
│ InterserverConnection	│     0 │ Number of connections from other replicas to fetch parts                                                                       │
│ PostgreSQLConnection	│     0 │ Number of client connections using PostgreSQL protocol                                                                         │
└───────────────────────┴───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

当前正在执行的查询

通过 system.processes 可以查询目前正在执行的查询

SELECT 
    query_id, 
    user, 
    address, 
    query
FROM system.processes
ORDER BY query_id ASC;

┌─query_id─────────────────────────────┬─user────┬─address────────────┬─query─────────────────────────────────────────────────────────────────────────────┐
│ 203f1d0e-944e-472d-8d8f-bae548ff9899 │ default │ ::ffff:10.37.119.4SELECT query_id, user, address, query FROM system.processes ORDER BY query_id ASC │
│ fb7fba85-b2a0-4271-87ff-22da97ae511b │ default │ ::ffff:10.37.119.4INSERT INTO hits_v1 FORMAT TSV                                                    │
└──────────────────────────────────────┴─────────┴────────────────────┴───────────────────────────────────────────────────────────────────────────────────┘

可以看到,CH 目前正在执行两条语句,其中第 2 条是 INSERT 查询正在写入数据。

终止查询

通过 KILL QUERY 语句,可以终止正在执行的查询:

KILL QUERY WHERE query_id = 'query_id'

例如,终止刚才的 INSERT 查询:

KILL QUERY WHERE query_id='ff695827-dbf5-45ad-9858-a853946ea140';

KILL QUERY WHERE query_id = 'ff695827-dbf5-45ad-9858-a853946ea140' ASYNC

Ok.

0 rows in set. Elapsed: 0.024 sec.

众所周知,除了常规的 SELECTINSERT 之外,在 ClickHouse 中还存在一类被称作 Mutation 的操作,也就是 ALTER DELETEALTER UPDATE

对于 Mutation 操作, ClickHouse 专门提供了 system.mutations 用于查询,例如:

SELECT 
    database, 
    table, 
    mutation_id, 
    command, 
    create_time, 
    is_done
FROM system.mutations;

┌─database─┬─table──────┬─mutation_id────┬─command──────────────────┬─────────create_time─┬─is_done─┐
│ default  │ testcol_v9 │ mutation_2.txt │ DELETE WHERE ID = 'A003'2020-06-29 01:15:041 │
└──────────┴────────────┴────────────────┴──────────────────────────┴─────────────────────┴─────────┘

1 rows in set. Elapsed: 0.002 sec.

同样的,可以使用 KILL MUTATION 终止正在执行的 Mutation 操作:

KILL MUTATION WHERE mutation_id = 'mutation_id';

存储空间统计

查询 CH 各个存储路径的空间:

SELECT 
    name, 
    path, 
    formatReadableSize(free_space) AS free, 
    formatReadableSize(total_space) AS total, 
    formatReadableSize(keep_free_space) AS reserved
FROM system.disks;

┌─name──────┬─path──────────────┬─free──────┬─total─────┬─reserved─┐
│ default/chbase/data/36.35 GiB │ 49.09 GiB │ 0.00 B   │
│ disk_cold │ /chbase/cloddata/35.35 GiB │ 48.09 GiB │ 1.00 GiB │
│ disk_hot1 │ /chbase/data/36.35 GiB │ 49.09 GiB │ 0.00 B   │
└───────────┴───────────────────┴───────────┴───────────┴──────────┘

4 rows in set. Elapsed: 0.001 sec.

各数据库占用空间统计

SELECT 
    database, 
    formatReadableSize(sum(bytes_on_disk)) AS on_disk
FROM system.parts
GROUP BY database;

┌─database─┬─on_disk──┐
│ system   │ 1.59 MiB │
│ default3.60 GiB │
└──────────┴──────────┘

个列字段占用空间统计

每个列字段的压缩大小、压缩比率以及该列的每行数据大小的占比

SELECT 
    database, 
    table, 
    column, 
    any(type), 
    sum(column_data_compressed_bytes) AS compressed, 
    sum(column_data_uncompressed_bytes) AS uncompressed, 
    round(uncompressed / compressed, 2) AS ratio, 
    compressed / sum(rows) AS bpr, 
    sum(rows)
FROM system.parts_columns
WHERE active AND database != 'system'
GROUP BY 
    database, 
    table, 
    column
ORDER BY 
    database ASC, 
    table ASC, 
    column ASC;

┌─database─┬─table────────┬─column─────────────────────┬─any(type)──────────────────────────────┬─compressed─┬─uncompressed─┬──ratio─┬───────────────────bpr─┬─sum(rows)─┐
│ default  │ hits_v1      │ AdvEngineID                │ UInt8                                  │     3515342662170675.730.01320478860370556326621706 │
│ default  │ hits_v1      │ Age                        │ UInt8                                  │    7543552266217063.530.283360953651880926621706 │
│ default  │ hits_v1      │ BrowserCountry             │ FixedString(2)6549379532434128.130.2460165024735830326621706 │
│ default  │ hits_v1      │ BrowserLanguage            │ FixedString(2)28190855324341218.890.1058942278154525526621706 │
│ default  │ hits_v1      │ CLID                       │ UInt32                                 │    231100610648682446.080.0868090872914004826621706 │
│ default  │ hits_v1      │ ClientEventTime            │ DateTime985187041064868241.083.700690857302683826621706 │
│ default  │ hits_v1      │ ClientIP                   │ UInt32                                 │   251207661064868244.240.943619691390176126621706 │
│ default  │ hits_v1      │ ClientIP6                  │ FixedString(16)2508855842594729616.980.942409851569993426621706 │
│ default  │ hits_v1      │ ClientTimeZone             │ Int16                                  │    8487148532434126.270.318805564151298226621706 │
│ default  │ hits_v1      │ CodeVersion                │ UInt32                                 │   119769521064868248.890.449894232924065826621706 │
│ default  │ hits_v1      │ ConnectTiming              │ Int32                                  │   279373731064868243.811.049420837267153426621706 │
│ default  │ hits_v1      │ CookieEnable               │ UInt8                                  │     20271826621706131.320.00761476368193683826621706 │
│ default  │ hits_v1      │ CounterClass               │ Int8                                   │     4254922662170662.570.01598289756486680526621706...

慢查询

SELECT 
    user, 
    client_hostname AS host, 
    client_name AS client, 
    formatDateTime(query_start_time, '%T') AS started, 
    query_duration_ms / 1000 AS sec, 
    round(memory_usage / 1048576) AS MEM_MB, 
    result_rows AS RES_CNT, 
    result_bytes / 1048576 AS RES_MB, 
    read_rows AS R_CNT, 
    round(read_bytes / 1048576) AS R_MB, 
    written_rows AS W_CNT, 
    round(written_bytes / 1048576) AS W_MB, 
    query
FROM system.query_log
WHERE type = 2
ORDER BY query_duration_ms DESC
LIMIT 10;

┌─user────┬─host─────────┬─client────────────┬─started──┬────sec─┬─MEM_MB─┬──RES_CNT─┬────────────────RES_MB─┬────R_CNT─┬─R_MB─┬───W_CNT─┬─W_MB─┬─query───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ default │ ch7.nauu.com │ ClickHouse client │ 01:05:0351.434103188738988706.511466979980088738988707INSERT INTO hits_v1 FORMAT TSV                                                                                                                                          │
│ default │ ch7.nauu.com │ ClickHouse client │ 01:01:4843.511103188738988706.511466979980088738988707INSERT INTO hits_v1 FORMAT TSV                                                                                                                                          │
│ default │ ch7.nauu.com │ ClickHouse client │ 17:12:0411.12180118874398446.8216323852539629146635100SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY id ORDER BY v ASC                                              │
│ default │ ch7.nauu.com │ ClickHouse client │ 17:13:283.992154918874398446.8216323852539629146635100SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY id                                                             │
│ default │ ch7.nauu.com │ ClickHouse client │ 17:13:123.976154918874398446.8216323852539629146635100SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY id                                                             │
│ default │ ch7.nauu.com │ ClickHouse client │ 01:25:393.962154918874398446.8216323852539629146635100SELECT id, arrayJoin(arrayConcat(groupArray(a), groupArray(b), groupArray(c))) AS v FROM test_y GROUP BY id                                                             │
│ default │ ch7.nauu.com │ ClickHouse client │ 04:32:293.114154210000000219.821929931640621050000023100SELECT user_id, argMax(score, create_time) AS score, argMax(deleted, create_time) AS deleted, max(create_time) AS ctime FROM test_a GROUP BY user_id HAVING deleted = 0 │
│ default │ ch7.nauu.com │ ClickHouse client │ 02:59:563.03154410000000219.753809928894041050000023100SELECT user_id, argMax(score, create_time) AS score, argMax(is_update, create_time) AS is_update, max(create_time) AS ctime FROM test_a GROUP BY user_id                │
│ default │ ch7.nauu.com │ ClickHouse client │ 02:54:013.019154310000000219.34509277343751050000023000SELECT user_id, argMax(score, create_time) AS score, argMax(delete, create_time) AS delete, max(create_time) AS ctime FROM test_a GROUP BY user_id                      │
│ default │              │                   │ 03:03:122.8571543100.00022697448730468751050000023100SELECT * FROM view_test_a limit 10                                                                                                                                      │
└─────────┴──────────────┴───────────────────┴──────────┴────────┴────────┴──────────┴───────────────────────┴──────────┴──────┴─────────┴──────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

10 rows in set. Elapsed: 0.017 sec. Processed 1.44 thousand rows, 200.81 KB (83.78 thousand rows/s., 11.68 MB/s.) 

副本预警监控

通过下面的 SQL 语句对副本进行预警监控,其中各个预警的变量可以根据自身情况调整。

SELECT database, table, is_leader, total_replicas, active_replicas 
  FROM system.replicas 
 WHERE is_readonly 
    OR is_session_expired 
    OR future_parts > 30 
    OR parts_to_check > 20 
    OR queue_size > 30 
    OR inserts_in_queue > 20 
    OR log_max_index - log_pointer > 20 
    OR total_replicas < 2 
    OR active_replicas < total_replicas

┌─database─┬─table───────────────────────┬─is_leader─┬─total_replicas─┬─active_replicas─┐
│ default  │ replicated_sales_12         │         000 │
│ default  │ test_fetch                  │         000 │
│ default  │ test_sharding_simple2_local │         000 │
└──────────┴─────────────────────────────┴───────────┴────────────────┴─────────────────┘

你可能感兴趣的:(ClickHouse,运维,dba,数据库)