【ClickHouse】查看数据库容量和表大小的方法(system.parts各种操作方法)

        clickhouse有system.parts系统表记录表相关元数据,可以通过该表对clickhouse上所有表进行查询表大小、行数等操作。

1.查看数据库容量

select
    sum(rows) as row,--总行数
    formatReadableSize(sum(data_uncompressed_bytes)) as ysq,--原始大小
    formatReadableSize(sum(data_compressed_bytes)) as ysh,--压缩大小
    round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes) * 100, 0) ys_rate--压缩率
from system.parts

结果:

row ysq ysh ys_rate
6.72E+10 11.19 TiB 2.68 TiB 24

2.查看表的各个指标

select database,
       table,
       sum(bytes) as size,
       sum(rows) as rows,
       min(min_date) as min_date,
       max(max_date) as max_date,
       sum(bytes_on_disk) as bytes_on_disk,
       sum(data_uncompressed_bytes) as data_uncompressed_bytes,
       sum(data_compressed_bytes) as data_compressed_bytes,
       (data_compressed_bytes / data_uncompressed_bytes) * 100 as compress_rate,
       max_date - min_date as days,
       size / (max_date - min_date) as avgDaySize
  from system.parts
 where active
   and database = 'database'
   and table = 'tablename'
 group by database, table

结果为:这种结果显示的大小size是字节,我们如何转换为常见的MB和GB呢?

database table size rows min_date max_date bytes_on_disk data_uncompressed_bytes data_compressed_bytes compress_rate days avgDaySize
database tablename 8.49E+08 6105087 2020/5/13 2020/5/14 8.49E+08 4.89E+09 8.45E+08 17.27185 1 8.49E+08
select
    database,
    table,
    formatReadableSize(size) as size,
    formatReadableSize(bytes_on_disk) as bytes_on_disk,
    formatReadableSize(data_uncompressed_bytes) as data_uncompressed_bytes,
    formatReadableSize(data_compressed_bytes) as data_compressed_bytes,
    compress_rate,
    rows,
    days,
    formatReadableSize(avgDaySize) as avgDaySize
from
(
    select
        database,
        table,
        sum(bytes) as size,
        sum(rows) as rows,
        min(min_date) as min_date,
        max(max_date) as max_date,
        sum(bytes_on_disk) as bytes_on_disk,
        sum(data_uncompressed_bytes) as data_uncompressed_bytes,
        sum(data_compressed_bytes) as data_compressed_bytes,
        (data_compressed_bytes / data_uncompressed_bytes) * 100 as compress_rate,
        max_date - min_date as days,
        size / (max_date - min_date) as avgDaySize
    from system.parts
    where active 
     and database = 'database'
     and table = 'tablename'
    group by
        database,
        table
)

结果:这就转换为常见的单位了:

database table size bytes_on_disk data_uncompressed_bytes data_compressed_bytes compress_rate rows days avgDaySize
database tablename 266.34 MiB 266.34 MiB 1.32 GiB 265.23 MiB 19.66019 2065215 0 inf YiB

上面过程可以看到,最终都用表进行了聚合,为什么会这样呢?

以一个简单的例子来看,我们最常见的是查看表分区,下面来看下不进行聚合的结果:

select partition
  from system.parts
 where active
   and database = 'database'
   and table = 'tablename'

结果为:这是因为在CH中,和我们hive表不一样,hive表一个分区只会有一条记录,但CH不是,每个分区分为了不同的marks

【ClickHouse】查看数据库容量和表大小的方法(system.parts各种操作方法)_第1张图片

因此,我们要实现和hive一样查分区的功能时,要对表进行聚合查看。

3.跟踪分区

SELECT database,
       table,
       count() AS parts,
       uniq(partition) AS partitions,
       sum(marks) AS marks,
       sum(rows) AS rows,
       formatReadableSize(sum(data_compressed_bytes)) AS compressed,
       formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed,
       round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100.,2) AS percentage
  FROM system.parts
 WHERE active
   and database = 'database'
   and table = 'tablename'
 GROUP BY database, table

结果为:

database table parts partitions marks rows compressed uncompressed percentage
database tablename 4 1 261 2065215 265.23 MiB 1.32 GiB 19.66

 4.检查数据大小

SELECT table,
       formatReadableSize(sum(data_compressed_bytes)) AS tc,
       formatReadableSize(sum(data_uncompressed_bytes)) AS tu,
       round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100,2) AS ratio
  FROM system.columns
 WHERE database = 'database'
   and table = 'table'
 GROUP BY table
 ORDER BY sum(data_compressed_bytes) ASC

结果:

table tc tu ratio
table 265.23 MiB 1.32 GiB 19.66

5.查看表中列的数据大小

SELECT column,
       any(type),
       sum(column_data_compressed_bytes) AS compressed,
       sum(column_data_uncompressed_bytes) AS uncompressed,
       sum(rows)
  FROM system.parts_columns
 WHERE database = 'database'
   and table = 'table'
   AND active
 GROUP BY column
 ORDER BY column ASC

结果:

【ClickHouse】查看数据库容量和表大小的方法(system.parts各种操作方法)_第2张图片

以上就是对表和库的一些基本操作了。 

你可能感兴趣的:(数据库)