Clickhouse的多磁盘多路径存储策略

目录

  • 1. 背景
  • 2. 实战练习
    • 2.1 修改config.xml
    • 2.2 重启clickhouse server
    • 2.3 验证配置成功
    • 2.4 jbod策略使用
    • 2.4 hot/cold策略使用
    • 2.5 分区part的移动

1. 背景

我们前面安装Clickhouse集群的时候,配置的数据path都是单路径,这里我们学习配置多磁盘上的存储路径,可以减轻磁盘的压力,增加写入和读取的性能

[root@clickhouse1 ~]# 
[root@clickhouse1 ~]# lsblk
NAME                   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                      8:0    0  100G  0 disk 
├─sda1                   8:1    0    1G  0 part /boot
└─sda2                   8:2    0   99G  0 part 
  ├─centos_centos-root 253:0    0   50G  0 lvm  /
  ├─centos_centos-swap 253:1    0    2G  0 lvm  [SWAP]
  └─centos_centos-home 253:2    0   47G  0 lvm  /home
sr0                     11:0    1  973M  0 rom  
[root@clickhouse1 ~]#

从上面我们可以看到,我们的测试环境只挂载了一块磁盘,所以我们这里采用单磁盘多路径的方式来模拟

2. 实战练习

2.1 修改config.xml

在四台服务器上添加以下配置

    
        
             
                /root/clickhouse_storage/hot1/
             
            
                /root/clickhouse_storage/hot2/
            
            
                /root/clickhouse_storage/cold/ 
                1073741824
            
        



        
             
                
                     
                        disk_hot1
                        disk_hot2
                    
                
            

            
                
                    
                        disk_hot1 
                        disk_hot2 
                        1048576
                    
                    
                        disk_cold
                    
                
                0.2
            
        
    

说明:

  • keep_free_space_bytes:选填项,表示不被使用的磁盘空间大小
  • jbod策略只需要配置一个磁盘组,part(如202107_0_0_0)储存轮询每个disk;适用于挂载了多块磁盘,但未配置RAID
  • hot/cold策略配置hot和cold两个磁盘组, part未超过(max_data_part_size_bytes[选填项] * move_factor[选填项, 默认0.1])则储存在hot磁盘组,超过则储存在cold磁盘组;适用于挂载了SSD和HDD磁盘

2.2 重启clickhouse server

在4台服务器新建path

[root@clickhouse1 ~]# mkdir -p /root/clickhouse_storage/hot1
[root@clickhouse1 ~]# mkdir /root/clickhouse_storage/hot2
[root@clickhouse1 ~]# mkdir /root/clickhouse_storage/cold

重启4台clickhouse server

[root@clickhouse1 ~]#
[root@clickhouse1 ~]# clickhouse stop
Found pid = 2660 in the list of running processes.
The process with pid = 2660 is running.
Sent terminate signal to process with pid 2660.
Waiting for server to stop
Found pid = 2660 in the list of running processes.
The process with pid = 2660 is running.
Waiting for server to stop
Found pid = 2660 in the list of running processes.
The process with pid = 2660 is running.
Waiting for server to stop
Found pid = 2660 in the list of running processes.
The process with pid = 2660 is running.
Waiting for server to stop
Now there is no clickhouse-server process.
Server stopped
[root@clickhouse1 ~]#
[root@clickhouse1 ~]# clickhouse-server --config=/etc/clickhouse-server/config.xml --daemon start
[root@clickhouse1 ~]#

2.3 验证配置成功

clickhouse1 :)
clickhouse1 :) select name, path, formatReadableSize(free_space) as free, formatReadableSize(total_space) as total, formatReadableSize(keep_free_space) as reserved from system.disks;

SELECT
    name,
    path,
    formatReadableSize(free_space) AS free,
    formatReadableSize(total_space) AS total,
    formatReadableSize(keep_free_space) AS reserved
FROM system.disks

Query id: 3bcffa44-6427-450e-8b0a-de067b6d7706

┌─name──────┬─path───────────────────────────┬─free──────┬─total─────┬─reserved─┐
│ default   │ /root/clickhouse/              │ 43.59 GiB │ 49.98 GiB │ 0.00 B   │
│ disk_cold │ /root/clickhouse_storage/cold/ │ 42.59 GiB │ 48.98 GiB │ 1.00 GiB │
│ disk_hot1 │ /root/clickhouse_storage/hot1/ │ 43.59 GiB │ 49.98 GiB │ 0.00 B   │
│ disk_hot2 │ /root/clickhouse_storage/hot2/ │ 43.59 GiB │ 49.98 GiB │ 0.00 B   │
└───────────┴────────────────────────────────┴───────────┴───────────┴──────────┘

4 rows in set. Elapsed: 0.004 sec. 

clickhouse1 :)
clickhouse1 :) select policy_name, volume_name, volume_priority, disks, formatReadableSize(max_data_part_size) max_data_part_size, move_factor from system.storage_policies;

SELECT
    policy_name,
    volume_name,
    volume_priority,
    disks,
    formatReadableSize(max_data_part_size) AS max_data_part_size,
    move_factor
FROM system.storage_policies

Query id: aa9e26e7-0580-44ae-bfb3-d9855414c44e

┌─policy_name─────┬─volume_name─┬─volume_priority─┬─disks─────────────────────┬─max_data_part_size─┬─move_factor─┐
│ default         │ default     │               1 │ ['default']               │ 0.00 B             │           0 │
│ hot_cold_police │ hot         │               1 │ ['disk_hot1','disk_hot2'] │ 1.00 MiB           │         0.2 │
│ hot_cold_police │ cold        │               2 │ ['disk_cold']             │ 0.00 B             │         0.2 │
│ jbod_police     │ jbod        │               1 │ ['disk_hot1','disk_hot2'] │ 0.00 B             │           0 │
└─────────────────┴─────────────┴─────────────────┴───────────────────────────┴────────────────────┴─────────────┘

4 rows in set. Elapsed: 0.014 sec. 

clickhouse1 :)

2.4 jbod策略使用

创建表

clickhouse1 :)
clickhouse1 :) create table jbod_table_local on cluster sharding_ha(
:-] id UInt64
:-] ) engine = ReplicatedMergeTree('/clickhouse/tables/jbod_table/{shard}', '{replica}')
:-] order by id
:-] settings storage_policy = 'jbod_police';

CREATE TABLE jbod_table_local ON CLUSTER sharding_ha
(
    `id` UInt64
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/jbod_table/{shard}', '{replica}')
ORDER BY id
SETTINGS storage_policy = 'jbod_police'

Query id: 9ad62172-2dee-49c3-ba46-bff047c9635e

┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse2 │ 9000 │      0 │       │                   3 │                3 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse3 │ 9000 │      0 │       │                   2 │                1 │
│ clickhouse1 │ 9000 │      0 │       │                   1 │                1 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse4 │ 9000 │      0 │       │                   0 │                0 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.451 sec. 

clickhouse1 :)
clickhouse1 :) create table jbod_table_all on cluster sharding_ha(
:-] id UInt64
:-] ) engine = Distributed(sharding_ha, default, jbod_table_local, rand());

CREATE TABLE jbod_table_all ON CLUSTER sharding_ha
(
    `id` UInt64
)
ENGINE = Distributed(sharding_ha, default, jbod_table_local, rand())

Query id: 46efcdf3-466c-4c6e-a694-cda568ca68f1

┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse2 │ 9000 │      0 │       │                   3 │                3 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse3 │ 9000 │      0 │       │                   2 │                0 │
│ clickhouse1 │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse4 │ 9000 │      0 │       │                   0 │                0 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.081 sec. 

clickhouse1 :)

第一次插入数据

clickhouse1 :)
clickhouse1 :) insert into jbod_table_all select rand() from numbers(20);

INSERT INTO jbod_table_all SELECT rand()
FROM numbers(20)

Query id: 3f933193-0edd-4d64-8129-ed9a68b016db

Ok.

0 rows in set. Elapsed: 0.023 sec. 

clickhouse1 :)
clickhouse1 :) select name, disk_name, active from system.parts where table = 'jbod_table_local';

SELECT
    name,
    disk_name
FROM system.parts
WHERE table = 'jbod_table_local'

Query id: d0a90599-1c40-4055-b088-55b06d5b1676

┌─name──────┬─disk_name─┬─active─┐
│ all_0_0_0 │ disk_hot1 │      1 │
└───────────┴───────────┴────────┘

1 rows in set. Elapsed: 0.004 sec. 

clickhouse1 :)

第二次插入数据

clickhouse1 :)
clickhouse1 :) insert into jbod_table_all select rand() from numbers(20);

INSERT INTO jbod_table_all SELECT rand()
FROM numbers(20)

Query id: 55875762-df39-4e55-b2f3-adf4550e2741

Ok.

0 rows in set. Elapsed: 0.027 sec. 

clickhouse1 :)
clickhouse1 :) select name, disk_name, active from system.parts where table = 'jbod_table_local';

SELECT
    name,
    disk_name
FROM system.parts
WHERE table = 'jbod_table_local'

Query id: dd2b6285-b082-4984-8903-e074d002794a

┌─name──────┬─disk_name─┬─active─┐
│ all_0_0_0 │ disk_hot1 │      1 │
│ all_1_1_0 │ disk_hot2 │      1 │
└───────────┴───────────┴────────┘

2 rows in set. Elapsed: 0.005 sec. 

clickhouse1 :) 

强制分区合并

clickhouse1 :)
clickhouse1 :) optimize table jbod_table_local on cluster sharding_ha final;

OPTIMIZE TABLE jbod_table_local ON CLUSTER sharding_ha FINAL

Query id: a58f3b6e-6d46-4a99-bfe4-bf5405300073

┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse2 │ 9000 │      0 │       │                   3 │                1 │
│ clickhouse3 │ 9000 │      0 │       │                   2 │                1 │
│ clickhouse4 │ 9000 │      0 │       │                   1 │                1 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse1 │ 9000 │      0 │       │                   0 │                0 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.203 sec. 

clickhouse1 :)
clickhouse1 :) select name, disk_name, active from system.parts where table = 'jbod_table_local';

SELECT
    name,
    disk_name,
    active
FROM system.parts
WHERE table = 'jbod_table_local'

Query id: ee17f981-d309-4f44-b7e2-b9220cda45dc

┌─name──────┬─disk_name─┬─active─┐
│ all_0_0_0 │ disk_hot1 │      0 │
│ all_0_1_1 │ disk_hot1 │      1 │
│ all_1_1_0 │ disk_hot2 │      0 │
└───────────┴───────────┴────────┘

3 rows in set. Elapsed: 0.005 sec. 

clickhouse1 :) 

2.4 hot/cold策略使用

创建表

clickhouse1 :) 
clickhouse1 :) create table hot_cold_table_local on cluster sharding_ha(
:-] id UInt64
:-] ) engine = ReplicatedMergeTree('/clickhouse/tables/hot_cold_table/{shard}', '{replica}')
:-] order by id
:-] settings storage_policy = 'hot_cold_police';

CREATE TABLE hot_cold_table_local ON CLUSTER sharding_ha
(
    `id` UInt64
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/hot_cold_table/{shard}', '{replica}')
ORDER BY id
SETTINGS storage_policy = 'hot_cold_police'

Query id: 8331bece-656a-4dba-b73a-834103d507c9

┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse2 │ 9000 │      0 │       │                   3 │                2 │
│ clickhouse4 │ 9000 │      0 │       │                   2 │                2 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse3 │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse1 │ 9000 │      0 │       │                   0 │                0 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.233 sec. 

clickhouse1 :) 
clickhouse1 :) create table hot_cold_table_all on cluster sharding_ha(
:-] id UInt64
:-] ) engine = Distributed(sharding_ha, default, hot_cold_table_local, rand());

CREATE TABLE hot_cold_table_all ON CLUSTER sharding_ha
(
    `id` UInt64
)
ENGINE = Distributed(sharding_ha, default, hot_cold_table_local, rand())

Query id: 03bf7af8-261e-4dad-b0c3-b267dc8738b2

┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse3 │ 9000 │      0 │       │                   3 │                2 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse2 │ 9000 │      0 │       │                   2 │                0 │
│ clickhouse1 │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse4 │ 9000 │      0 │       │                   0 │                0 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.090 sec. 

clickhouse1 :)

第一次插入数据

clickhouse1 :)
clickhouse1 :) insert into hot_cold_table_all select rand() from numbers(200000);

INSERT INTO hot_cold_table_all SELECT rand()
FROM numbers(200000)

Query id: 9bafe2c8-f937-46d4-a29d-86ff7f35f9e6

Ok.

0 rows in set. Elapsed: 0.073 sec. Processed 200.00 thousand rows, 1.60 MB (2.76 million rows/s., 22.04 MB/s.)

clickhouse1 :)
clickhouse1 :) select name, disk_name, formatReadableSize(bytes_on_disk) size, active from system.parts where table = 'hot_cold_table_local';

SELECT
    name,
    disk_name,
    active
FROM system.parts
WHERE table = 'hot_cold_table_local'

Query id: f7924c10-31fe-4d18-9736-4e095ae68c32

┌─name──────┬─disk_name─┬─size───────┬─active─┐
│ all_0_0_0 │ disk_hot1 │ 540.15 KiB │      1 │
└───────────┴───────────┴────────────┴────────┘

1 rows in set. Elapsed: 0.002 sec. 

clickhouse1 :)

第二次插入数据

clickhouse1 :)
clickhouse1 :) insert into hot_cold_table_all select rand() from numbers(200000);

INSERT INTO hot_cold_table_all SELECT rand()
FROM numbers(200000)

Query id: 84846376-3d19-448e-8f8a-242f2e2b999d

Ok.

0 rows in set. Elapsed: 0.057 sec. Processed 200.00 thousand rows, 1.60 MB (3.48 million rows/s., 27.84 MB/s.)

clickhouse1 :)
clickhouse1 :) select name, disk_name, formatReadableSize(bytes_on_disk) size, active from system.parts where table = 'hot_cold_table_local';

SELECT
    name,
    disk_name,
    active
FROM system.parts
WHERE table = 'hot_cold_table_local'

Query id: 6c8081f1-5fd0-4ba3-a997-6f19db554e45

┌─name──────┬─disk_name─┬─size───────┬─active─┐
│ all_0_0_0 │ disk_hot1 │ 540.15 KiB │      1 │
│ all_1_1_0 │ disk_hot2 │ 539.10 KiB │      1 │
└───────────┴───────────┴────────────┴────────┘

2 rows in set. Elapsed: 0.003 sec. 

clickhouse1 :)

强制分区合并

clickhouse1 :)
clickhouse1 :) optimize table hot_cold_table_local on cluster sharding_ha final;

OPTIMIZE TABLE hot_cold_table_local ON CLUSTER sharding_ha FINAL

Query id: ec7435f0-7ef9-4cf0-b944-9d0b5f4973d8

┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse2 │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse3 │ 9000 │      0 │       │                   2 │                0 │
│ clickhouse1 │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse4 │ 9000 │      0 │       │                   0 │                0 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.294 sec. 

clickhouse1 :)
clickhouse1 :) select name, disk_name, formatReadableSize(bytes_on_disk) size, active from system.parts where table = 'hot_cold_table_local';

SELECT
    name,
    disk_name,
    formatReadableSize(bytes_on_disk) AS size,
    active
FROM system.parts
WHERE table = 'hot_cold_table_local'

Query id: 33d7721b-d953-4325-9742-400dcf5a7c95

┌─name──────┬─disk_name─┬─size───────┬─active─┐
│ all_0_0_0 │ disk_hot1 │ 540.15 KiB │      0 │
│ all_0_1_1 │ disk_cold │ 1.01 MiB   │      1 │
│ all_1_1_0 │ disk_hot2 │ 539.10 KiB │      0 │
└───────────┴───────────┴────────────┴────────┘

3 rows in set. Elapsed: 0.013 sec. 

clickhouse1 :) 

2.5 分区part的移动

clickhouse1 :) 
clickhouse1 :) alter table hot_cold_table_local on cluster sharding_ha move part 'all_0_1_1' to disk 'disk_hot2';

ALTER TABLE hot_cold_table_local ON CLUSTER sharding_ha
    MOVE PART 'all_0_1_1' TO DISK 'disk_hot2'

Query id: 8a367d9e-2e9c-4dc6-acc3-55f6ff363de4

┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse2 │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse3 │ 9000 │      0 │       │                   2 │                0 │
│ clickhouse1 │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse4 │ 9000 │      0 │       │                   0 │                0 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.150 sec. 

clickhouse1 :)
clickhouse1 :) select name, disk_name, formatReadableSize(bytes_on_disk) size, active from system.parts where table = 'hot_cold_table_local';

SELECT
    name,
    disk_name,
    formatReadableSize(bytes_on_disk) AS size,
    active
FROM system.parts
WHERE table = 'hot_cold_table_local'

Query id: 31754c1a-8671-4af9-ba18-44fac1263161

┌─name──────┬─disk_name─┬─size───────┬─active─┐
│ all_0_0_0 │ disk_hot1 │ 540.15 KiB │      0 │
│ all_0_1_1 │ disk_cold │ 1.01 MiB   │      1 │
│ all_1_1_0 │ disk_hot2 │ 539.10 KiB │      0 │
└───────────┴───────────┴────────────┴────────┘

3 rows in set. Elapsed: 0.003 sec. 

clickhouse1 :)

由上面可以看出part 'all_0_1_1’并没有移动到disk ‘disk_hot2’,这是因为on cluster操作只影响了shard,并没有影响replica; 所以需要在本地操作

clickhouse1 :)
clickhouse1 :) alter table hot_cold_table_local on cluster sharding_ha move part 'all_0_1_1' to volume 'cold';

ALTER TABLE hot_cold_table_local ON CLUSTER sharding_ha
    MOVE PART 'all_0_1_1' TO VOLUME 'cold'

Query id: b9a0384e-990b-4173-afaf-ab1a9ee8f481

┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse2 │ 9000 │      0 │       │                   3 │                2 │
│ clickhouse1 │ 9000 │      0 │       │                   2 │                2 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
┌─host────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse3 │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse4 │ 9000 │      0 │       │                   0 │                0 │
└─────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

4 rows in set. Elapsed: 0.237 sec. 

clickhouse1 :)
clickhouse1 :) select name, disk_name, formatReadableSize(bytes_on_disk) size, active from system.parts where table = 'hot_cold_table_local';

SELECT
    name,
    disk_name,
    formatReadableSize(bytes_on_disk) AS size,
    active
FROM system.parts
WHERE table = 'hot_cold_table_local'

Query id: 985623b6-55f0-49cd-83cb-8cb94a658196

┌─name──────┬─disk_name─┬─size─────┬─active─┐
│ all_0_1_1 │ disk_cold │ 1.01 MiB │      1 │
└───────────┴───────────┴──────────┴────────┘

1 rows in set. Elapsed: 0.004 sec. 

clickhouse1 :)

on cluster操作只影响了shard,并没有影响replica; 如果未移动就需要在本地操作

你可能感兴趣的:(#,Clickhouse,clickhouse,多路径存储,多磁盘存储,jbod策略,hot/cold策略)