Written By: Xinyao Tian
【原创文章 码字不易 转载请注明作者及出处】
本文档描述了 Clickhouse 集群在三台物理主机上的 “3分片2副本” 的配置及搭建方法。
由于本搭建方案涉及 “1 台物理节点启动多个 Clickhouse 实例” 的情况,故需要进行各种目录权限、进程资源和运行时环境的拆分,较为复杂,
故单独撰写本份文档进行详细操作步骤的记录。
理论上,三台物理节点搭建的 Clickhouse 3 Shard 2 Replica 集群可以充分并行使用 3 台物理节点的计算资源,同时支持最多 1 台主机的宕机;
相应地,为了进行 Fault Tolerance,本方案会由于每个 Shard 配置了 2 个 Replica,故会额外消耗一倍的存储资源。
根据 Clickhouse 官方文档,Replica / Shard 等相关概念的具体定义如下所示。
A copy of data. ClickHouse always has at least one copy of your data, and so the minimum number of replicas is one. This is an important detail, you may not be used to counting the original copy of your data as a replica, but that is the term used in ClickHouse code and documentation. Adding a second replica of your data provides fault tolerance.
A subset of data. ClickHouse always has at least one shard for your data, so if you do not split the data across multiple servers, your data will be stored in one shard. Sharding data across multiple servers can be used to divide the load if you exceed the capacity of a single server. The destination server is determined by the sharding key, and is defined when you create the distributed table. The sharding key can be random or as an output of a hash function. The deployment examples involving sharding will use rand() as the sharding key, and will provide further information on when and how to choose a different sharding key.
ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. ClickHouse Keeper is compatible with Apache ZooKeeper.
注意,为突出重点,本例中仅针对关键配置项进行记录,其他 Clickhouse 常规配置 (e.g. Zookeeper 相关) 没有进行记录,但仍需谨慎配置。
由于我们需要在同一台物理节点上启动多个 Clickhouse instance,故需要不同的配置文件对配置多个 instance。
此处,我们复制 Clickhouse 默认的配置文件 config.xml
并创建额外的 clickhouse 配置文件 config-9100.xml
[root@p0-lpsm-rf1 clickhouse-server]# ls -l | grep config
-rw-r--r-- 1 clickhouse clickhouse 58471 Jun 27 15:51 config-9100.xml
drwxr-xr-x 2 clickhouse clickhouse 24 Jun 19 17:36 config.d
-rw-r--r-- 1 clickhouse clickhouse 59784 Jun 27 15:41 config.xml
Clickhouse “3 分片 2 副本” 的部署模式可以在保障高效查询的同时,开启错误容忍的功能: 按如下配置,3 台主机中挂掉最多 1 台仍可以保障服务的正常运行。
下面为修改分片和副本的配置方式,配置方法如下所示。需要注意每个 replica 的端口号不能冲突,否则 Clickhouse 会出现 Error 报警。
该配置项为集群级配置,故所有进程的配置文件该项均相同。在 config.xml
和 config-9000.xml
中均进行此配置:
1
false
1
p0-lpsm-rf1
9000
2
p0-lpsm-rf2
9100
1
false
1
p0-lpsm-rf2
9000
2
p0-lpsm-rf3
9100
1
false
1
p0-lpsm-rf3
9000
2
p0-lpsm-rf1
9100
配置第一个 Clickhouse 实例并使用如下端口 (9000 / 8123 / 9009 端口):
8123
9000
9009
配置第二个 Clickhouse 实例并使用如下端口 (9100 / 8124 / 9010 端口):
8124
9100
9010
创建 clickhouse 底层的数据目录,以隔离两个 Clickhouse 实例的存储空间:
[root@p0-lpsm-rf1 clickhouse]# mkdir /data/clickhouse-9100
[root@p0-lpsm-rf1 clickhouse]# chown -R clickhouse:clickhouse /data/clickhouse-9100
创建 Clickhouse 日志文件存放位置:
[root@p0-lpsm-rf1 clickhouse-server]# mkdir /data/clickhouse-9100/logs
[root@p0-lpsm-rf1 clickhouse-server]# chown -R clickhouse:clickhouse /data/clickhouse-9100/logs/
修改配置文件中的数据目录位置:
/data/clickhouse-9000
/data/clickhouse-9000/tmp/
/data/clickhouse-9000/user_files/
修改配置文件中的日志位置:
information
/data/clickhouse-9000/logs/clickhouse-server.log
/data/clickhouse-9000/logs/clickhouse-server.err.log
100M
10
修改配置文件中的数据目录位置:
/data/clickhouse-9100
/data/clickhouse-9100/tmp/
/data/clickhouse-9100/user_files/
修改配置文件中的日志位置:
information
/data/clickhouse-9100/logs/clickhouse-server.log
/data/clickhouse-9100/logs/clickhouse-server.err.log
100M
10
该配置项在 3 个节点上的共计 6 个实例均互不相同,故需要细心配置,否则会出现数据错乱导致的严重异常。
01
01
production_cluster_3s2r
03
02
production_cluster_3s2r
02
01
production_cluster_3s2r
01
02
production_cluster_3s2r
03
01
production_cluster_3s2r
02
02
production_cluster_3s2r
至此,共计 6 个 Clickhouse 实例相应的配置文件就已经基本配置完毕。
由于我们在一台主机上将启动 2 个 Clickhouse 实例,故对启动命令也需要进行相应的区分。
进入 system 启动脚本目录并创建新的启动脚本 cp /etc/systemd/system/clickhouse-server.service /etc/systemd/system/clickhouse-server-9100.service
:
[root@p0-lpsm-rf2 system]# pwd
/etc/systemd/system
[root@p0-lpsm-rf2 system]# ls -l | grep clickhouse-server
-rw-r--r-- 1 root root 965 Jun 27 15:06 clickhouse-server-9100.service
-rw-r--r-- 1 root root 950 Jun 19 15:15 clickhouse-server.service
修改新的启动脚本 /etc/systemd/system/clickhouse-server-9100.service
的内容。注意其中的 ExecStart
和 EnvironmentFile
的变量内容已经被修改为了 9100 相关的配置。
[root@p0-lpsm-rf2 system]# cat /etc/systemd/system/clickhouse-server-9100.service
[Unit]
Description=ClickHouse Server (analytic DBMS for big data)
Requires=network-online.target
# NOTE: that After/Wants=time-sync.target is not enough, you need to ensure
# that the time was adjusted already, if you use systemd-timesyncd you are
# safe, but if you use ntp or some other daemon, you should configure it
# additionaly.
After=time-sync.target network-online.target
Wants=time-sync.target
[Service]
Type=simple
User=clickhouse
Group=clickhouse
Restart=always
RestartSec=30
RuntimeDirectory=clickhouse-server
ExecStart=/usr/bin/clickhouse-server --config=/etc/clickhouse-server/config-9100.xml --pid-file=/run/clickhouse-server/clickhouse-server-9100.pid
# Minus means that this file is optional.
EnvironmentFile=-/etc/default/clickhouse-9100
LimitCORE=infinity
LimitNOFILE=500000
CapabilityBoundingSet=CAP_NET_ADMIN CAP_IPC_LOCK CAP_SYS_NICE
[Install]
# ClickHouse should not start from the rescue shell (rescue.target).
WantedBy=multi-user.target
以进程的方式分别启动两个 Clickhouse instances:
# 启动 Clickhouse 9000 instance
sudo systemctl stop clickhouse-server
sudo systemctl start clickhouse-server
sudo systemctl restart clickhouse-server
# 启动 Clickhouse 9100 instance
sudo systemctl stop clickhouse-server-9100
sudo systemctl start clickhouse-server-9100
sudo systemctl restart clickhouse-server-9100
检视两个实例对应的进程是否全部启动:
[root@p0-lpsm-rf1 clickhouse-server]# ps -ef | grep clickhouse
clickho+ 58120 1 0 15:43 ? 00:00:00 clickhouse-watchdog --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid
clickho+ 58123 58120 2 15:43 ? 00:01:32 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid
clickho+ 64780 1 0 15:52 ? 00:00:00 clickhouse-watchdog --config=/etc/clickhouse-server/config-9100.xml --pid-file=/run/clickhouse-server/clickhouse-server-9100.pid
clickho+ 64783 64780 1 15:52 ? 00:00:50 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config-9100.xml --pid-file=/run/clickhouse-server/clickhouse-server-9100.pid
root 144479 7356 0 16:45 pts/0 00:00:00 grep --color=auto clickhouse
检视端口是否启动正常:
[root@p0-lpsm-rf1 clickhouse-server]# netstat -ntlp | grep clickhouse
tcp 0 0 0.0.0.0:9009 0.0.0.0:* LISTEN 58123/clickhouse-se
tcp 0 0 127.0.0.1:9010 0.0.0.0:* LISTEN 64783/clickhouse-se
tcp 0 0 0.0.0.0:8123 0.0.0.0:* LISTEN 58123/clickhouse-se
tcp 0 0 127.0.0.1:8124 0.0.0.0:* LISTEN 64783/clickhouse-se
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 58123/clickhouse-se
tcp 0 0 127.0.0.1:9100 0.0.0.0:* LISTEN 64783/clickhouse-se
Ps: 如果启动过程中出现问题请检视日志文件: /data/clickhouse/logs/clickhouse-server.err.log
以及 /data/clickhouse-9100/logs/clickhouse-server.err.log
。
Verify ClickHouse cluster functionality, followed by the official tutorial: Verify ClickHouse cluster functionality
CREATE DATABASE db_test ON CLUSTER production_cluster_3s2r
DROP TABLE IF EXISTS db_test.test_table01 ON CLUSTER production_cluster_3s2r
CREATE TABLE db_test.test_table01 ON CLUSTER production_cluster_3s2r
(
`id` UInt64,
`column1` String
)
ENGINE = ReplicatedMergeTree
ORDER BY id
p0-lpsm-rf1
INSERT INTO db_test.test_table01 (id, column1) VALUES (1, 'abc');
p0-lpsm-rf2
SELECT * FROM db_test.test_table01
p0-lpsm-rf1
INSERT INTO db_test.test_table01 (id, column1) VALUES (2, 'def');
Stop one ClickHouse server node Stop one of the ClickHouse server nodes by running an operating system command similar to the command used to start the node. If you used systemctl start
to start the node, then use systemctl stop
to stop it.
Insert more data on the running node
INSERT INTO db_test.test_table01 (id, column1) VALUES (3, 'ghi');
SELECT * FROM db_test.test_table01
可以看到在 p0-lpsm-rf2:9000
插入数据后,由于设置了 p0-lpsm-rf3:9100
为其副本,故登录 p0-lpsm-rf3:9100
即可直接查看到插入的内容,
者提供了良好的错误容忍机制。
登录 p0-lpsm-rf2:9000
插入数据
[root@p0-lpsm-rf2 clickhouse-server]# clickhouse-client --port=9000
ClickHouse client version 22.3.2.1.
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.3.2 revision 54455.
p0-lpsm-rf2 :) INSERT INTO db_test.test_table01 (id, column1) VALUES (1, 'abc');
INSERT INTO db_test.test_table01 (id, column1) FORMAT Values
Query id: c731d404-ff01-43eb-912e-e2671e8dbafb
Ok.
1 rows in set. Elapsed: 0.022 sec.
p0-lpsm-rf2 :) SELECT * FROM db_test.test_table01
SELECT *
FROM db_test.test_table01
Query id: b74b8f78-5b97-4eb9-956d-27f6fc556f3d
┌─id─┬─column1─┐
│ 1 │ abc │
└────┴─────────┘
登录 p0-lpsm-rf3:9100
查看数据,由于这两个实例是同一个 shard 的两个副本,故自动进行了备份,查询可以直接输出结果。
[root@p0-lpsm-rf3 clickhouse-server]# clickhouse-client --port=9100
ClickHouse client version 22.3.2.1.
Connecting to localhost:9100 as user default.
Connected to ClickHouse server version 22.3.2 revision 54455.
p0-lpsm-rf3 :) SELECT * FROM db_test.test_table01
SELECT *
FROM db_test.test_table01
Query id: 1fbf3994-b4fe-44d7-99f6-26aa2ead5e5a
┌─id─┬─column1─┐
│ 1 │ abc │
└────┴─────────┘
1 rows in set. Elapsed: 0.001 sec.
本文档描述了 Clickhouse 集群在三台物理主机上的 “3分片2副本” 的配置及搭建方法。
使用该方法进行 Clickhouse 的集群搭建,即可在保障充分并行利用 3 台机器算例的同时,对每个 shard 进行跨主机的备份,为 Clickhouse 服务高可用和错误恢复机制提供了底层技术支撑。