集群规划
集群架构
其中 Distribute 是指创建分布式表的机器,在此文章中是将 Distribute 单独部署的,也可以将 Distribute 角色划分到每个 Replica 机器上,即在所有副本机器上创建相同的分布式表,可以使用 create table tbl on cluster 'cluster_name'
。
角色分布
本次安装中使用了 5 个 zookeeper 节点的集群,这个对于安装 clickhouse 不是必须的。
安装步骤
基础环境准备
1. 安装 clustershell
$ yum install -y clustershell
$ vi /etc/clustershell/groups
all: clickhouse-node-[01-14]
replica1:clickhouse-node-[07,10,13]
replica2:clickhouse-node-[08,11,14]
distributed:clickhouse-node-[06,09,12]
chproxy:clickhouse-node-[06,09,12]
$ clush -a 'uptime'
2. 免密登陆
$ chmod 755 ~/.ssh
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 600 ~/.ssh/authorized_keys
$ vi /etc/ssh/sshd_config PubkeyAuthentication yes # 启⽤公钥私钥配对认证⽅式,ssh-copy-id需要此权限
$ service sshd restart $ ssh-copy-id -i ~/.ssh/id_rsa.pub root@xxxx
安装 Clickhouse
1. 安装 RPM 包
在所有机器上下载并安装 curl下载安装脚本/包
clush -g all -b 'yum install -y curl'
在 replica, distributed, chproxy 机器上下载并执行 packagecloud.io 提供的 clickhouse 安装脚本
clush -g replica1,replica2,distributed,chproxy -b 'curl -s https://packagecloud.io/install/repositories/altinity/clickhouse/script.rpm.sh | sudo bash'
将 clickhouse-server, clickhouse-client 安装到 replica 和 distributed 机器上
# check for availables
$ clush -g replica1,replica2,distributed -b 'sudo yum list "clickhouse*"'
# install
$ clush -g replica1,replica2,distributed -b 'sudo yum install -y clickhouse-server clickhouse-client clickhouse-compressor'
2. 修改 ulimit 配置
$ vi /etc/security/limits.d/clickhouse.conf
# 添加 core file size,允许程序 crash 时创建的 dump 文件大小
clickhouse soft core 1073741824
clickhouse hard core 1073741824
3. 修改启动脚本
$ /etc/init.d/clickhouse-server
CLICKHOUSE_LOGDIR=/data/clickhouse/log
4.修改集群配置
根据以下文件参数修改 /etc/clickhouse-server/config.xml,以下只包含需要修改替换的配置
trace
/data/clickhouse/logs/server.log
/data/clickhouse/logs/error.log
1000M
10
8123
9000
9009
0.0.0.0
/data/clickhouse/
/data/clickhouse/tmp/
users.xml
default
default
/etc/clickhouse-server/metrika.xml
创建 /etc/clickhouse-server/metrika.xml
true
clickhouse-node-07
9000
default
6lYaUiFi
clickhouse-node-08
9000
default
6lYaUiFi
true
clickhouse-node-10
9000
default
6lYaUiFi
clickhouse-node-11
9000
default
6lYaUiFi
true
clickhouse-node-13
9000
default
6lYaUiFi
clickhouse-node-14
9000
default
6lYaUiFi
clickhouse-node-01
2181
clickhouse-node-02
2181
clickhouse-node-03
2181
clickhouse-node-04
2181
clickhouse-node-05
2181
cluster-1
host_name
shard_number
10000000000
0.01
lz4
将 /etc/clickhouse-server/users.xml 修改为如下内容
10000000000
0
random
10000000000
0
random
1
3600
0
0
0
0
0
967f3bf355dddfabfca1c9f5cab39352b2ec1cd0b05f9e1e6b8f629705fe7d6e
::/0
default
default
967f3bf355dddfabfca1c9f5cab39352b2ec1cd0b05f9e1e6b8f629705fe7d6e
::/0
readonly
default
5. 同步配置
将 clickhouse 用户设置为 login 用户
$ clush -g replica1,replica2,distributed -b 'usermod -s /bin/bash clickhouse'
将 clickhouse 放置到 /data/clickhouse/ 下
$ clush -g replica1,replica2,distributed -b 'mkdir /data/clickhouse/logs -p'
$ clush -g replica1,replica2,distributed -b 'chown clickhouse.clickhouse /data/clickhouse/ -R'
将配置文件复制到所有的 clickhouse 机器
$ clush -g replica1,replica2,distributed -b --copy /etc/security/limits.d/clickhouse.conf --dest /etc/security/limits.d/
$ clush -g replica1,replica2,distributed -b --copy /etc/init.d/clickhouse-server --dest /etc/init.d
$ clush -g replica1,replica2,distributed -b --copy /etc/clickhouse-server/config.xml --dest /etc/clickhouse-server/
$ clush -g replica1,replica2,distributed -b --copy /etc/clickhouse-server/users.xml --dest /etc/clickhouse-server/
$ clush -g replica1,replica2,distributed -b --copy /etc/clickhouse-server/metrika.xml --dest /etc/clickhouse-server/
修改各个机器的变量
# replace hostname
$ clush -g replica1,replica2,distributed -b 'sed -i "s/host_name/"$HOSTNAME"/" /data/clickhouse/metrika.xml'
# replace shard_number
$ clush -w clickhouse-node-[06-08] -b 'sed -i "s/shard_number/1/" /data/clickhouse/metrika.xml'
$ clush -w clickhouse-node-[09-11] -b 'sed -i "s/shard_number/2/" /data/clickhouse/metrika.xml'
$ clush -w clickhouse-node-[12-14] -b 'sed -i "s/shard_number/3/" /data/clickhouse/metrika.xml'
6. 重新启动服务
# restart server
$ clush -g replica1,replica2,distributed -b 'service clickhouse-server restart'
# login with password
$ clickhouse-client -h 127.0.0.1 -d default -m -u default --password 6lYaUiFi
登陆机器创建 local table
$ clickhouse-client
CREATE TABLE monchickey.image_label (
label_id UInt32,
label_name String,
insert_time Date
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/01-01/image_label','cluster01-01-1',insert_time, (label_id, insert_time), 8192)
安装 chproxy
1. 下载 chproxy
https://github.com/Vertamedia/chproxy/releases
$ mkdir -p /data/chproxy
$ cd /data/chproxy
$ wget https://github.com/Vertamedia/chproxy/releases/download/1.13.0/chproxy-linux-amd64-1.13.0.tar.gz
$ tar -xzvf chproxy-*.gz
chproxy-linux-amd64
2. 配置文件
新建 /data/chproxy/config.yml
设置这个可以关闭安全检查 hack_me_please: true
server:
http:
listen_addr: ":9090"
allowed_networks: ["172.0.0.0/8"]
users:
- name: "distributed-write"
to_cluster: "distributed-write"
to_user: "default"
- name: "replica-write"
to_cluster: "replica-write"
to_user: "default"
- name: "distributed-read"
to_cluster: "distributed-read"
to_user: "readonly"
max_concurrent_queries: 6
max_execution_time: 1m
clusters:
- name: "replica-write"
replicas:
- name: "replica1"
nodes: ["clickhouse-node-07:8123", "clickhouse-node-10:8123", "clickhouse-node-13:8123"]
- name: "replica2"
nodes: ["clickhouse-node-08:8123", "clickhouse-node-11:8123", "clickhouse-node-14:8123"]
users:
- name: "default"
password: "6lYaUiFi"
- name: "distributed-write"
nodes: [
"clickhouse-node-06:8123",
"clickhouse-node-09:8123",
"clickhouse-node-12:8123"
]
users:
- name: "default"
password: "6lYaUiFi"
- name: "distributed-read"
nodes: [
"clickhouse-node-06:8123",
"clickhouse-node-09:8123",
"clickhouse-node-12:8123"
]
users:
- name: "readonly"
password: "6lYaUiFi"
caches:
- name: "shortterm"
dir: "/data/chproxy/cache/shortterm"
max_size: 150Mb
expire: 130s
新建 /data/chproxy/restart.sh
$ vim /data/chproxy/restart.sh
#!/bin/bash
cd $(dirname)
ps -ef | grep chproxy | head -2 | tail -1 | awk '{print $2}' | xargs kill -9
sudo -u chproxy nohup ./chproxy-linux-amd64 -config=./config/config.yml >> ./logs/chproxy.out 2>&1 &
3.分布式安装
创建用户
$ clush -g distributed -b 'useradd chproxy'
创建目录
$ clush -g distributed -b 'mkdir -p /data/chproxy/logs'
$ clush -g distributed -b 'mkdir -p /data/chproxy/config'
$ clush -g distributed -b 'mkdir -p /data/chproxy/cache/shortterm'
$ clush -g distributed -b 'mkdir -p /data/chproxy/cache/longterm'
分发文件
$ clush -g distributed -b --copy /data/chproxy/chproxy-linux-amd64 --dest /data/chproxy/
$ clush -g distributed -b --copy /data/chproxy/config.yml --dest /data/chproxy/config/
$ clush -g distributed -b --copy /data/chproxy/restart.sh --dest /data/chproxy/
修改目录权限
$ clush -g distributed -b 'chown -R chproxy.chproxy /data/chproxy'
4. 启动 chproxy
$ clush -g distributed -b 'bash /data/chproxy/restart.sh'
$ clush -g distributed -b 'ps -ef | grep chproxy' # check
检查 http 接口
# clichhouse
$ echo 'SELECT 1' | curl 'http://localhost:8123/?user=default&password=6lYaUiFi' --data-binary @-
1
echo 'SELECT 1' | curl 'http://default:6lYaUiFi@localhost:8123/' --data-binary @-
1
echo 'SELECT 1' | curl 'http://readonly:6lYaUiFi@localhost:8123/' --data-binary @-
1
# chproxy
echo 'SELECT 1' | curl 'http://clickhouse-node-06:9090/?user=distributed-read&password=' --data-binary @-
1
echo 'SELECT 1' | curl 'http://clickhouse-node-06:9090/?user=distributed-write&password=' --data-binary @-
1
监控
clickhouse 监控模板
https://github.com/Vertamedia/clickhouse-grafana
各个库表,占用的存储空间大小
SELECT
database,
table,
formatReadableSize ( sum( bytes ) ) AS size
FROM
system.parts
WHERE
active
GROUP BY database,table
ORDER BY sum( bytes ) DESC
遇到的问题
问题 #1
操作:
直接启动 clickhouse-server 会报 ulimit 相关的错误
clickhouse-server --config-file=/etc/clickhouse-server/config.xml
报错:
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = Exception: Cannot set max size of core file to 1073741824, e.what() = Exception
解决:
$ vi /etc/security/limits.d/clickhouse.conf
# 添加 core file size,允许程序 crash 时创建的 dump 文件大小
clickhouse soft core 1073741824
clickhouse hard core 1073741824
问题 #2
操作:
$ clickhouse-server --config-file=/etc/clickhouse-server/config.xml
报错:
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Logging trace to console
2019.02.13 15:15:36.539294 [ 1 ] {} : Starting ClickHouse 19.1.6 with revision 54413
2019.02.13 15:15:36.543324 [ 1 ] {} Application: starting up
2019.02.13 15:15:36.547676 [ 1 ] {} Application: DB::Exception: Effective user of the process (root) does not match the owner of the data (clickhouse). Run under 'sudo -u clickhouse'.
2019.02.13 15:15:36.547714 [ 1 ] {} Application: shutting down
2019.02.13 15:15:36.547729 [ 1 ] {} Application: Uninitializing subsystem: Logging Subsystem
2019.02.13 15:15:36.547809 [ 2 ] {} BaseDaemon: Stop SignalListener thread
解决:
$ sudo -u clickhouse clickhouse-server --config-file=/etc/clickhouse-server/config.xml
问题 #3
操作:
$ sudo -u clickhouse clickhouse-server --config-file=/etc/clickhouse-server/config.xml
报错:
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
Couldn't save preprocessed config to /var/lib/clickhouse//preprocessed_configs/config.xml: Access to file denied: /var/lib/clickhouse//preprocessed_configs/config.xml
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Poco::Exception. Code: 1000, e.code() = 13, e.displayText() = Access to file denied: /var/log/clickhouse-server/clickhouse-server.log, e.what() = Access to file denied
解决:
chown -R clickhouse /var/log/clickhouse-server/
问题 #4
操作:
$ sudo -u clickhouse clickhouse-server --config-file=/etc/clickhouse-server/config.xml
报错:
Logging trace to /var/log/clickhouse-server/clickhouse-server.log
Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
Logging trace to console
2019.02.13 16:39:48.812708 [ 1 ] {} : Starting ClickHouse 19.1.6 with revision 54413
2019.02.13 16:39:48.815644 [ 1 ] {} Application: starting up
2019.02.13 16:39:48.819798 [ 1 ] {} Application: rlimit on number of file descriptors is 262144
2019.02.13 16:39:48.819827 [ 1 ] {} Application: Initializing DateLUT.
2019.02.13 16:39:48.819850 [ 1 ] {} Application: Initialized DateLUT with time zone `Asia/Shanghai'.
2019.02.13 16:39:48.820256 [ 1 ] {} Application: Configuration parameter 'interserver_http_host' doesn't exist or exists and empty. Will use 'clickhouse-node-13' as replica host.
2019.02.13 16:39:48.822770 [ 1 ] {} ConfigReloader: Loading config `/data/clickhouse/users.xml'
Include not found: networks
Include not found: networks
2019.02.13 16:39:48.823764 [ 1 ] {} Application: Loading metadata.
2019.02.13 16:39:48.829479 [ 1 ] {} Application: Loaded metadata.
2019.02.13 16:39:48.829592 [ 1 ] {} BackgroundProcessingPool: Create BackgroundProcessingPool with 16 threads
2019.02.13 16:39:48.829762 [ 3 ] {} DDLWorker: Started DDLWorker thread
2019.02.13 16:39:48.834746 [ 3 ] {} ZooKeeper: initialized, hosts: clickhouse-node-03:2181,clickhouse-node-02:2181,clickhouse-node-05:2181,clickhouse-node-01:2181,clickhouse-node-04:2181
2019.02.13 16:39:48.834875 [ 1 ] {} Application: It looks like the process has no CAP_NET_ADMIN capability, 'taskstats' performance statistics will be disabled. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_net_admin=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems. It also doesn't work if you run clickhouse-server inside network namespace as it happens in some containers.
2019.02.13 16:39:48.835531 [ 1 ] {} Application: Listen [::1]: 99: Net Exception: Cannot assign requested address: [::1]:8123 If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in element of configuration file. Example for disabled IPv6: 0.0.0.0 . Example for disabled IPv4: ::
2019.02.13 16:39:48.835636 [ 1 ] {} Application: Listening http://127.0.0.1:8123
2019.02.13 16:39:48.835684 [ 1 ] {} Application: Listening tcp: 127.0.0.1:9000
2019.02.13 16:39:48.835734 [ 1 ] {} Application: Listening interserver http: 127.0.0.1:9009
2019.02.13 16:39:48.836105 [ 1 ] {} Application: Available RAM = 31.26 GiB; physical cores = 8; threads = 16.
2019.02.13 16:39:48.836120 [ 1 ] {} Application: Ready for connections.
2019.02.13 16:39:48.838717 [ 3 ] {} DDLWorker: Processing tasks
2019.02.13 16:39:48.838977 [ 3 ] {} DDLWorker: Waiting a watch
2019.02.13 16:39:50.838820 [ 23 ] {} ConfigReloader: Loading config `/data/clickhouse/config.xml'
解决:
$ vim /etc/clickhouse-server/config.xml
0.0.0.0
问题 #5
操作:
$ clush -g replica1,replica2 -b 'service clickhouse-server stop'
问题:
8123 端口相关的进程不能被停止
解决:
$ lsof -i :8123 | grep clickhouse | awk '{print $2}' | xargs kill -9
# or
$ service clickhouse-server forcestop
参考
- ClickHouse集群搭建从0到1
- https://zhuanlan.zhihu.com/p/34669883
- https://clustershell.readthedocs.io/en/latest/index.html
- https://github.com/Altinity/clickhouse-rpm-install
- https://github.com/Vertamedia/chproxy
- https://www.altinity.com/blog/2017/6/5/clickhouse-data-distribution
- https://www.altinity.com/blog/2018/5/10/circular-replication-cluster-topology-in-clickhouse
- http://jackpgao.github.io/
- https://hzkeung.com/2018/06/21/clickhouse-cluster-install