Clickhouse 单机版 及 Clickhouse 集群的安装及搭建

Clickhouse 单机版及集群的搭建

本文详细描述了如何搭建 Clickhouse 单机版及 Clickhouse 集群。

Clickhouse 单机版搭建

本小节简述了如何搭建 Clickhouse 的单机版。该部分内容也会成为后续搭建 Clickhouse 集群的先决条件,请务必认真对待。

安装包拷贝至相应目录

将下载好的 clickhouse 安装包利用命令分发至待安装主机上。安装包可以到 Clickhouse 官网 下载至本地。

[hadoop@p0-tklcdh-clickhouse01 software]$ pwd
/home/hadoop/software
[hadoop@p0-tklcdh-clickhouse01 software]$ ls -l
total 1112624
-rw-r--r-- 1 hadoop hadoop     39866 Sep 28 13:57 clickhouse-client-22.3.2.2.tgz
-rw-r--r-- 1 hadoop hadoop 179545907 Sep 28 13:57 clickhouse-common-static-22.3.2.2.tgz
-rw-r--r-- 1 hadoop hadoop 774024509 Sep 28 13:57 clickhouse-common-static-dbg-22.3.2.2.tgz
-rw-r--r-- 1 hadoop hadoop     61901 Sep 28 13:57 clickhouse-server-22.3.2.2.tgz
-rw-r--r-- 1 hadoop hadoop 185646832 Sep 28 11:40 jdk-8u181-linux-x64.tar.gz
[hadoop@p0-tklcdh-clickhouse01 software]$ scp clickhouse-* [email protected]:`pwd`
[hadoop@p0-tklcdh-clickhouse01 software]$ scp clickhouse-* [email protected]:`pwd

自行编写的安装脚本

/homehadoop/software 中创建一个新脚本 sudo vim /homehadoop/software/install.sh 并将下面的内容拷贝至脚本。

也可以参考官方网站的 Installation Guide 进行安装。

#!/bin/bash

# Remember to switch Linux user to root

sudo mkdir /home/hadoop/clickhouse
sudo cp /home/hadoop/software/clickhouse-* /home/hadoop/clickhouse

cd /home/hadoop/clickhouse

tar -xzvf "clickhouse-common-static-22.3.2.2.tgz"
tar -xzvf "clickhouse-common-static-dbg-22.3.2.2.tgz"
tar -xzvf "clickhouse-server-22.3.2.2.tgz"
tar -xzvf "clickhouse-client-22.3.2.2.tgz"

rm -f /home/hadoop/clickhouse/*.tgz

sudo "clickhouse-common-static-22.3.2.2/install/doinst.sh"
sudo "clickhouse-common-static-dbg-22.3.2.2/install/doinst.sh"
sudo "clickhouse-server-22.3.2.2/install/doinst.sh"
sudo "clickhouse-client-22.3.2.2/install/doinst.sh"

# Remember to generate a openssl certification
sudo openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 3650 -nodes -x509 -keyout /etc/clickhouse-servee-server/server.crt

# Set config to ensure only IPv4 is usable
sudo echo "0.0.0.0" > /etc/clickhouse-server/config.d/listen.xml

拷贝完毕后,更改脚本权限 chmod 777 sudo vim /homehadoop/software/install.sh 并执行脚本 /homehadoop/software/install.sh。根据提示输入相应的指令并完成安装。

检验单机 Clickhouse 是否成功安装

使用如下命令启动 clickhouse server:

sudo clickhouse start

在 server 成功启动后,使用 clickhouse-client 配合在安装时指定的 password 进行登录

clickhouse-client --password

如观测到如下输出则可表明 Clickhouse 在单机上运行成功

[root@p0-tklcdh-clickhouse03 software]# clickhouse-client --password
ClickHouse client version 22.3.2.1.
Password for user (default): 
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.3.2 revision 54455.

p0-tklcdh-clickhouse01 :) show databases;

SHOW DATABASES

Query id: d9a6c3f9-1aa8-42fd-92e4-e29de6d32dd4

┌─name───────────────┐
│ INFORMATION_SCHEMA │
│ default            │
│ information_schema │
│ system             │
└────────────────────┘

4 rows in set. Elapsed: 0.002 sec.

p0-tklcdh-clickhouse03 :) exit
Bye.

Clickhouse 分布式集群搭建

本小节在上一个小节的基础上配合 Zookeeper 开始搭建一个 Clickhouse 集群。

安装前置需求

  • 确保每一台要组成 Clickhouse 集群的节点上都按上述方法成功安装了 Clickhouse
  • 至少已经准备好一个可用的 Zookeeper 集群,并确保所有 Clickhouse 节点都可以访问 Zookeeper 集群的相应端口
  • /etc/hosts 中各个 Clickhouse 节点和 Zookeeper 节点的 hostname-IP 映射全部配置完毕,并每台机器可以相互 ping 通,相应端口不要被占用
  • 待配置文件 /etc/clickhouse-server/config.xml 完成配置后,将配置文件全部替换至各个 Clickhouse 节点的相应位置

集群化运行修改配置文件

Clickhouse 官方文档给出了配置分布式集群的方法。详见 Clickhouse Clusters

为了能够使 Clickhouse 以集群的方式顺利运行,需要在 /etc/clickhouse-server/config.xml 配置文件的相应位置替换如下配置项:

  1. remote_servers 的相关配置。特别注意,配置 remote_servers 时需要确保每台节点上的 /etc/hosts 中都有相应的域名-IP映射关系,否则节点之间无法互通。
	<remote_servers>
		<example_cluster>
			<shard>
				
				<weight>1weight>
				
				<internal_replication>falseinternal_replication>
				<replica>
					<host>p0-tklcdh-clickhouse01host>
					<port>9000port>
				replica>
			shard>
			<shard>
				<weight>1weight>
				<internal_replication>falseinternal_replication>
				<replica>
					<host>p0-tklcdh-clickhouse02host>
					<port>9000port>
				replica>
			shard>
			<shard>
				<weight>1weight>
				<internal_replication>falseinternal_replication>
				<replica>
					<host>p0-tklcdh-clickhouse03host>
					<port>9000port>
				replica>
			shard>
		example_cluster>
	remote_servers>
  1. zookeeper 的相关配置。特别注意,配置 zookeeper 时需要确保每台节点上的 /etc/hosts 中都有相应的域名-IP映射关系,否则该节点无法找到 Zookeeper 服务,也就无法获取从其他节点传来的任务指令。详见 Data Replication 和 Clickhouse 官方 Zookeeper 配置指南
	<zookeeper>
		<node>
			<host>p0-tklfbirp-cdh-zk01host>
			<port>2181port>
		node>
		<node>
			<host>p0-tklfbirp-cdh-zk02host>
			<port>2181port>
		node>
		<node>
			<host>p0-tklfbirp-cdh-zk03host>
			<port>2181port>
		node>
		<node>
			<host>p0-tklfbirp-cdh-zk04host>
			<port>2181port>
		node>
		<node>
			<host>p0-tklfbirp-cdh-zk05host>
			<port>2181port>
		node>
	zookeeper>
  1. macros 的相关配置。具体作用请见官方文档说明 macros 和 Replicated*MergeTree parameters
	<macros>
        <shard>01shard>
        <replica>p0-tklcdh-clickhouse01replica>
    macros>
  1. 修改数据存储位置
    
    <path>/data02/clickhouse/path>

    
    <tmp_path>/data02/clickhouse/tmp/tmp_path>

	
    <user_files_path>/data02/clickhouse/user_files/user_files_path>

分发配置文件至其余 Clickhouse 节点

使用 scp 命令或将添加完集群配置的配置文件 /etc/clickhouse-server/config.xml 手动拷贝至所有 Clickhouse 节点的相应位置。

重启各个 Clickhouse 节点的服务

使用如下命令在各个 Clickhouse 节点上重新启动修改完配置文件的 clickhouse server:

sudo clickhouse restart

查看 Clickhouse 集群信息

在 Clickhouse Cmd 中使用相关命令查看元数据的相应信息。

p0-tklcdh-clickhouse01 :) select * from system.clusters

SELECT *
FROM system.clusters

Query id: 8fac3aac-c543-4fae-97b2-58b278089675

┌─cluster─────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name──────────────┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─slowdowns_count─┬─estimated_recovery_time─┐
│ example_cluster │         1 │            1 │           1 │ p0-tklcdh-clickhouse01 │ 10.129.81.113 │ 9000 │        1 │ default │                  │            0 │               0 │                       0 │
│ example_cluster │         2 │            1 │           1 │ p0-tklcdh-clickhouse02 │ 10.129.81.114 │ 9000 │        0 │ default │                  │            0 │               0 │                       0 │
│ example_cluster │         3 │            1 │           1 │ p0-tklcdh-clickhouse03 │ 10.129.81.115 │ 9000 │        0 │ default │                  │            0 │               0 │                       0 │
└─────────────────┴───────────┴──────────────┴─────────────┴────────────────────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────┴─────────────────────────┘

3 rows in set. Elapsed: 0.002 sec.

在主节点上使用 clickhouse-client 连接并验证分布式搭建是否成功

使用如下命令连接至 clickhouse 的主节点,并运行一个分布式建表语句检查 Clickhouse 的分布式搭建是否成功。详见 Creating Replicated Tables

建表语句如下:

CREATE TABLE IF NOT EXISTS table_test01
ON CLUSTER example_cluster
(
    UserID UInt32,
	ver UInt16
) 
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/01/table_test01', 'p0-tklcdh-clickhouse01-01', ver)
PRIMARY KEY UserID;

执行结果如下:

[root@p0-tklcdh-clickhouse01 clickhouse-client-22.3.2.2]# clickhouse-client --password
ClickHouse client version 22.3.2.1.
Password for user (default): 
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 22.3.2 revision 54455.

p0-tklcdh-clickhouse01 :) CREATE TABLE IF NOT EXISTS table_test01
                          ON CLUSTER example_cluster
                          (
                              UserID UInt32,
                          ^Iver UInt16
                          ) 
                          ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/01/table_test01', 'p0-tklcdh-clickhouse01-01', ver)
                          PRIMARY KEY UserID;

CREATE TABLE IF NOT EXISTS table_test01 ON CLUSTER example_cluster
(
    `UserID` UInt32,
    `ver` UInt16
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/01/table_test01', 'p0-tklcdh-clickhouse01-01', ver)
PRIMARY KEY UserID

Query id: 90055fe0-7443-4c65-8aa1-4cb09becc393

┌─host───────────────────┬─port─┬─status─┬─error─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─num_hosts_remaining─┬─num_hosts_active─┐
│ p0-tklcdh-clickhouse02 │ 9000 │    253 │ Code: 253. DB::Exception: Replica /clickhouse/tables/01/table_test01/replicas/p0-tklcdh-clickhouse01-01 already exists. (REPLICA_IS_ALREADY_EXIST) (version 22.3.2.1) │                   2 │                0 │
│ p0-tklcdh-clickhouse03 │ 9000 │      0 │                                                                                                                                                                       │                   1 │                0 │
│ p0-tklcdh-clickhouse01 │ 9000 │    253 │ Code: 253. DB::Exception: Replica /clickhouse/tables/01/table_test01/replicas/p0-tklcdh-clickhouse01-01 already exists. (REPLICA_IS_ALREADY_EXIST) (version 22.3.2.1) │                   0 │                0 │
└────────────────────────┴──────┴────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────┴──────────────────┘
→ Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.)  0%
3 rows in set. Elapsed: 0.120 sec. 

Received exception from server (version 22.3.2):
Code: 253. DB::Exception: Received from localhost:9000. DB::Exception: There was an error on [p0-tklcdh-clickhouse02:9000]: Code: 253. DB::Exception: Replica /clickhouse/tables/01/table_test01/replicas/p0-tklcdh-clickhouse01-01 already exists. (REPLICA_IS_ALREADY_EXIST) (version 22.3.2.1). (REPLICA_IS_ALREADY_EXIST)

如果显示以上信息则可以初步表明分布式搭建已经成功,所有 Clickhouse 节点组成了一个 Clickhouse 集群并由 Zookeeper 进行任务协同。(由于本例中之前已经做过了一次该操作,故这里有一个 Replica 已经存在的 Exception,可忽略。)

Clickhouse 集群指标测试

通过官方给定的测试用例,可以系统性的测试 Clickhouse 集群搭建的是否成功,以及查询性能等诸多指标。详见 Clickhouse Testing

备注

  • Clickhouse 默认日志存储路径,如果发现问题可以及时登录每台 Clickhouse 节点上进行日志分析。
[root@p0-tklcdh-clickhouse02 clickhouse-server]# ls /var/log/clickhouse-server/
clickhouse-server.err.log  clickhouse-server.log  stderr.log  stdout.log

References

  • Clickhouse tgz 包下载路径
  • Installation Guide
  • Clickhouse Clusters
  • Zookeeper 配置指南
  • Data Replication
  • macros
  • Replicated*MergeTree parameters
  • Creating Replicated Tables
  • Clickhouse Testing
  • Creating a ClickHouse cluster - Part I: Sharding
  • Creating a ClickHouse cluster - Part II: Replication

附录:配置文件全部内容

(略) 数据合规,信息安全,无法透露请谅解

你可能感兴趣的:(Datawarehouse,clickhouse,hadoop,大数据)