ClickHouse的数据同步功能一直是万众期待,之前有一些开源工具来做,比如Bifrost、mysql-clickhouse-replication、clickhouse-mysql-data-reader 等。
这个月官方终于推出了带同步功能的稳定版本--20.8. X。值得注意的是,该功能是两位国人技术大牛 BohuTANG 和 zhang1024 提交的。
但根据小道消息,目前这个稳定版的同步功能还有些小问题,因此今天的测试使用了最新的测试版本,而非稳定版。
以下罗列了所有版本tag
https://github.com/ClickHouse/ClickHouse/tags
可以看到最新的测试版是:v20.10.1.4635-testing
编译好的RPM包大全
https://clickhouse-builds.s3.yandex.net/0/109fd9d6d7a0e053bc8272dbef25febb9ba9efc1/clickhouse_build_check/report.html
找到Linux下的rpm包下载即可
# 找了个测试环境,检查当前CH版本:
[root@bj79 ~]# rpm -qa|grep click
clickhouse-common-static-20.3.5.21-2.x86_64
clickhouse-client-20.3.5.21-2.noarch
clickhouse-server-20.3.5.21-2.noarch
# 停CH服务
[root@bj79 ~]# service clickhouse-server status
clickhouse-server service is running
[root@bj79 ~]#
[root@bj79 ~]# service clickhouse-server stop
Stop clickhouse-server service: DONE
# 准备升级包
[root@bj79 ch_20]# ll
total 138620
-rw-r--r-- 1 root root 125126 Sep 14 11:28 clickhouse-client-20.10.1.4635-2.noarch.rpm
-rw-r--r-- 1 root root 141662768 Sep 14 11:30 clickhouse-common-static-20.10.1.4635-2.x86_64.rpm
-rw-r--r-- 1 root root 149103 Sep 14 11:29 clickhouse-server-20.10.1.4635-2.noarch.rpm
# 版本升级
[root@bj79 ch_20]# rpm -Uvh clickhouse*.rpm
Preparing... ################################# [100%]
Updating / installing...
1:clickhouse-common-static-20.10.1.################################# [ 17%]
2:clickhouse-client-20.10.1.4635-2 ################################# [ 33%]
3:clickhouse-server-20.10.1.4635-2 ################################# [ 50%]
Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/
Cleaning up / removing...
4:clickhouse-server-20.3.5.21-2 ################################# [ 67%]
5:clickhouse-client-20.3.5.21-2 ################################# [ 83%]
6:clickhouse-common-static-20.3.5.2################################# [100%]
[root@bj79 ch_20]#
# 启动服务,检查版本已经更新成功
[root@bj79 ch_20]# systemctl start clickhouse-server
[root@bj79 ch_20]#
[root@bj79 ch_20]# clickhouse-client -m
ClickHouse client version 20.10.1.4635 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 20.10.1 revision 54440.
bj79 :)
bj79 :) show databases;
SHOW DATABASES
┌─name───────────────────────────┐
│ _temporary_and_external_tables │
│ default │
│ system │
└────────────────────────────────┘
3 rows in set. Elapsed: 0.002 sec.
# MySQL:建立复制的用户
GRANT REPLICATION SLAVE,REPLICATION CLIENT,RELOAD,SELECT ON *.* to 'ch_repl'@'%' identified by 'Ch_repl_123';
# 建立测试表:
CREATE TABLE `t` (
`id` int(11) NOT NULL,
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a` (`a`),
KEY `b` (`b`)
) ENGINE=InnoDB;
# 生成10万的测试数据:
delimiter ;;
create procedure idata()
begin
declare i int;
set i=1;
while(i<=100000)do
insert into t values(i, i, i);
set i=i+1;
end while;
end;;
delimiter ;
# 执行时间很慢:
[root@localhost][caihao_test]> call idata();
Query OK, 1 row affected (6 min 48.02 sec)
# 期间的系统负载:
# 为了方便观察,MySQL上开启general_log
[root@localhost][caihao_test]> set global general_log=1;
Query OK, 0 rows affected (0.00 sec)
# 语法的格式:
CREATE DATABASE ckdb ENGINE = MaterializeMySQL('[mysql-host]:[mysql-port]', '[mysql-database]', '[mysql-user]', '[mysql-password]');
# 执行同步:
bj79 :) SET allow_experimental_database_materialize_mysql=1;
SET allow_experimental_database_materialize_mysql = 1
Ok.
0 rows in set. Elapsed: 0.001 sec.
bj79 :) CREATE DATABASE caihao_test ENGINE = MaterializeMySQL('192.168.0.77:3306', 'caihao_test', 'ch_repl', 'Ch_repl_123');
CREATE DATABASE caihao_test
ENGINE = MaterializeMySQL('192.168.0.77:3306', 'caihao_test', 'ch_repl', 'Ch_repl_123')
Ok.
0 rows in set. Elapsed: 0.056 sec.
# 检查同步后的数据量:
# MySQL端,继续插入2020条数据:
delimiter ;;
create procedure inc_idata()
begin
declare i int;
set i=100001;
while(i<=102020)do
insert into t values(i, i, i);
set i=i+1;
end while;
end;;
delimiter ;
call inc_idata();
# 对比两边数据一致,增量数据一条不少
# 两边的位点对比:
# 源端MySQL:
[root@localhost][(none)]> show master status\G
*************************** 1. row ***************************
File: binlog.000001
Position: 32936647
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 431b0ec6-3561-11ea-a9a1-00163e0ec877:1-100115
1 row in set (0.00 sec)
# 目标端CH:
[root@bj79 caihao_test]# more /var/lib/clickhouse/metadata/caihao_test/.metadata
Version: 2
Binlog File: binlog.000001
Executed GTID: 431b0ec6-3561-11ea-a9a1-00163e0ec877:1-100115
Binlog Position: 32936647
Data Version: 1
在下面的地址中,作者简要记录了一下数据同步的过程:
https://github.com/ClickHouse/ClickHouse/issues/4006
通过观察上面同步过程的日志,也可以看到一些中间的操作过程
----------------- MySQL的日志 -----------------
2020-09-15T15:09:56.963739+08:00 1056 Query SELECT @@global.read_only
2020-09-15T15:09:57.503217+08:00 24476 Connect [email protected] on caihao_test using SSL/TLS
2020-09-15T15:09:57.503523+08:00 24476 Query SET NAMES utf8
2020-09-15T15:09:57.503971+08:00 24476 Query SHOW VARIABLES WHERE (Variable_name = 'log_bin' AND upper(Value) = 'ON') OR (Variable_name = 'binlog_format' AND upper(Value) = 'ROW') OR (Variable_name = 'binlog_row_image' AND upper(Value) = 'FULL') OR (Variable_name = 'default_authentication_plugin' AND upper(Value) = 'MYSQL_NATIVE_PASSWORD')
2020-09-15T15:09:57.504854+08:00 24476 Query SELECT version() AS version
2020-09-15T15:09:57.505242+08:00 24476 Query FLUSH TABLES
2020-09-15T15:09:57.520117+08:00 24476 Query FLUSH TABLES WITH READ LOCK
2020-09-15T15:09:57.520408+08:00 24476 Query SHOW MASTER STATUS
2020-09-15T15:09:57.520756+08:00 24476 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-09-15T15:09:57.520985+08:00 24476 Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */
2020-09-15T15:09:57.521274+08:00 24476 Query SELECT TABLE_NAME AS table_name FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'caihao_test'
2020-09-15T15:09:57.521723+08:00 24476 Query SHOW CREATE TABLE caihao_test.t
2020-09-15T15:09:57.522263+08:00 24476 Query UNLOCK TABLES
2020-09-15T15:09:57.531608+08:00 24476 Query SELECT * FROM caihao_test.t
2020-09-15T15:09:57.614572+08:00 24476 Query COMMIT
2020-09-15T15:09:57.615380+08:00 24477 Connect [email protected] on using TCP/IP
2020-09-15T15:09:57.615574+08:00 24477 Query SET @master_binlog_checksum = 'CRC32'
2020-09-15T15:09:57.615742+08:00 24477 Query SET @master_heartbeat_period = 1000000000
2020-09-15T15:09:57.615997+08:00 24477 Binlog Dump GTID Log: '' Pos: 4 GTIDs: '431b0ec6-3561-11ea-a9a1-00163e0ec877:1-100115'
----------------- CH的日志 -----------------
2020.09.15 15:09:57.491745 [ 26310 ] {5b265ba7-183c-4399-85ce-c7f696cf1d9b} executeQuery: (from 127.0.0.1:57674) CREATE DATABASE caihao_test ENGINE = MaterializeMySQL('192.168.0.77:3306', 'caihao_test', 'ch_repl', 'Ch_repl_123');
2020.09.15 15:09:57.491890 [ 26310 ] {5b265ba7-183c-4399-85ce-c7f696cf1d9b} ContextAccess (default): Access granted: CREATE DATABASE ON caihao_test.*
2020.09.15 15:09:57.494874 [ 26310 ] {5b265ba7-183c-4399-85ce-c7f696cf1d9b} DatabaseOrdinary (caihao_test): Total 0 tables and 0 dictionaries.
2020.09.15 15:09:57.494890 [ 26310 ] {5b265ba7-183c-4399-85ce-c7f696cf1d9b} DatabaseOrdinary (caihao_test): Starting up tables.
2020.09.15 15:09:57.494899 [ 26310 ] {5b265ba7-183c-4399-85ce-c7f696cf1d9b} Application: MYSQL: Connecting to [email protected]:3306 as user ch_repl
2020.09.15 15:09:57.504937 [ 26310 ] {5b265ba7-183c-4399-85ce-c7f696cf1d9b} MemoryTracker: Peak memory usage (for query): 0.00 B.
2020.09.15 15:09:57.505069 [ 26310 ] {} TCPHandler: Processed in 0.013530009 sec.
2020.09.15 15:09:57.525443 [ 26340 ] {} MemoryTracker: Peak memory usage (for query): 0.00 B.
2020.09.15 15:09:57.525945 [ 26340 ] {} executeQuery: (internal) /*Materialize MySQL step 1: execute MySQL DDL for dump data*/ EXTERNAL DDL FROM MySQL(caihao_test, caihao_test) CREATE TABLE `t` ( `id` int(11) NOT NULL, `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `a` (`a`), KEY `b` (`b`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
2020.09.15 15:09:57.526682 [ 26340 ] {} executeQuery: (internal) /* Rewritten MySQL DDL Query */ CREATE TABLE caihao_test.t (`id` Int32, `a` Nullable(Int32), `b` Nullable(Int32), `_sign` Int8() MATERIALIZED 1, `_version` UInt64() MATERIALIZED 1) ENGINE = ReplacingMergeTree(_version) PARTITION BY intDiv(id, 4294967) ORDER BY (id, assumeNotNull(a), assumeNotNull(b))
2020.09.15 15:09:57.527544 [ 26340 ] {} caihao_test.t: Loading data parts
2020.09.15 15:09:57.527710 [ 26340 ] {} caihao_test.t: Loaded data parts (0 items)
2020.09.15 15:09:57.530654 [ 26340 ] {} executeQuery: (internal) /*Materialize MySQL step 1: execute dump data*/ INSERT INTO t(id, a, b) VALUES
2020.09.15 15:09:57.610028 [ 26340 ] {} DiskLocal: Reserving 2.19 MiB on disk `default`, having unreserved 127.86 GiB.
2020.09.15 15:09:57.614024 [ 26340 ] {} caihao_test.t: Renaming temporary part tmp_insert_0_1_1_0 to 0_1_1_0.
2020.09.15 15:09:57.614110 [ 26340 ] {} MaterializeMySQLSyncThread(caihao_test): Materialize MySQL step 1: dump t, 0.00 rows, 0.00 B in 0.081625235 sec., 0.00 rows/sec., 0.00 B/sec.
2020.09.15 15:09:57.614171 [ 26340 ] {} MaterializeMySQLSyncThread: MySQL dump database position:
=== Binlog Position ===
Binlog: binlog.000001
Position: 32936647
GTIDSets: 431b0ec6-3561-11ea-a9a1-00163e0ec877:1-100115
2020.09.15 15:09:57.690952 [ 26340 ] {} MaterializeMySQLSyncThread: Skip MySQL event:
=== RotateEvent ===
Timestamp: 0
Event Type: 4
Server ID: 49
Event Size: 44
Log Pos: 0
Flags: 32
Position: 4
Next Binlog: binlog.000001
2020.09.15 15:09:57.690980 [ 26340 ] {} MaterializeMySQLSyncThread: Skip MySQL event:
=== FormatDescriptionEvent ===
Timestamp: 1599787087
Event Type: 15
Server ID: 49
Event Size: 119
Log Pos: 123
Flags: 0
Binlog Version: 4
Server Version:
Create Timestamp: 0
Event Header Len: 19
2020.09.15 15:09:57.691013 [ 26340 ] {} MaterializeMySQLSyncThread: Skip MySQL event:
=== PreviousGTIDsEvent ===
Timestamp: 1599787087
Event Type: 35
Server ID: 49
Event Size: 31
Log Pos: 154
Flags: 128
[DryRun Event]
阅读到此的你,辛苦了
#历史文章摘要
GitHub都在用的高可用工具Orch:
协调器:01基础篇
协调器:02高可用方案VIP篇
协调器:03高可用方案ProxySQL篇
协调器:04高可用方式部署
Percona全力打造的监控平台PMM:
监控利器PMM2.0X GA版本发布!
PMM监控的重新配置
PMM的Ansible部署与重点指标
在PMM中添加Redis和ES