前言:关于canal是什么以及canal能做什么,不在本文讨论之内,请自行查阅相关资料
在两地部署了同一套生产应用,给两地人员使用。现在需要做统一门户管理,涉及到用户相关表同步问题,在此我们使用canal工具来完成两地用户相关表同步。
请确保mysql开始了binlog功能,并且binlog格式是ROW基于行复制
mysql> show variables like 'log_bin';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin | ON |
+---------------+-------+
1 row in set (0.00 sec)
mysql> show variables like 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | MIXED |
+---------------+-------+
1 row in set (0.01 sec)
mysql> select version();
+------------+
| version() |
+------------+
| 5.7.28-log |
+------------+
1 row in set (0.00 sec)
可以看到我这里binlog日志格式是MIXED混合模式,需要修改为ROW
vim /etc/my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html
[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# 代表开启binlog功能,指定binlog日志文件名称,最终生成的binlog文件名称为master-binlog-000001
log_bin=master-binlog
# 复制方式,有三种。
# 1.STATEMENT(基于SQL语句的复制)
# 优点是不需要记录每一条SQL语句以及每一行数据得变化,减少了binlog日志量,节约IO
# 缺点是某些情况下会导致master-slave数据不一致,比如SQL语句中含有last_insert_id(),now()等可变函数时)
# 2.ROW(基于行的复制)
# 日志中需要记录每一条数据的变化,修改前什么样修改后什么样,不会产生last_insert_id(),now()等函数无法正确复制问题
# 缺点是会产生大量的日志,尤其是alter Table时会让日志量暴增
# 3.MIXED(混合模式复制)
# STATEMENT和ROW两种模式混合使用,一般会使用STATEMENT保存binlog日志
# 对于STATEMENT模式无法复制的使用ROW模式保存binlog,MYSQL会根据执行的SQL语句选择写入模式
binlog_format=ROW
# 服务编号,与其它节点不冲突就可以
server-id=1
# 每次执行操作都与磁盘进行同步
sync-binlog=1
# 启动自动清理binlog日志功能,默认值为0表示不开启,设置为1表示超出1天的binlog文件会自动删除掉
expire_logs_days=1
# 省略其它配置....
重启mysql服务
systemctl restart mysqld
mysql> show variables like '%binlog_format%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW |
+---------------+-------+
1 row in set (0.00 sec)
mysql> create user canal identified by 'canal';
Query OK, 0 rows affected (1.02 sec)
mysql> grant select, replication slave, replication client on *.* to 'canal'@'%' identified by 'canal';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.02 sec)
去 官方 或者 百度云(提取码:7nop)下载canal-deployer应用。下载完成后上传到服务器并解压到指定目录
[root@10-9-102-16 local]# tar -xvf canal.deployer-1.1.4.tar.gz -C /opt/canal/canal-deployer
[root@10-9-102-16 canal-deployer]# pwd
/opt/canal/canal-deployer
[root@10-9-102-16 canal-deployer]# ls
bin conf lib logs
[root@10-9-102-16 canal-deployer]# cd conf
[root@10-9-102-16 conf]# ls
canal_local.properties canal.properties example logback.xml metrics spring
[root@10-9-102-16 conf]#
修改canal.properties配置文件
#################################################
######### destinations #############
#################################################
# 指定canal实例名称,多个用逗号,分隔。实例名称就是conf文件夹下文件名
canal.destinations = example
# conf root dir
canal.conf.dir = ../conf
# auto scan instance dir add/remove and start/stop instance
# 开启扫描canal实例以及指定多久扫描一次
canal.auto.scan = true
canal.auto.scan.interval = 5
修改实例配置文件
#################################################
## mysql serverId , v1.0.26+ will autoGen
# canal.instance.mysql.slaveId=0
# enable gtid use true/false
# 是否开启GTID模式
canal.instance.gtidon=false
# position info
# 源MYSQL数据库路径
canal.instance.master.address=127.0.0.1:3306
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=
# rds oss binlog
# 读取阿里云RDS的binlog时需要配置
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=
# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal
#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=
# username/password
# 配置同步Master使用到的数据库用户名及密码
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==
# table regex
# 配置指定哪些表需要被同步,多个正则用,分隔
# 常见例子:
#1. 所有表:.* or .*\\..*
#2. canal schema下所有表: canal\\..*
#3. canal下的以canal打头的表:canal\\.canal.*
#4. canal schema下的一张表:canal.test1
#5. 多个规则组合使用:canal\\..*,mysql.test1,mysql.test2 (逗号分隔)
canal.instance.filter.regex=test\\..*
# table black regex
# 配置指定哪些表不需要被同步,黑名单
canal.instance.filter.black.regex=
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch
# mq config
canal.mq.topic=example
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.partitionsNum=3
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#################################################
cd /opt/canal/canal-deployer/bin
./start.sh
<dependency>
<groupId>org.springframework.bootgroupId>
<artifactId>spring-boot-starter-webartifactId>
dependency>
<dependency>
<groupId>com.alibaba.ottergroupId>
<artifactId>canal.clientartifactId>
<version>1.1.4version>
dependency>
package com.broada.canal.client;
import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.common.utils.AddressUtils;
import com.alibaba.otter.canal.protocol.CanalEntry;
import com.alibaba.otter.canal.protocol.Message;
import com.google.protobuf.InvalidProtocolBufferException;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;
import javax.annotation.PostConstruct;
import java.net.InetSocketAddress;
import java.util.List;
import java.util.concurrent.TimeUnit;
import com.alibaba.otter.canal.protocol.CanalEntry.Column;
import com.alibaba.otter.canal.protocol.CanalEntry.Entry;
import com.alibaba.otter.canal.protocol.CanalEntry.EventType;
import com.alibaba.otter.canal.protocol.CanalEntry.RowChange;
import com.alibaba.otter.canal.protocol.CanalEntry.RowData;
/**
* @description:
* @author: hui.cheng
* @create: 2021-02-22 16:03
**/
@Component
@Slf4j
public class CanalClient {
@PostConstruct
public void start() throws Exception {
startCanalTask();
}
private void startCanalTask() throws Exception {
int batchSize = 1000;
int emptyCount = 0;
CanalConnector connector = null;
try {
connector = CanalConnectors.newSingleConnector(new InetSocketAddress(AddressUtils.getHostIp(), 11111), "example", "", "");
connector.connect();
connector.subscribe("test\\..*,lagou\\..*");
connector.rollback();
int totalEmptyCount = 120;
while (emptyCount < totalEmptyCount) {
Message message = connector.getWithoutAck(batchSize); // 获取指定数量的数据
long batchId = message.getId();
int size = message.getEntries().size();
if (batchId == -1 || size == 0) {
emptyCount++;
log.info("=====没有检测到数据库变更=====");
TimeUnit.MILLISECONDS.sleep(2000);
} else {
emptyCount = 0;
printEntry(message.getEntries());
}
connector.ack(batchId); // 提交确认
// connector.rollback(batchId); // 处理失败, 回滚数据
}
} catch (Exception e) {
log.error(e.getMessage(), e);
} finally {
if (connector != null) {
connector.disconnect();
}
}
}
private void printEntry(List<Entry> entries) throws InvalidProtocolBufferException {
for (Entry entry : entries) {
if (entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONBEGIN || entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONEND) {
continue;
}
// RowChange对象包含了一行数据变化的所有特征
RowChange rowChange = null;
try {
rowChange = RowChange.parseFrom(entry.getStoreValue());
} catch (Exception e) {
throw new RuntimeException("ERROR ## parser of eromanga-event has an error , data:" + entry.toString(), e);
}
// 获取操作类型 insert/update/delete
EventType eventType = rowChange.getEventType();
log.info((String.format("================ binlog[%s:%s] , name[%s,%s] , eventType : %s",
entry.getHeader().getLogfileName(),
entry.getHeader().getLogfileOffset(),
entry.getHeader().getSchemaName(),
entry.getHeader().getTableName(),
eventType)));
if (rowChange.getIsDdl()) {
log.info("DDL语句。{}", rowChange.getSql());
}
// 获取rowchange对象里的每一行数据
List<RowData> rowDatasList = rowChange.getRowDatasList();
for (RowData rowData : rowDatasList) {
switch (eventType) {
case INSERT:
log.info("新增语句。变更后内容:");
printColumn(rowData.getAfterColumnsList());
break;
case DELETE:
log.info("删除语句。变更前内容:");
printColumn(rowData.getBeforeColumnsList());
break;
case UPDATE:
log.info("更新语句。变更前内容:");
printColumn(rowData.getBeforeColumnsList());
log.info("更新语句。变更前后内容:");
printColumn(rowData.getAfterColumnsList());
default:
break;
}
}
}
}
private void printColumn(List<Column> columns) {
for (Column column : columns) {
log.info("字段名:{},字段值:{},更新后值:{}", column.getName(), column.getValue(), column.getUpdated());
}
}
}
注意:当使用Java客户端连接canal时,如果在创建CanalConnector连接后指定了订阅表的正则表达式connector.subscribe("test\\..*,lagou\\..*");
,那么在instance.properties中配置的canal.instance.filter.regex就会不生效。
可以看到我们在数据库变更操作数据之后,canalClient可以监听到数据变更状态,拿到这个状态后可以往指定数据源同步数据了,比如mysql。不过canal官方已经提供了canal客户端适配器canal-adapter来帮助我们对接上游消息,包括kafka、rocketmq、canal-server,下游写入支持mysql、es、hbas,使用canal-adapter可以轻松完成数据在多个数据源间转移同步工作。
接下来继续学习使用canal-adapter完成数据同步
首先去 官方 或者 百度云(提取码:po5u)下载canal.adapter-1.1.4应用。下载完成后上传到服务器并解压到指定目录
tar -xvf canal.adapter-1.1.4.tar.gz -C /opt/canal/canal-adapter
cd /opt/canal/canal-adapter/conf
vim application.yml
修改项目配置文件application.yml
server:
port: 8081
spring:
jackson:
date-format: yyyy-MM-dd HH:mm:ss
time-zone: GMT+8
default-property-inclusion: non_null
canal.conf:
# 因为是要同步数据到MYSQL所以这里选择TCP
mode: tcp # kafka rocketMQ
# 指定canal-server服务端地址
canalServerHost: 127.0.0.1:11111
# zookeeperHosts: slave1:2181
# mqServers: 127.0.0.1:9092 #or rocketmq
# flatMessage: true
batchSize: 500
syncBatchSize: 1000
retries: 0
timeout:
accessKey:
secretKey:
# 指定源MYSQL服务器地址信息
srcDataSources:
defaultDS:
url: jdbc:mysql://127.0.0.1:3306/test?useUnicode=true
username: root
password: root
canalAdapters:
- instance: example # canal instance Name or mq topic name
groups:
- groupId: g1
outerAdapters:
- name: logger
- name: rdb
# 指定目标MYSQL服务器地址信息
key: mysql1
properties:
jdbc.driverClassName: com.mysql.jdbc.Driver
jdbc.url: jdbc:mysql://127.0.0.1:3306/test2?useUnicode=true
jdbc.username: root
jdbc.password: root
# - name: rdb
# key: oracle1
# properties:
# jdbc.driverClassName: oracle.jdbc.OracleDriver
# jdbc.url: jdbc:oracle:thin:@localhost:49161:XE
# jdbc.username: mytest
# jdbc.password: m121212
# - name: rdb
# key: postgres1
# properties:
# jdbc.driverClassName: org.postgresql.Driver
# jdbc.url: jdbc:postgresql://localhost:5432/postgres
# jdbc.username: postgres
# jdbc.password: 121212
# threads: 1
# commitSize: 3000
# - name: hbase
# properties:
# hbase.zookeeper.quorum: 127.0.0.1
# hbase.zookeeper.property.clientPort: 2181
# zookeeper.znode.parent: /hbase
# - name: es
# hosts: 127.0.0.1:9300 # 127.0.0.1:9200 for rest mode
# properties:
# mode: transport # or rest
# # security.auth: test:123456 # only used for rest mode
# cluster.name: elasticsearch
接下来需要修改两个Mysql服务表映射配置文件,以user表为例测试
一张表映射关系对应一个yml文件,yml文件存放在conf/rdb目录下
cd /opt/canal/canal-adapter/conf/rdb
vim test_user.yml
dataSourceKey: defaultDS
destination: example
groupId: g1
outerAdapterKey: mysql1
concurrent: true
dbMapping:
database: test
table: user
targetTable: test2.user
targetPk:
id: id
mapAll: true
targetColumns:
id:
name:
role_id:
c_time:
test1:
etlCondition: "where c_time>={}"
commitBatch: 3000 # 批量提交的大小
## Mirror schema synchronize config
#dataSourceKey: defaultDS
#destination: example
#groupId: g1
#outerAdapterKey: mysql1
#concurrent: true
#dbMapping:
# mirrorDb: true
# database: mytest
cd /opt/canal/canal-adapter/bin
./startup.sh
数据变更前状态
修改test.user表内容 insert、delete、update后可以看到test2.user表记录同步被更新。观察canal-adapter后台日志
2021-02-23 14:11:34.553 [main] INFO c.a.otter.canal.adapter.launcher.CanalAdapterApplication - Started CanalAdapterApplication in 5.216 seconds (JVM running for 5.961)
2021-02-23 14:11:35.444 [Thread-4] INFO c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Start to connect destination: example <=============
2021-02-23 14:11:35.579 [Thread-4] INFO c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Start to subscribe destination: example <=============
2021-02-23 14:11:35.586 [Thread-4] INFO c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Subscribe destination: example succeed <=============
2021-02-23 14:11:35.732 [pool-8-thread-1] INFO c.a.o.canal.client.adapter.logger.LoggerAdapterExample - DML: {"data":[{"id":2,"name":"scott1"}],"database":"test","destination":"example","es":1614060454000,"groupId":null,"isDdl":false,"old":[{"name":"scott"}],"pkNames":["id"],"sql":"","table":"user","ts":1614060695624,"type":"UPDATE"}
2021-02-23 14:11:35.767 [pool-4-thread-1] DEBUG c.a.o.canal.client.adapter.rdb.service.RdbSyncService - DML: {"data":{"id":2,"name":"scott1"},"database":"test","destination":"example","old":{"name":"scott"},"table":"user","type":"UPDATE"}
2021-02-23 14:12:16.848 [pool-8-thread-1] INFO c.a.o.canal.client.adapter.logger.LoggerAdapterExample - DML: {"data":[{"id":3,"name":"canal"}],"database":"test","destination":"example","es":1614060736000,"groupId":null,"isDdl":false,"old":null,"pkNames":["id"],"sql":"","table":"user","ts":1614060736848,"type":"DELETE"}
2021-02-23 14:12:16.853 [pool-2-thread-1] DEBUG c.a.o.canal.client.adapter.rdb.service.RdbSyncService - DML: {"data":{"id":3,"name":"canal"},"database":"test","destination":"example","old":null,"table":"user","type":"DELETE"}
2021-02-23 14:12:38.066 [pool-8-thread-1] INFO c.a.o.canal.client.adapter.logger.LoggerAdapterExample - DML: {"data":[{"id":2,"name":"scott"}],"database":"test","destination":"example","es":1614060757000,"groupId":null,"isDdl":false,"old":[{"name":"scott1"}],"pkNames":["id"],"sql":"","table":"user","ts":1614060758066,"type":"UPDATE"}
2021-02-23 14:12:38.072 [pool-4-thread-1] DEBUG c.a.o.canal.client.adapter.rdb.service.RdbSyncService - DML: {"data":{"id":2,"name":"scott"},"database":"test","destination":"example","old":{"name":"scott1"},"table":"user","type":"UPDATE"}
2021-02-23 14:12:51.713 [pool-8-thread-1] INFO c.a.o.canal.client.adapter.logger.LoggerAdapterExample - DML: {"data":[{"id":3,"name":"canal"}],"database":"test","destination":"example","es":1614060771000,"groupId":null,"isDdl":false,"old":null,"pkNames":["id"],"sql":"","table":"user","ts":1614060771713,"type":"INSERT"}
2021-02-23 14:12:51.717 [pool-2-thread-1] DEBUG c.a.o.canal.client.adapter.rdb.service.RdbSyncService - DML: {"data":{"id":3,"name":"canal"},"database":"test","destination":"example","old":null,"table":"user","type":"INSERT"}
到此完成了使用阿里巴巴canal产品完成从A数据库同步到B数据库功能。