基于阿里巴巴canal产品完成Mysql主从数据同步

文章目录

    • 1.需求背景
    • 2.部署canal服务端
      • 2.1 检查mysql配置
      • 2.2 创建canal mysql用户并授权访问binlog
      • 2.3 安装canal-deployer
      • 2.4 启动canal服务
    • 3 使用Java程序连接canal服务
      • 3.1 观察测试情况
    • 4 使用canal-adapter完成数据同步
      • 4.1 下载安装
      • 4.2 关键配置文件修改
      • 4.3 启动canal-adapter服务
      • 4.4 测试效果

前言:关于canal是什么以及canal能做什么,不在本文讨论之内,请自行查阅相关资料


1.需求背景

在两地部署了同一套生产应用,给两地人员使用。现在需要做统一门户管理,涉及到用户相关表同步问题,在此我们使用canal工具来完成两地用户相关表同步。

2.部署canal服务端

2.1 检查mysql配置

请确保mysql开始了binlog功能,并且binlog格式是ROW基于行复制

mysql> show variables like 'log_bin';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin       | ON    |
+---------------+-------+
1 row in set (0.00 sec)

mysql> show variables like 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | MIXED |
+---------------+-------+
1 row in set (0.01 sec)

mysql> select version();
+------------+
| version()  |
+------------+
| 5.7.28-log |
+------------+
1 row in set (0.00 sec)

可以看到我这里binlog日志格式是MIXED混合模式,需要修改为ROW

 vim  /etc/my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html

[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# 代表开启binlog功能,指定binlog日志文件名称,最终生成的binlog文件名称为master-binlog-000001
log_bin=master-binlog

# 复制方式,有三种。
# 1.STATEMENT(基于SQL语句的复制)
#     优点是不需要记录每一条SQL语句以及每一行数据得变化,减少了binlog日志量,节约IO
#     缺点是某些情况下会导致master-slave数据不一致,比如SQL语句中含有last_insert_id(),now()等可变函数时)
# 2.ROW(基于行的复制)
#     日志中需要记录每一条数据的变化,修改前什么样修改后什么样,不会产生last_insert_id(),now()等函数无法正确复制问题
#     缺点是会产生大量的日志,尤其是alter Table时会让日志量暴增
# 3.MIXED(混合模式复制)
#     STATEMENT和ROW两种模式混合使用,一般会使用STATEMENT保存binlog日志
#     对于STATEMENT模式无法复制的使用ROW模式保存binlog,MYSQL会根据执行的SQL语句选择写入模式
binlog_format=ROW

# 服务编号,与其它节点不冲突就可以
server-id=1

# 每次执行操作都与磁盘进行同步
sync-binlog=1

# 启动自动清理binlog日志功能,默认值为0表示不开启,设置为1表示超出1天的binlog文件会自动删除掉
expire_logs_days=1

# 省略其它配置....

重启mysql服务

systemctl restart mysqld
mysql> show variables like '%binlog_format%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW   |
+---------------+-------+
1 row in set (0.00 sec)

2.2 创建canal mysql用户并授权访问binlog

mysql> create user canal identified by 'canal';
Query OK, 0 rows affected (1.02 sec)

mysql> grant select, replication slave, replication client on *.* to 'canal'@'%' identified by 'canal';
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.02 sec)

2.3 安装canal-deployer

去 官方 或者 百度云(提取码:7nop)下载canal-deployer应用。下载完成后上传到服务器并解压到指定目录

[root@10-9-102-16 local]# tar -xvf canal.deployer-1.1.4.tar.gz -C /opt/canal/canal-deployer
[root@10-9-102-16 canal-deployer]# pwd
/opt/canal/canal-deployer
[root@10-9-102-16 canal-deployer]# ls
bin  conf  lib  logs
[root@10-9-102-16 canal-deployer]# cd conf
[root@10-9-102-16 conf]# ls
canal_local.properties  canal.properties  example  logback.xml  metrics  spring
[root@10-9-102-16 conf]# 

修改canal.properties配置文件

#################################################
######### 		destinations		#############
#################################################
# 指定canal实例名称,多个用逗号,分隔。实例名称就是conf文件夹下文件名
canal.destinations = example
# conf root dir
canal.conf.dir = ../conf
# auto scan instance dir add/remove and start/stop instance
# 开启扫描canal实例以及指定多久扫描一次
canal.auto.scan = true
canal.auto.scan.interval = 5

修改实例配置文件

#################################################
## mysql serverId , v1.0.26+ will autoGen
# canal.instance.mysql.slaveId=0

# enable gtid use true/false
# 是否开启GTID模式
canal.instance.gtidon=false

# position info
# 源MYSQL数据库路径
canal.instance.master.address=127.0.0.1:3306
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=

# rds oss binlog
# 读取阿里云RDS的binlog时需要配置
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=

# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal

#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=

# username/password
# 配置同步Master使用到的数据库用户名及密码
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==

# table regex
# 配置指定哪些表需要被同步,多个正则用,分隔
# 常见例子:
#1. 所有表:.* or .*\\..*
#2. canal schema下所有表: canal\\..*
#3. canal下的以canal打头的表:canal\\.canal.*
#4. canal schema下的一张表:canal.test1
#5. 多个规则组合使用:canal\\..*,mysql.test1,mysql.test2 (逗号分隔)
canal.instance.filter.regex=test\\..*
# table black regex

# 配置指定哪些表不需要被同步,黑名单
canal.instance.filter.black.regex=
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch

# mq config
canal.mq.topic=example
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.partitionsNum=3
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#################################################

2.4 启动canal服务

cd /opt/canal/canal-deployer/bin
./start.sh

3 使用Java程序连接canal服务

<dependency>
    <groupId>org.springframework.bootgroupId>
    <artifactId>spring-boot-starter-webartifactId>
dependency>

<dependency>
    <groupId>com.alibaba.ottergroupId>
    <artifactId>canal.clientartifactId>
    <version>1.1.4version>
dependency>

package com.broada.canal.client;

import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.common.utils.AddressUtils;
import com.alibaba.otter.canal.protocol.CanalEntry;
import com.alibaba.otter.canal.protocol.Message;
import com.google.protobuf.InvalidProtocolBufferException;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Component;

import javax.annotation.PostConstruct;
import java.net.InetSocketAddress;
import java.util.List;
import java.util.concurrent.TimeUnit;

import com.alibaba.otter.canal.protocol.CanalEntry.Column;
import com.alibaba.otter.canal.protocol.CanalEntry.Entry;
import com.alibaba.otter.canal.protocol.CanalEntry.EventType;
import com.alibaba.otter.canal.protocol.CanalEntry.RowChange;
import com.alibaba.otter.canal.protocol.CanalEntry.RowData;

/**
 * @description:
 * @author: hui.cheng
 * @create: 2021-02-22 16:03
 **/
@Component
@Slf4j
public class CanalClient {

    @PostConstruct
    public void start() throws Exception {
        startCanalTask();
    }

    private void startCanalTask() throws Exception {
        int batchSize = 1000;
        int emptyCount = 0;

        CanalConnector connector = null;
        try {
            connector = CanalConnectors.newSingleConnector(new InetSocketAddress(AddressUtils.getHostIp(), 11111), "example", "", "");
            connector.connect();
            connector.subscribe("test\\..*,lagou\\..*");
            connector.rollback();

            int totalEmptyCount = 120;
            while (emptyCount < totalEmptyCount) {

                Message message = connector.getWithoutAck(batchSize);   // 获取指定数量的数据
                long batchId = message.getId();

                int size = message.getEntries().size();
                if (batchId == -1 || size == 0) {
                    emptyCount++;
                    log.info("=====没有检测到数据库变更=====");
                    TimeUnit.MILLISECONDS.sleep(2000);

                } else {
                    emptyCount = 0;
                    printEntry(message.getEntries());
                }

                connector.ack(batchId); // 提交确认
                // connector.rollback(batchId); // 处理失败, 回滚数据
            }

        } catch (Exception e) {
            log.error(e.getMessage(), e);

        } finally {

            if (connector != null) {
                connector.disconnect();
            }
        }


    }

    private void printEntry(List<Entry> entries) throws InvalidProtocolBufferException {

        for (Entry entry : entries) {
            if (entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONBEGIN || entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONEND) {
                continue;
            }

            // RowChange对象包含了一行数据变化的所有特征
            RowChange rowChange = null;

            try {
                rowChange = RowChange.parseFrom(entry.getStoreValue());
            } catch (Exception e) {
                throw new RuntimeException("ERROR ## parser of eromanga-event has an error , data:" + entry.toString(), e);
            }

            // 获取操作类型 insert/update/delete
            EventType eventType = rowChange.getEventType();
            log.info((String.format("================ binlog[%s:%s] , name[%s,%s] , eventType : %s",
                    entry.getHeader().getLogfileName(),
                    entry.getHeader().getLogfileOffset(),
                    entry.getHeader().getSchemaName(),
                    entry.getHeader().getTableName(),
                    eventType)));

            if (rowChange.getIsDdl()) {
                log.info("DDL语句。{}", rowChange.getSql());
            }

            // 获取rowchange对象里的每一行数据
            List<RowData> rowDatasList = rowChange.getRowDatasList();
            for (RowData rowData : rowDatasList) {

                switch (eventType) {
                    case INSERT:
                        log.info("新增语句。变更后内容:");
                        printColumn(rowData.getAfterColumnsList());
                        break;
                    case DELETE:
                        log.info("删除语句。变更前内容:");
                        printColumn(rowData.getBeforeColumnsList());
                        break;
                    case UPDATE:
                        log.info("更新语句。变更前内容:");
                        printColumn(rowData.getBeforeColumnsList());
                        log.info("更新语句。变更前后内容:");
                        printColumn(rowData.getAfterColumnsList());
                     default:
                        break;
                }
            }
        }
    }

    private void printColumn(List<Column> columns) {
        for (Column column : columns) {
            log.info("字段名:{},字段值:{},更新后值:{}", column.getName(), column.getValue(), column.getUpdated());
        }
    }
}

3.1 观察测试情况


注意:当使用Java客户端连接canal时,如果在创建CanalConnector连接后指定了订阅表的正则表达式connector.subscribe("test\\..*,lagou\\..*");,那么在instance.properties中配置的canal.instance.filter.regex就会不生效。

可以看到我们在数据库变更操作数据之后,canalClient可以监听到数据变更状态,拿到这个状态后可以往指定数据源同步数据了,比如mysql。不过canal官方已经提供了canal客户端适配器canal-adapter来帮助我们对接上游消息,包括kafka、rocketmq、canal-server,下游写入支持mysql、es、hbas,使用canal-adapter可以轻松完成数据在多个数据源间转移同步工作。

接下来继续学习使用canal-adapter完成数据同步


4 使用canal-adapter完成数据同步

4.1 下载安装

首先去 官方 或者 百度云(提取码:po5u)下载canal.adapter-1.1.4应用。下载完成后上传到服务器并解压到指定目录

tar -xvf canal.adapter-1.1.4.tar.gz -C /opt/canal/canal-adapter
cd /opt/canal/canal-adapter/conf
vim application.yml

4.2 关键配置文件修改

修改项目配置文件application.yml

server:
  port: 8081
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8
    default-property-inclusion: non_null

canal.conf:
#  因为是要同步数据到MYSQL所以这里选择TCP
  mode: tcp # kafka rocketMQ
#  指定canal-server服务端地址
  canalServerHost: 127.0.0.1:11111
#  zookeeperHosts: slave1:2181
#  mqServers: 127.0.0.1:9092 #or rocketmq
#  flatMessage: true
  batchSize: 500
  syncBatchSize: 1000
  retries: 0
  timeout:
  accessKey:
  secretKey:
#  指定源MYSQL服务器地址信息
  srcDataSources:
    defaultDS:
      url: jdbc:mysql://127.0.0.1:3306/test?useUnicode=true
      username: root
      password: root
  canalAdapters:
  - instance: example # canal instance Name or mq topic name
    groups:
    - groupId: g1
      outerAdapters:
      - name: logger
      - name: rdb
#      指定目标MYSQL服务器地址信息
        key: mysql1
        properties:
          jdbc.driverClassName: com.mysql.jdbc.Driver
          jdbc.url: jdbc:mysql://127.0.0.1:3306/test2?useUnicode=true
          jdbc.username: root
          jdbc.password: root
#      - name: rdb
#        key: oracle1
#        properties:
#          jdbc.driverClassName: oracle.jdbc.OracleDriver
#          jdbc.url: jdbc:oracle:thin:@localhost:49161:XE
#          jdbc.username: mytest
#          jdbc.password: m121212
#      - name: rdb
#        key: postgres1
#        properties:
#          jdbc.driverClassName: org.postgresql.Driver
#          jdbc.url: jdbc:postgresql://localhost:5432/postgres
#          jdbc.username: postgres
#          jdbc.password: 121212
#          threads: 1
#          commitSize: 3000
#      - name: hbase
#        properties:
#          hbase.zookeeper.quorum: 127.0.0.1
#          hbase.zookeeper.property.clientPort: 2181
#          zookeeper.znode.parent: /hbase
#      - name: es
#        hosts: 127.0.0.1:9300 # 127.0.0.1:9200 for rest mode
#        properties:
#          mode: transport # or rest
#          # security.auth: test:123456 #  only used for rest mode
#          cluster.name: elasticsearch

接下来需要修改两个Mysql服务表映射配置文件,以user表为例测试

一张表映射关系对应一个yml文件,yml文件存放在conf/rdb目录下

cd /opt/canal/canal-adapter/conf/rdb
vim test_user.yml
dataSourceKey: defaultDS
destination: example
groupId: g1
outerAdapterKey: mysql1
concurrent: true
dbMapping:
  database: test
  table: user
  targetTable: test2.user
  targetPk:
    id: id
  mapAll: true
  targetColumns:
    id:
    name:
    role_id:
    c_time:
    test1:
  etlCondition: "where c_time>={}"
  commitBatch: 3000 # 批量提交的大小


## Mirror schema synchronize config
#dataSourceKey: defaultDS
#destination: example
#groupId: g1
#outerAdapterKey: mysql1
#concurrent: true
#dbMapping:
#  mirrorDb: true
#  database: mytest

4.3 启动canal-adapter服务

cd /opt/canal/canal-adapter/bin
./startup.sh

4.4 测试效果

数据变更前状态

基于阿里巴巴canal产品完成Mysql主从数据同步_第1张图片
修改test.user表内容 insert、delete、update后可以看到test2.user表记录同步被更新。观察canal-adapter后台日志

2021-02-23 14:11:34.553 [main] INFO  c.a.otter.canal.adapter.launcher.CanalAdapterApplication - Started CanalAdapterApplication in 5.216 seconds (JVM running for 5.961)
2021-02-23 14:11:35.444 [Thread-4] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Start to connect destination: example <=============
2021-02-23 14:11:35.579 [Thread-4] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Start to subscribe destination: example <=============
2021-02-23 14:11:35.586 [Thread-4] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterWorker - =============> Subscribe destination: example succeed <=============
2021-02-23 14:11:35.732 [pool-8-thread-1] INFO  c.a.o.canal.client.adapter.logger.LoggerAdapterExample - DML: {"data":[{"id":2,"name":"scott1"}],"database":"test","destination":"example","es":1614060454000,"groupId":null,"isDdl":false,"old":[{"name":"scott"}],"pkNames":["id"],"sql":"","table":"user","ts":1614060695624,"type":"UPDATE"}
2021-02-23 14:11:35.767 [pool-4-thread-1] DEBUG c.a.o.canal.client.adapter.rdb.service.RdbSyncService - DML: {"data":{"id":2,"name":"scott1"},"database":"test","destination":"example","old":{"name":"scott"},"table":"user","type":"UPDATE"}
2021-02-23 14:12:16.848 [pool-8-thread-1] INFO  c.a.o.canal.client.adapter.logger.LoggerAdapterExample - DML: {"data":[{"id":3,"name":"canal"}],"database":"test","destination":"example","es":1614060736000,"groupId":null,"isDdl":false,"old":null,"pkNames":["id"],"sql":"","table":"user","ts":1614060736848,"type":"DELETE"}
2021-02-23 14:12:16.853 [pool-2-thread-1] DEBUG c.a.o.canal.client.adapter.rdb.service.RdbSyncService - DML: {"data":{"id":3,"name":"canal"},"database":"test","destination":"example","old":null,"table":"user","type":"DELETE"}
2021-02-23 14:12:38.066 [pool-8-thread-1] INFO  c.a.o.canal.client.adapter.logger.LoggerAdapterExample - DML: {"data":[{"id":2,"name":"scott"}],"database":"test","destination":"example","es":1614060757000,"groupId":null,"isDdl":false,"old":[{"name":"scott1"}],"pkNames":["id"],"sql":"","table":"user","ts":1614060758066,"type":"UPDATE"}
2021-02-23 14:12:38.072 [pool-4-thread-1] DEBUG c.a.o.canal.client.adapter.rdb.service.RdbSyncService - DML: {"data":{"id":2,"name":"scott"},"database":"test","destination":"example","old":{"name":"scott1"},"table":"user","type":"UPDATE"}
2021-02-23 14:12:51.713 [pool-8-thread-1] INFO  c.a.o.canal.client.adapter.logger.LoggerAdapterExample - DML: {"data":[{"id":3,"name":"canal"}],"database":"test","destination":"example","es":1614060771000,"groupId":null,"isDdl":false,"old":null,"pkNames":["id"],"sql":"","table":"user","ts":1614060771713,"type":"INSERT"}
2021-02-23 14:12:51.717 [pool-2-thread-1] DEBUG c.a.o.canal.client.adapter.rdb.service.RdbSyncService - DML: {"data":{"id":3,"name":"canal"},"database":"test","destination":"example","old":null,"table":"user","type":"INSERT"}

到此完成了使用阿里巴巴canal产品完成从A数据库同步到B数据库功能。


你可能感兴趣的:(学习笔记)