使用阿里开源的canal作为数据同步工具。
方案架构图
总的来说有两种方案
方案 | 实现 | 优点 | 缺点 |
---|---|---|---|
方案1 | 使用canal+mq | 1.借助mq(rocketmq或kafka),可以接入多语言的客户端 2.可以借助mq实现一些失败重新消费的逻辑 | 1.依赖mq中间件 2.mq消费binlog的顺序性没有canal client那么灵活,具体见 mq顺序性问题 |
方案2 | 使用canal+canal client | 1.无需引入其他中间件 2.可以通过canal的ack机制实现失败重新消费 | 1.client端是通过pull模式拉取的,实时性可能 没mq那么高 2.代码量相比mq可能会多点 |
本文把两种方式都实现下。如果公司有统一的平台接入binlog的话,canal+mq应该是比较好的解耦的方式。
准备
- 准备工作需要开启binlog写入功能,配置binlog-format为row模式,在my.conf文件中配置
log-bin=mysql-bin # 开启 binlog mysql8默认开
binlog-format=ROW # 选择 ROW 模式 mysql8好像默认是row
server_id=1 # 配置 MySQL replaction 需要定义,不要和 canal 的 slaveId 重复
- 查看binlog是否开启
show binary logs;
- 授权 canal 链接 MySQL 账号具有作为 MySQL slave 的权限, 如果已有账户可直接 grant
注意自己的密码强度 如果设置的强度不够会报错
mysql_native_password这个是mysql8.0可能会报caching_sha2_password Auth failed
CREATE USER canal IDENTIFIED WITH mysql_native_password BY 'canal1234';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;
启动canal
- 下载canal
canal release页面 https://github.com/alibaba/canal/releases - 进入canal.deployer文件夹
- 修改配置文件conf/example/instance.properties
## mysql serverId 不要和mysql slave重复
canal.instance.mysql.slaveId = 1234
#position info,需要改成自己的数据库信息
canal.instance.master.address = 127.0.0.1:3306
canal.instance.master.journal.name =
canal.instance.master.position =
canal.instance.master.timestamp =
#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#username/password,需要改成自己的数据库信息
canal.instance.dbUsername = canal
canal.instance.dbPassword = canal1234
canal.instance.defaultDatabaseName =
canal.instance.connectionCharset = UTF-8
#table regex
canal.instance.filter.regex = .\*\\\\..\*
1.slaveId不要和上面my.conf配置的server_id重复(因为canal的实现原理是当成mysql的slave来实现的)
主要配置项是address,dbUsername,dbPassword
2.canal.instance.connectionCharset 代表数据库的编码方式对应到 java 中的编码类型,比如 UTF-8,GBK , ISO-8859-1
3.如果系统是1个 cpu,需要将 canal.instance.parser.parallel 设置为 false
- 启动
sh bin/startup.sh
如果启动过程报了Could not find first log file name in binary log index file 需要rm -rf conf/example/meta.dat
可以看到启动使用的配置文件是canal.properties,指定了destination是example
- 查看instance日志
tail -100f logs/example/example.log
- 查看server日志
tail -100f logs/canal/canal.log
- 关闭
sh bin/stop.sh
使用canal client
官网示例的client demo https://github.com/alibaba/canal/wiki/ClientExample
pom依赖
4.0.0
com.alibaba.otter
canal.sample
1.0-SNAPSHOT
jar
canal.sample
http://maven.apache.org
UTF-8
junit
junit
4.12
test
com.alibaba.otter
canal.client
1.1.0
CanalClientMysql2Redis
package com.alibaba.otter.simple;
import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.common.utils.AddressUtils;
import com.alibaba.otter.canal.common.utils.NamedThreadFactory;
import com.alibaba.otter.canal.protocol.CanalEntry;
import com.alibaba.otter.canal.protocol.Message;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;
import java.net.InetSocketAddress;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
/**
* 从canal拉取数据同步到redis
*
* @author tangzihao
* @date 2021/4/26 9:08 上午
*/
public class CanalClientMysql2Redis {
private volatile boolean running = true;
private final ExecutorService executor = new ThreadPoolExecutor(Runtime.getRuntime().availableProcessors(), Runtime.getRuntime().availableProcessors() * 2,
60, TimeUnit.SECONDS, new ArrayBlockingQueue<>(500), new NamedThreadFactory("canal-redis-thread"), new ThreadPoolExecutor.CallerRunsPolicy());
private final Jedis jedis;
public CanalClientMysql2Redis() {
JedisPool jedisPool;
JedisPoolConfig config = new JedisPoolConfig();
//最大空闲连接数,需自行评估,不超过Redis实例的最大连接数
config.setMaxIdle(200);
//最大连接数,需自行评估,不超过Redis实例的最大连接数
config.setMaxTotal(300);
config.setTestOnBorrow(false);
config.setTestOnReturn(false);
String host = "127.0.0.1";
jedisPool = new JedisPool(config, host, 6379, 3000);
jedis = jedisPool.getResource();
if (!"PONG".equals(jedis.ping())) {
throw new RuntimeException("redis 连接失败");
}
}
public void clientStart() {
CanalConnector connector = CanalConnectors.newSingleConnector(new InetSocketAddress(AddressUtils.getHostIp(), 11111), "example", "", "");
int batchSize = 1000;
try {
connector.connect();
connector.subscribe(".*\\..*");
connector.rollback();
while (running) {
final Message message = connector.getWithoutAck(batchSize);
long batchId = message.getId();
int size = message.getEntries().size();
if (batchId == -1 || size == 0) {
try {
Thread.sleep(1000L);
} catch (Exception e) {
e.printStackTrace();
}
} else {
//这边采用了异步写redis的方式 异步写的话相比同步写效率会提高
//缺点就是发生异常不能利用connector.rollback 重试
executor.submit(() -> handleBinlog(message.getEntries()));
}
connector.ack(batchId);
}
} finally {
connector.disconnect();
}
}
private void handleBinlog(List entries) {
for (CanalEntry.Entry entry : entries) {
if (entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONBEGIN || entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONEND) {
continue;
}
CanalEntry.RowChange rowChange;
try {
rowChange = CanalEntry.RowChange.parseFrom(entry.getStoreValue());
} catch (Exception e) {
throw new RuntimeException("ERROR ## parser of eromanga-event has an error , data:" + entry,
e);
}
CanalEntry.EventType eventType = rowChange.getEventType();
//通过header获取binlog名称位点;schema名称和table名称
System.out.printf("================%s binlog[%s:%s] , name[%s,%s] , eventType : %s%n================",
Thread.currentThread().getName(),
entry.getHeader().getLogfileName(), entry.getHeader().getLogfileOffset(),
entry.getHeader().getSchemaName(), entry.getHeader().getTableName(),
eventType);
for (CanalEntry.RowData rowData : rowChange.getRowDatasList()) {
if (eventType == CanalEntry.EventType.DELETE) {
Map columnValueMap = getColumnValue(rowData.getBeforeColumnsList());
String id = columnValueMap.get("id");
jedis.del(id);
} else if (eventType == CanalEntry.EventType.INSERT) {
Map columnValueMap = getColumnValue(rowData.getAfterColumnsList());
String id = columnValueMap.get("id");
columnValueMap.remove("id");
jedis.hmset(id, columnValueMap);
} else {
//更新的情况只取id和更新的column 避免没必要的全量更新
Map columnValueMap = getColumnValueOnlyUpdate(rowData.getAfterColumnsList());
String id = columnValueMap.get("id");
columnValueMap.remove("id");
jedis.hmset(id, columnValueMap);
}
}
}
}
private Map getColumnValue(List columns) {
Map paramMap = new HashMap<>(columns.size());
for (CanalEntry.Column column : columns) {
paramMap.put(column.getName(), column.getValue());
}
return paramMap;
}
private Map getColumnValueOnlyUpdate(List columns) {
Map paramMap = new HashMap<>(columns.size());
for (CanalEntry.Column column : columns) {
if ("id".equals(column.getName()) || column.getUpdated()) {
paramMap.put(column.getName(), column.getValue());
}
}
return paramMap;
}
public void stopCanalClient() {
this.running = false;
}
public static void main(String[] args) {
CanalClientMysql2Redis canalClientMysql2Redis = new CanalClientMysql2Redis();
canalClientMysql2Redis.clientStart();
}
}
使用rocketmq写入redis
修改配置文件
- 修改instance.properties
## mysql serverId
canal.instance.mysql.slaveId = 1234
#position info,需要改成自己的数据库信息
canal.instance.master.address = 127.0.0.1:3306
canal.instance.master.journal.name =
canal.instance.master.position =
canal.instance.master.timestamp =
#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#username/password,需要改成自己的数据库信息
canal.instance.dbUsername = canal
canal.instance.dbPassword = canal1234
canal.instance.defaultDatabaseName =
canal.instance.connectionCharset = UTF-8
#table regex
canal.instance.filter.regex = .\*\\\\..\*
#canal动态topic
canal.mq.dynamicTopic=.*\\..*
canal.mq.partition=0
- 修改canal.properties
canal.serverMode = RocketMQ
canal.mq.servers = 127.0.0.1:9876
canal.mq.retries = 0
canal.mq.batchSize = 16384
canal.mq.maxRequestSize = 1048576
canal.mq.lingerMs = 100
canal.mq.bufferMemory = 33554432
canal.mq.canalBatchSize = 50
canal.mq.canalGetTimeout = 100
canal.mq.flatMessage = true
canal.mq.compressionType = none
canal.mq.acks = all
#canal.mq.properties. =
canal.mq.producerGroup = test
# Set this value to "cloud", if you want open message trace feature in aliyun.
canal.mq.accessChannel = local
# aliyun mq namespace
#canal.mq.namespace =
启动rocketmq
#启动nameserver
/Users/zihao/Documents/servers/rocketmq-all-4.3.1-bin-release/bin/mqnamesrv &
#启动broker
/Users/zihao/Documents/servers/rocketmq-all-4.3.1-bin-release/bin/mqbroker -c /Users/zihao/Documents/servers/rocketmq-all-4.3.1-bin-release/conf/broker.conf &
#启动rocketmq console
nohup java -jar /Users/zihao/Documents/servers/rocketmq-externals-master/rocketmq-console/target/rocketmq-console-ng-1.0.1.jar &
查看是否接入成功
往test.company表插入一条数据,然后取rocketmq控制台的message查看下
监听topic,写入redis
RocketmqMysql2Redis
package com.alibaba.otter.simple;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.TypeReference;
import com.alibaba.otter.model.BinlogModel;
import com.alibaba.otter.model.Company;
import org.apache.rocketmq.client.consumer.DefaultMQPushConsumer;
import org.apache.rocketmq.client.consumer.listener.ConsumeConcurrentlyStatus;
import org.apache.rocketmq.client.consumer.listener.MessageListenerConcurrently;
import org.apache.rocketmq.client.exception.MQClientException;
import org.apache.rocketmq.common.consumer.ConsumeFromWhere;
import org.apache.rocketmq.common.message.MessageExt;
import redis.clients.jedis.Jedis;
import java.util.HashMap;
import java.util.Map;
/**
* @author tangzihao
* @date 2021/4/26 10:19 下午
*/
public class RocketmqMysql2Redis {
private static final Integer MAX_RETRIES = 3;
public static void main(String[] args) throws MQClientException {
Jedis jedis = new Jedis("127.0.0.1", 6379);
DefaultMQPushConsumer consumer = new DefaultMQPushConsumer("test_company-consumer");
consumer.setNamesrvAddr("127.0.0.1:9876");
consumer.setConsumeFromWhere(ConsumeFromWhere.CONSUME_FROM_LAST_OFFSET);
consumer.subscribe("test_company", "*");
consumer.registerMessageListener((MessageListenerConcurrently) (msgs, context) -> {
MessageExt msg = msgs.get(0);
String bodyString = new String(msg.getBody());
try {
System.out.printf("%s Receive New Messages: %s %n", Thread.currentThread().getName(), bodyString);
handleBinlog(bodyString, jedis);
System.out.println("topic:" + msg.getTopic());
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
} catch (Exception e) {
e.printStackTrace();
if (msg.getReconsumeTimes() >= MAX_RETRIES) {
//自己做记录,补偿机制 不让rocketmq重试了
return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
}
System.out.println("第" + msg.getReconsumeTimes() + "重试");
}
return ConsumeConcurrentlyStatus.RECONSUME_LATER;
});
consumer.start();
System.out.printf("Consumer Started.%n");
}
private static void handleBinlog(String bodyString, Jedis jedis) {
BinlogModel binlogModel = JSON.parseObject(bodyString, new TypeReference>() {
});
String eventType = binlogModel.getType();
Company company = binlogModel.getData().get(0);
String id = String.valueOf(company.getId());
if ("DELETE".equals(eventType)) {
jedis.del(id);
} else {
Map paramMap = new HashMap<>();
paramMap.put("name", company.getName());
paramMap.put("city", company.getCity());
paramMap.put("domain", company.getDomain());
paramMap.put("email", company.getEmail());
paramMap.put("sdate", company.getSdate());
jedis.hmset(id, paramMap);
}
}
}
binlog顺序性
官方文档有给出顺序性的说明 https://github.com/alibaba/canal/wiki/Canal-Kafka-RocketMQ-QuickStart
在上面的配置中,我使用的是canal.mq.partition=0,是单分区(rocketmq中应该叫consume queue), 看下控制台可以看到全是发送到queue=0的消费队列