最近在公司中做 cdc (change data capture) 从canal 切换到 debezium 平台。在迁移过程中遇到运行在 canal 上的instance 如何无损迁移到debezium 上?
这里无损是指:debezium connector 接着canal instance 停止时记录在zk 的position 位置信息, 一条不多,一条不少的cdc 消息同步。
debezium 本身支持从上次停止的位置继续消费,但是不能(我没有找到,如果有的话,欢迎指正)从指定binlog 文件位置继续消费。
迁移过程中可以避免有状态的实时计算出现错误。
突破口,debezium connector 可以从上次停止的位置继续运行。也就是说debezium connector 已经将该connector 的位置信息持久化了,在进行下次重启时,可以读取到mysql binlog position 信息,实现继续同步。
而kafka connect 启动的时候,会在kafka 中创建如下topic:
connect-offsets topic 内记录着debezium connector 的位置信息如下:
["MySqlConnector0320",{"server":"dbz"}], {"ts_sec":1584948756,"file":"mysql-bin.000009","pos":212537220,"row":1,"server_id":1,"event":2}
该消息的key包含信息如下:
[
"MySqlConnector0320",
{
"server": "dbz"
}
]
该消息的value包含信息如下:
{
"ts_sec": 1584948756,
"file": "mysql-bin.000009",
"pos": 212537220,
"row": 1,
"server_id": 1,
"event": 2
}
那么… 问题就变得简单了!我只需要提前向connect-offsets内生产一条消息,来记录要实现无损同步的connector 的binlog position位置信息,就可以实现从指定位置消费cdc 事件消息了。
从zk 中获取 canal instance 停止时的位置信息
get /otter/canal/destinations/test_xx/1001/cursor
{
"@type": "com.alibaba.otter.canal.protocol.position.LogPosition",
"identity": {
"slaveId": -1,
"sourceAddress": {
"address": "192.168.110.222",
"port": 3306
}
},
"postion": {
"gtid": "",
"included": false,
"journalName": "mysql-bin.000009",
"position": 212549837,
"serverId": 1,
"timestamp": 1584950198000
}
}
构造debezium connector 位置消息
key :
[
"MySqlConnector0323", -- debezium connector name
{
"server": "dbz" -- database.server.name
}
]
value:
{
"ts_sec": 1584950198000,
"file": "mysql-bin.000009", -- binlog name
"pos": 212549837, -- 指定canal 停止时的位置
"row": 1,
"server_id": 1,
"event": 2
}
使用kafka producer 发送位置信息到connect-offsets topic 中。
注意:connect-offsets topic 采用ByteArraySerializer 序列化消息。
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.ByteArraySerializer;
import org.apache.kafka.connect.converters.ByteArrayConverter;
import java.util.Properties;
/**
* @author xk
* @date 2020/3/23 15:18
*/
public class ProducerTest {
private ByteArrayConverter converter = new ByteArrayConverter();
public static void main(String[] args) {
ProducerTest producerTest = new ProducerTest();
producerTest.sendMessages();
}
public void sendMessages() {
Properties props = new Properties();
props.put("bootstrap.servers", "broker1:9092,broker2:9092");
props.put("key.serializer", ByteArraySerializer.class.getName());
props.put("value.serializer", ByteArraySerializer.class.getName());
props.put(ProducerConfig.RETRIES_CONFIG, Integer.MAX_VALUE);
Producer<String, String> producer = new KafkaProducer<>(props);
String keyMessage = "[\"MySqlConnector0323\",{\"server\":\"dbz\"}]";
String valueMessage = "{\"ts_sec\":1584950112000,\"file\":\"mysql-bin.000009\",\"pos\":212549837,\"row\":1,\"server_id\":1,\"event\":2}";
byte[] key = converter.fromConnectData("", null, keyMessage.getBytes());
byte[] value = converter.fromConnectData("", null, valueMessage.getBytes());
producer.send(new ProducerRecord("connect-offsets", key, value));
producer.close();
}
}
位置信息成功写入后,创建debezium connector
connector.name = MySqlConnector0323
connector.class=io.debezium.connector.mysql.MySqlConnector
database.user=debezium
database.server.id=10020
tasks.max=1
database.history.kafka.bootstrap.servers=broker1:9092,broker2:9092,broker3:9092
database.history.kafka.topic=dbhistory.inventory_xk_0324
database.server.name=dbz
database.port=3306
include.schema.changes=true
database.hostname=192.168.111.111
database.password=debezium
database.whitelist=inventory
snapshot.mode=schema_only_recovery
注意:
最后:发现 connector MySqlConnector0323 从位置
file = mysql-bin.000009
pos = 212549837
继续消费。至此问题解决!