大数据平台实时数仓从0到1搭建之 - 14 Maxwell & Canal 对比

大数据平台实时数仓从0到1搭建之 - 14 Maxwell & Canal 对比

  • 概述
  • 环境
  • 数据库配置
  • 安装
  • 配置
  • 启动命令
  • 输出结果
  • HA
  • 监控web
  • 总结

概述

前两章分别写了maxwell和canal的安装使用,我本人其实是第一次听说这两个工具。
那就从新手使用方面,对比下这两个工具。

环境

工具 版本
Linux Centos 7
JDK 1.8
scala 2.11
mariadb 10.3
Zookeeper 3.5.8
kafka 2.4.1
Flink 1.13
maxwell latest
canal-server v1.1.5

数据库配置

两个工具数据库配置一样,然后添加各自的用户。

[root@server111 software]# vim /etc/my.cnf.d/server.cnf
[server]
server_id=1
log-bin=master
binlog_format=row

赋权也一致。因为它们两个底层都是模拟slave节点

MariaDB [(none)]> GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
MariaDB [(none)]> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'maxwell'@'%';

安装

maxwell 和canal-server 都支持docker部署

[root@server113 ~]# docker pull zendesk/maxwell
[root@server113 ~]# docker pull canal/canal-server:v1.1.5
[root@server113 ~]# docker images
REPOSITORY           TAG       IMAGE ID       CREATED        SIZE
zendesk/maxwell      latest    f46af4abbe00   4 months ago   888MB
canal/canal-server   v1.1.5    0c7f1d62a7d8   5 months ago   874MB

配置

maxwell里kafka的配置就这几个,相比之下,Canal稍微多一点,canal里配置kafka和mq共用。
canal里有一个让我印象深刻的参数,就是canal.mq.dynamicTopic,动态topic,可以以表名为topic,或者某些表发的同一个topic,这个功能maxwell好像是没有,需要后期代码里加分发。
配置大同小异。

canal.mq.dynamicTopic 表达式说明
canal 1.1.3版本之后, 支持配置格式:schema 或 schema.table,多个配置之间使用逗号或分号分隔
例子1:test\.test 指定匹配的单表,发送到以test_test为名字的topic上
例子2:.\… 匹配所有表,则每个表都会发送到各自表名的topic上
例子3:test 指定匹配对应的库,一个库的所有表都会发送到库名的topic上
例子4:test\…* 指定匹配的表达式,针对匹配的表会发送到各自表名的topic上
例子5:test,test1\.test1,指定多个表达式,会将test库的表都发送到test的topic上,test1\.test1的表发送到对应的test1_test1 topic上,其余的表发送到默认的canal.mq.topic值
为满足更大的灵活性,允许对匹配条件的规则指定发送的topic名字,配置格式:topicName:schema 或 topicName:schema.table
例子1: test:test\.test 指定匹配的单表,发送到以test为名字的topic上
例子2: test:.\… 匹配所有表,因为有指定topic,则每个表都会发送到test的topic下
例子3: test:test 指定匹配对应的库,一个库的所有表都会发送到test的topic下
例子4:testA:test\…* 指定匹配的表达式,针对匹配的表会发送到testA的topic下
例子5:test0:test,test1:test1\.test1,指定多个表达式,会将test库的表都发送到test0的topic下,test1\.test1的表发送到对应的test1的topic下,其余的表发送到默认的canal.mq.topic值
引用:https://github.com/alibaba/canal/wiki/Canal-Kafka-RocketMQ-QuickStart

大数据平台实时数仓从0到1搭建之 - 14 Maxwell & Canal 对比_第1张图片大数据平台实时数仓从0到1搭建之 - 14 Maxwell & Canal 对比_第2张图片
大数据平台实时数仓从0到1搭建之 - 14 Maxwell & Canal 对比_第3张图片

启动命令

maxwell命令如下,简单明了

[root@server113 ~]# docker run -it --rm zendesk/maxwell bin/maxwell \
--user='maxwell' \
--password='maxwell' \
--host='192.168.1.111' \
--producer=kafka \
--kafka.bootstrap.servers='192.168.1.112:9092' \
--kafka_topic=maxwell \
--filter="include:test.test"

canal给封装了个run.sh,需要另外下载,脚本里配置了一些环境,最终也是创建了个容器

[root@server113 ~]# ./run.sh \
-e canal.instance.master.address=192.168.1.111:3306 \
-e canal.instance.dbUsername=canal \
-e canal.instance.dbPassword=canal \
-e canal.serverMode=kafka \
-e canal.mq.servers=192.168.1.110:9092 \
-e canal.mq.topic=canal \
-e canal.mq.flatMessage=true \
-e canal.instance.filter.regex='test.test'

canal也有过滤参数,canal.instance.filter.regex
大数据平台实时数仓从0到1搭建之 - 14 Maxwell & Canal 对比_第4张图片

输出结果

同时启动maxwell和canal,对比输出结果。

--执行如下sql,对应idea里的输出分别如下图
insert into test values (5,'hello world');
update test set  name='hello flink' where id = 5;
CREATE TABLE tmp (id int ,name varchar(255));
insert into tmp values (5,'hello world');
delete from test where id = 5;
delete from tmp where id = 5;

maxwell
在这里插入图片描述
maxwell-json

{"database":"test","table":"test","type":"insert","ts":1633583847,"xid":3387,"commit":true,"data":{"id":5,"name":"hello world"}}
{"database":"test","table":"test","type":"update","ts":1633583856,"xid":3406,"commit":true,"data":{"id":5,"name":"hello flink"},"old":{"name":"hello world"}}
{"database":"test","table":"tmp","type":"insert","ts":1633583869,"xid":3446,"commit":true,"data":{"id":5,"name":"hello world"}}
{"database":"test","table":"test","type":"delete","ts":1633583872,"xid":3459,"commit":true,"data":{"id":5,"name":"hello flink"}}
{"database":"test","table":"tmp","type":"delete","ts":1633583875,"xid":3471,"commit":true,"data":{"id":5,"name":"hello world"}}

canal
大数据平台实时数仓从0到1搭建之 - 14 Maxwell & Canal 对比_第5张图片
canal-json

{"data":null,"database":"","es":1633583841000,"id":111,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"update `heartbeats` set `heartbeat` = 1633583842271 where `server_id` = 1 and `client_id` = 'maxwell' and `heartbeat` = 1633583832128","sqlType":null,"table":"","ts":1633583842449,"type":"QUERY"}
{"data":null,"database":"","es":1633583842000,"id":112,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 150354, last_heartbeat_read = 1633583842271, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583842271, gtid_set = null, binlog_file = 'master.000007', binlog_position=150354","sqlType":null,"table":"","ts":1633583843392,"type":"QUERY"}
{"data":null,"database":"","es":1633583847000,"id":113,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"insert into test values (5,'hello world')","sqlType":null,"table":"","ts":1633583848189,"type":"QUERY"}
{"data":[{"id":"5","name":"hello world"}],"database":"test","es":1633583847000,"id":113,"isDdl":false,"mysqlType":{"id":"int(11)","name":"varchar(255)"},"old":null,"pkNames":null,"sql":"","sqlType":{"id":4,"name":12},"table":"test","ts":1633583848190,"type":"INSERT"}
{"data":null,"database":"","es":1633583847000,"id":114,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 151173, last_heartbeat_read = 1633583842271, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583842271, gtid_set = null, binlog_file = 'master.000007', binlog_position=151173","sqlType":null,"table":"","ts":1633583848515,"type":"QUERY"}
{"data":null,"database":"","es":1633583848000,"id":115,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"update `heartbeats` set `heartbeat` = 1633583849321 where `server_id` = 1 and `client_id` = 'maxwell' and `heartbeat` = 1633583842271","sqlType":null,"table":"","ts":1633583849450,"type":"QUERY"}
{"data":null,"database":"","es":1633583849000,"id":116,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 152120, last_heartbeat_read = 1633583849321, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583849321, gtid_set = null, binlog_file = 'master.000007', binlog_position=152120","sqlType":null,"table":"","ts":1633583850372,"type":"QUERY"}
{"data":null,"database":"","es":1633583856000,"id":117,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"update test set  name='hello flink' where id = 5","sqlType":null,"table":"","ts":1633583857240,"type":"QUERY"}
{"data":[{"id":"5","name":"hello flink"}],"database":"test","es":1633583856000,"id":117,"isDdl":false,"mysqlType":{"id":"int(11)","name":"varchar(255)"},"old":[{"name":"hello world"}],"pkNames":null,"sql":"","sqlType":{"id":4,"name":12},"table":"test","ts":1633583857240,"type":"UPDATE"}
{"data":null,"database":"","es":1633583856000,"id":118,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 152964, last_heartbeat_read = 1633583849321, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583849321, gtid_set = null, binlog_file = 'master.000007', binlog_position=152964","sqlType":null,"table":"","ts":1633583857547,"type":"QUERY"}
{"data":null,"database":"","es":1633583857000,"id":119,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"update `heartbeats` set `heartbeat` = 1633583858372 where `server_id` = 1 and `client_id` = 'maxwell' and `heartbeat` = 1633583849321","sqlType":null,"table":"","ts":1633583858505,"type":"QUERY"}
{"data":null,"database":"","es":1633583858000,"id":120,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 153911, last_heartbeat_read = 1633583858372, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583858372, gtid_set = null, binlog_file = 'master.000007', binlog_position=153911","sqlType":null,"table":"","ts":1633583859460,"type":"QUERY"}
{"data":null,"database":"","es":1633583865000,"id":121,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT into `schemas` SET base_schema_id = 6, deltas = '[{\"type\":\"table-create\",\"database\":\"test\",\"table\":\"tmp\",\"def\":{\"database\":\"test\",\"charset\":\"latin1\",\"table\":\"tmp\",\"columns\":[{\"type\":\"int\",\"name\":\"id\",\"signed\":true},{\"type\":\"varchar\",\"name\":\"name\",\"charset\":\"latin1\"}],\"primary-key\":[]}}]', binlog_file = 'master.000007', binlog_position = 154566, server_id = 1, charset = 'latin1', version = 4, position_sha = '9c0805d22ba59ec470ed90769bbd0a115db979b5', gtid_set = null, last_heartbeat_read = 1633583858372","sqlType":null,"table":"","ts":1633583866315,"type":"QUERY"}
{"data":null,"database":"","es":1633583865000,"id":122,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 154677, last_heartbeat_read = 1633583858372, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583858372, gtid_set = null, binlog_file = 'master.000007', binlog_position=154677","sqlType":null,"table":"","ts":1633583866421,"type":"QUERY"}
{"data":null,"database":"","es":1633583866000,"id":123,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"update `heartbeats` set `heartbeat` = 1633583867400 where `server_id` = 1 and `client_id` = 'maxwell' and `heartbeat` = 1633583858372","sqlType":null,"table":"","ts":1633583867551,"type":"QUERY"}
{"data":null,"database":"","es":1633583867000,"id":124,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 156642, last_heartbeat_read = 1633583867400, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583867400, gtid_set = null, binlog_file = 'master.000007', binlog_position=156642","sqlType":null,"table":"","ts":1633583868475,"type":"QUERY"}
{"data":null,"database":"","es":1633583868000,"id":125,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"update `heartbeats` set `heartbeat` = 1633583869417 where `server_id` = 1 and `client_id` = 'maxwell' and `heartbeat` = 1633583867400","sqlType":null,"table":"","ts":1633583869610,"type":"QUERY"}
{"data":null,"database":"","es":1633583869000,"id":126,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 157589, last_heartbeat_read = 1633583869417, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583869417, gtid_set = null, binlog_file = 'master.000007', binlog_position=157589","sqlType":null,"table":"","ts":1633583870540,"type":"QUERY"}
{"data":null,"database":"","es":1633583869000,"id":127,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"insert into tmp values (5,'hello world')","sqlType":null,"table":"","ts":1633583870649,"type":"QUERY"}
{"data":null,"database":"","es":1633583870000,"id":128,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 158406, last_heartbeat_read = 1633583869417, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583869417, gtid_set = null, binlog_file = 'master.000007', binlog_position=158406","sqlType":null,"table":"","ts":1633583871577,"type":"QUERY"}
{"data":null,"database":"","es":1633583871000,"id":129,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"update `heartbeats` set `heartbeat` = 1633583872436 where `server_id` = 1 and `client_id` = 'maxwell' and `heartbeat` = 1633583869417","sqlType":null,"table":"","ts":1633583872503,"type":"QUERY"}
{"data":null,"database":"","es":1633583872000,"id":130,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 159353, last_heartbeat_read = 1633583872436, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583872436, gtid_set = null, binlog_file = 'master.000007', binlog_position=159353","sqlType":null,"table":"","ts":1633583873633,"type":"QUERY"}
{"data":null,"database":"","es":1633583872000,"id":130,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"delete from test where id = 5","sqlType":null,"table":"","ts":1633583873633,"type":"QUERY"}
{"data":[{"id":"5","name":"hello flink"}],"database":"test","es":1633583872000,"id":130,"isDdl":false,"mysqlType":{"id":"int(11)","name":"varchar(255)"},"old":null,"pkNames":null,"sql":"","sqlType":{"id":4,"name":12},"table":"test","ts":1633583873633,"type":"DELETE"}
{"data":null,"database":"","es":1633583873000,"id":131,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 160160, last_heartbeat_read = 1633583872436, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583872436, gtid_set = null, binlog_file = 'master.000007', binlog_position=160160","sqlType":null,"table":"","ts":1633583874556,"type":"QUERY"}
{"data":null,"database":"","es":1633583874000,"id":132,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"update `heartbeats` set `heartbeat` = 1633583875456 where `server_id` = 1 and `client_id` = 'maxwell' and `heartbeat` = 1633583872436","sqlType":null,"table":"","ts":1633583875483,"type":"QUERY"}
{"data":null,"database":"","es":1633583875000,"id":133,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"delete from tmp where id = 5","sqlType":null,"table":"","ts":1633583876204,"type":"QUERY"}
{"data":null,"database":"","es":1633583875000,"id":134,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 161330, last_heartbeat_read = 1633583875456, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583875456, gtid_set = null, binlog_file = 'master.000007', binlog_position=161330","sqlType":null,"table":"","ts":1633583876526,"type":"QUERY"}
{"data":null,"database":"","es":1633583876000,"id":135,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"update `heartbeats` set `heartbeat` = 1633583877467 where `server_id` = 1 and `client_id` = 'maxwell' and `heartbeat` = 1633583875456","sqlType":null,"table":"","ts":1633583877481,"type":"QUERY"}
{"data":null,"database":"","es":1633583877000,"id":136,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"INSERT INTO `positions` set server_id = 1, gtid_set = null, binlog_file = 'master.000007', binlog_position = 162277, last_heartbeat_read = 1633583877467, client_id = 'maxwell' ON DUPLICATE KEY UPDATE last_heartbeat_read = 1633583877467, gtid_set = null, binlog_file = 'master.000007', binlog_position=162277","sqlType":null,"table":"","ts":1633583878621,"type":"QUERY"}

从结果上来看,maxwell里的过滤,完全没有生效。
canal里的过滤,只是不显示data,第一条的query数据还是正常发送到了kafka。
maxwell 不能监听ddl,暂不清楚是不是漏了某个配置项,
canal可以监听建库,不能监听对表的操作。
返回结果上来看,canal结果稍微丰富一些,不过如果只是简单的同步数据,maxwell的结果完全够用

HA

高可用,canal可maxwell都有相关的配置项。详见官网

监控web

maxwell 自带监控需要加几个配置,
Canal 需要集成监控组件Prometheus

总结

maxwell小巧方便,使用配置简单,
canal有一整套生态链,功能完善,可配置项丰富。
还是那句,工具没有好与不好,只有合不合适。从需求来看,我就需要从数据库到kafka的这部分,所以maxwell完全可以满足需求,没必要浪费较多精力摸索canal。

希望路过的大神可以指点一二.

你可能感兴趣的:(实时数仓,kafka,big,data,docker)