这是一开始的部署图,没有加消息队列,消息队列后面再加的
canal的原理是基于mysql binlog技术,所以这里一定需要开启mysql的binlog写入功能,建议配置binlog模式为row.**针对阿里云RDS账号默认已经有binlog dump权限,不需要任何权限或者binlog设置,可以直接跳过这一步**
[mysqld]
log-bin=mysql-bin #添加这一行就ok
binlog-format=ROW #选择row模式
server_id=1 #配置mysql replaction需要定义,不能和canal的slaveId重复
canal的原理是模拟自己为mysql slave,所以这里一定需要做为mysql slave的相关权限.
CREATE USER canal IDENTIFIED BY 'canal';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;
canal的安装
下载地址
https://github.com/alibaba/canal/releases
最好下载最新版本,新版本会修复一些bug
下载后直接解压缩,进入canal文件,目录结构如下
drwxr-xr-x. 2 root root 4096 7月 10 15:31 bin
drwxr-xr-x. 4 root root 4096 7月 10 16:40 conf
drwxr-xr-x. 2 root root 4096 5月 8 15:52 lib
drwxr-xr-x. 4 root root 4096 7月 2 18:42 logs
主要需要修改两个配置文件 conf/canal.properties ,conf/example/instance.properties
vi conf/canal.properties
#################################################
######### common argument #############
#################################################
canal.id= 1
canal.ip=
canal.port= 11111
canal.zkServers=127.0.0.1:2181
# flush data to zk
canal.zookeeper.flush.period = 1000
# flush meta cursor/parse position to file
canal.file.data.dir = ${canal.conf.dir}
canal.file.flush.period = 1000
## memory store RingBuffer size, should be Math.pow(2,n)
canal.instance.memory.buffer.size = 16384
## memory store RingBuffer used memory unit size , default 1kb
canal.instance.memory.buffer.memunit = 1024
## meory store gets mode used MEMSIZE or ITEMSIZE
canal.instance.memory.batch.mode = MEMSIZE
## detecing config
canal.instance.detecting.enable = false
#canal.instance.detecting.sql = insert into retl.xdual values(1,now()) on duplicate key update x=now()
canal.instance.detecting.sql = select 1
canal.instance.detecting.interval.time = 3
canal.instance.detecting.retry.threshold = 3
canal.instance.detecting.heartbeatHaEnable = false
# support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery
canal.instance.transaction.size = 1024
# mysql fallback connected to new master should fallback times
canal.instance.fallbackIntervalInSeconds = 60
# network config
canal.instance.network.receiveBufferSize = 16384
canal.instance.network.sendBufferSize = 16384
canal.instance.network.soTimeout = 30
# binlog filter config
canal.instance.filter.druid.ddl = true
canal.instance.filter.query.dcl = false
canal.instance.filter.query.dml = false
canal.instance.filter.query.ddl = false
canal.instance.filter.table.error = false
canal.instance.filter.rows = false
# binlog format/image check
canal.instance.binlog.format = ROW,STATEMENT,MIXED
canal.instance.binlog.image = FULL,MINIMAL,NOBLOB
# binlog ddl isolation
canal.instance.get.ddl.isolation = false
#################################################
######### destinations #############
#################################################
canal.destinations= example
# conf root dir
canal.conf.dir = ../conf
# auto scan instance dir add/remove and start/stop instance
canal.auto.scan = true
canal.auto.scan.interval = 5
vi conf/example/instance.properties
#################################################
## mysql serverId
canal.instance.mysql.slaveId=1234
# position info
canal.instance.master.address=147.18.1.52:3306
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
# table meta tsdb info
canal.instance.tsdb.enable=true
canal.instance.tsdb.dir=${canal.file.data.dir:../conf}/${canal.instance.destination:}
canal.instance.tsdb.url=jdbc:h2:${canal.instance.tsdb.dir}/h2;CACHE_SIZE=1000;MODE=MYSQL;
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
canal.instance.tsdb.dbUsername=canal
canal.instance.tsdb.dbPassword=canal
#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
# username/password
canal.instance.dbUsername=root
canal.instance.dbPassword=jhtest124
canal.instance.defaultDatabaseName=userdb
canal.instance.connectionCharset=UTF-8
# table regex
canal.instance.filter.regex=userdb.tshop,shj_cate_db.tdishes,shj_cate_db.tdishestype,shj_cate_db.tattachindishes
# table black regex
canal.instance.filter.black.regex=
#################################################
上面红色背景的配置都是比较重要的配置,其它配置使用默认的就好
canal.properties
canal.zkServers=127.0.0.1:2181 //在使用HA 模式时,需要配置该属性,用于协调主备canal server以及记录日志信息消费情况(后面会详细说明)
instance.properties
canal.instance.mysql.slaveId=1234 // 这个配置不能与其它的canal服务器的配置重复
canal.instance.master.address=147.18.1.52:3306 //mysql的ip和端口(也可以是域名加端口,比如阿里云服务的rds)
canal.instance.dbUsername=root
canal.instance.dbPassword=jhtest124
//mysql具有足够权限(SELECT, REPLICATION SLAVE, REPLICATION CLIENT)的用户名,密码
canal.instance.filter.regex=userdb.tshop,shj_cate_db.tdishes,shj_cate_db.tdishestype,shj_cate_db.tattachindishes
//一般情况下,这个参数不配置的时候canal也可以正常运行,但是mysql为阿里云rds时会报错,贴一段canal源码吧,如下:
问题在于,如果不配置canal.instance.filter.regex,在第2步时,canal会将mysql master所有的表,包括view视图,问题就在这了,在对rds执行(show create table + 视图)的时候会报没有权限错误,就是你去设置用户权限也不行,rds就没有开放这个权限.
设置了canal.instance.filter.regex这个参数后,第2步在for循环中,会将不在参数中的表过滤掉,这样就这处理业务关心的表.
/**
* 初始化的时候dump一下表结构
*/
private boolean dumpTableMeta(MysqlConnection connection, final CanalEventFilter filter) {
try {
ResultSetPacket packet = connection.query("show databases");//1
List schemas = new ArrayList();
for (String schema : packet.getFieldValues()) {
schemas.add(schema);
}
for (String schema : schemas) {
packet = connection.query("show tables from `" + schema + "`");//2
List tables = new ArrayList();
for (String table : packet.getFieldValues()) {
String fullName = schema + "." + table;
if (blackFilter == null || !blackFilter.filter(fullName)) {
if (filter == null || filter.filter(fullName)) {
tables.add(table);
}
}
}
if (tables.isEmpty()) {
continue;
}
StringBuilder sql = new StringBuilder();
for (String table : tables) {
sql.append("show create table `" + schema + "`.`" + table + "`;");//3
}
List packets = connection.queryMulti(sql.toString());
for (ResultSetPacket onePacket : packets) {
if (onePacket.getFieldValues().size() > 1) {
String oneTableCreateSql = onePacket.getFieldValues().get(1);
memoryTableMeta.apply(INIT_POSITION, schema, oneTableCreateSql, null);//4
}
}
}
return true;
} catch (IOException e) {
throw new CanalParseException(e);
}
}
配置好了后,执行bin目录下的startup.sh就跑起来了.
/otter/canal/
|- destinations
|- example/
| |- cluster ==>[172.18.0.52:11111,172.18.0.50:11111]
| |- running ==>{"active":true,"address":"172.18.0.52:11111","cid":1}
| |- 1001
| |-running ==> {"active":true,"address":"172.18.0.53:62744","clientId":1001}
| |-cursor==>
{"@type":"com.alibaba.otter.canal.protocol.position.LogPosition","identity":{"slaveId":-1,"sourceAddress":{"address":"172.18.0.52","port":3306}},"postion":{"included":false,"journalName":"mysql-bin.000002","position":124582810,"
serverId":1,"timestamp":1530538222000}}
/otter/canal/destinations/example/cluster 记录canal服务端集群信息,具体就是canal服务器的ip和端口集合
/otter/canal/destinations/example/running 正在运行的canal服务器节点信息
/otter/canal/destinations/example/1001/running canal客户端节点信息
/otter/canal/destinations/example/1001/cursor canal客户端的变更日志游标
/otter/canal/destinations/example/running 节点记录正在执行任务的canal server的ip和端口,canal client就是通过这个节点的数据然后再去访问canal server的.
canal server 在执行的时候会将默认情况下是将内网地址写入这个节点了,如果canal client和canal server 不在一个子网,那就不能访问到canal server.
conf/canal.properties 有canal.ip 的参数,可以配置,但仍然会有问题.当时我是修改了源码,重新编译后才搞定的,canal最新版本现在好像已经修复了.
canal客户端的使用直接参考官网
今天就到这吧,以后再更