Flink CDC实战之Mongo同步Mysql

简介

面对复杂的业务场景,企业可能会选用不同的数据库,这给业务之间数据交互,数据分析等带来一定的困扰,对此,数据同步起到很重要的作用,目前业内成熟的数据同步组件很多,支持实时同步的组件有:Canal,Maxwell,Debezium等等,Flink作为实时处理引擎,采用一种sql的方式方便快捷的实现了数据同步,笔者今天就以mongo同步到mysql为例做个演示,flink使用的版本为1.13.5。

Mongo环境搭建

Flink MongoDB CDC Connector是基于MongoDB Change Streams实现的,所以单机版的Mongo DB不支持。MongoDB 提供了副本集和分片集两种集群模部署模式,副本集相当于mysql的主从复制,集群模式相当于多实例分片存储集群。笔者在docker中部署了一个副本集群进行演示。

  1. 创建三个容器

docker run --name mongo0 -p 27000:27017 -d mongo --replSet "mg-cdc"
docker run --name mongo1 -p 27001:27017 -d mongo --replSet "mg-cdc"
docker run --name mongo2 -p 27002:27017 -d mongo --replSet "mg-cdc"

  1. 进入容器mongo0

docker exec -it mongo0 /bin/bash

  1. 进入客户端并配置集群
bin/mongosh

//ifconfig 查看宿主机IP地址
config = {"_id":"mg-cdc",
          "members":[
          {"_id":0,host:"192.168.1.9:27000"},
          {"_id":1,host:"192.168.1.9:27001"},
          {"_id":2,host:"192.168.1.9:27002"}
          ]
}
 
rs.initiate(config)
rs.status()

如果容器重启了,会报错

MongoServerError: already initialized

需要重新配置集群

rs.reconfig(config)

可能报如下错误

MongoServerError: New config is rejected :: caused by :: replSetReconfig should only be run on a writable PRIMARY. Current state REMOVED;

根据报错信息加强制指令执行

rs.reconfig(config, {force:true})

  1. 创建Mongo新用户,给Flink MongoDB CDC使用
use admin;
db.createUser({
  user: "flinkuser",
  pwd: "flinkpw",
  roles: [
    { role: "read", db: "admin" },
    { role: "readAnyDatabase", db: "admin" }
  ]
});
  1. 测试changestream
use wlapp;
cursor = db.plan_joined_user.watch()
cursor.next()

Flink CDC 代码实现

  1. 相关依赖

    com.ververica
    flink-connector-mongodb-cdc
    2.2.1



    com.ververica
    flink-sql-connector-mongodb-cdc
    2.2.1




    org.apache.kafka
    connect-api
    2.7.0




    org.apache.flink
    flink-connector-jdbc_2.12
    1.13.5

  1. 代码
public class FlinkCdcSync {
    public static void main(String[] args) {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setBufferTimeout(BUFFER_TIMEOUT_MS);
        env.enableCheckpointing(CHECKPOINT_INTERVAL_MS, CheckpointingMode.AT_LEAST_ONCE);
        env.getCheckpointConfig().setCheckpointTimeout(CHECKPOINT_TIMEOUT_MS);
        env.getCheckpointConfig().setMinPauseBetweenCheckpoints(CHECKPOINT_MIN_PAUSE_MS);
        env.setRestartStrategy(RestartStrategies.failureRateRestart(3, Time.of(5, TimeUnit.MINUTES),Time.of(10, TimeUnit.SECONDS)));
        final StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);

        tableEnv.executeSql("CREATE TABLE mongo_plan_joined_user (" +
                "  _id STRING," +
                "  plan_id STRING," +
                "  user_id STRING," +
                "  invite_share_log_id STRING," +
                "  joined_time STRING," +
                "  target_value STRING," +
                "  PRIMARY KEY(_id) NOT ENFORCED" +
                ") WITH (" +
                "  'connector' = 'mongodb-cdc'," +
                "  'hosts' = '127.0.0.1:27000,127.0.0.1:27001,127.0.0.1:27002'," +
                "  'username' = 'flinkuser'," +
                "  'password' = 'flinkpw'," +
                "  'database' = 'wlapp'," +
                "  'collection' = 'plan_joined_user'" +
                ")");

        tableEnv.executeSql("CREATE TABLE mysql_plan_joined_user (" +
                "  id STRING," +
                "  plan_id STRING," +
                "  user_id STRING," +
                "  invite_share_log_id STRING," +
                "  joined_time STRING," +
                "  target_value STRING," +
                "  PRIMARY KEY (id) NOT ENFORCED" +
                ") WITH (" +
                "   'connector' = 'jdbc'," +
                "   'url' = 'jdbc:mysql://localhost:3306/wlapp'," +
                "   'table-name' = 'plan_joined_user'," +
                "   'driver' = 'com.mysql.cj.jdbc.Drive'," +
                "   'username' = 'root'," +
                "   'password' = '123456'," +
                "   'scan.fetch-size' = '200'" +
                ")");

        tableEnv.executeSql("insert into mysql_plan_joined_user select * from mongo_plan_joined_user");
    }
}

Flink CDC SQL实现

  1. 相关jar包
  • avro-1.11.0.jar
  • kafka-clients-2.7.0.jar
  • connect-api-2.7.0.jar
  • flink-sql-connector-mongodb-cdc-2.2.1.jar
  • flink-connector-jdbc_2.11-1.13.5.jar
  • mysql-connector-java-8.0.29.jar
  1. 注意事项
  • kafka-client需要适配,本地采用的是2.7.0,过程中使用1.1.1会报错
java.lang.NoClassDefFoundError: com/ververica/cdc/debezium/internal/FlinkOffsetBackingStore
  • flink客户端lib中不应该存放flink-connector-mongodb-cdc-2.2.1.jar、flink-connector-debezium-2.2.1.jar 因为flink-sql-connector-mongodb-cdc-2.2.1.jar 都已经以shade的方式打进去了,否则会报如下错误
java.lang.NoClassDefFoundError: com/ververica/cdc/debezium/internal/FlinkOffsetBackingStore
    at com.ververica.cdc.debezium.DebeziumSourceFunction.run(DebeziumSourceFunction.java:369)
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:104)
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:60)
    at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:269)
  1. 代码
SET 'table.dml-sync' = 'false';
SET 'state.backend' = 'filesystem';
SET 'state.checkpoints.dir' = 'hdfs://namespace/user/flink/checkpoints';
SET 'execution.checkpointing.mode' = 'EXACTLY_ONCE';
SET 'execution.checkpointing.interval' = '2min';
SET 'execution.checkpointing.min-pause' = '1min';
SET 'execution.checkpointing.max-concurrent-checkpoints' = '1';
SET 'execution.checkpointing.prefer-checkpoint-for-recovery' = 'true';
SET 'execution.runtime-mode' = 'streaming';

CREATE TABLE mongo_plan_joined_user (
    _id STRING,
    plan_id STRING,
    user_id STRING,
    invite_share_log_id STRING,
    joined_time STRING,
    target_value STRING,
    PRIMARY KEY(_id) NOT ENFORCED
) WITH (
    'connector' = 'mongodb-cdc',
    'hosts' = '127.0.0.1:27000,127.0.0.1:27001,127.0.0.1:27002',
    'username' = 'flinkuser',
    'password' = 'flinkpw',
    'database' = 'wlapp',
    'collection' = 'plan_joined_user'
);

CREATE TABLE mysql_plan_joined_user (
    id STRING,
    plan_id STRING,
    user_id STRING,
    invite_share_log_id STRING,
    joined_time STRING,
    target_value STRING,
    PRIMARY KEY (id) NOT ENFORCED
) WITH (
    'connector' = 'jdbc',
    'url' = 'jdbc:mysql://127.0.0.1:3306/data',
    'table-name' = 'plan_joined_user',
    'driver' = 'com.mysql.cj.jdbc.Driver',
    'username' = 'root',
    'password' = '123456',
    'scan.fetch-size' = '200'
);

INSERT INTO mysql_plan_joined_user SELECT * FROM mongo_plan_joined_user;

本例简单测试了mongo cdc同步mysql的场景,后续在生产中遇到坑也会同步更新。

你可能感兴趣的:(Flink CDC实战之Mongo同步Mysql)