Flink SQL方式一次性同步单表Mysql数据到Mysql

环境说明:

flink 1.15.2

mysql 版本5.7    注意:不需要开启binlog,因为是基于表数据查询获取数据

mysql 源表和目标表 有无主键(ID)、有无(ID)重复的数据的几种实测情况如下:

        源表没有主键但有重复的数据,目标表没有主键,数据会完整同步过去。(同步多次时,目标表会有多份)

        源表没有主键但有重复的数据,目标表有主键,程序运行没有报错但是数据同步不过去。

        源表没有主键,目标表没有主键,数据会完整同步过去。(同步多次时,目标表会有多份)

        源表有主键,目标表没有主键,数据会完整同步过去。(同步多次时,目标表会有多份)

        源表有主键,目标表有主键,数据会完整同步过去。(同步多次时,只有第一次成功,其余程序不报错,数据是进不去的,主键冲突。)

windows11 IDEA 本地运行

官网参考:MySQL CDC Connector — CDC Connectors for Apache Flink® documentation

官网是CDC的,数据一批次性同步非CDC功能哈。

maven依赖如下:

 
        8
        8
        1.15.2
    
    
        
            org.apache.flink
            flink-clients
            ${flink.version}
        
        
            org.apache.flink
            flink-streaming-java
            ${flink.version}
        
        
            org.apache.flink
            flink-runtime-web
            ${flink.version}
        

        
            org.apache.flink
            flink-table-planner_2.12
            ${flink.version}
            
        
        
            org.apache.flink
            flink-connector-jdbc
            ${flink.version}
            
            
        
        
            mysql
            mysql-connector-java
            8.0.29
            
        
    

mysql 建表:

源表:user 

目标表:user_new

CREATE TABLE `user` (
  `id` int(11) NOT NULL,
  `username` varchar(255) DEFAULT NULL,
  `password` varchar(255) DEFAULT NULL,

  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

CREATE TABLE `user_new` (
  `id` int(11) NOT NULL,
  `username` varchar(255) DEFAULT NULL,
  `password` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

demo 如下:


import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;

public class MysqlToMysqlFullData {
    public static void main(String[] args) {
        //1.获取stream的执行环境
        StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment();
        senv.setParallelism(4);
        EnvironmentSettings settings = EnvironmentSettings.newInstance().inBatchMode().build();
        //2.创建表执行环境
        StreamTableEnvironment tEnv = StreamTableEnvironment.create(senv,settings);

        String sourceTable = "CREATE TABLE mysql_source (" +
                "  id INT,\n" +
                "  username STRING,\n" +
                "  password STRING\n" +
                ") WITH (\n" +
                "'connector' = 'jdbc',\n" +
                "'driver' = 'com.mysql.cj.jdbc.Driver',\n" +
                "'url' = 'jdbc:mysql://localhost:3306/test',\n" +
                "'username' = 'root',\n" +
                "'password' = 'root',\n" +
                "'table-name' = 'user'\n" +
                ")";
        tEnv.executeSql(sourceTable);
        tEnv.executeSql("select * from mysql_source").print();
        String sinkTable = "CREATE TABLE mysql_sink (" +
                "  id INT,\n" +
                "  username STRING,\n" +
                "  password STRING\n" +
                ") WITH (\n" +
                "'connector' = 'jdbc',\n" +
                "'driver' = 'com.mysql.cj.jdbc.Driver',\n" +
                "'url' = 'jdbc:mysql://localhost:3306/test?rewriteBatchedStatements=true',\n" +
                "'username' = 'root',\n" +
                "'password' = 'root',\n" +
                "'table-name' = 'user_new'\n" +
                ")";
        // jdbc 所支持的参数配置:
        //connection.max-retry-timeout
        //connector
        //driver
        //lookup.cache.caching-missing-key
        //lookup.cache.max-rows
        //lookup.cache.ttl
        //lookup.max-retries
        //password
        //property-version
        //scan.auto-commit
        //scan.fetch-size
        //scan.partition.column
        //scan.partition.lower-bound
        //scan.partition.num
        //scan.partition.upper-bound
        //sink.buffer-flush.interval
        //sink.buffer-flush.max-rows
        //sink.max-retries
        //sink.parallelism
        //table-name
        //url
        //username
        tEnv.executeSql(sinkTable);
        tEnv.executeSql("insert into mysql_sink select id,username,password from mysql_source");
    }
}

注意:insert into   程序执行一次,数据就会重复一次哦,现在还没有实现overwrite功能,想要插入前清空表数据还得再拿JDBCDriver 去写一个sql执行。可以借鉴DataStream方式用JDBCDrvier 获取表的建表语句和执行清空表数据操作。

支持 INSERT OVERWRITE,有 Filesystemconnector 和 Hive table, 这些表一般不会有主键。其他connector 如 JDBC\ES\HBase 目前不支持 INSERT OVERWRITE。

你可能感兴趣的:(Flink,CDC,sql,mysql,flink)