环境说明:
flink 1.15.2
mysql 版本5.7 注意:不需要开启binlog,因为是基于表数据查询获取数据
mysql 源表和目标表 有无主键(ID)、有无(ID)重复的数据的几种实测情况如下:
源表没有主键但有重复的数据,目标表没有主键,数据会完整同步过去。(同步多次时,目标表会有多份)
源表没有主键但有重复的数据,目标表有主键,程序运行没有报错但是数据同步不过去。
源表没有主键,目标表没有主键,数据会完整同步过去。(同步多次时,目标表会有多份)
源表有主键,目标表没有主键,数据会完整同步过去。(同步多次时,目标表会有多份)
源表有主键,目标表有主键,数据会完整同步过去。(同步多次时,只有第一次成功,其余程序不报错,数据是进不去的,主键冲突。)
windows11 IDEA 本地运行
官网参考:MySQL CDC Connector — CDC Connectors for Apache Flink® documentation
官网是CDC的,数据一批次性同步非CDC功能哈。
maven依赖如下:
8
8
1.15.2
org.apache.flink
flink-clients
${flink.version}
org.apache.flink
flink-streaming-java
${flink.version}
org.apache.flink
flink-runtime-web
${flink.version}
org.apache.flink
flink-table-planner_2.12
${flink.version}
org.apache.flink
flink-connector-jdbc
${flink.version}
mysql
mysql-connector-java
8.0.29
mysql 建表:
源表:user
目标表:user_new
CREATE TABLE `user` (
`id` int(11) NOT NULL,
`username` varchar(255) DEFAULT NULL,
`password` varchar(255) DEFAULT NULL,PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;CREATE TABLE `user_new` (
`id` int(11) NOT NULL,
`username` varchar(255) DEFAULT NULL,
`password` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
demo 如下:
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
public class MysqlToMysqlFullData {
public static void main(String[] args) {
//1.获取stream的执行环境
StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment();
senv.setParallelism(4);
EnvironmentSettings settings = EnvironmentSettings.newInstance().inBatchMode().build();
//2.创建表执行环境
StreamTableEnvironment tEnv = StreamTableEnvironment.create(senv,settings);
String sourceTable = "CREATE TABLE mysql_source (" +
" id INT,\n" +
" username STRING,\n" +
" password STRING\n" +
") WITH (\n" +
"'connector' = 'jdbc',\n" +
"'driver' = 'com.mysql.cj.jdbc.Driver',\n" +
"'url' = 'jdbc:mysql://localhost:3306/test',\n" +
"'username' = 'root',\n" +
"'password' = 'root',\n" +
"'table-name' = 'user'\n" +
")";
tEnv.executeSql(sourceTable);
tEnv.executeSql("select * from mysql_source").print();
String sinkTable = "CREATE TABLE mysql_sink (" +
" id INT,\n" +
" username STRING,\n" +
" password STRING\n" +
") WITH (\n" +
"'connector' = 'jdbc',\n" +
"'driver' = 'com.mysql.cj.jdbc.Driver',\n" +
"'url' = 'jdbc:mysql://localhost:3306/test?rewriteBatchedStatements=true',\n" +
"'username' = 'root',\n" +
"'password' = 'root',\n" +
"'table-name' = 'user_new'\n" +
")";
// jdbc 所支持的参数配置:
//connection.max-retry-timeout
//connector
//driver
//lookup.cache.caching-missing-key
//lookup.cache.max-rows
//lookup.cache.ttl
//lookup.max-retries
//password
//property-version
//scan.auto-commit
//scan.fetch-size
//scan.partition.column
//scan.partition.lower-bound
//scan.partition.num
//scan.partition.upper-bound
//sink.buffer-flush.interval
//sink.buffer-flush.max-rows
//sink.max-retries
//sink.parallelism
//table-name
//url
//username
tEnv.executeSql(sinkTable);
tEnv.executeSql("insert into mysql_sink select id,username,password from mysql_source");
}
}
注意:insert into 程序执行一次,数据就会重复一次哦,现在还没有实现overwrite功能,想要插入前清空表数据还得再拿JDBCDriver 去写一个sql执行。可以借鉴DataStream方式用JDBCDrvier 获取表的建表语句和执行清空表数据操作。
支持 INSERT OVERWRITE,有 Filesystemconnector 和 Hive table, 这些表一般不会有主键。其他connector 如 JDBC\ES\HBase 目前不支持 INSERT OVERWRITE。