Soft | Verison |
---|---|
MySQL | 5.7.34 |
Flink | 1.14.5 |
Doris | 1.1.0 |
参照:https://www.runoob.com/mysql/mysql-install.html
Flink CDC 通过订阅MySQL的binglog实时将数据同步到Doris,因此需要启用MySQL binglog功能。
vim /etc/my.cnf
log-bin=mysql-bin
binlog_format=Row
重启MySQL
systemctl restart mysqld
查看binlog日志是否打开
mysql> show variables like "log_bin";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin | ON |
+---------------+-------+
1 row in set (0.00 sec)
mysql>
登录MySQL
mysql -h 127.0.0.1 -P 3306 -uroot
创建数据库和表
create database example_db;
CREATE TABLE `test_cdc` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4;
测试数据写入
insert into test_cdc (id,name) values(1,'1') ;
测试数据验证
mysql> select * from test_cdc;
+----+------+
| id | name |
+----+------+
| 1 | 1 |
4 rows in set (0.00 sec)
mysql>
Doris安装参照官网:https://doris.apache.org/zh-CN/docs/get-starting/
mysql -h 127.0.0.1 -P 9030 -uroot;
create database example_db;
CREATE TABLE IF NOT EXISTS example_db.expamle_tbl
(
`id` LARGEINT NOT NULL COMMENT "用户id",
`name` VARCHAR(50) NOT NULL COMMENT "用户昵称"
)
UNIQUE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
下载Scala【注意这里使用2.12.15版本】:
mkdir -p /opt
cd /opt/
wget https://downloads.lightbend.com/scala/2.12.15/scala-2.12.15.tgz
tar -xzvf scala-2.12.15.tgz
2. 配置Scala:
vim /etc/profile
export SCALA_HOME=/opt/scala-2.12.15
export PATH=$PATH:$SCALA_HOME/bin
#使配置文件生效
. /etc/profile
3. Scala版本验证:
[root@localhost ~]# scala -version
Scala code runner version 2.12.15 -- Copyright 2002-2021, LAMP/EPFL and Lightbend, Inc.
[root@localhost ~]#
Flink 下载【注意这里使用2.12版本】
cd /opt/
wget https://dlcdn.apache.org/flink/flink-1.14.5/flink-1.14.5-bin-scala_2.12.tgz
tar -xzvf flink-1.14.5-bin-scala_2.12.tgz
其中1.14.5是Flink版本,2.12我Scala版本。
在Flink 的lib目录加入Flnk CDC依赖包
cd /opt/flink-1.14.5/lib
#下载mysql-cdc
wget https://repo1.maven.org/maven2/com/ververica/flink-sql-connector-mysql-cdc/2.2.1/flink-sql-connector-mysql-cdc-2.2.1.jar
#下载flink-doris-connector
wget https://repo.maven.apache.org/maven2/org/apache/doris/flink-doris-connector-1.14_2.12/1.1.0/flink-doris-connector-1.14_2.12-1.1.0.jar
注意:
Flink® CDC Version | Flink® Version |
---|---|
1.0.0 | 1.11.* |
1.1.0 | 1.11.* |
1.2.0 | 1.12.* |
1.3.0 | 1.12.* |
1.4.0 | 1.13.* |
2.0.* | 1.13.* |
2.1.* | 1.13.* |
2.2.* | 1.13., 1.14. |
Flink CDC对应Jar包版本的地址:https://github.com/ververica/flink-cdc-connectors/releases
Download
flink-sql-connector-mongodb-cdc-2.2.1.jar
flink-sql-connector-mysql-cdc-2.2.1.jar
flink-sql-connector-oceanbase-cdc-2.2.1.jar
flink-sql-connector-oracle-cdc-2.2.1.jar
flink-sql-connector-postgres-cdc-2.2.1.jar
flink-sql-connector-sqlserver-cdc-2.2.1.jar
flink-sql-connector-tidb-cdc-2.2.1.jar
flink-doris-connector下载地址为:https://repo.maven.apache.org/maven2/org/apache/doris/flink-doris-connector-1.14_2.12/
其中1.14为支持的Flink版本,2.12为支持的Scala版本。
[root@localhost flink-1.14.5]# cd /opt/flink-1.14.5/
[root@localhost flink-1.14.5]# bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host localhost.localdomain.
Starting taskexecutor daemon on host localhost.localdomain.
./bin/sql-client.sh embedded
set execution.result-mode=tableau;
SET 'execution.checkpointing.interval' = '10s';
CREATE TABLE cdc_mysql_source ( id int ,name VARCHAR ,PRIMARY KEY (id) NOT ENFORCED) WITH ( 'connector' = 'mysql-cdc', 'hostname' = '127.0.0.1', 'port' = '3306', 'username' = 'root', 'password' = '', 'database-name' = 'example_db', 'table-name' = 'test_cdc');
CREATE TABLE doris_sink (id INT,name STRING) WITH ( 'connector' = 'doris', 'fenodes' = '127.0.0.1:8030', 'table.identifier' = 'example_db.expamle_tbl', 'username' = 'root', 'password' = '', 'sink.enable-2pc'='false', 'sink.label-prefix' = 'doris_label');
insert into doris_sink select id,name from cdc_mysql_source;
说明:Flink CDC首次启动会全量同步一次历史数据,等全量数据同步完成后会开启增量同步任务。
mysql -h 127.0.0.1 -P 9030 -uroot
mysql> use example_db
Database changed
mysql>
mysql> select * from expamle_tbl;
+------+------+
| id | name |
+------+------+
| 1 | 1 |
1 rows in set (0.02 sec)
2022-09-05 19:46:30,792 INFO org.apache.doris.flink.sink.writer.DorisStreamLoad [] - load Result {
"TxnId": 1125,
"Label": "doris_label_0_1662378388355",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 1,
"NumberLoadedRows": 1,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 8,
"LoadTimeMs": 123,
"BeginTxnTimeMs": 2,
"StreamLoadPutTimeMs": 108,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 2256,
"CommitAndPublishTimeMs": 60
}
通过日志可可以看到Flink数据写入成功了。
insert into test_cdc (id,name) values(2,'2') ;
2 . 增量同步验证:Doris数据查看
mysql -h 127.0.0.1 -P 9030 -uroot
mysql> use example_db
Database changed
mysql>
mysql> select * from expamle_tbl;
+------+------+
| id | name |
+------+------+
| 1 | 1 |
| 2 | 2 |
2 rows in set (0.02 sec)
最后宣传下我的书:
1 . 《图解Spark 大数据快速分析实战(异步图书出品)》 https://item.jd.com/13613302.html
2. 《Offer来了:Java面试核心知识点精讲(第2版)(博文视点出品)》https://item.jd.com/13200939.html
3. 《Offer来了:Java面试核心知识点精讲(原理篇)(博文视点出品)》https://item.jd.com/12737278.html
4. 《Offer来了:Java面试核心知识点精讲(框架篇)(博文视点出品)》https://item.jd.com/12868220.html