【Flink 1.12】深度解析 Flink Upsert-kafka

image.png

测试数据准备

在正式开始之前,请先下载好上述所需要的文件。我们首先用命令docker-compose up -d启动docker。我们可以利用以下命令从 Terminal 进入 Mysql 容器之中,并插入相应的数据。

docker exec -it  mysql bash -c 'mysql -uroot -p123456'

在 Mysql 中执行以下命令:

CREATE DATABASE flink;
USE flink;

CREATE TABLE users (
  user_id BIGINT,
  user_name VARCHAR(1000),
  region VARCHAR(1000)
);

INSERT INTO users VALUES 
(1, 'Timo', 'Berlin'),
(2, 'Tom', 'Beijing'),
(3, 'Apple', 'Beijing');

现在,我们利用Sql client在Flink中创建相应的表。

CREATE TABLE users (
  user_id BIGINT,
  user_name STRING,
  region STRING
) WITH (
  'connector' = 'mysql-cdc',
  'hostname' = 'localhost',
  'database-name' = 'flink',
  'table-name' = 'users',
  'username' = 'root',
  'password' = '123456'
);

CREATE TABLE pageviews (
  user_id BIGINT,
  page_id BIGINT,
  view_time TIMESTAMP(3),
  proctime AS PROCTIME()
) WITH (
  'connector' = 'kafka',
  'topic' = 'pageviews',
  'properties.bootstrap.servers' = 'localhost:9092',
  'scan.startup.mode' = 'earliest-offset',
  'format' = 'json'
);

并利用Flink 往 Kafka中灌入相应的数据

INSERT INTO pageviews VALUES
  (1, 101, TO_TIMESTAMP('2020-11-23 15:00:00')),
  (2, 104, TO_TIMESTAMP('2020-11-23 15:00:01.00'));

将 left join 结果写入 Kafka

我们首先测试是否能将Left join的结果灌入到 Kafka 之中。

首先,我们在 Sql client 中创建相应的表

CREATE TABLE enriched_pageviews (
  user_id BIGINT,
  user_region STRING,
  page_id BIGINT,
  view_time TIMESTAMP(3),
  WATERMARK FOR view_time as view_time - INTERVAL '5' SECOND,
  PRIMARY KEY (user_id, page_id) NOT ENFORCED
) WITH (
  'connector' = 'upsert-kafka',
  'topic' = 'enriched_pageviews',
  'properties.bootstrap.servers' = 'localhost:9092',
  'key.format' = 'json',
  'value.format' = 'json'
);

并利用以下语句将left join的结果插入到kafka对应的topic之中。

INSERT INTO enriched_pageviews
SELECT pageviews.user_id, region, pageviews.page_id, pageviews.view_time
FROM pageviews
LEFT JOIN users ON pageviews.user_id = users.user_id;

当作业跑起来后,我们可以另起一个 Terminal 利用命令docker exec -it kafka bash 进入kafka所在的容器之中。 Kafka的安装路径在于/opt/kafka,利用以下命令,我们可以打印topic内的数据./kafka-console-consumer.sh --bootstrap-server kafka:9094 --topic "enriched_pageviews" --from-beginning --property print.key=true

#预期结果
{"user_id":1,"page_id":101} {"user_id":1,"user_region":null,"page_id":101,"view_time":"2020-11-23 15:00:00"}
{"user_id":2,"page_id":104} {"user_id":2,"user_region":null,"page_id":104,"view_time":"2020-11-23 15:00:01"}
{"user_id":1,"page_id":101} null
{"user_id":1,"page_id":101} {"user_id":1,"user_region":"Berlin","page_id":101,"view_time":"2020-11-23 15:00:00"}
{"user_id":2,"page_id":104} null
{"user_id":2,"page_id":104} {"user_id":2,"user_region":"Beijing","page_id":104,"view_time":"2020-11-23 15:00:01"}

Left join中,右流发现左流没有join上但已经发射了,此时会发送DELETE消息,而非UPDATE-BEFORE消息清理之前发送的消息。详见org.apache.flink.table.runtime.operators.join.stream.StreamingJoinOperator#processElement

我们可以进一步在mysql中删除或者修改一些数据,来观察进一步的变化。

UPDATE users SET region = 'Beijing' WHERE user_id = 1;

DELETE FROM users WHERE user_id = 1;

将聚合结果写入 Kafka

我们进一步测试将聚合的结果写入到 Kafka 之中。

在Sql client 中构建以下表

CREATE TABLE pageviews_per_region (
  user_region STRING,
  cnt BIGINT,
  PRIMARY KEY (user_region) NOT ENFORCED
) WITH (
  'connector' = 'upsert-kafka',
  'topic' = 'pageviews_per_region',
  'properties.bootstrap.servers' = 'localhost:9092',
  'key.format' = 'json',
  'value.format' = 'json'
)

我们再用以下命令将数据插入到upsert-kafka之中。

INSERT INTO pageviews_per_region
SELECT
  user_region,
  COUNT(*)
FROM enriched_pageviews
WHERE user_region is not null
GROUP BY user_region;

我们可以通过以下命令查看 Kafka 中对应的数据

./kafka-console-consumer.sh --bootstrap-server kafka:9094 --topic "pageviews_per_region" --from-beginning --property print.key=true

# 预期结果
{"user_region":"Berlin"}    {"user_region":"Berlin","cnt":1}
{"user_region":"Beijing"}   {"user_region":"Beijing","cnt":1}
{"user_region":"Berlin"}    null
{"user_region":"Beijing"}   {"user_region":"Beijing","cnt":2}
{"user_region":"Beijing"}   {"user_region":"Beijing","cnt":1}

你可能感兴趣的:(【Flink 1.12】深度解析 Flink Upsert-kafka)