最近由于Flink relase了新的版本1.12,更新了upsert kafka,以及更佳完善的流批一体机制,所以迫不及待想试试 Flink1.12,前段时间因为事情比较多,也是
很久没搞Flink,最近的一个需求刚好用到窗口,所以想着有空复习一下 Flink内容刚好也跟上一波12的风,于是打开官网了解了一下,顺便复习复习window机制
其实之前的window 机制一直停留在使用层次,这次刚好有空就深入研究下,于是乎有了这篇文章 其实我觉得如果搞Flink,想要深入肯定是绕不过窗口和
watermark,这两天周末写了几个demo去复习了一下,给我的感觉就是温故而知新,因为之前需求匆忙 所以没办法深入,最近又玩了一次,发现好多点之前都
没有注意到,首先我们拿出测试用例
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771480000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771481000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771482000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771483000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771484000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771485000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771486000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771487000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771488000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771489000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771490000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771491000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771492000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771493000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771494000"}
{
"orderId":"10001","shopName":"TestData","userID":"10007","amount":10.00,"sum":1,"orderStatus":1,"orderTime":"1607771495000"}
public class FlinkDemo02 {
public static final String KAFKA_GROUP_ID = "test";
public static final String KAFKA_TOPIC = "flink";
public static void main(String[] args) {
Properties pro = new Properties();
pro.setProperty("bootstrap.servers", GlobalPublicVariables.KAFKA_SERVER);
pro.setProperty("group.id", KAFKA_GROUP_ID);
final StreamExecutionEnvironment env = FlinkContextUtil.getFlinkEnv(true, true);
DataStreamSource<String> input = env.addSource(new FlinkKafkaConsumer<>(KAFKA_TOPIC, new SimpleStringSchema(), pro).setStartFromLatest());
//todo 未写完
SingleOutputStreamOperator<OrderInformation> data = input.map((line) -> {
return new ObjectMapper().readValue(line, OrderInformation.class);
}).returns(TypeInformation.of(OrderInformation.class)).
assignTimestampsAndWatermarks(WatermarkStrategy
.<OrderInformation>forBoundedOutOfOrderness(Duration.ofMillis(1000))
.withTimestampAssigner((orderInformation, timestamp) -> Long.parseLong(orderInformation.getOrderTime()))).startNewChain();
//需求每五秒统计一次金额
//统计要求,每隔五秒统计一下五秒内的,而不是每隔五秒统计一下滚动,而是滑动
SingleOutputStreamOperator<OrderInformation> sum = data.windowAll(TumblingEventTimeWindows.of(Time.milliseconds(5000)))
.sum("amount").startNewChain();
sum.print();
try {
//打印执行计划
System.out.println(env.getExecutionPlan());
env.execute("TestFLink");
} catch (Exception e) {
e.printStackTrace();
}
}
}
一直以为Flink的事件时间开始是与第一条数据的进入时间开始的,但是我测试时发现并不是这样,我们先看下我遇
到的情况,如上代码一个五秒的窗,Watermark限制乱序,能够忍受的触发条件是:
eventTime - watermark时间(3s)> window闭合时间
窗口便会关闭,然后我们插入数据输出结果
//当我的数据是从1607771480000;开始执行时输出的内容是50
OrderInformation(orderId=10001, shopName=TestData, userID=10007, amount=50.0, sum=1, orderStatus=1, orderTime=1607771480000)
//当我的数据是从1607771481000;开始执行时输出的内容是40,就很奇怪
在我看来watermark触发时间应该是时间戳到9000而且结果也应该是五十,但是并没有得到我想要的结果于是我就就去百度了,发现关注这个问题的人不是很多或者说我的关键字搜索错误,又或者说各位大佬理解能力好,并不像我理解的这样子,于是我去看了下源码
可以看到上面方法指定了Flink注册窗口时,会先设置一个Start时间,这个Start计算逻辑如下
我们来试一下上面的代码逻辑
@Test
public void testFlinkWindowTimeStamp() {
long timestamp = 1607771481000L;
long offset=0;
long windowSize=5000;
System.out.println((timestamp - (timestamp - offset + windowSize) % windowSize));
}
//输出结果1607771480000
执行完以后我只想对自己说以后学习东西还是尽量深入别看了文章直接用,一定要严谨,本次文章到此结束,如果文章对你有收获那就是对我最大的帮助