窗口延时、侧输出流数据处理

一 、 AllowedLateness API 延时关闭窗口

AllowedLateness 方法需要基于 WindowedStream 调用。AllowedLateness 需要设置一个延时时间,注意这个时间决定了窗口真正关闭的时间,而且是加上WaterMark的时间,例如 WaterMark的延时时间为2s,AllowedLateness 的时间为2s,那一个10的滚动窗口,0-10这个单位窗口正常的关窗时间应该是超过12s的数据到达之后就关窗。而AllowedLateness 是在12s的基础上继续延长了2s,也就是在14s的时候才真正去关闭 0-10s的窗口,但是在12s的时候会触发窗口计算,从12s之后到14s的数据每到达一个就会触发一次窗口计算。

二 、 OutputTag API 侧输出流

使用 OutputTag API 保证窗口关闭的数据依然可以获取,窗口到达AllowedLateness 时间后将彻底关闭,此时再属于该窗口范围内的数据将会流向 OutputTag 。

       context.collect(new Event("A", "/user", 1000L));
                Thread.sleep(3000);
                context.collect(new Event("B", "/prod", 6500L));
                Thread.sleep(3000);
                context.collect(new Event("C", "/cart", 4000L));
                Thread.sleep(3000);
                context.collect(new Event("D", "/user", 7500L));
                System.out.println("窗口关闭 ~ ");
                Thread.sleep(3000);
                context.collect(new Event("E", "/cente", 8500L));
                Thread.sleep(3000);
                context.collect(new Event("F", "/cente", 4000L));
                Thread.sleep(3000);
                context.collect(new Event("G", "/cente", 9200L));
                Thread.sleep(3000);
                context.collect(new Event("H", "/cente", 1000L));
                Thread.sleep(3000);
                context.collect(new Event("I", "/cente", 1500L));
                Thread.sleep(3000);

如果现在定义一个 5s的
滚动窗口,WaterMark延时时间为2s,AllowedLateness 延时时间为2s,此时相当于是 WaterMark到达9s的时候才会关闭0-5的窗口,也就是说最后两条数据会流向OutputTag . 当4000L数据到达后,会再次触发一次窗口计算。

完全与预期一致。

窗口延时、侧输出流数据处理_第1张图片

完整代码:

public class WindowOutputTest {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = Env.getEnv();

        DataStreamSource<Event> dataStreamSource = env.addSource(new SourceFunction<Event>() {
            @Override
            public void run(SourceContext<Event> context) throws Exception {
                context.collect(new Event("A", "/user", 1000L));
                Thread.sleep(3000);
                context.collect(new Event("B", "/prod", 6500L));
                Thread.sleep(3000);
                context.collect(new Event("C", "/cart", 4000L));
                Thread.sleep(3000);
                context.collect(new Event("D", "/user", 7500L));
                System.out.println("窗口关闭 ~ ");
                Thread.sleep(3000);
                context.collect(new Event("E", "/cente", 8500L));
                Thread.sleep(3000);
                context.collect(new Event("F", "/cente", 4000L));
                Thread.sleep(3000);
                context.collect(new Event("G", "/cente", 9200L));
                Thread.sleep(3000);
                context.collect(new Event("H", "/cente", 1000L));
                Thread.sleep(3000);
                context.collect(new Event("I", "/cente", 1500L));
                Thread.sleep(3000);
            }

            @Override
            public void cancel() {

            }
        });

        //operator
        SingleOutputStreamOperator<Event> operator = dataStreamSource.assignTimestampsAndWatermarks(
                WatermarkStrategy.<Event>forBoundedOutOfOrderness(Duration.ofSeconds(2))// 水位线延时2s
                        .withTimestampAssigner(new SerializableTimestampAssigner<Event>() {
                            @Override
                            public long extractTimestamp(Event event, long l) {
                                return event.timestamp;
                            }
                        })
        );

        OutputTag<Event> eventOutputTag = new OutputTag<Event>("late") {
        };

        WindowedStream<Event, Boolean, TimeWindow> windowedStream = operator.keyBy(d -> true)
                .window(TumblingEventTimeWindows.of(Time.of(5, TimeUnit.SECONDS)))
                .allowedLateness(Time.of(2, TimeUnit.SECONDS))
                .sideOutputLateData(eventOutputTag);


        SingleOutputStreamOperator<String> windowAgg = windowedStream.aggregate(new AggregateFunction<Event, Long, Long>() {
            @Override
            public Long createAccumulator() {
                return 0L;
            }

            @Override
            public Long add(Event event, Long acc) {
                return acc + 1;
            }

            @Override
            public Long getResult(Long acc) {
                return acc;
            }

            @Override
            public Long merge(Long aLong, Long acc1) {
                return null;
            }
        }, new ProcessWindowFunction<Long, String, Boolean, TimeWindow>() {
            @Override
            public void process(Boolean key, Context context, Iterable<Long> iterable, Collector<String> collector) throws Exception {
                long start = context.window().getStart();
                long end = context.window().getEnd();
                collector.collect(new Timestamp(start) + " ~ " + new Timestamp(end) + " ===> " + iterable.iterator().next());
            }
        });

        windowAgg.print("窗口数据 ");

        //获取测输出流中的延时数据
        DataStream<Event> sideOutput = windowAgg.getSideOutput(eventOutputTag);
        sideOutput.print("测输出流:-> ");


        env.execute();

    }

}

你可能感兴趣的:(大数据计算,开发语言,flink)