滑动窗口适用场景:比如行程卡上统计最近14天内途径的城市,每次统计数据中会有上一个窗口最后13天的行程数据和最新1天的数据。
基于数据处理时间的滑动窗口
(1)数据源
每秒生成一条数据
public class IntegerSource implements SourceFunction<Integer> {
int i = 0;
@Override
public void run(SourceContext ctx) throws Exception {
while (true) {
ctx.collect(i++);
Thread.sleep(1000);
}
}
@Override
public void cancel() {
}
}
(2)示例
@Test
public void slidingProcessingTimeWindowsTest() throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.STREAMING);
DataStreamSource<Integer> source = env.addSource(new IntegerSource());
//基于ProcessingTime的滑动窗口,窗口长度时3s,每次滑动2s,即统计最近3s内的数据
source.windowAll(SlidingProcessingTimeWindows.of(Time.seconds(3), Time.seconds(1)))
.process(new ProcessAllWindowFunction<Integer, Integer, TimeWindow>() {
@Override
public void process(Context context, Iterable<Integer> elements, Collector<Integer> out) throws Exception {
Iterator<Integer> it = elements.iterator();
int sum = 0;
while (it.hasNext()) {
Integer next = it.next();
sum += next;
System.out.println("元素: " + next + " ,处理时间:" + new Date());
}
out.collect(sum);
}
})
.print();
env.execute("SlidingProcessingTimeWindows");
}
结果:每个窗口的数据包含上一个窗口最后2s的数据和新的1s的数据
元素: 0 ,处理时间:Sat Apr 16 23:45:19 CST 2022
7> 0
元素: 0 ,处理时间:Sat Apr 16 23:45:20 CST 2022
元素: 1 ,处理时间:Sat Apr 16 23:45:20 CST 2022
8> 1
元素: 0 ,处理时间:Sat Apr 16 23:45:21 CST 2022
元素: 1 ,处理时间:Sat Apr 16 23:45:21 CST 2022
元素: 2 ,处理时间:Sat Apr 16 23:45:21 CST 2022
1> 3
元素: 1 ,处理时间:Sat Apr 16 23:45:22 CST 2022
元素: 2 ,处理时间:Sat Apr 16 23:45:22 CST 2022
元素: 3 ,处理时间:Sat Apr 16 23:45:22 CST 2022
2> 6
元素: 2 ,处理时间:Sat Apr 16 23:45:23 CST 2022
元素: 3 ,处理时间:Sat Apr 16 23:45:23 CST 2022
元素: 4 ,处理时间:Sat Apr 16 23:45:23 CST 2022
3> 9
元素: 3 ,处理时间:Sat Apr 16 23:45:24 CST 2022
元素: 4 ,处理时间:Sat Apr 16 23:45:24 CST 2022
元素: 5 ,处理时间:Sat Apr 16 23:45:24 CST 2022
4> 12
元素: 4 ,处理时间:Sat Apr 16 23:45:25 CST 2022
元素: 5 ,处理时间:Sat Apr 16 23:45:25 CST 2022
元素: 6 ,处理时间:Sat Apr 16 23:45:25 CST 2022
5> 15
带偏移量的基于基于ProcessingTime的滑动窗口
示例
@Test
public void slidingProcessingTimeWindowsWithOffsetTest() throws Exception {
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.STREAMING);
DataStreamSource<Integer> source = env.addSource(new IntegerSource());
//基于ProcessingTime的滑动窗口,窗口长度时4s,每次滑动2s,偏移量是1s
source.windowAll(SlidingProcessingTimeWindows.of(Time.seconds(4), Time.seconds(2), Time.seconds(1)))
.process(new ProcessAllWindowFunction<Integer, Integer, TimeWindow>() {
@Override
public void process(Context context, Iterable<Integer> elements, Collector<Integer> out) throws Exception {
Iterator<Integer> it = elements.iterator();
int sum = 0;
while (it.hasNext()) {
Integer next = it.next();
sum += next;
System.out.println("元素: " + next + " ,处理时间:" + format.format(new Date()));
}
out.collect(sum);
}
})
.print();
env.execute("slidingProcessingTimeWindowsWithOffset");
}
结果:每次统计上一个窗口最后2s的数据和最新2s的数据,并且整体时间向后偏移了1s
元素: 0 ,处理时间:2022-04-16 23:58:37
元素: 1 ,处理时间:2022-04-16 23:58:37
2> 1
元素: 0 ,处理时间:2022-04-16 23:58:39
元素: 1 ,处理时间:2022-04-16 23:58:39
元素: 2 ,处理时间:2022-04-16 23:58:39
元素: 3 ,处理时间:2022-04-16 23:58:39
3> 6
元素: 2 ,处理时间:2022-04-16 23:58:41
元素: 3 ,处理时间:2022-04-16 23:58:41
元素: 4 ,处理时间:2022-04-16 23:58:41
元素: 5 ,处理时间:2022-04-16 23:58:41
4> 14
元素: 4 ,处理时间:2022-04-16 23:58:43
元素: 5 ,处理时间:2022-04-16 23:58:43
元素: 6 ,处理时间:2022-04-16 23:58:43
元素: 7 ,处理时间:2022-04-16 23:58:43
5> 22
根据数据的数量分隔窗口
示例
@Test
public void slidingCountWindowsTest() throws Exception {
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.STREAMING);
DataStreamSource<Integer> source = env.addSource(new IntegerSource());
//每个窗口5条数据,滑动2个数据
source.countWindowAll(5, 2)
.process(new ProcessAllWindowFunction<Integer, Integer, GlobalWindow>() {
@Override
public void process(Context context, Iterable<Integer> elements, Collector<Integer> out) throws Exception {
Iterator<Integer> it = elements.iterator();
int sum = 0;
while (it.hasNext()) {
Integer next = it.next();
sum += next;
System.out.println("元素: " + next + " ,处理时间:" + format.format(new Date()));
}
out.collect(sum);
}
})
.print();
env.execute("slidingCountWindows");
}
结果:
元素: 0 ,处理时间:2022-04-17 00:08:12
元素: 1 ,处理时间:2022-04-17 00:08:12
4> 1
元素: 0 ,处理时间:2022-04-17 00:08:14
元素: 1 ,处理时间:2022-04-17 00:08:14
元素: 2 ,处理时间:2022-04-17 00:08:14
元素: 3 ,处理时间:2022-04-17 00:08:14
5> 6
元素: 1 ,处理时间:2022-04-17 00:08:16
元素: 2 ,处理时间:2022-04-17 00:08:16
元素: 3 ,处理时间:2022-04-17 00:08:16
元素: 4 ,处理时间:2022-04-17 00:08:16
元素: 5 ,处理时间:2022-04-17 00:08:16
6> 15
元素: 3 ,处理时间:2022-04-17 00:08:18
元素: 4 ,处理时间:2022-04-17 00:08:18
元素: 5 ,处理时间:2022-04-17 00:08:18
元素: 6 ,处理时间:2022-04-17 00:08:18
元素: 7 ,处理时间:2022-04-17 00:08:18
7> 25
元素: 5 ,处理时间:2022-04-17 00:08:20
元素: 6 ,处理时间:2022-04-17 00:08:20
元素: 7 ,处理时间:2022-04-17 00:08:20
元素: 8 ,处理时间:2022-04-17 00:08:20
元素: 9 ,处理时间:2022-04-17 00:08:20
8> 35
基于事件时间(EventTime)的滑动窗口
(1)数据源
public class EventElementSource implements SourceFunction<EventElement> {
@Override
public void run(SourceContext ctx) throws Exception {
int id = 0;
Random random = new Random();
while (true) {
long time = new Date().getTime() + random.nextInt(5000);
Date date = new Date(time);
EventElement eventElement = new EventElement(id++, time, date);
System.out.println("生成时间:"+eventElement);
ctx.collect(eventElement);
Thread.sleep(1000);
}
}
@Override
public void cancel() {
}
}
(2)示例
@Test
public void slidingEventTimeWindowsTest() throws Exception {
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
.setParallelism(1);
//添加数据源
env.addSource(new EventElementSource())
//设置时间戳和水印,使用数据里的时间
.assignTimestampsAndWatermarks(WatermarkStrategy.<EventElement>forMonotonousTimestamps().withTimestampAssigner((eventTElement, re) -> eventTElement.getTime()))
//基于事件时间的滑动窗口
.windowAll(SlidingEventTimeWindows.of(Time.seconds(3),Time.seconds(2)))
.process(new ProcessAllWindowFunction<EventElement, Integer, TimeWindow>() {
@Override
public void process(Context context, Iterable<EventElement> elements, Collector<Integer> out) throws Exception {
Iterator<EventElement> it = elements.iterator();
int id = 0;
while (it.hasNext()) {
EventElement next = it.next();
id = next.getId();
System.out.println("元素: " + next + " ,处理时间:" + format.format(new Date()));
}
out.collect(id);
}
})
.print();
env.execute("slidingEventTimeWindows");
}
结果:
生成时间:EventElement(id=0, time=1650125744834, date=Sun Apr 17 00:15:44 CST 2022)
生成时间:EventElement(id=1, time=1650125742511, date=Sun Apr 17 00:15:42 CST 2022)
生成时间:EventElement(id=2, time=1650125745653, date=Sun Apr 17 00:15:45 CST 2022)
元素: EventElement(id=0, time=1650125744834, date=Sun Apr 17 00:15:44 CST 2022) ,处理时间:2022-04-17 00:15:42
元素: EventElement(id=1, time=1650125742511, date=Sun Apr 17 00:15:42 CST 2022) ,处理时间:2022-04-17 00:15:42
1
生成时间:EventElement(id=3, time=1650125744252, date=Sun Apr 17 00:15:44 CST 2022)
生成时间:EventElement(id=4, time=1650125748848, date=Sun Apr 17 00:15:48 CST 2022)
元素: EventElement(id=0, time=1650125744834, date=Sun Apr 17 00:15:44 CST 2022) ,处理时间:2022-04-17 00:15:44
元素: EventElement(id=2, time=1650125745653, date=Sun Apr 17 00:15:45 CST 2022) ,处理时间:2022-04-17 00:15:44
元素: EventElement(id=3, time=1650125744252, date=Sun Apr 17 00:15:44 CST 2022) ,处理时间:2022-04-17 00:15:44
3
生成时间:EventElement(id=5, time=1650125745767, date=Sun Apr 17 00:15:45 CST 2022)
生成时间:EventElement(id=6, time=1650125749420, date=Sun Apr 17 00:15:49 CST 2022)
元素: EventElement(id=4, time=1650125748848, date=Sun Apr 17 00:15:48 CST 2022) ,处理时间:2022-04-17 00:15:46
4
生成时间:EventElement(id=7, time=1650125751065, date=Sun Apr 17 00:15:51 CST 2022)
元素: EventElement(id=4, time=1650125748848, date=Sun Apr 17 00:15:48 CST 2022) ,处理时间:2022-04-17 00:15:47
元素: EventElement(id=6, time=1650125749420, date=Sun Apr 17 00:15:49 CST 2022) ,处理时间:2022-04-17 00:15:47
6
生成时间:EventElement(id=8, time=1650125753373, date=Sun Apr 17 00:15:53 CST 2022)
元素: EventElement(id=7, time=1650125751065, date=Sun Apr 17 00:15:51 CST 2022) ,处理时间:2022-04-17 00:15:48
7
生成时间:EventElement(id=9, time=1650125752809, date=Sun Apr 17 00:15:52 CST 2022)
生成时间:EventElement(id=10, time=1650125751337, date=Sun Apr 17 00:15:51 CST 2022)
生成时间:EventElement(id=11, time=1650125756070, date=Sun Apr 17 00:15:56 CST 2022)
元素: EventElement(id=8, time=1650125753373, date=Sun Apr 17 00:15:53 CST 2022) ,处理时间:2022-04-17 00:15:51
元素: EventElement(id=9, time=1650125752809, date=Sun Apr 17 00:15:52 CST 2022) ,处理时间:2022-04-17 00:15:51
9