本文为学习flink入门与实战/网易云课堂-flink大数据项目实战课程的笔记整理
1.Stream中,Time的种类有三种:Event Time/Ingestion Time/Processing Time
2.三种Time之间的关系
3.设置Time的方法:
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
flink处理时,容易出现数据乱序的情况。在计算window时,不能无限期等待,因此需要有一个机制来保证,在特定时间之后,必须触发window计算,该机制为watermark。
只有Event Time时需要指定watermark和timestamp,watermark和timestamp采用毫秒作为计量单位。
有序Stream中的watermark:
无序Stream中的watermark:
多并行度Stream的watermark:
一个opt有多个入度时,watermark会取所有入度中最小的watermark
a.接收到Source的数据后,立即生成watermark
b.在map/filter等操作后生成(timestamp assigner/watermark generator)
示例代码:
package com.zzh.testWindow;
import com.zzh.testJoin.Transcript;
import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import java.sql.Timestamp;
public class testWindow {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env=StreamExecutionEnvironment.createLocalEnvironment();
//设置时间类型为event time
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream dataStream=env.fromElements(getTranscriptDataSource());
//在opt中设置watermark
DataStream dataStreamWithTimeStamp=dataStream.filter(new FilterFunction() {
@Override
public boolean filter(Transcript transcript) throws Exception {
if (transcript.getScore()>60){
return true;
}
return false;
}
}).assignTimestampsAndWatermarks(new MyWaterMark(3500));
dataStreamWithTimeStamp.timeWindowAll(Time.seconds(10)).reduce(new ReduceFunction(){
@Override
public Transcript reduce(Transcript lastData, Transcript newData) throws Exception {
System.out.println(lastData);
System.out.println(newData);
System.out.println("=====================");
lastData.setScore((lastData.getScore()+newData.getScore())/2);
return lastData;
}
}).print();
env.execute("finish");
}
private static Transcript[] getTranscriptDataSource(){
return new Transcript[]{
new Transcript("1","张三","语文",100, Timestamp.valueOf("2020-07-01 11:1:1").getTime()),
new Transcript("2","李四","语文",78,Timestamp.valueOf("2020-07-01 11:3:1").getTime()),
new Transcript("3","王五","语文",99,Timestamp.valueOf("2020-07-01 11:3:4").getTime()),
new Transcript("4","赵六","语文",81,Timestamp.valueOf("2020-07-01 11:3:9").getTime()),
new Transcript("5","钱七","语文",59,Timestamp.valueOf("2020-07-01 11:1:10").getTime()),
new Transcript("6","马二","语文",97,Timestamp.valueOf("2020-07-01 11:1:12").getTime()),
};
}
}
a.wtih periodic watermarks
概述:
周期性调用getCurrentWatermark()方法,若获取的watermark不为null且大于上一个watermark,则向下游发送
特点:
ExecutionConfig.setAutoWatermarkInterval();
示例代码:
package com.zzh.testWindow;
import com.zzh.testJoin.Transcript;
import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks;
import org.apache.flink.streaming.api.watermark.Watermark;
import javax.annotation.Nullable;
public class MyWaterMark implements AssignerWithPeriodicWatermarks {
private long currentMaxTimeStamp;
private long timeBounded;
public MyWaterMark(long timeBounded){
this.timeBounded=timeBounded;
}
@Nullable
@Override
public Watermark getCurrentWatermark() {
//当当前watermark比上一次大,则向发射数据,因此此处使用最大timestamp减去bounded
return new Watermark(this.currentMaxTimeStamp-this.timeBounded);
}
@Override
public long extractTimestamp(Transcript transcript, long l) {
//获取当前最大的时间戳
long currentTimeStamp=transcript.getTime();
this.currentMaxTimeStamp=Math.max(currentTimeStamp,this.currentMaxTimeStamp);
return currentTimeStamp;
}
}
b.with punctuated watermarks
特点:
示例代码:
package com.zzh.testWindow;
import com.zzh.testJoin.Transcript;
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks;
import org.apache.flink.streaming.api.watermark.Watermark;
import javax.annotation.Nullable;
public class PunctuatedWaterMark implements AssignerWithPunctuatedWatermarks {
@Nullable
@Override
public Watermark checkAndGetNextWatermark(Transcript transcript, long l) {
//l等价于transcript的timestamp
return transcript.getTime()>0?new Watermark(l):null;
}
@Override
public long extractTimestamp(Transcript transcript, long l) {
return transcript.getTime();
}
}
.assignTimestampsAndWatermarks(new AscendingTimestampExtractor() {
@Override
public long extractAscendingTimestamp(Transcript element) {
return element.getTime();
}
});
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(Time.seconds(10)) {
@Override
public long extractTimestamp(Transcript element) {
return element.getTime();
}
});
示例代码:
OutputTag lateOutputTag=new OutputTag("late-date");
dataStreamWithTimeStamp.timeWindowAll(Time.seconds(10)).
allowedLateness(Time.seconds(10)).
sideOutputLateData(lateOutputTag).