需求,滑动窗口统计,keyby下过来一条就触发窗口统计,如果没消息过来,按60s触发一次窗口。
只能自定义Trigger
直接上代码
package com.tc.flink.demo.stream;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.tc.flink.conf.KafkaConfig;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer011;
import java.util.Properties;
public class WindowCountOrTimeTrigger {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
Properties propsConsumer = new Properties();
propsConsumer.setProperty("bootstrap.servers", KafkaConfig.KAFKA_BROKER_LIST);
propsConsumer.setProperty("group.id", "trafficwisdom-streaming");
propsConsumer.put("enable.auto.commit", false);
propsConsumer.put("max.poll.records", 1000);
FlinkKafkaConsumer011 consumer = new FlinkKafkaConsumer011("topic-test", new SimpleStringSchema(), propsConsumer);
consumer.setStartFromLatest();
DataStream stream = env.addSource(consumer);
stream.print();
DataStream> exposure = stream.map(new MapFunction>() {
@Override
public Tuple2 map(String value) throws Exception {
try {
JSONObject jsonObject = JSON.parseObject(value);
String itemId = jsonObject.getString("itemId");
return new Tuple2(itemId, 1);
} catch (Exception e) {
return Tuple2.of(null, null);
}
}
}).filter(tuple2 -> tuple2.f0 != null);
DataStream> result = exposure.keyBy(0).timeWindow(Time.minutes(5)).trigger(TimeCountTrigger.of(1, Time.minutes(1))).sum(1);
result.print();
env.execute();
}
}
自定义的Trigger
package com.tc.flink.demo.stream;
import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeutils.base.LongSerializer;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.triggers.Trigger;
import org.apache.flink.streaming.api.windowing.triggers.TriggerResult;
import org.apache.flink.streaming.api.windowing.windows.Window;
/**
* A {@link Trigger} that fires once the count of elements in a pane reaches the given count.
*
* @param The type of {@link Window Windows} on which this trigger can operate.
*/
@PublicEvolving
public class TimeCountTrigger extends Trigger
模拟发消息
{"action":"exposure","itemId":"1cdlTUJUCidYgcUQALhCpg==","time":"2019-11-06 16:01:00"}
{"action":"exposure","itemId":"LXFMRmnKqM7JiV75KQt+GQ==","time":"2019-11-06 16:02:00"}
{"action":"exposure","itemId":"LXFMRmnKqM7JiV75KQt+GQ==","time":"2019-11-06 16:03:00"}
统计结果这个样子的
3> {"action":"exposure","itemId":"1cdlTUJUCidYgcUQALhCpg==","time":"2019-11-06 16:01:00"}
onMaxCount ....
1> (1cdlTUJUCidYgcUQALhCpg==,1)
onProcessingTime ....
1> (1cdlTUJUCidYgcUQALhCpg==,1)
3> {"action":"exposure","itemId":"LXFMRmnKqM7JiV75KQt+GQ==","time":"2019-11-06 16:02:00"}
onMaxCount ....
4> (LXFMRmnKqM7JiV75KQt+GQ==,1)
onProcessingTime ....
4> (LXFMRmnKqM7JiV75KQt+GQ==,1)
onProcessingTime ....
1> (1cdlTUJUCidYgcUQALhCpg==,1)
3> {"action":"exposure","itemId":"LXFMRmnKqM7JiV75KQt+GQ==","time":"2019-11-06 16:03:00"}
onMaxCount ....
4> (LXFMRmnKqM7JiV75KQt+GQ==,2)
onProcessingTime ....
onProcessingTime ....
1> (1cdlTUJUCidYgcUQALhCpg==,1)
4> (LXFMRmnKqM7JiV75KQt+GQ==,2)
onProcessingTime ....
onProcessingTime ....
clear ....
clear ....
说明:TimeCountTrigger其实根据org.apache.flink.streaming.api.windowing.triggers.ContinuousProcessingTimeTrigger和org.apache.flink.streaming.api.windowing.triggers.CountTrigger合并而来。