代码示例,通过flink消费kafka,查看list状态中的数据,确定参数的具体含义
kafka的代码:发送两个key值,一秒发送一次
for(int i = 0; i< 100; i++){
JSONObject object = new JSONObject();
object.put("id", 1);
object.put("value", i);
String s = object.toJSONString();
kafkaProducer.send(new ProducerRecord("test_topic_partition_one", s.getBytes(StandardCharsets.UTF_8))).get();
object = new JSONObject();
object.put("id", 2);
object.put("value", 100 + i);
s = object.toJSONString();
kafkaProducer.send(new ProducerRecord("test_topic_partition_one", s.getBytes(StandardCharsets.UTF_8))).get();
Thread.sleep(1000);
}
flink消费kafka示例:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(10 * 1000);
KafkaSource source = KafkaSource.builder()
.setBootstrapServers("broker:9092")
.setProperties(properties)
.setTopics("test_topic_partition_one")
.setGroupId("my-group")
.setStartingOffsets(OffsetsInitializer.latest())
.setValueOnlyDeserializer(new SimpleStringSchema())
.build();
DataStreamSource kafkaSource = env
.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source")
.setParallelism(2);
DataStream> dataStream = kafkaSource.map(
new MapFunction>() {
@Override
public Tuple2 map(String value) throws Exception {
JSONObject object = JSONObject.parseObject(value);
return new Tuple2(object.getString("id"), object.getInteger("value"));
}
});
DataStream resultStream = dataStream
.keyBy(value -> value.f0) // 根据第一个字段(键)进行分组
.process(new ListValueProcess());
// 打印结果
resultStream.print();
ListValueProcess状态函数
@Override
public void processElement(Tuple2 value, KeyedProcessFunction, String>.Context ctx, Collector out) throws Exception {
// 添加元素到 ListState
listState.add(value.f1);
// 获取 ListState 中的所有元素,并输出它们
String key = value.f0;
List list = new ArrayList<>();
for (Integer integer : listState.get()) {
list.add(integer);
}
String result = "key:" + key + ", value:" +list;
// 输出结果
out.collect(result);
}
@Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.seconds(10))
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
.setStateVisibility(StateTtlConfig.StateVisibility.ReturnExpiredIfNotCleanedUp)
.build();
// 初始化 ListState
// 不同的key 具有不用的listState
// 用于存储一个key多个值
ListStateDescriptor integerListStateDescriptor = new ListStateDescriptor<>("my-list-state", Integer.class);
integerListStateDescriptor.enableTimeToLive(ttlConfig);
listState = getRuntimeContext().getListState(integerListStateDescriptor);
}
可以看到StateTtlConfig大部份有三个参数
指定状态保存时间
setUpdateType
设置状态更新策略:OnCreateAndWrite
和 OnReadAndWrite
setStateVisibility
设置状态可见行 :ReturnExpiredIfNotCleanedUp
和 NeverReturnExpired
这里我们保存状态时间是10s
OnCreateAndWrite
: 表示当状态被创建与更新的时候,表示更新了状态
OnReadAndWrite
:表示状态被创建与更新和读取的时候,表示更新了状态
ReturnExpiredIfNotCleanedUp
:表示状态过期了但没有删除,也可以读取到状态
NeverReturnExpired
:表示状态过期就读取不到
结果示例:
当:OnCreateAndWrite
和ReturnExpiredIfNotCleanedUp
时
1> key:1, value:[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124]
1> key:1, value:[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125]
1> key:1, value:[6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129]
1> key:1, value:[18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130]
可以看到,状态会定期删除过期的数据,而且数据可见可能大于10s的范围。
当OnCreateAndWrite
和NeverReturnExpired
时
1> key:2, value:[109, 110, 111, 112, 113, 114, 115, 116, 117]
1> key:1, value:[10, 11, 12, 13, 14, 15, 16, 17, 18]
1> key:2, value:[110, 111, 112, 113, 114, 115, 116, 117, 118]
1> key:1, value:[11, 12, 13, 14, 15, 16, 17, 18, 19]
1> key:2, value:[111, 112, 113, 114, 115, 116, 117, 118, 119]
1> key:1, value:[12, 13, 14, 15, 16, 17, 18, 19, 20]
1> key:2, value:[112, 113, 114, 115, 116, 117, 118, 119, 120]
1> key:1, value:[13, 14, 15, 16, 17, 18, 19, 20, 21]
1> key:2, value:[113, 114, 115, 116, 117, 118, 119, 120, 121]
1> key:1, value:[14, 15, 16, 17, 18, 19, 20, 21, 22]
1> key:2, value:[114, 115, 116, 117, 118, 119, 120, 121, 122]
可以看到,状态的数据只保留最近10s内的值
当OnReadAndWrite
和ReturnExpiredIfNotCleanedUp
时
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]
可以看到,状态保留了所有的数据,因为每次都会读取了数据,所以不会过期
当OnReadAndWrite
和NeverReturnExpired
时
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130]
1> key:1, value:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
1> key:2, value:[101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131]
可以看到,状态保留了所有的数据,因为每次都会读取了数据,所以不会过期