1)TTL的更新策略(默认是OnCreateAndWrite)
StateTtlConfig.UpdateType.OnCreateAndWrite-仅在创建和写入时更新
StateTtlConfig.UpdateType.OnReadAndWrite-读取时也更新
StateTtlConfig.UpdateType.Disabled:状态不过期
2)数据在过期但还未被清理时的可见性配置如下(默认为NeverReturnExpired):
StateTtlConfig.StateVisibility.NeverReturnExpired-不返回过期数据
StateTtlConfig.StateVisibility.ReturnExpiredIfNotCleanedUp-会返回过期但未清理的数据
NeverReturnExpired情况下,不管是否被物理删除。均不返回过期数据。ReturnExpiredIfNotCleanedUp在数据被物理删除前都会返回。
3)状态的清除策略
EMPTY_STRATEGY(空策略,不清除)
FULL_STATE_SCAN_SNAPSHOT(清除完成快照)在获取完整状态快照时激活清理
INCREMENTAL_CLEANUP:增量清理
cleanupIncrementally(intcleanupSize,booleanrunCleanupForEveryRecord)(增量清除)
参数1:每次触发清除状态的最大key数
参数2:是否对每个已处理的记录运行增量清理。
ROCKSDB_COMPACTION_FILTER(RocksDB压缩过滤清除)
RocksDB定期运行异步压缩以merge状态更新并减少存储
import org.apache.flink.api.common.state.StateTtlConfig;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.time.Time;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.util.Collector;
public class StateAndExpire {
public static void main(String[] args) throws Exception {
// 创建执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 从端口接入数据
DataStreamSource<String> line = env.socketTextStream("localhost", 8888);
line.keyBy(e -> e.split(",")[0])
.process(new ProcessFunction<String, Tuple2<String, Integer>>() {
private transient ValueState<Tuple2<String, Integer>> valueState;
@Override
public void open(Configuration parameters) throws Exception {
// 创建状态描述器
ValueStateDescriptor<Tuple2<String, Integer>> valueStateDescriptor = new ValueStateDescriptor<>("word_count",
TypeInformation.of(new TypeHint<Tuple2<String, Integer>>() {}));
// 设置状态过期时间为 1 分钟
StateTtlConfig ttlConfig = StateTtlConfig.newBuilder(Time.minutes(1))
// TTL 的更新策略(默认是 OnCreateAndWrite):
//StateTtlConfig.UpdateType.OnCreateAndWrite - 仅在创建和写入时更新
//StateTtlConfig.UpdateType.OnReadAndWrite - 读取时也更新
//StateTtlConfig.UpdateType.Disabled:状态不过期
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
// 数据在过期但还未被清理时的可见性配置如下(默认为 NeverReturnExpired):
//StateTtlConfig.StateVisibility.NeverReturnExpired - 不返回过期数据
//StateTtlConfig.StateVisibility.ReturnExpiredIfNotCleanedUp - 会返回过期但未清理的数据
//NeverReturnExpired 情况下,不管是否被物理删除。均不返回过期数据。 ReturnExpiredIfNotCleanedUp 在数据被物理删除前都会返回。
.setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
// 状态的清除策略
// EMPTY_STRATEGY(空策略,不清除)
// FULL_STATE_SCAN_SNAPSHOT(清除完成快照)在获取完整状态快照时激活清理
// cleanupIncrementally(int cleanupSize, boolean runCleanupForEveryRecord)(增量清除)参数1:每次触发清除状态的最大key数 参数2:是否对每个已处理的记录运行增量清理。
// ROCKSDB_COMPACTION_FILTER(RocksDB压缩过滤清除)RocksDB定期运行异步压缩以merge状态更新并减少存储
.cleanupFullSnapshot()
.build();
// 状态描述器添加状态配置
valueStateDescriptor.enableTimeToLive(ttlConfig);
// 根据状态描述器创建状态
valueState = getRuntimeContext().getState(valueStateDescriptor);
}
@Override
public void processElement(String input,
ProcessFunction<String, Tuple2<String, Integer>>.Context ctx,
Collector<Tuple2<String, Integer>> out) throws Exception {
// 获取状态
Tuple2<String, Integer> state = valueState.value();
String[] fields = input.split(",");
if (state == null) {
valueState.update(new Tuple2<>(fields[0], Integer.parseInt(fields[1])));
out.collect(new Tuple2<>(fields[0], Integer.parseInt(fields[1])));
} else {
state.f1 += Integer.parseInt(fields[1]);
valueState.update(new Tuple2<>(state.f0, state.f1));
out.collect(new Tuple2<>(state.f0, state.f1));
}
}
}).print();
env.execute();
}
}
1)开启8888端口输入测试数据
nc -lk 8888
a,1
b,1
a,2
b,2
控制台输出结果为:
6> (a,1)
2> (b,1)
6> (a,3)
2> (b,3)
2)间隔 1 分钟,此时状态已被自动清理,输入测试数据
a,1
b,2
控制台输出结果为:
6> (a,1)
2> (b,2)