Flink 的state管理(1) -Flink SQL的状态清理

Flink 的状态管理

  • Flink SQL实现状态清理

Flink SQL实现状态清理

  1. 背景:分组处理产生的结果会作为中间状态存储下来,随着分组key的不断增加,状态自然也就会不断膨胀。但是这些状态数据基本都具有时效性,不必永久保留。例如:topN语法进行去重,重复的数据一般都位于特定的区间内(例如1小时或者1天),过了这段时间之后,对应的状态就不需要了。随着时间的增长以及key的增加,面临超出存储的风险越来越大。在一些continuous queries的情况下,不得不去限制state的大小。是否去限制state的大小,则个取决于数据本身特性以及query本身是否需要去限制state大小
  2. 如何限制state大小以及参数含义
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
//获取tableconfig
TableEnvironmentConfig tConfig = tableEnv.getConfig();
//设置状态清理参数
tConfig.setIdleStateRetentionTime(Time.hours(12),Time.hours(24));

minIdleRetentionTime: key被移除前最少的空闲时间
maxidleStateRetentionTime: key被移除前最长的空闲时间

public void setIdStateRetentionTime(Time minTime,Time maxTime)
  1. 实现原理与源码分析
    Flink SQL空闲状态保留的时间底层是基于keyedProcessFunction 函数来实现的,然后为每个key,结合空闲状态的最小值和最大值注册timer,然后到时间就进行状态清理,具体从keyedProcessFunctionWithCleanupState开始看:
/**
 * A function that processes elements of a stream, and could cleanup state.
 * @param  Type of the key.
 * @param   Type of the input elements.
 * @param  Type of the output elements.
 */
public abstract class KeyedProcessFunctionWithCleanupState<K, IN, OUT>
    extends KeyedProcessFunction<K, IN, OUT> implements CleanupState {

    private static final long serialVersionUID = 2084560869233898457L;

    private final long minRetentionTime;
    private final long maxRetentionTime;
    protected final boolean stateCleaningEnabled;

    // holds the latest registered cleanup timer
    private ValueState<Long> cleanupTimeState;

    public KeyedProcessFunctionWithCleanupState(long minRetentionTime, long maxRetentionTime) {
        this.minRetentionTime = minRetentionTime;
        this.maxRetentionTime = maxRetentionTime;
        this.stateCleaningEnabled = minRetentionTime > 1;
    }
}

首先这个类有一个是否能够清理空闲状态的标志,当空闲状态大于1时,这个标志就是true。
再看下具体timer的注册逻辑,空闲状态清理的Timer时调用其方法registerProcessingCleanupTimer来进行注册,而其中又调用了cleanupState中的registerProcessingCleanupTimer方法:

/**
 * Base interface for clean up state, both for {@link ProcessFunction} and {@link CoProcessFunction}.
 */
public interface CleanupState {

    default void registerProcessingCleanupTimer(
            ValueState<Long> cleanupTimeState,
            long currentTime,
            long minRetentionTime,
            long maxRetentionTime,
            TimerService timerService) throws Exception {

        // last registered timer
        Long curCleanupTime = cleanupTimeState.value();

        // check if a cleanup timer is registered and
        // that the current cleanup timer won't delete state we need to keep
        if (curCleanupTime == null || (currentTime + minRetentionTime) > curCleanupTime) {
            // we need to register a new (later) timer
            long cleanupTime = currentTime + maxRetentionTime;
            // register timer and remember clean-up time
            timerService.registerProcessingTimeTimer(cleanupTime);
            // delete expired timer
            if (curCleanupTime != null) {
                timerService.deleteProcessingTimeTimer(curCleanupTime);
            }
            cleanupTimeState.update(cleanupTime);
        }
    }
}

当某个key有消息处理时候,先从状态中取出该key最新的timer的触发时间,如果为空或者当前时间+状态最小空闲状态保留时间>上次注册的timer的触发清理的时间,那么也重新注册Timer,Timer的时间也为当前时间+最大空闲状态保留时间,long cleanupTime = currentTime + maxRetentionTime;
同时删除上一次注册的清理的Timer。如果某个key的状态被清理掉,如果后续再有这个key的消息记录时,会被当做该key的第一条记录来进行处理。

本文参考:
https://www.cnblogs.com/qiu-hua/p/14471568.html

你可能感兴趣的:(flink,state,flink入门,Flink,watermark,flink,state,flinksql)