KeyState通过KeyedStateBackend进行创建和管理,Flink提供了基于JVM堆内存和RocksDB实现的KeyedStateBackend。KeyedStateBackend不仅提供了创建KeyedState的功能,也实现SnapshotStrategy接口,对KeyedStateBackend中的状态数据进行快照,实现状态数据的持久化。
在Task初始化时,每个Task实例初始化的过程中都会创建一个StateBackend,用于管理状态和Checkpoint操作。StreamTask初始化时还会对每个Operator进行状态初始化
Task实例初始化时会创建对应的状态后端并对所有Operator进行状态初始化
public abstract class StreamTask<OUT, OP extends StreamOperator<OUT>>
extends AbstractInvokable
implements AsyncExceptionHandler {
private void beforeInvoke() throws Exception {
disposedOperators = false;
LOG.debug("Initializing {}.", getName());
asyncOperationsThreadPool = Executors.newCachedThreadPool(new ExecutorThreadFactory("AsyncOperations", uncaughtExceptionHandler));
//创建Task实例的状态后端
stateBackend = createStateBackend();
checkpointStorage = stateBackend.createCheckpointStorage(getEnvironment().getJobID());
// timerService初始化,本质是一个ScheduledThreadPoolExecutor
if (timerService == null) {
ThreadFactory timerThreadFactory =
new DispatcherThreadFactory(TRIGGER_THREAD_GROUP, "Time Trigger for " + getName());
//用于算子中进行定时任务
timerService = new SystemProcessingTimeService(
this::handleTimerException,
timerThreadFactory);
}
operatorChain = new OperatorChain<>(this, recordWriter);
headOperator = operatorChain.getHeadOperator();
// task specific initialization
init();
// save the work of reloading state, etc, if the task is already canceled
if (canceled) {
throw new CancelTaskException();
}
// -------- Invoke --------
LOG.debug("Invoking {}", getName());
//算子状态初始化
actionExecutor.runThrowing(() -> {
// 对StreamTask中的所有Operator进行状态初始化
initializeStateAndOpen();
});
}
//对所有Operator状态初始化
private void initializeStateAndOpen() throws Exception {
StreamOperator<?>[] allOperators = operatorChain.getAllOperators();
for (StreamOperator<?> operator : allOperators) {
if (null != operator) {
operator.initializeState();
//状态初始化完毕后,启动算子
operator.open();
}
}
}
}
在HeapKeyedStateBackend中通过一个Map
public class HeapKeyedStateBackend<K> extends AbstractKeyedStateBackend<K> {
//存储该Task的
private final Map<String, StateTable<K, ?, ?>> registeredKVStates;
}
用户创建的KeyedState会借助StateTable进行存储,StateTable内部将KeyedState按KeyGroup划分为多个StateMap存储。每个KeyGroup对于一个StateMap,StateMap实际就是一个数组和链表组成的哈希表。
public abstract class StateTable<K, N, S>
implements StateSnapshotRestore, Iterable<StateEntry<K, N, S>> {
//每个KeyGroup对应一个StateMap存储KeyedState
protected final StateMap<K, N, S>[] keyGroupedStateMaps;
}
CopyOnWriteStateMap中的元素类型为StateMapEntry,与HashMap中的Entry类似,StateMapEntry是存储数据的实体。CopyOnWriteStateMap是一个类似HashMap的结构,采用渐进式rehash策略进行扩容。
public class CopyOnWriteStateMap<K, N, S> extends StateMap<K, N, S> {
protected static class StateMapEntry<K, N, S> implements StateEntry<K, N, S> {
final K key;
final N namespace;
S state;
StateMapEntry<K, N, S> next;
}
private StateMapEntry<K, N, S>[] primaryTable;
private StateMapEntry<K, N, S>[] incrementalRehashTable;
}
StateBackend作为flink的状态后端,提供了KeyedStateBackend、OperatorStateBackend以及CheckpointStorage的功能。根据创建的KeyedStateBackend和CheckpointStorage不同,分为MemoryStateBackend、FsStateBackend和RocksDBStateBackend三种状态后端。