在使用“工厂容器”创建“超级组件”时,其中一步就是创建DispatcherRunner。在创建DispatcherRunner的时候,有一个核心参数–HaServicesJobGraphStoreFactory。它是JobGraphStoreFactory的实现子类,可以通过工厂模式构造出JobGraphStore来。
// 创建DispatcherRunner,它会在后面被LeaderElectionService服务启动
// 由DispatcherRunnerFactory创建DispatcherRunner,Dispatcher组件要依赖DispatcherRunner来启动、运行,DispatcherRunner需要DispatcherRunnerFactory创建
// DispatcherRunner提供了Dispatcher启动运行、Leader选举的能力
dispatcherRunner = dispatcherRunnerFactory.createDispatcherRunner(
highAvailabilityServices.getDispatcherLeaderElectionService(), // 从高可用服务中,获取到Dispatcher的“leader选举服务”
fatalErrorHandler,
// 创建JobGraphStoreFactory的实现子类,它会创建出JobGraphStore
// DispatcherLeaderProcess之所以能够“恢复JobGraph”,完全是因为JobGraphListener监听了JobGraphStore,
// JobGraphStore中对JobGraph的增加、删除,都能通过JobGraphListener通知到DispatcherLeaderProcess
new HaServicesJobGraphStoreFactory(highAvailabilityServices),
ioExecutor,
rpcService,
partialDispatcherServices);
JobGraphStore的实现子类中,只有ZooKeeperJobGraphStore可以提供JobGraph的持久化和恢复操作。 JobGraphStore通过JobGraphListener实现了对JobGraphStore增加、删除JobGraph的监听,监听方就是DispatcherLeaderProcess。当JobGraphStore中的JobGraph发生变化时,JobGraphListener就会立即通知DispatcherLeaderProcess,根据需要决定是否启动或停止JobGraph对应的作业。
在构建DispatcherRunner时就已经创建好了JobGraphStoreFactory
new HaServicesJobGraphStoreFactory(highAvailabilityServices)
在创建DispatcherLeaderProcess时,会顺便(使用工厂模式)创建JobGraphStore
/**
* 使用DispatcherLeaderProcessFactory创建DispatcherLeaderProcess
*/
@Override
public DispatcherLeaderProcess create(UUID leaderSessionID) {
return SessionDispatcherLeaderProcess.create(
leaderSessionID,
dispatcherGatewayServiceFactory,
// 使用工厂模式创建JobGraphStore
jobGraphStoreFactory.create(),
ioExecutor,
fatalErrorHandler);
}
HaServicesJobGraphStoreFactory作为JobGraphStoreFactory的实现子类,创建JobGraphStore的方法由高可用服务提供:
/**
* 创建JobGraphStore
*/
@Override
public JobGraphStore create() {
try {
// 高可用服务 HighAvailabilityServices提供了创建JobGraphStore的方法
return highAvailabilityServices.getJobGraphStore();
} catch (Exception e) {
throw new FlinkRuntimeException(
String.format(
"Could not create %s from %s.",
JobGraphStore.class.getSimpleName(),
highAvailabilityServices.getClass().getSimpleName()),
e);
}
}
SessionDispatcherLeaderProcess启动时,会先将JobGraphStore启动起来。
/**
* 启动JobGraphStore
*/
private void startServices() {
try {
jobGraphStore.start(this);
} catch (Exception e) {
throw new FlinkRuntimeException(
String.format(
"Could not start %s when trying to start the %s.",
jobGraphStore.getClass().getSimpleName(),
getClass().getSimpleName()),
e);
}
}
然后就要异步的从JobGraphStore中将JobGraph恢复出来
/**
* 异步的将JobGraph从JobGraphStore中恢复出来
*/
private Collection<JobGraph> recoverJobs() {
log.info("Recover all persisted job graphs.");
// 从JobGraphStore中获取JobID列表
final Collection<JobID> jobIds = getJobIds();
final Collection<JobGraph> recoveredJobGraphs = new ArrayList<>();
for (JobID jobId : jobIds) {
// 根据JobID,从JobGraphStore中获取对应的JobGraph,并将其添加到List中
recoveredJobGraphs.add(recoverJob(jobId));
}
log.info("Successfully recovered {} persisted job graphs.", recoveredJobGraphs.size());
// 返回这个装有JobGraph的List
return recoveredJobGraphs;
}
从得到JobGraph后,就要创建Dispatcher对JobGraph进行调度、执行。这一步本质上就是将需要恢复的JobGraph全都放到了Dispatcher的HashSet集合中,然后会遍历这个HashSet集合,Dispatcher会对Set集合内的每个JobGraph进行分发,并安排JobManager执行…
/**
* 恢复JobGraph,使其重新由Dispatcher调度、执行
*/
private void startRecoveredJobs() {
for (JobGraph recoveredJob : recoveredJobs) {
// 让Dispatcher重新对JobGraph进行调度(安排JobManager执行)
FutureUtils.assertNoException(runJob(recoveredJob)
.handle(handleRecoveredJobStartError(recoveredJob.getJobID())));
}
// JobGraph恢复成功一个,就从HashSet集合中remove一个
recoveredJobs.clear();
}