本文相关内容:
Flink 1.13 源码解析 目录汇总
Flink 1.13 源码解析前导——Akka通信模型
Flink 1.13 源码解析——JobManager启动流程 WebMonitorEndpoint启动
Flink 1.13 源码解析——Flink 作业提交流程 上
目录
前言
一、JobSubmitHandler解析JobGraph并交给Dispatcher
二、Dispatcher接收JobGraph并初始化JobMaster并启动JobMaster
2.1、初始化JobMaster所需的相关基础服务
2.2、JobMaster的Leader竞选流程
2.3、JobMaster的初始化和启动
总结
在上一章中我们讲到,在env.execute环节中,根据我们构建的Transformations集合,构建出StreamGraph,再将StreamGraph转化为JobGraph,并将JobGraph持久化,最终将我们的JobGraphFile以及依赖Jar以及其他一些配置构建为一个RequestBody,通过RestClient内部构建的Netty客户端发送至JobManager中的WebMonitorEndpoint中的Netty 服务端,再由Netty服务端解析url交给JobSubmitHandler处理。
在这一章中,我们来分析一下JobManager接收到RestClient发送来的HttpRequest后的处理流程。
客户端构建好的JobGraph以及所需的资源会发送给WebMonitorEndpoint。在WebMonitorEndpoint内部有一个Router,用来解析url,并发送给url对应的handler,然后回调该handler的handleRequest方法,我们直接来看JobSubmitHandler的handleRequest方法:
/*
TODO 从磁盘文件反序列化得到JobGraph, 并转交给Dispatcher
*/
@Override
protected CompletableFuture handleRequest(
@Nonnull HandlerRequest request,
@Nonnull DispatcherGateway gateway)
throws RestHandlerException {
// TODO 从请求中获取文件: 包含JobGraph序列化文件nameToFile
final Collection uploadedFiles = request.getUploadedFiles();
final Map nameToFile =
uploadedFiles.stream()
.collect(Collectors.toMap(File::getName, Path::fromLocalFile));
if (uploadedFiles.size() != nameToFile.size()) {
throw new RestHandlerException(
String.format(
"The number of uploaded files was %s than the expected count. Expected: %s Actual %s",
uploadedFiles.size() < nameToFile.size() ? "lower" : "higher",
nameToFile.size(),
uploadedFiles.size()),
HttpResponseStatus.BAD_REQUEST);
}
// TODO 拿到请求体
final JobSubmitRequestBody requestBody = request.getRequestBody();
if (requestBody.jobGraphFileName == null) {
throw new RestHandlerException(
String.format(
"The %s field must not be omitted or be null.",
JobSubmitRequestBody.FIELD_NAME_JOB_GRAPH),
HttpResponseStatus.BAD_REQUEST);
}
// TODO 反序列化得到JobGraph
// TODO 由此可见,服务端接收到客户端提交的,其实就是一个JobGraph
CompletableFuture jobGraphFuture = loadJobGraph(requestBody, nameToFile);
// TODO 获取Job本体jar
Collection jarFiles = getJarFilesToUpload(requestBody.jarFileNames, nameToFile);
// TODO 获取job的依赖Jar
Collection> artifacts =
getArtifactFilesToUpload(requestBody.artifactFileNames, nameToFile);
// TODO 将JobGraph + 程序Jar + 依赖Jar 上传至BlobServer
CompletableFuture finalizedJobGraphFuture =
uploadJobGraphFiles(gateway, jobGraphFuture, jarFiles, artifacts, configuration);
// TODO 转交给Dispatcher
CompletableFuture jobSubmissionFuture =
finalizedJobGraphFuture.thenCompose(
// TODO 由JobSubmitHandler转交给Dispatcher来执行处理
// TODO 此处的Gateway为Dispatcher的代理对象
jobGraph -> gateway.submitJob(jobGraph, timeout));
return jobSubmissionFuture.thenCombine(
jobGraphFuture,
(ack, jobGraph) -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID()));
}
在这个方法里做了以下工作:
1、从请求中获取文件: 包含JobGraph序列化文件nameToFile。
2、从请求中取出请求体
3、从请求体中取出JobGraph
4、从请求体中取出job本身的Jar
5、从请求体中拿到Job的依赖Jar
6、将JobGraph、Job本身Jar、Job依赖Jar上传至BlobServer
7、将JobGraph交给Dispatcher组件
我们先来看JobGraph的解析过程,点开loadJobGraph方法:
private CompletableFuture loadJobGraph(
JobSubmitRequestBody requestBody, Map nameToFile)
throws MissingFileException {
final Path jobGraphFile =
getPathAndAssertUpload(
requestBody.jobGraphFileName, FILE_TYPE_JOB_GRAPH, nameToFile);
// TODO 从文件中反序列化JobGraph
return CompletableFuture.supplyAsync(
() -> {
JobGraph jobGraph;
try (ObjectInputStream objectIn =
new ObjectInputStream(
jobGraphFile.getFileSystem().open(jobGraphFile))) {
jobGraph = (JobGraph) objectIn.readObject();
} catch (Exception e) {
throw new CompletionException(
new RestHandlerException(
"Failed to deserialize JobGraph.",
HttpResponseStatus.BAD_REQUEST,
e));
}
return jobGraph;
},
executor);
}
可以看到,是从文件系统中拿到JobGraphFile,并进行反序列化得到JobGraph。
我们再来看将JobGraph + 程序Jar + 依赖Jar 上传至BlobServer的过程,点开uploadJobGraphFiles方法:
private CompletableFuture uploadJobGraphFiles(
DispatcherGateway gateway,
CompletableFuture jobGraphFuture,
Collection jarFiles,
Collection> artifacts,
Configuration configuration) {
CompletableFuture blobServerPortFuture = gateway.getBlobServerPort(timeout);
return jobGraphFuture.thenCombine(
blobServerPortFuture,
(JobGraph jobGraph, Integer blobServerPort) -> {
final InetSocketAddress address =
new InetSocketAddress(gateway.getHostname(), blobServerPort);
try {
// TODO BIO通信,BlobClient => BlobServer
ClientUtils.uploadJobGraphFiles(
jobGraph,
jarFiles,
artifacts,
() -> new BlobClient(address, configuration));
} catch (FlinkException e) {
throw new CompletionException(
new RestHandlerException(
"Could not upload job files.",
HttpResponseStatus.INTERNAL_SERVER_ERROR,
e));
}
return jobGraph;
});
}
上传JobGraph相关的资源文件,这里是通过BlobClient进行上传,上传到BlobServer,在JobManager启动时我们还讲过BlobServer会有一个1小时的定时任务,会定时清理用不到的资源文件。
在转交JobGraph给Dispatcher时,是通过调用Dispatcher的代理对象方法实现的,我们点进gateway.submitJob方法,选择Dispatcher实现:
@Override
public CompletableFuture submitJob(JobGraph jobGraph, Time timeout) {
log.info("Received JobGraph submission {} ({}).", jobGraph.getJobID(), jobGraph.getName());
try {
// TODO jobID的去重判断
if (isDuplicateJob(jobGraph.getJobID())) {
return FutureUtils.completedExceptionally(
new DuplicateJobSubmissionException(jobGraph.getJobID()));
} else if (isPartialResourceConfigured(jobGraph)) {
return FutureUtils.completedExceptionally(
new JobSubmissionException(
jobGraph.getJobID(),
"Currently jobs is not supported if parts of the vertices have "
+ "resources configured. The limitation will be removed in future versions."));
} else {
// TODO 提交Job,此时JobGraph所需的jar和文件都已经上传
// TODO 此处携带的JobGraph,会在一会启动JobMaster的时候,会用来构建ExecutionGraph
return internalSubmitJob(jobGraph);
}
} catch (FlinkException e) {
return FutureUtils.completedExceptionally(e);
}
}
代码执行到这里时,JobGraph所需的Jar和其他资源文件已上传至BlobServer服务器,我们继续点进internalSubmitJob(jobGraph):
private CompletableFuture internalSubmitJob(JobGraph jobGraph) {
log.info("Submitting job {} ({}).", jobGraph.getJobID(), jobGraph.getName());
final CompletableFuture persistAndRunFuture =
// TODO 先持久化,然后运行(拉起JobMaster),this::persistAndRunJob
waitForTerminatingJob(jobGraph.getJobID(), jobGraph, this::persistAndRunJob)
.thenApply(ignored -> Acknowledge.get());
return persistAndRunFuture.handleAsync(
(acknowledge, throwable) -> {
if (throwable != null) {
cleanUpJobData(jobGraph.getJobID(), true);
ClusterEntryPointExceptionUtils.tryEnrichClusterEntryPointError(throwable);
final Throwable strippedThrowable =
ExceptionUtils.stripCompletionException(throwable);
log.error(
"Failed to submit job {}.", jobGraph.getJobID(), strippedThrowable);
throw new CompletionException(
new JobSubmissionException(
jobGraph.getJobID(),
"Failed to submit job.",
strippedThrowable));
} else {
return acknowledge;
}
},
ioExecutor);
}
我们继续点进this::persistAndRunJob方法:
private void persistAndRunJob(JobGraph jobGraph) throws Exception {
// TODO 服务端保存JobGraph此处是将JobGraph持久化到FileSystem(例如hdfs)上,返回一个stateHandle(句柄),并将状态句柄保存在zk里面
// TODO 之前在讲主节点启动时Dispatcher会启动一个JobGraphStore服务,并且如果里面还有未执行完的JobGraph,会先进行恢复
// TODO JobGraphWriter = DefaultJobGraphStore
jobGraphWriter.putJobGraph(jobGraph);
// TODO
runJob(jobGraph, ExecutionType.SUBMISSION);
}
之前在讲主节点启动时Dispatcher会启动一个JobGraphStore服务,并且如果里面还有未执行完的JobGraph,会先进行恢复。这里的JobGraphWriter就是JobGraphStore,我们点进jobGraphWriter.putJobGraph(jobGraph)方法,选择DefaultJobGraphStore实现:
@Override
public void putJobGraph(JobGraph jobGraph) throws Exception {
checkNotNull(jobGraph, "Job graph");
final JobID jobID = jobGraph.getJobID();
final String name = jobGraphStoreUtil.jobIDToName(jobID);
LOG.debug("Adding job graph {} to {}.", jobID, jobGraphStateHandleStore);
boolean success = false;
while (!success) {
synchronized (lock) {
verifyIsRunning();
final R currentVersion = jobGraphStateHandleStore.exists(name);
if (!currentVersion.isExisting()) {
try {
// TODO
jobGraphStateHandleStore.addAndLock(name, jobGraph);
addedJobGraphs.add(jobID);
success = true;
} catch (StateHandleStore.AlreadyExistException ignored) {
LOG.warn("{} already exists in {}.", jobGraph, jobGraphStateHandleStore);
}
} else if (addedJobGraphs.contains(jobID)) {
try {
jobGraphStateHandleStore.replace(name, currentVersion, jobGraph);
LOG.info("Updated {} in {}.", jobGraph, getClass().getSimpleName());
success = true;
} catch (StateHandleStore.NotExistException ignored) {
LOG.warn("{} does not exists in {}.", jobGraph, jobGraphStateHandleStore);
}
} else {
throw new IllegalStateException(
"Trying to update a graph you didn't "
+ "#getAllSubmittedJobGraphs() or #putJobGraph() yourself before.");
}
}
}
LOG.info("Added {} to {}.", jobGraph, jobGraphStateHandleStore);
}
在这段代码里,获取了一些Job的相关信息,并确认Job的运行状态,我们点进jobGraphStateHandleStore.addAndLock方法,选择zookeeper的实现:
@Override
public RetrievableStateHandle addAndLock(String pathInZooKeeper, T state)
throws PossibleInconsistentStateException, Exception {
checkNotNull(pathInZooKeeper, "Path in ZooKeeper");
checkNotNull(state, "State");
final String path = normalizePath(pathInZooKeeper);
if (exists(path).isExisting()) {
throw new AlreadyExistException(
String.format("ZooKeeper node %s already exists.", path));
}
// TODO 保存在fileSystem上,并返回一个状态句柄
final RetrievableStateHandle storeHandle = storage.store(state);
// TODO 先序列化该状态句柄,转为字节序列化数据
final byte[] serializedStoreHandle = serializeOrDiscard(storeHandle);
try {
// TODO 存储在zk上
writeStoreHandleTransactionally(path, serializedStoreHandle);
return storeHandle;
} catch (KeeperException.NodeExistsException e) {
// Transactions are not idempotent in the curator version we're currently using, so it
// is actually possible that we've re-tried a transaction that has already succeeded.
// We've ensured that the node hasn't been present prior executing the transaction, so
// we can assume that this is a result of the retry mechanism.
return storeHandle;
} catch (Exception e) {
if (indicatesPossiblyInconsistentState(e)) {
throw new PossibleInconsistentStateException(e);
}
// In case of any other failure, discard the state and rethrow the exception.
storeHandle.discardState();
throw e;
}
}
可以看到,这里将JobGraph先持久化到外部存储系统,例如hdfs,再获句柄,再将句柄保存在zookeeper上,这里将句柄保存在zk上是处于性能效率考虑。
在完成了JobGraph的持久化后,将开始执行Job,我们回到这段代码:
private void persistAndRunJob(JobGraph jobGraph) throws Exception {
// TODO 服务端保存JobGraph此处是将JobGraph持久化到FileSystem(例如hdfs)上,返回一个stateHandle(句柄),并将状态句柄保存在zk里面
// TODO 之前在讲主节点启动时Dispatcher会启动一个JobGraphStore服务,并且如果里面还有未执行完的JobGraph,会先进行恢复
// TODO JobGraphWriter = DefaultJobGraphStore
jobGraphWriter.putJobGraph(jobGraph);
// TODO
runJob(jobGraph, ExecutionType.SUBMISSION);
}
点进runJob方法:
private void runJob(JobGraph jobGraph, ExecutionType executionType) throws Exception {
Preconditions.checkState(!runningJobs.containsKey(jobGraph.getJobID()));
long initializationTimestamp = System.currentTimeMillis();
/*
TODO 创建JobManagerRunner,这是一个启动器,内部会初始化DefaultJobMasterServiceProcessFactory对象
在JobMaster竞选完成后,DefaultJobMasterServiceProcessFactory对象会做两件重要的事情:
1. 创建JobMaster实例
2. 在创建JobMaster的时候,同时会把JobGraph变成ExecutionGraph
TODO Flink集群的两个主从架构:
1. 资源管理 ResourceManager + TaskExecutor
2. 任务运行 JobMaster + StreamTask
*/
JobManagerRunner jobManagerRunner =
createJobManagerRunner(jobGraph, initializationTimestamp);
// TODO 加入 runningJobs 队列
runningJobs.put(jobGraph.getJobID(), jobManagerRunner);
final JobID jobId = jobGraph.getJobID();
final CompletableFuture cleanupJobStateFuture =
jobManagerRunner
.getResultFuture()
.handleAsync(
(jobManagerRunnerResult, throwable) -> {
Preconditions.checkState(
runningJobs.get(jobId) == jobManagerRunner,
"The job entry in runningJobs must be bound to the lifetime of the JobManagerRunner.");
if (jobManagerRunnerResult != null) {
return handleJobManagerRunnerResult(
jobManagerRunnerResult, executionType);
} else {
return jobManagerRunnerFailed(jobId, throwable);
}
},
getMainThreadExecutor());
final CompletableFuture jobTerminationFuture =
cleanupJobStateFuture
.thenApply(cleanupJobState -> removeJob(jobId, cleanupJobState))
.thenCompose(Function.identity());
FutureUtils.assertNoException(jobTerminationFuture);
registerJobManagerRunnerTerminationFuture(jobId, jobTerminationFuture);
}
在这段代码里,首先构建了一个JobManagerRunner这么一个启动器,但是这个JobManager并不是我们所说的主节点JobManager,我们点进createJobManagerRunner方法:
JobManagerRunner createJobManagerRunner(JobGraph jobGraph, long initializationTimestamp)
throws Exception {
final RpcService rpcService = getRpcService();
// TODO 构建JobManagerRunner,内部分装了一个DefaultJobMasterServiceProcessFactory,
// 此对象内部会在后面leader竞选完成后构建JobMaster并启动
JobManagerRunner runner =
jobManagerRunnerFactory.createJobManagerRunner(
jobGraph,
configuration,
rpcService,
highAvailabilityServices,
heartbeatServices,
jobManagerSharedServices,
new DefaultJobManagerJobMetricGroupFactory(jobManagerMetricGroup),
fatalErrorHandler,
initializationTimestamp);
// TODO 开始JobMaster的选举,选举成功后会在ZooKeeperLeaderElectionDriver的isLeader方法中创建并启动JobMaster
runner.start();
return runner;
}
可以看到,这里通过工厂方法构建了一个JobManagerRunner,并启动了这个runner。
我们点jobManagerRunnerFactory.createJobManagerRunner:
@Override
public JobManagerRunner createJobManagerRunner(
JobGraph jobGraph,
Configuration configuration,
RpcService rpcService,
HighAvailabilityServices highAvailabilityServices,
HeartbeatServices heartbeatServices,
JobManagerSharedServices jobManagerServices,
JobManagerJobMetricGroupFactory jobManagerJobMetricGroupFactory,
FatalErrorHandler fatalErrorHandler,
long initializationTimestamp)
throws Exception {
checkArgument(jobGraph.getNumberOfVertices() > 0, "The given job is empty");
final JobMasterConfiguration jobMasterConfiguration =
JobMasterConfiguration.fromConfiguration(configuration);
final RunningJobsRegistry runningJobsRegistry =
highAvailabilityServices.getRunningJobsRegistry();
// TODO 获取选举服务,准备进行JobMaster的leader选举
final LeaderElectionService jobManagerLeaderElectionService =
highAvailabilityServices.getJobManagerLeaderElectionService(jobGraph.getJobID());
final SlotPoolServiceSchedulerFactory slotPoolServiceSchedulerFactory =
DefaultSlotPoolServiceSchedulerFactory.fromConfiguration(
configuration, jobGraph.getJobType());
if (jobMasterConfiguration.getConfiguration().get(JobManagerOptions.SCHEDULER_MODE)
== SchedulerExecutionMode.REACTIVE) {
Preconditions.checkState(
slotPoolServiceSchedulerFactory.getSchedulerType()
== JobManagerOptions.SchedulerType.Adaptive,
"Adaptive Scheduler is required for reactive mode");
}
final ShuffleMaster> shuffleMaster =
ShuffleServiceLoader.loadShuffleServiceFactory(configuration)
.createShuffleMaster(configuration);
final LibraryCacheManager.ClassLoaderLease classLoaderLease =
jobManagerServices
.getLibraryCacheManager()
.registerClassLoaderLease(jobGraph.getJobID());
final ClassLoader userCodeClassLoader =
classLoaderLease
.getOrResolveClassLoader(
jobGraph.getUserJarBlobKeys(), jobGraph.getClasspaths())
.asClassLoader();
// TODO 构建DefaultJobMasterServiceFactory,封装了JobMaster启动所需的基础服务
final DefaultJobMasterServiceFactory jobMasterServiceFactory =
new DefaultJobMasterServiceFactory(
jobManagerServices.getScheduledExecutorService(),
rpcService,
jobMasterConfiguration,
jobGraph,
highAvailabilityServices,
slotPoolServiceSchedulerFactory,
jobManagerServices,
heartbeatServices,
jobManagerJobMetricGroupFactory,
fatalErrorHandler,
userCodeClassLoader,
shuffleMaster,
initializationTimestamp);
final DefaultJobMasterServiceProcessFactory jobMasterServiceProcessFactory =
new DefaultJobMasterServiceProcessFactory(
jobGraph.getJobID(),
jobGraph.getName(),
jobGraph.getCheckpointingSettings(),
initializationTimestamp,
jobMasterServiceFactory);
return new JobMasterServiceLeadershipRunner(
jobMasterServiceProcessFactory,
jobManagerLeaderElectionService,
runningJobsRegistry,
classLoaderLease,
fatalErrorHandler);
}
这段代码蛮长的,但是脉络很清晰,在里面初始化了一些基础JobMaster所需要的基础服务,例如JobMaster的Leader竞选服务jobManagerLeaderElectionService,并且初始化了一个很重要的组件DefaultJobMasterServiceProcessFactory,JobMaster的初始化以及启动都是在这个里面完成的。
接下来我们回去看刚才那段代码,看runner的启动流程,
我们点进runner.start()方法:
@Override
public void start() throws Exception {
LOG.debug("Start leadership runner for job {}.", getJobID());
// TODO
leaderElectionService.start(this);
}
再点进leaderElectionService.start方法,选择DefaultLeaderElectionService实现:
@Override
public final void start(LeaderContender contender) throws Exception {
checkNotNull(contender, "Contender must not be null.");
Preconditions.checkState(leaderContender == null, "Contender was already set.");
synchronized (lock) {
/*
TODO 在WebMonitorEndpoint中调用时,此contender为DispatcherRestEndPoint
在ResourceManager中调用时,contender为ResourceManager
在DispatcherRunner中调用时,contender为DispatcherRunner
当JobMaster竞选时contender为JobMasterServiceLeadershipRunner
*/
leaderContender = contender;
// TODO 此处创建选举对象 leaderElectionDriver
leaderElectionDriver =
leaderElectionDriverFactory.createLeaderElectionDriver(
this,
new LeaderElectionFatalErrorHandler(),
leaderContender.getDescription());
LOG.info("Starting DefaultLeaderElectionService with {}.", leaderElectionDriver);
running = true;
}
}
可以看到我们又回到了这里,在之前分析JobManager启动流程的时候,JobManager中的三大核心组件的选举都使用过这个方法,由于目前是JobMaster的选举,这里的contender是JobMasterServiceLeadershipRunner。我们继续点进leaderElectionDriverFactory.createLeaderElectionDriver方法,选择zookeeper实现:
@Override
public ZooKeeperLeaderElectionDriver createLeaderElectionDriver(
LeaderElectionEventHandler leaderEventHandler,
FatalErrorHandler fatalErrorHandler,
String leaderContenderDescription)
throws Exception {
// TODO
return new ZooKeeperLeaderElectionDriver(
client,
latchPath,
leaderPath,
leaderEventHandler,
fatalErrorHandler,
leaderContenderDescription);
}
再进入ZooKeeperLeaderElectionDriver的构造方法:
public ZooKeeperLeaderElectionDriver(
CuratorFramework client,
String latchPath,
String leaderPath,
LeaderElectionEventHandler leaderElectionEventHandler,
FatalErrorHandler fatalErrorHandler,
String leaderContenderDescription)
throws Exception {
this.client = checkNotNull(client);
this.leaderPath = checkNotNull(leaderPath);
this.leaderElectionEventHandler = checkNotNull(leaderElectionEventHandler);
this.fatalErrorHandler = checkNotNull(fatalErrorHandler);
this.leaderContenderDescription = checkNotNull(leaderContenderDescription);
leaderLatch = new LeaderLatch(client, checkNotNull(latchPath));
cache = new NodeCache(client, leaderPath);
client.getUnhandledErrorListenable().addListener(this);
running = true;
// TODO 开始选举
leaderLatch.addListener(this);
leaderLatch.start();
/*
TODO 选举开始后,不就会接收到响应:
1.如果竞选成功,则回调该类的isLeader方法
2.如果竞选失败,则回调该类的notLeader方法
每一个竞选者对应一个竞选Driver
*/
cache.getListenable().addListener(this);
cache.start();
client.getConnectionStateListenable().addListener(listener);
}
可以看到在这里将开始进行Leader的选举。正如我们之前再JobManager启动时讲到的,在选举完成之后,如果选举成功则会回调当前类的isLeader方法,我们直接去看该方法:
/*
选举成功
*/
@Override
public void isLeader() {
// TODO
leaderElectionEventHandler.onGrantLeadership();
}
再进入leaderElectionEventHandler.onGrantLeadership():
@Override
@GuardedBy("lock")
public void onGrantLeadership() {
synchronized (lock) {
if (running) {
issuedLeaderSessionID = UUID.randomUUID();
clearConfirmedLeaderInformation();
if (LOG.isDebugEnabled()) {
LOG.debug(
"Grant leadership to contender {} with session ID {}.",
leaderContender.getDescription(),
issuedLeaderSessionID);
}
/*
TODO 有4种竞选者类型,LeaderContender有4种情况
1.Dispatcher = DefaultDispatcherRunner
2.JobMaster = JobMasterServiceLeadershipRunner
3.ResourceManager = ResourceManager
4.WebMonitorEndpoint = WebMonitorEndpoint
*/
leaderContender.grantLeadership(issuedLeaderSessionID);
} else {
if (LOG.isDebugEnabled()) {
LOG.debug(
"Ignoring the grant leadership notification since the {} has "
+ "already been closed.",
leaderElectionDriver);
}
}
}
}
再进入leaderContender.grantLeadership方法,选择JobMasterServiceLeadershipRunner实现:
@Override
public void grantLeadership(UUID leaderSessionID) {
// TODO 检验启动状态
runIfStateRunning(
// TODO 创建JobMaster并启动
() -> startJobMasterServiceProcessAsync(leaderSessionID),
"starting a new JobMasterServiceProcess");
}
我们再进入startJobMasterServiceProcessAsync方法:
@GuardedBy("lock")
private void startJobMasterServiceProcessAsync(UUID leaderSessionId) {
sequentialOperation =
sequentialOperation.thenRun(
// TODO 校验leader状态
() ->
runIfValidLeader(
leaderSessionId,
ThrowingRunnable.unchecked(
// TODO 创建jobMaster并启动
() ->
verifyJobSchedulingStatusAndCreateJobMasterServiceProcess(
leaderSessionId)),
"verify job scheduling status and create JobMasterServiceProcess"));
handleAsyncOperationError(sequentialOperation, "Could not start the job manager.");
}
可以看到这里做了一个leader状态校验,我们继续点进verifyJobSchedulingStatusAndCreateJobMasterServiceProcess方法:
@GuardedBy("lock")
private void verifyJobSchedulingStatusAndCreateJobMasterServiceProcess(UUID leaderSessionId)
throws FlinkException {
final RunningJobsRegistry.JobSchedulingStatus jobSchedulingStatus =
getJobSchedulingStatus();
if (jobSchedulingStatus == RunningJobsRegistry.JobSchedulingStatus.DONE) {
jobAlreadyDone();
} else {
// TODO 创建JobMaster并启动
createNewJobMasterServiceProcess(leaderSessionId);
}
}
这里会进行一个Job状态的校验,看Job是否已完成,我们再进入createNewJobMasterServiceProcess方法:
@GuardedBy("lock")
private void createNewJobMasterServiceProcess(UUID leaderSessionId) throws FlinkException {
Preconditions.checkState(jobMasterServiceProcess.closeAsync().isDone());
LOG.debug(
"Create new JobMasterServiceProcess because we were granted leadership under {}.",
leaderSessionId);
try {
// TODO 状态注册,标识当前Job为Running状态
runningJobsRegistry.setJobRunning(getJobID());
} catch (IOException e) {
throw new FlinkException(
String.format(
"Failed to set the job %s to running in the running jobs registry.",
getJobID()),
e);
}
// TODO 创建JobMaster并启动
jobMasterServiceProcess = jobMasterServiceProcessFactory.create(leaderSessionId);
forwardIfValidLeader(
leaderSessionId,
jobMasterServiceProcess.getJobMasterGatewayFuture(),
jobMasterGatewayFuture,
"JobMasterGatewayFuture from JobMasterServiceProcess");
forwardResultFuture(leaderSessionId, jobMasterServiceProcess.getResultFuture());
confirmLeadership(leaderSessionId, jobMasterServiceProcess.getLeaderAddressFuture());
}
可以看到,这里先对当前Job进行了状态注册,注册为Running状态,我们再进入jobMasterServiceProcessFactory.create方法:
@Override
public JobMasterServiceProcess create(UUID leaderSessionId) {
// TODO 内部构建JobMaster并启动
return new DefaultJobMasterServiceProcess(
jobId,
leaderSessionId,
jobMasterServiceFactory,
cause -> createArchivedExecutionGraph(JobStatus.FAILED, cause));
}
再点进DefaultJobMasterServiceProcess的构造方法:
public DefaultJobMasterServiceProcess(
JobID jobId,
UUID leaderSessionId,
JobMasterServiceFactory jobMasterServiceFactory,
Function failedArchivedExecutionGraphFactory) {
this.jobId = jobId;
this.leaderSessionId = leaderSessionId;
// TODO 构建JobMaster并启动
this.jobMasterServiceFuture =
jobMasterServiceFactory.createJobMasterService(leaderSessionId, this);
jobMasterServiceFuture.whenComplete(
(jobMasterService, throwable) -> {
if (throwable != null) {
final JobInitializationException jobInitializationException =
new JobInitializationException(
jobId, "Could not start the JobMaster.", throwable);
LOG.debug(
"Initialization of the JobMasterService for job {} under leader id {} failed.",
jobId,
leaderSessionId,
jobInitializationException);
resultFuture.complete(
JobManagerRunnerResult.forInitializationFailure(
new ExecutionGraphInfo(
failedArchivedExecutionGraphFactory.apply(
jobInitializationException)),
jobInitializationException));
} else {
registerJobMasterServiceFutures(jobMasterService);
}
});
}
这里使用了异步编程构建并启动JobMaster,并对启动结果进行检查是否有异常,我们点进jobMasterServiceFactory.createJobMasterService方法:
@Override
public CompletableFuture createJobMasterService(
UUID leaderSessionId, OnCompletionActions onCompletionActions) {
return CompletableFuture.supplyAsync(
FunctionUtils.uncheckedSupplier(
// TODO 内部构建JobMaster并启动
() -> internalCreateJobMasterService(leaderSessionId, onCompletionActions)),
executor);
}
再点进internalCreateJobMasterService方法:
private JobMasterService internalCreateJobMasterService(
UUID leaderSessionId, OnCompletionActions onCompletionActions) throws Exception {
final JobMaster jobMaster =
new JobMaster(
rpcService,
JobMasterId.fromUuidOrNull(leaderSessionId),
jobMasterConfiguration,
ResourceID.generate(),
jobGraph,
haServices,
slotPoolServiceSchedulerFactory,
jobManagerSharedServices,
heartbeatServices,
jobManagerJobMetricGroupFactory,
onCompletionActions,
fatalErrorHandler,
userCodeClassloader,
shuffleMaster,
lookup ->
new JobMasterPartitionTrackerImpl(
jobGraph.getJobID(), shuffleMaster, lookup),
new DefaultExecutionDeploymentTracker(),
DefaultExecutionDeploymentReconciler::new,
initializationTimestamp);
// TODO JobMaster继承了Endpoint,所以在初始化完成后会回调JobMaster的onStart方法
jobMaster.start();
return jobMaster;
}
可以看到在这里完成了JobMaster的初始化以及启动。由于JobMaster继承自RpcEndpoint,在之前的Akka章节中我们讲到过,所以这里在完成JobMaster的初始化后会回调JobMaster的onStart生命周期方法,此处的JobMaster.start并没有什么实质性的工作,只是向自己发送了一条消息告知已启动完毕。我们去看JobMaster的onStart方法:
@Override
protected void onStart() throws JobMasterException {
try {
// TODO JobMaster向 ResourceManager注册,开始申请Slot并且调度部署StreamTask
startJobExecution();
} catch (Exception e) {
final JobMasterException jobMasterException =
new JobMasterException("Could not start the JobMaster.", e);
handleJobMasterError(jobMasterException);
throw jobMasterException;
}
}
在这个方法里,JobMaster即将通过startJobExecution()进行注册动作,以及Slot的申请工作,我们点进startJobExecution()方法:
private void startJobExecution() throws Exception {
validateRunsInMainThread();
// TODO 启动一些服务
startJobMasterServices();
log.info(
"Starting execution of job {} ({}) under job master id {}.",
jobGraph.getName(),
jobGraph.getJobID(),
getFencingToken());
// TODO 解析ExecutionGraph,申请Slot,部署Task到TaskExecutor
startScheduling();
}
我们首先来看startJobMasterServices()方法,点进来:
private void startJobMasterServices() throws Exception {
try {
// TODO 启动两个心跳服务
this.taskManagerHeartbeatManager = createTaskManagerHeartbeatManager(heartbeatServices);
this.resourceManagerHeartbeatManager =
createResourceManagerHeartbeatManager(heartbeatServices);
// start the slot pool make sure the slot pool now accepts messages for this leader
// TODO 启动Slot管理服务,内部启动了3个定时任务
slotPoolService.start(getFencingToken(), getAddress(), getMainThreadExecutor());
// job is ready to go, try to establish connection with resource manager
// - activate leader retrieval for the resource manager
// - on notification of the leader, the connection will be established and
// the slot pool will start requesting slots
// TODO 监听ResourceManager的地址更改
resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());
} catch (Exception e) {
handleStartJobMasterServicesError(e);
}
}
可以看到这里做了三件事:
1、启动两个心跳服务
2、启动Slot管理服务,内部启动了3个定时任务
3、监听ResourceManager的地址
由于在后面的章节中我们会专门来讲Slot 的管理以及调度,所以这里就先不分析Slot了,我们回到上层方法看 startScheduling()方法,一路点进来,选择SchedulerBase实现:
@Override
public final void startScheduling() {
mainThreadExecutor.assertRunningInMainThread();
registerJobMetrics();
operatorCoordinatorHandler.startAllOperatorCoordinators();
// TODO
startSchedulingInternal();
}
在点进startSchedulingInternal方法:
@Override
protected void startSchedulingInternal() {
log.info(
"Starting scheduling with scheduling strategy [{}]",
schedulingStrategy.getClass().getName());
transitionToRunning();
// TODO 申请Slot,并部署StreamTask运行
schedulingStrategy.startScheduling();
}
再点进schedulingStrategy.startScheduling()方法:
@Override
public void startScheduling() {
final Set sourceRegions =
IterableUtils.toStream(schedulingTopology.getAllPipelinedRegions())
.filter(this::isSourceRegion)
.collect(Collectors.toSet());
// TODO 申请Slot,并部署StreamTask运行
maybeScheduleRegions(sourceRegions);
}
在这里,即将进行Slot的申请,我们再点进maybeScheduleRegions方法:
private void maybeScheduleRegions(final Set regions) {
final List regionsSorted =
SchedulingStrategyUtils.sortPipelinedRegionsInTopologicalOrder(
schedulingTopology, regions);
final Map consumableStatusCache = new HashMap<>();
for (SchedulingPipelinedRegion region : regionsSorted) {
// TODO 申请Slot,并部署StreamTask运行
maybeScheduleRegion(region, consumableStatusCache);
}
}
再点进maybeScheduleRegion方法:
@Override
public void allocateSlotsAndDeploy(
final List executionVertexDeploymentOptions) {
validateDeploymentOptions(executionVertexDeploymentOptions);
final Map deploymentOptionsByVertex =
groupDeploymentOptionsByVertexId(executionVertexDeploymentOptions);
final List verticesToDeploy =
executionVertexDeploymentOptions.stream()
.map(ExecutionVertexDeploymentOption::getExecutionVertexId)
.collect(Collectors.toList());
final Map requiredVersionByVertex =
executionVertexVersioner.recordVertexModifications(verticesToDeploy);
transitionToScheduled(verticesToDeploy);
// TODO 申请Slot
final List slotExecutionVertexAssignments =
allocateSlots(executionVertexDeploymentOptions);
final List deploymentHandles =
createDeploymentHandles(
requiredVersionByVertex,
deploymentOptionsByVertex,
slotExecutionVertexAssignments);
// TODO 部署Task
waitForAllSlotsAndDeploy(deploymentHandles);
}
在这个方法里,主要做了两件事:
1、Slot的申请
2、Task的部署
具体的实现过程我们会在后续Slot的管理章节中详细分析。
到这里,JobMaster的已经启动完成了。
客户端构建好的JobGraph以及所需的资源会发送给WebMonitorEndpoint。在WebMonitorEndpoint内部有一个Router,用来解析url,并发送给url对应的handler,然后回调该handler,也就是JobSubmitHandler的handleRequest方法。
在handleRequest的方法内会解析请求体中的Job信息以及Job所需的资源,包括JobGraph、Job本身的Jar、Job依赖的Jar等,解析完成后JobSubmitHandler将JobGraph交给Dispatcher来处理。
Dispatcher在接收到JobGraph后开始着手准备JobMaster的初始化和启动,最先做的事是初始化了一堆JobMaster所需的基础服务,然后构建了一个重要对象DefaultJobMasterServiceFactory, 然后开始准备JobMaster的Leader竞选。
在JobMaster完成选举之后,会回调isLeader方法,并开始进行JobMaster的初始化,由于JobMaster继承了RPCEndpoint,JobMaster会在初始化完成后回调onStart生命周期方法。
在onStart生命周期方法里,JobMaster进行了Slot的申请以及Task的部署工作。