在client提交任务的源码分析那篇中我们知道了客户端提交给JobManager的是一个JobGraph对象,那么当JobManager的Dispatcher组件接收到JobGraph后做了哪些处理呢,这篇我们从源码分析一些这个处理过程。
NettyRPC 接收到请求调用的是channelRead0方法,所以在JM端程序的入口:
RedirectHandler.channelRead0
===> AbstractHandler.respondAsLeader
===>AbstractHandler.respondToRequest
===> JobSubmitHandler.handleRequest
===>gateway.submitJob(jobGraph, timeout) 实际调用的是 Dispatcher.submitJob,源码如下:
public CompletableFuture submitJob(JobGraph jobGraph, Time timeout) {
final JobID jobId = jobGraph.getJobID();
log.info("Submitting job {} ({}).", jobId, jobGraph.getName());
final RunningJobsRegistry.JobSchedulingStatus jobSchedulingStatus;
try {
jobSchedulingStatus = runningJobsRegistry.getJobSchedulingStatus(jobId);//根据任务ID获取状态,PENDING,RUNNING, DODE
} catch (IOException e) {
return FutureUtils.completedExceptionally(new FlinkException(String.format("Failed to retrieve job scheduling status for job %s.", jobId), e));
}
if (jobSchedulingStatus == RunningJobsRegistry.JobSchedulingStatus.DONE || jobManagerRunnerFutures.containsKey(jobId)) {
return FutureUtils.completedExceptionally(
new JobSubmissionException(jobId, String.format("Job has already been submitted and is in state %s.", jobSchedulingStatus)));
} else {
final CompletableFuture persistAndRunFuture = waitForTerminatingJobManager(jobId, jobGraph, this::persistAndRunJob)//持久化并运行
.thenApply(ignored -> Acknowledge.get());
return persistAndRunFuture.exceptionally(
(Throwable throwable) -> {
final Throwable strippedThrowable = ExceptionUtils.stripCompletionException(throwable);
log.error("Failed to submit job {}.", jobId, strippedThrowable);
throw new CompletionException(
new JobSubmissionException(jobId, "Failed to submit job.", strippedThrowable));
});
}
}
继续进到persistAndRunJob方法查看
private CompletableFuture persistAndRunJob(JobGraph jobGraph) throws Exception {
submittedJobGraphStore.putJobGraph(new SubmittedJobGraph(jobGraph, null));//jobGraph 存入 submittedJobGraphStore,只有ha模式下会存入zk,其他模式没做任何处理
final CompletableFuture runJobFuture = runJob(jobGraph);//执行任务
return runJobFuture.whenComplete(BiConsumerWithException.unchecked((Object ignored, Throwable throwable) -> {
if (throwable != null) {
submittedJobGraphStore.removeJobGraph(jobGraph.getJobID());
}
}));
}
private CompletableFuture runJob(JobGraph jobGraph) {
Preconditions.checkState(!jobManagerRunnerFutures.containsKey(jobGraph.getJobID()));
final CompletableFuture jobManagerRunnerFuture = createJobManagerRunner(jobGraph);//创建JobRunner
jobManagerRunnerFutures.put(jobGraph.getJobID(), jobManagerRunnerFuture);
return jobManagerRunnerFuture
.thenApply(FunctionUtils.nullFn())
.whenCompleteAsync(
(ignored, throwable) -> {
if (throwable != null) {
jobManagerRunnerFutures.remove(jobGraph.getJobID());
}
},
getMainThreadExecutor());
}
private CompletableFuture createJobManagerRunner(JobGraph jobGraph) {
final RpcService rpcService = getRpcService();
final CompletableFuture jobManagerRunnerFuture = CompletableFuture.supplyAsync(
CheckedSupplier.unchecked(() ->
jobManagerRunnerFactory.createJobManagerRunner(// ==> DefaultJobManagerRunnerFactory,
ResourceID.generate(),
jobGraph,
configuration,
rpcService,
highAvailabilityServices,
heartbeatServices,
blobServer,
jobManagerSharedServices,
new DefaultJobManagerJobMetricGroupFactory(jobManagerMetricGroup),
fatalErrorHandler)),
rpcService.getExecutor());
return jobManagerRunnerFuture.thenApply(FunctionUtils.uncheckedFunction(this::startJobManagerRunner));//启动 jobManager
}
private JobManagerRunner startJobManagerRunner(JobManagerRunner jobManagerRunner) throws Exception {
final JobID jobId = jobManagerRunner.getJobGraph().getJobID();
jobManagerRunner.getResultFuture().whenCompleteAsync(
(ArchivedExecutionGraph archivedExecutionGraph, Throwable throwable) -> {
// check if we are still the active JobManagerRunner by checking the identity
//noinspection ObjectEquality
if (jobManagerRunner == jobManagerRunnerFutures.get(jobId).getNow(null)) {
if (archivedExecutionGraph != null) {
jobReachedGloballyTerminalState(archivedExecutionGraph);
} else {
final Throwable strippedThrowable = ExceptionUtils.stripCompletionException(throwable);
if (strippedThrowable instanceof JobNotFinishedException) {
jobNotFinished(jobId);
} else {
jobMasterFailed(jobId, strippedThrowable);
}
}
} else {
log.debug("There is a newer JobManagerRunner for the job {}.", jobId);
}
}, getMainThreadExecutor());
jobManagerRunner.start();//启动
return jobManagerRunner;
}
在创建了JobManagerRunner后,将其启动,JobManagerRunner内有一个JobMaster对象,也是在这里创建的。
接着看下启动方法 jobManagerRunner.start()
public void start() throws Exception {
try {
leaderElectionService.start(this);//===> 实际调用的是StandaloneLeaderElectionService start,传入的this作为参数
} catch (Exception e) {
log.error("Could not start the JobManager because the leader election service did not start.", e);
throw new Exception("Could not start the leader election service.", e);
}
}
public void start(LeaderContender newContender) throws Exception {
if (contender != null) {
// Service was already started
throw new IllegalArgumentException("Leader election service cannot be started multiple times.");
}
contender = Preconditions.checkNotNull(newContender);
// directly grant leadership to the given contender
contender.grantLeadership(HighAvailabilityServices.DEFAULT_LEADER_ID);//==> 调用JobManagerRunner.grantLeadership
}
这里contender就是JobManagerRunner,所以又调用了JobManagerRunner的grantLeadership方法
grantLeadership ==>
private void verifyJobSchedulingStatusAndStartJobManager(UUID leaderSessionId) throws Exception {
final JobSchedulingStatus jobSchedulingStatus = runningJobsRegistry.getJobSchedulingStatus(jobGraph.getJobID());
if (jobSchedulingStatus == JobSchedulingStatus.DONE) {//任务已完成
log.info("Granted leader ship but job {} has been finished. ", jobGraph.getJobID());
jobFinishedByOther();
} else {
log.info("JobManager runner for job {} ({}) was granted leadership with session id {} at {}.",
jobGraph.getName(), jobGraph.getJobID(), leaderSessionId, getAddress());
runningJobsRegistry.setJobRunning(jobGraph.getJobID());//设置任务状态为running, stand-alone放到内存,zk ha存在zk中
final CompletableFuture startFuture = jobMaster.start(new JobMasterId(leaderSessionId), rpcTimeout);//启动JobMaster
final CompletableFuture currentLeaderGatewayFuture = leaderGatewayFuture;
startFuture.whenCompleteAsync(
(Acknowledge ack, Throwable throwable) -> {
if (throwable != null) {
handleJobManagerRunnerError(new FlinkException("Could not start the job manager.", throwable));
} else {
confirmLeaderSessionIdIfStillLeader(leaderSessionId, currentLeaderGatewayFuture);
}
},
jobManagerSharedServices.getScheduledExecutorService());
}
}
接着JobMaster的启动,继续往下看
JobMaster.start ===> startJobExecution
private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {
validateRunsInMainThread();
checkNotNull(newJobMasterId, "The new JobMasterId must not be null.");
if (Objects.equals(getFencingToken(), newJobMasterId)) {
log.info("Already started the job execution with JobMasterId {}.", newJobMasterId);
return Acknowledge.get();
}
setNewFencingToken(newJobMasterId);
startJobMasterServices();//包含了slotPoll启动 resourceManager的连接(后续用于request slot)
log.info("Starting execution of job {} ({})", jobGraph.getName(), jobGraph.getJobID());
resetAndScheduleExecutionGraph();//执行job
return Acknowledge.get();
}
这里将JobMastert中的slotpool启动,并和JM的ResourceManager通信
private void startJobMasterServices() throws Exception {
// start the slot pool make sure the slot pool now accepts messages for this leader
slotPool.start(getFencingToken(), getAddress());//slotPool是一个Rpc服务
//TODO: Remove once the ZooKeeperLeaderRetrieval returns the stored address upon start
// try to reconnect to previously known leader
reconnectToResourceManager(new FlinkException("Starting JobMaster component."));//连接resourceManager
// job is ready to go, try to establish connection with resource manager
// - activate leader retrieval for the resource manager
// - on notification of the leader, the connection will be established and
// the slot pool will start requesting slots
resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());//告知resourceManager启动正常
}
在slotPool和resourcemanager通信完毕后 开始执行job ,resetAndScheduleExecutionGraph();//执行job
private void resetAndScheduleExecutionGraph() throws Exception {
validateRunsInMainThread();
final CompletableFuture executionGraphAssignedFuture;
if (executionGraph.getState() == JobStatus.CREATED) {
executionGraphAssignedFuture = CompletableFuture.completedFuture(null);
} else {
suspendAndClearExecutionGraphFields(new FlinkException("ExecutionGraph is being reset in order to be rescheduled."));
final JobManagerJobMetricGroup newJobManagerJobMetricGroup = jobMetricGroupFactory.create(jobGraph);
final ExecutionGraph newExecutionGraph = createAndRestoreExecutionGraph(newJobManagerJobMetricGroup);//生成executionGraph
executionGraphAssignedFuture = executionGraph.getTerminationFuture().handleAsync(
(JobStatus ignored, Throwable throwable) -> {
assignExecutionGraph(newExecutionGraph, newJobManagerJobMetricGroup);
return null;
},
getMainThreadExecutor());
}
executionGraphAssignedFuture.thenRun(this::scheduleExecutionGraph);//执行executionGraph
}
这里会将JobGraph转为ExecutionGraph并执行
===>
scheduleExecutionGraph()
===>ExecutionGraph.scheduleForExecution();
==>scheduleEager(slotProvider, allocationTimeout);//立即执行
===>执行任务的核心方法
申请资源
for (ExecutionJobVertex ejv : getVerticesTopologically()) {
// these calls are not blocking, they only return futures
Collection> allocationFutures = ejv.allocateResourcesForAll(
slotProvider,
queued,
LocationPreferenceConstraint.ALL,
allPreviousAllocationIds,
timeout);//申请slot
allAllocationFutures.addAll(allocationFutures);
}
这里我们先看申请资源这里的调用路线
ejv.allocateResourcesForAll 即 ExecutionJobVertex的allocateResourcesForAll 方法
public Collection> allocateResourcesForAll(
SlotProvider resourceProvider,
boolean queued,
LocationPreferenceConstraint locationPreferenceConstraint,
@Nonnull Set allPreviousExecutionGraphAllocationIds,
Time allocationTimeout) {
final ExecutionVertex[] vertices = this.taskVertices;
final CompletableFuture[] slots = new CompletableFuture[vertices.length];
// try to acquire a slot future for each execution.
// we store the execution with the future just to be on the safe side
for (int i = 0; i < vertices.length; i++) {
// allocate the next slot (future)
final Execution exec = vertices[i].getCurrentExecutionAttempt();
final CompletableFuture allocationFuture = exec.allocateAndAssignSlotForExecution(//申请和分配slot给execution
resourceProvider,
queued,
locationPreferenceConstraint,
allPreviousExecutionGraphAllocationIds,
allocationTimeout);
slots[i] = allocationFuture;
}
// all good, we acquired all slots
return Arrays.asList(slots);
}
===》
slotProvider.allocateSlot // 实际是调用SlotPool 内部类 的allocateSlot 方法,申请资源,也就是说slot都是放在slotpool中的
public CompletableFuture allocateSlot(
SlotRequestId slotRequestId,
ScheduledUnit task,
boolean allowQueued,
SlotProfile slotProfile,
Time timeout) {
CompletableFuture slotFuture = gateway.allocateSlot(//申请slot
slotRequestId,
task,
slotProfile,
allowQueued,
timeout);
slotFuture.whenComplete(
(LogicalSlot slot, Throwable failure) -> {
if (failure != null) {
gateway.releaseSlot(//==>SlotPoll
slotRequestId,
task.getSlotSharingGroupId(),
failure);
}
});
return slotFuture;
}
public CompletableFuture allocateSlot(
SlotRequestId slotRequestId,
ScheduledUnit task,
SlotProfile slotProfile,
boolean allowQueuedScheduling,
Time allocationTimeout) {
log.debug("Received slot request [{}] for task: {}", slotRequestId, task.getTaskToExecute());
if (task.getSlotSharingGroupId() == null) {//判断sharing group 是否为空
return allocateSingleSlot(slotRequestId, slotProfile, allowQueuedScheduling, allocationTimeout);
} else {
return allocateSharedSlot(slotRequestId, task, slotProfile, allowQueuedScheduling, allocationTimeout);
}
}
一般情况shareslot不为空,我们就看不为空的分配方法
=== > allocateMultiTaskSlot
===> 这里有资源的话 就会返回了,但是出现资源不够用的情况就会继续向resourmanager申请
if (allowQueuedScheduling) {//允许排队,没有足够的slot
...
final CompletableFuture futureSlot = requestNewAllocatedSlot(//迫不得已去resourceManager申请slot
allocatedSlotRequestId,
slotProfile.getResourceProfile(),
allocationTimeout);
...
}
private CompletableFuture requestNewAllocatedSlot(
SlotRequestId slotRequestId,
ResourceProfile resourceProfile,
Time allocationTimeout) {
final PendingRequest pendingRequest = new PendingRequest(
slotRequestId,
resourceProfile);
// register request timeout
FutureUtils
.orTimeout(pendingRequest.getAllocatedSlotFuture(), allocationTimeout.toMilliseconds(), TimeUnit.MILLISECONDS)
.whenCompleteAsync(
(AllocatedSlot ignored, Throwable throwable) -> {
if (throwable instanceof TimeoutException) {
timeoutPendingSlotRequest(slotRequestId);
}
},
getMainThreadExecutor());
if (resourceManagerGateway == null) {
stashRequestWaitingForResourceManager(pendingRequest);
} else {
requestSlotFromResourceManager(resourceManagerGateway, pendingRequest);//从resourceManager申请slot
}
return pendingRequest.getAllocatedSlotFuture();
}
===>
CompletableFuture rmResponse = resourceManagerGateway.requestSlot(//调用ResourceManager.requestSlot
jobMasterId,
new SlotRequest(jobId, allocationId, pendingRequest.getResourceProfile(), jobManagerAddress),
rpcTimeout);
===>
ResourceManager.requestSlot ==> registerSlotRequest ===> internalRequestSlot 源码如下:
===>
private void internalRequestSlot(PendingSlotRequest pendingSlotRequest) throws ResourceManagerException {
TaskManagerSlot taskManagerSlot = findMatchingSlot(pendingSlotRequest.getResourceProfile());
if (taskManagerSlot != null) {
allocateSlot(taskManagerSlot, pendingSlotRequest);//申请到了,rpc回复给taskManager
} else {
resourceActions.allocateResource(pendingSlotRequest.getResourceProfile());//没申请到,调用ResourceActionsImpl继续申请
}
}
===》
public void allocateResource(ResourceProfile resourceProfile) throws ResourceManagerException {
validateRunsInMainThread();
startNewWorker(resourceProfile);//申请yarn container
}
public void startNewWorker(ResourceProfile resourceProfile) {
Preconditions.checkArgument(
ResourceProfile.UNKNOWN.equals(resourceProfile),
"The YarnResourceManager does not support custom ResourceProfiles yet. It assumes that all containers have the same resources.");
requestYarnContainer();//从YarnResourceManager申请container
}
至此申请资源这条调用链路已经全部完成,接着我们看下执行链路的代码:
回到ExecutionGraph.scheduleEager方法
execution.deploy();//任务触发执行
===》Execution. deploy方法
final TaskDeploymentDescriptor deployment = vertex.createDeploymentDescriptor(//创建任务部署描述
attemptId,
slot,
taskRestore,
attemptNumber);
...
final CompletableFuture submitResultFuture = taskManagerGateway.submitTask(deployment, rpcTimeout);// ==> RpcTaskManagerGateway
===>
RpcTaskManagerGateway.submitTask
===>
public CompletableFuture submitTask(TaskDeploymentDescriptor tdd, Time timeout) {
return taskExecutorGateway.submitTask(tdd, jobMasterId, timeout);//==> TaskExecutor.submitTask
}
==> TaskExecutor.submitTask Task的构建和task的真正执行
Task task = new Task(
jobInformation,
taskInformation,
tdd.getExecutionAttemptId(),
tdd.getAllocationId(),
tdd.getSubtaskIndex(),
tdd.getAttemptNumber(),
tdd.getProducedPartitions(),
tdd.getInputGates(),
tdd.getTargetSlotNumber(),
taskExecutorServices.getMemoryManager(),
taskExecutorServices.getIOManager(),
taskExecutorServices.getNetworkEnvironment(),
taskExecutorServices.getBroadcastVariableManager(),
taskStateManager,
taskManagerActions,
inputSplitProvider,
checkpointResponder,
blobCacheService,
libraryCache,
fileCache,
taskManagerConfiguration,
taskMetricGroup,
resultPartitionConsumableNotifier,
partitionStateChecker,
getRpcService().getExecutor());
log.info("Received task {}.", task.getTaskInfo().getTaskNameWithSubtasks());
boolean taskAdded;
try {
taskAdded = taskSlotTable.addTask(task);
} catch (SlotNotFoundException | SlotNotActiveException e) {
throw new TaskSubmissionException("Could not submit task.", e);
}
if (taskAdded) {
task.startTaskThread();//任务真正执行
return CompletableFuture.completedFuture(Acknowledge.get());
} else {
final String message = "TaskManager already contains a task for id " +
task.getExecutionId() + '.';
log.debug(message);
throw new TaskSubmissionException(message);
}
至此JobGraph提交到JM,Rpc接收请求到Dispatcher处理任务,提交并运行,拉起JobManagerRunner,还有启动JobMaster,与resourceManager建立连接,再向slotpool申请资源,资源不够的话,继续向ResouceManager申请,还不够的话向yarn申请。
资源申请完毕然后JobGraph转为ExecutionGraph,再转为物理执行,到具体的task执行,任务就提交并运行了。