二十、Flink源码阅读--JobManager对提交过来的JobGraph处理过程

在client提交任务的源码分析那篇中我们知道了客户端提交给JobManager的是一个JobGraph对象,那么当JobManager的Dispatcher组件接收到JobGraph后做了哪些处理呢,这篇我们从源码分析一些这个处理过程。

源码分析

NettyRPC 接收到请求调用的是channelRead0方法,所以在JM端程序的入口:
RedirectHandler.channelRead0
===> AbstractHandler.respondAsLeader
===>AbstractHandler.respondToRequest
===> JobSubmitHandler.handleRequest
===>gateway.submitJob(jobGraph, timeout) 实际调用的是 Dispatcher.submitJob,源码如下:

public CompletableFuture submitJob(JobGraph jobGraph, Time timeout) {
	final JobID jobId = jobGraph.getJobID();

	log.info("Submitting job {} ({}).", jobId, jobGraph.getName());
	final RunningJobsRegistry.JobSchedulingStatus jobSchedulingStatus;

	try {
		jobSchedulingStatus = runningJobsRegistry.getJobSchedulingStatus(jobId);//根据任务ID获取状态,PENDING,RUNNING, DODE
	} catch (IOException e) {
		return FutureUtils.completedExceptionally(new FlinkException(String.format("Failed to retrieve job scheduling status for job %s.", jobId), e));
	}

	if (jobSchedulingStatus == RunningJobsRegistry.JobSchedulingStatus.DONE || jobManagerRunnerFutures.containsKey(jobId)) {
		return FutureUtils.completedExceptionally(
			new JobSubmissionException(jobId, String.format("Job has already been submitted and is in state %s.", jobSchedulingStatus)));
	} else {
		final CompletableFuture persistAndRunFuture = waitForTerminatingJobManager(jobId, jobGraph, this::persistAndRunJob)//持久化并运行
			.thenApply(ignored -> Acknowledge.get());

		return persistAndRunFuture.exceptionally(
			(Throwable throwable) -> {
				final Throwable strippedThrowable = ExceptionUtils.stripCompletionException(throwable);
				log.error("Failed to submit job {}.", jobId, strippedThrowable);
				throw new CompletionException(
					new JobSubmissionException(jobId, "Failed to submit job.", strippedThrowable));
			});
	}
}

继续进到persistAndRunJob方法查看

private CompletableFuture persistAndRunJob(JobGraph jobGraph) throws Exception {
	submittedJobGraphStore.putJobGraph(new SubmittedJobGraph(jobGraph, null));//jobGraph 存入 submittedJobGraphStore,只有ha模式下会存入zk,其他模式没做任何处理

	final CompletableFuture runJobFuture = runJob(jobGraph);//执行任务

	return runJobFuture.whenComplete(BiConsumerWithException.unchecked((Object ignored, Throwable throwable) -> {
		if (throwable != null) {
			submittedJobGraphStore.removeJobGraph(jobGraph.getJobID());
		}
	}));
}

private CompletableFuture runJob(JobGraph jobGraph) {
	Preconditions.checkState(!jobManagerRunnerFutures.containsKey(jobGraph.getJobID()));

	final CompletableFuture jobManagerRunnerFuture = createJobManagerRunner(jobGraph);//创建JobRunner

	jobManagerRunnerFutures.put(jobGraph.getJobID(), jobManagerRunnerFuture);

	return jobManagerRunnerFuture
		.thenApply(FunctionUtils.nullFn())
		.whenCompleteAsync(
			(ignored, throwable) -> {
				if (throwable != null) {
					jobManagerRunnerFutures.remove(jobGraph.getJobID());
				}
			},
			getMainThreadExecutor());
}

private CompletableFuture createJobManagerRunner(JobGraph jobGraph) {
	final RpcService rpcService = getRpcService();

	final CompletableFuture jobManagerRunnerFuture = CompletableFuture.supplyAsync(
		CheckedSupplier.unchecked(() ->
			jobManagerRunnerFactory.createJobManagerRunner(// ==> DefaultJobManagerRunnerFactory,
				ResourceID.generate(),
				jobGraph,
				configuration,
				rpcService,
				highAvailabilityServices,
				heartbeatServices,
				blobServer,
				jobManagerSharedServices,
				new DefaultJobManagerJobMetricGroupFactory(jobManagerMetricGroup),
				fatalErrorHandler)),
		rpcService.getExecutor());

	return jobManagerRunnerFuture.thenApply(FunctionUtils.uncheckedFunction(this::startJobManagerRunner));//启动 jobManager
}

private JobManagerRunner startJobManagerRunner(JobManagerRunner jobManagerRunner) throws Exception {
	final JobID jobId = jobManagerRunner.getJobGraph().getJobID();
	jobManagerRunner.getResultFuture().whenCompleteAsync(
		(ArchivedExecutionGraph archivedExecutionGraph, Throwable throwable) -> {
			// check if we are still the active JobManagerRunner by checking the identity
			//noinspection ObjectEquality
			if (jobManagerRunner == jobManagerRunnerFutures.get(jobId).getNow(null)) {
				if (archivedExecutionGraph != null) {
					jobReachedGloballyTerminalState(archivedExecutionGraph);
				} else {
					final Throwable strippedThrowable = ExceptionUtils.stripCompletionException(throwable);

					if (strippedThrowable instanceof JobNotFinishedException) {
						jobNotFinished(jobId);
					} else {
						jobMasterFailed(jobId, strippedThrowable);
					}
				}
			} else {
				log.debug("There is a newer JobManagerRunner for the job {}.", jobId);
			}
		}, getMainThreadExecutor());

	jobManagerRunner.start();//启动

	return jobManagerRunner;
}

在创建了JobManagerRunner后,将其启动,JobManagerRunner内有一个JobMaster对象,也是在这里创建的。
接着看下启动方法 jobManagerRunner.start()

public void start() throws Exception {
	try {
		leaderElectionService.start(this);//===> 实际调用的是StandaloneLeaderElectionService start,传入的this作为参数
	} catch (Exception e) {
		log.error("Could not start the JobManager because the leader election service did not start.", e);
		throw new Exception("Could not start the leader election service.", e);
	}
}

public void start(LeaderContender newContender) throws Exception {
	if (contender != null) {
		// Service was already started
		throw new IllegalArgumentException("Leader election service cannot be started multiple times.");
	}

	contender = Preconditions.checkNotNull(newContender);

	// directly grant leadership to the given contender
	contender.grantLeadership(HighAvailabilityServices.DEFAULT_LEADER_ID);//==> 调用JobManagerRunner.grantLeadership
}

这里contender就是JobManagerRunner,所以又调用了JobManagerRunner的grantLeadership方法

grantLeadership ==> 

private void verifyJobSchedulingStatusAndStartJobManager(UUID leaderSessionId) throws Exception {
		final JobSchedulingStatus jobSchedulingStatus = runningJobsRegistry.getJobSchedulingStatus(jobGraph.getJobID());

	if (jobSchedulingStatus == JobSchedulingStatus.DONE) {//任务已完成
		log.info("Granted leader ship but job {} has been finished. ", jobGraph.getJobID());
		jobFinishedByOther();
	} else {
		log.info("JobManager runner for job {} ({}) was granted leadership with session id {} at {}.",
			jobGraph.getName(), jobGraph.getJobID(), leaderSessionId, getAddress());

		runningJobsRegistry.setJobRunning(jobGraph.getJobID());//设置任务状态为running, stand-alone放到内存,zk ha存在zk中

		final CompletableFuture startFuture = jobMaster.start(new JobMasterId(leaderSessionId), rpcTimeout);//启动JobMaster
		final CompletableFuture currentLeaderGatewayFuture = leaderGatewayFuture;

		startFuture.whenCompleteAsync(
			(Acknowledge ack, Throwable throwable) -> {
				if (throwable != null) {
					handleJobManagerRunnerError(new FlinkException("Could not start the job manager.", throwable));
				} else {
					confirmLeaderSessionIdIfStillLeader(leaderSessionId, currentLeaderGatewayFuture);
				}
			},
			jobManagerSharedServices.getScheduledExecutorService());
	}
}
接着JobMaster的启动,继续往下看

JobMaster.start ===> startJobExecution

private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {
	validateRunsInMainThread();

	checkNotNull(newJobMasterId, "The new JobMasterId must not be null.");

	if (Objects.equals(getFencingToken(), newJobMasterId)) {
		log.info("Already started the job execution with JobMasterId {}.", newJobMasterId);

		return Acknowledge.get();
	}

	setNewFencingToken(newJobMasterId);

	startJobMasterServices();//包含了slotPoll启动 resourceManager的连接(后续用于request slot)

	log.info("Starting execution of job {} ({})", jobGraph.getName(), jobGraph.getJobID());

	resetAndScheduleExecutionGraph();//执行job

	return Acknowledge.get();
}

这里将JobMastert中的slotpool启动,并和JM的ResourceManager通信

private void startJobMasterServices() throws Exception {
	// start the slot pool make sure the slot pool now accepts messages for this leader
	slotPool.start(getFencingToken(), getAddress());//slotPool是一个Rpc服务

	//TODO: Remove once the ZooKeeperLeaderRetrieval returns the stored address upon start
	// try to reconnect to previously known leader
	reconnectToResourceManager(new FlinkException("Starting JobMaster component."));//连接resourceManager

	// job is ready to go, try to establish connection with resource manager
	//   - activate leader retrieval for the resource manager
	//   - on notification of the leader, the connection will be established and
	//     the slot pool will start requesting slots
	resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());//告知resourceManager启动正常
}

在slotPool和resourcemanager通信完毕后 开始执行job ,resetAndScheduleExecutionGraph();//执行job

private void resetAndScheduleExecutionGraph() throws Exception {
	validateRunsInMainThread();

	final CompletableFuture executionGraphAssignedFuture;

	if (executionGraph.getState() == JobStatus.CREATED) {
		executionGraphAssignedFuture = CompletableFuture.completedFuture(null);
	} else {
		suspendAndClearExecutionGraphFields(new FlinkException("ExecutionGraph is being reset in order to be rescheduled."));
		final JobManagerJobMetricGroup newJobManagerJobMetricGroup = jobMetricGroupFactory.create(jobGraph);
		final ExecutionGraph newExecutionGraph = createAndRestoreExecutionGraph(newJobManagerJobMetricGroup);//生成executionGraph

		executionGraphAssignedFuture = executionGraph.getTerminationFuture().handleAsync(
			(JobStatus ignored, Throwable throwable) -> {
				assignExecutionGraph(newExecutionGraph, newJobManagerJobMetricGroup);
				return null;
			},
			getMainThreadExecutor());
	}

	executionGraphAssignedFuture.thenRun(this::scheduleExecutionGraph);//执行executionGraph
}
这里会将JobGraph转为ExecutionGraph并执行
===>
scheduleExecutionGraph()
===>ExecutionGraph.scheduleForExecution();

==>scheduleEager(slotProvider, allocationTimeout);//立即执行
===>执行任务的核心方法
申请资源
for (ExecutionJobVertex ejv : getVerticesTopologically()) {
		// these calls are not blocking, they only return futures
		Collection> allocationFutures = ejv.allocateResourcesForAll(
			slotProvider,
			queued,
			LocationPreferenceConstraint.ALL,
			allPreviousAllocationIds,
			timeout);//申请slot

		allAllocationFutures.addAll(allocationFutures);
	}

这里我们先看申请资源这里的调用路线

ejv.allocateResourcesForAll 即 ExecutionJobVertex的allocateResourcesForAll 方法


public Collection> allocateResourcesForAll(
		SlotProvider resourceProvider,
		boolean queued,
		LocationPreferenceConstraint locationPreferenceConstraint,
		@Nonnull Set allPreviousExecutionGraphAllocationIds,
		Time allocationTimeout) {
	final ExecutionVertex[] vertices = this.taskVertices;
	final CompletableFuture[] slots = new CompletableFuture[vertices.length];

	// try to acquire a slot future for each execution.
	// we store the execution with the future just to be on the safe side
	for (int i = 0; i < vertices.length; i++) {
		// allocate the next slot (future)
		final Execution exec = vertices[i].getCurrentExecutionAttempt();
		final CompletableFuture allocationFuture = exec.allocateAndAssignSlotForExecution(//申请和分配slot给execution
			resourceProvider,
			queued,
			locationPreferenceConstraint,
			allPreviousExecutionGraphAllocationIds,
			allocationTimeout);
		slots[i] = allocationFuture;
	}

	// all good, we acquired all slots
	return Arrays.asList(slots);
}

===》 
slotProvider.allocateSlot // 实际是调用SlotPool 内部类 的allocateSlot 方法,申请资源,也就是说slot都是放在slotpool中的

public CompletableFuture allocateSlot(
			SlotRequestId slotRequestId,
			ScheduledUnit task,
			boolean allowQueued,
			SlotProfile slotProfile,
			Time timeout) {

		CompletableFuture slotFuture = gateway.allocateSlot(//申请slot
			slotRequestId,
			task,
			slotProfile,
			allowQueued,
			timeout);

		slotFuture.whenComplete(
			(LogicalSlot slot, Throwable failure) -> {
				if (failure != null) {
					gateway.releaseSlot(//==>SlotPoll
						slotRequestId,
						task.getSlotSharingGroupId(),
						failure);
				}
		});

		return slotFuture;
	}

public CompletableFuture allocateSlot(
		SlotRequestId slotRequestId,
		ScheduledUnit task,
		SlotProfile slotProfile,
		boolean allowQueuedScheduling,
		Time allocationTimeout) {

	log.debug("Received slot request [{}] for task: {}", slotRequestId, task.getTaskToExecute());

	if (task.getSlotSharingGroupId() == null) {//判断sharing group 是否为空
		return allocateSingleSlot(slotRequestId, slotProfile, allowQueuedScheduling, allocationTimeout);
	} else {
		return allocateSharedSlot(slotRequestId, task, slotProfile, allowQueuedScheduling, allocationTimeout);
	}
}	
一般情况shareslot不为空,我们就看不为空的分配方法 
=== > allocateMultiTaskSlot
===> 这里有资源的话 就会返回了,但是出现资源不够用的情况就会继续向resourmanager申请 

if (allowQueuedScheduling) {//允许排队,没有足够的slot
			...
				final CompletableFuture futureSlot = requestNewAllocatedSlot(//迫不得已去resourceManager申请slot
					allocatedSlotRequestId,
					slotProfile.getResourceProfile(),
					allocationTimeout);
			...		
}			

private CompletableFuture requestNewAllocatedSlot(
		SlotRequestId slotRequestId,
		ResourceProfile resourceProfile,
		Time allocationTimeout) {

	final PendingRequest pendingRequest = new PendingRequest(
		slotRequestId,
		resourceProfile);

	// register request timeout
	FutureUtils
		.orTimeout(pendingRequest.getAllocatedSlotFuture(), allocationTimeout.toMilliseconds(), TimeUnit.MILLISECONDS)
		.whenCompleteAsync(
			(AllocatedSlot ignored, Throwable throwable) -> {
				if (throwable instanceof TimeoutException) {
					timeoutPendingSlotRequest(slotRequestId);
				}
			},
			getMainThreadExecutor());

	if (resourceManagerGateway == null) {
		stashRequestWaitingForResourceManager(pendingRequest);
	} else {
		requestSlotFromResourceManager(resourceManagerGateway, pendingRequest);//从resourceManager申请slot
	}

	return pendingRequest.getAllocatedSlotFuture();
}

===>

CompletableFuture rmResponse = resourceManagerGateway.requestSlot(//调用ResourceManager.requestSlot
			jobMasterId,
			new SlotRequest(jobId, allocationId, pendingRequest.getResourceProfile(), jobManagerAddress),
			rpcTimeout);
===>
ResourceManager.requestSlot ==> registerSlotRequest ===> internalRequestSlot 源码如下:
===>
private void internalRequestSlot(PendingSlotRequest pendingSlotRequest) throws ResourceManagerException {
	TaskManagerSlot taskManagerSlot = findMatchingSlot(pendingSlotRequest.getResourceProfile());

	if (taskManagerSlot != null) {
		allocateSlot(taskManagerSlot, pendingSlotRequest);//申请到了,rpc回复给taskManager
	} else {
		resourceActions.allocateResource(pendingSlotRequest.getResourceProfile());//没申请到,调用ResourceActionsImpl继续申请
	}
}
===》
public void allocateResource(ResourceProfile resourceProfile) throws ResourceManagerException {
	validateRunsInMainThread();
	startNewWorker(resourceProfile);//申请yarn container
}
public void startNewWorker(ResourceProfile resourceProfile) {
	Preconditions.checkArgument(
		ResourceProfile.UNKNOWN.equals(resourceProfile),
		"The YarnResourceManager does not support custom ResourceProfiles yet. It assumes that all containers have the same resources.");
	requestYarnContainer();//从YarnResourceManager申请container
}

至此申请资源这条调用链路已经全部完成,接着我们看下执行链路的代码:
回到ExecutionGraph.scheduleEager方法

execution.deploy();//任务触发执行

===》Execution. deploy方法

final TaskDeploymentDescriptor deployment = vertex.createDeploymentDescriptor(//创建任务部署描述
			attemptId,
			slot,
			taskRestore,
			attemptNumber);

...

final CompletableFuture submitResultFuture = taskManagerGateway.submitTask(deployment, rpcTimeout);// ==> RpcTaskManagerGateway
===> 
RpcTaskManagerGateway.submitTask
===> 
public CompletableFuture submitTask(TaskDeploymentDescriptor tdd, Time timeout) {
	return taskExecutorGateway.submitTask(tdd, jobMasterId, timeout);//==> TaskExecutor.submitTask
}

==> TaskExecutor.submitTask Task的构建和task的真正执行

Task task = new Task(
		jobInformation,
		taskInformation,
		tdd.getExecutionAttemptId(),
		tdd.getAllocationId(),
		tdd.getSubtaskIndex(),
		tdd.getAttemptNumber(),
		tdd.getProducedPartitions(),
		tdd.getInputGates(),
		tdd.getTargetSlotNumber(),
		taskExecutorServices.getMemoryManager(),
		taskExecutorServices.getIOManager(),
		taskExecutorServices.getNetworkEnvironment(),
		taskExecutorServices.getBroadcastVariableManager(),
		taskStateManager,
		taskManagerActions,
		inputSplitProvider,
		checkpointResponder,
		blobCacheService,
		libraryCache,
		fileCache,
		taskManagerConfiguration,
		taskMetricGroup,
		resultPartitionConsumableNotifier,
		partitionStateChecker,
		getRpcService().getExecutor());

	log.info("Received task {}.", task.getTaskInfo().getTaskNameWithSubtasks());

	boolean taskAdded;

	try {
		taskAdded = taskSlotTable.addTask(task);
	} catch (SlotNotFoundException | SlotNotActiveException e) {
		throw new TaskSubmissionException("Could not submit task.", e);
	}

	if (taskAdded) {
		task.startTaskThread();//任务真正执行

		return CompletableFuture.completedFuture(Acknowledge.get());
	} else {
		final String message = "TaskManager already contains a task for id " +
			task.getExecutionId() + '.';

		log.debug(message);
		throw new TaskSubmissionException(message);
	}

至此JobGraph提交到JM,Rpc接收请求到Dispatcher处理任务,提交并运行,拉起JobManagerRunner,还有启动JobMaster,与resourceManager建立连接,再向slotpool申请资源,资源不够的话,继续向ResouceManager申请,还不够的话向yarn申请。
资源申请完毕然后JobGraph转为ExecutionGraph,再转为物理执行,到具体的task执行,任务就提交并运行了。

你可能感兴趣的:(Apache,Flink)