Yarn源码剖析(三)--- ApplicationMaster的启动

前言

在上文Yarn源码剖析(二) --- spark-submit,我们介绍了spark任务通过spark-submit提交任务至yarn申请资源至启动的全流程,本篇将介绍启动过程中ApplicationMaster(后文简称AM)是如何启动。

AM的启动与Container的申请

1. 在Yarn源码剖析(二)中yarnClient最终调用submitApplication方法提交任务,传入的参数带有AM启动的上下文,因此AM的启动就是在yarn这个方法中实现的

val containerContext = createContainerLaunchContext(newAppResponse) //封装AM启动的上下文
val appContext = createApplicationSubmissionContext(newApp, containerContext) //App的上下文

yarnClient.submitApplication(appContext) //提交任务

2. AM的启动异常的复杂,篇幅巨大,下面我会摘选重要的部分做分析,spark在此处封装好AM运行的上下文后,最终在yarn的事件处理机制会运行这些上下文,回调到spark中的AM类,client模式和cluster模式运行的类是不一样的,具体的运行类如下所示:

val amClass =
 if (isClusterMode) { //集群cluster模式
    Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
 } else { //client模式
    Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
 }

3. 那显而易见的,我们应该从yarnClient.submitApplication(appContext)去分析hadoop端的代码,分析yarn是如何来通过这个上下文来启动spark自己封装的这个AM的,显而易见这个接口由YarnClientImpl实现,在该方法内部又调用了ApplicationClientProtocol.submitApplication,这个类是yarn利用rpc相互通信的关键类,这里也不多做介绍了,我们看到提交任务后会启动一个死循环,等待任务提交完成。

//request是包含了我们服务整体参数以及脚本的对象,提交至RM
 rmClient.submitApplication(request);

 int pollCount = 0;
 long startTime = System.currentTimeMillis();
 EnumSet waitingStates = 
 EnumSet.of(YarnApplicationState.NEW,
 YarnApplicationState.NEW_SAVING,
 YarnApplicationState.SUBMITTED);
 EnumSet failToSubmitStates = 
 EnumSet.of(YarnApplicationState.FAILED,
 YarnApplicationState.KILLED); 
 while (true) {
     try {
         ApplicationReport appReport = getApplicationReport(applicationId);
         YarnApplicationState state = appReport.getYarnApplicationState();
         if (!waitingStates.contains(state)) {
             if(failToSubmitStates.contains(state)) {
                 throw new YarnException("Failed to submit " + applicationId + 
                 " to YARN : " + appReport.getDiagnostics());
             }
         LOG.info("Submitted application " + applicationId);
         break;
     }
}

4. 这个submitApplication是由ClientRMService来实现的,我把整段方法都贴进来了,所以我把分析内容放到了代码的注释中

//为了保证安全性ApplicationSubmissionContext在这里会被验证,哪些独立于RM
 //字段在此处验证,而依赖于RM发的则在RMAppManager被验证
String user = null;
try {
     user = UserGroupInformation.getCurrentUser().getShortUserName();
} catch (IOException ie) {
    LOG.warn("Unable to get the current user.", ie);
    RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
    ie.getMessage(), "ClientRMService",
    "Exception in submitting application", applicationId);
    throw RPCUtil.getRemoteException(ie);
 }

 //确认app是否被放在了rmContext中,如果是则响应
if (rmContext.getRMApps().get(applicationId) != null) {
 LOG.info("This is an earlier submitted application: " + applicationId);
 return SubmitApplicationResponse.newInstance();
 }

 //判断任务队列
if (submissionContext.getQueue() == null) {
 submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME);
 }
 //判断是否是无效的任务名称
if (submissionContext.getApplicationName() == null) {
 submissionContext.setApplicationName(
 YarnConfiguration.DEFAULT_APPLICATION_NAME);
 }
 //任务类型判断
if (submissionContext.getApplicationType() == null) {
 submissionContext
 .setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE);
 } else {
     if (submissionContext.getApplicationType().length() >                 
     YarnConfiguration.APPLICATION_TYPE_LENGTH) {
     submissionContext.setApplicationType(submissionContext
     .getApplicationType().substring(0,
     YarnConfiguration.APPLICATION_TYPE_LENGTH));
     }
 }

 try {
 // call RMAppManager to submit application directly
 //让RMAppManager立即提交应用
//关于ApplicationManager大家可以参考我基础组件分析的那一章节
 rmAppManager.submitApplication(submissionContext,
 System.currentTimeMillis(), user);

 LOG.info("Application with id " + applicationId.getId() + 
 " submitted by user " + user);
 RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
 "ClientRMService", applicationId);
 } catch (YarnException e) {
 LOG.info("Exception in submitting application with id " +
 applicationId.getId(), e);
 RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
 e.getMessage(), "ClientRMService",
 "Exception in submitting application", applicationId);
 throw e;
 }

 SubmitApplicationResponse response = recordFactory
 .newRecordInstance(SubmitApplicationResponse.class);
 return response;
 }

5. 从上列的代码中可知,我们应该进到rmAppManager.submitApplication()去分析,该方法内部有一个createAndPopulateNewRMApp(),我们来看一下

private RMAppImpl createAndPopulateNewRMApp(
 ApplicationSubmissionContext submissionContext, long submitTime,
 String user, boolean isRecovery) throws YarnException {
 ApplicationId applicationId = submissionContext.getApplicationId();

 //检查AM的请求,对资源做检查
ResourceRequest amReq =
 validateAndCreateResourceRequest(submissionContext, isRecovery);

 // Create RMApp
 //此处封装了一个状态机,这是yarn的一个重大机制,每个服务随着状态不断改变而做出操作
RMAppImpl application =
 new RMAppImpl(applicationId, rmContext, this.conf,
 submissionContext.getApplicationName(), user,
 submissionContext.getQueue(),
 submissionContext, this.scheduler, this.masterService,
 submitTime, submissionContext.getApplicationType(),
 submissionContext.getApplicationTags(), amReq);

 return application;
 }


6. 最后提交了一个START事件,上文我们可以知道new了一个RMAppImpl,这里就是触发它的状态机对应事件,这段代码的意思是处理START事件,任务状态从NEW转换到NEW_SAVING并,触发了RMAppNewlySavingTransition转换

.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
 RMAppEventType.START, new RMAppNewlySavingTransition())

7. 那很明显,我们要看RMAppNewlySavingTransition(),点进去看内部的代码很简单,代码

private static final class RMAppNewlySavingTransition extends RMAppTransition {
 @Override
 public void transition(RMAppImpl app, RMAppEvent event) {
 ///如果恢复配置被启用,那么将应用程序信息存储在非阻塞调用中,
// 因此要确保RM已经存储了在RM重新启动后能够重启AM所需的信息,而无需再次与客户端通信

LOG.info("Storing application with id " + app.applicationId);
 app.rmContext.getStateStore().storeNewApplication(app);
 }
 }

8. 那我们来看这个存储App信息的方法做了些什么,触发了一个STORE_APP事件,由StoreAppTransition处理

public void storeNewApplication(RMApp app) {
 ApplicationSubmissionContext context = app.getApplicationSubmissionContext();
 assert context instanceof ApplicationSubmissionContextPBImpl;
 ApplicationStateData appState =
 ApplicationStateData.newInstance(
 app.getSubmitTime(), app.getStartTime(), context, app.getUser());
 dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState));
 }

9. 可以看到内部有一段代码,通知rm提交APP_NEW_SAVED事件,这个事件由AddApplicationToSchedulerTransition
处理

try {
 store.storeApplicationStateInternal(appId, appState);
 store.notifyApplication(new RMAppEvent(appId,
 RMAppEventType.APP_NEW_SAVED));
 } catch (Exception e) {
 LOG.error("Error storing app: " + appId, e);
 isFenced = store.notifyStoreOperationFailedInternal(e);
 }

10. 这个事件很简单从表面意思也能读懂,就是将应用程序交个调度器去处理,所以他提交了一个APP_ADDED事件,我们分析默认调度器Capatity Scheduler,所以此时就去看Capatity中的代码

case APP_ADDED:
 {
 AppAddedSchedulerEvent appAddedEvent = (AppAddedSchedulerEvent) event;
 String queueName =
 resolveReservationQueueName(appAddedEvent.getQueue(),
 appAddedEvent.getApplicationId(),
 appAddedEvent.getReservationID());
 if (queueName != null) {
 if (!appAddedEvent.getIsAppRecovering()) {
 //这里是告知队列有任务提交了,队列会统计任务数量
addApplication(appAddedEvent.getApplicationId(), queueName,
 appAddedEvent.getUser());
 } else {
 addApplicationOnRecovery(appAddedEvent.getApplicationId(), queueName,
 appAddedEvent.getUser());
 }
 }
 // 提交了APP_ACCEPTED事件
queue.getMetrics().submitApp(user);
 SchedulerApplication application =
 new SchedulerApplication(queue, user);
 applications.put(applicationId, application);
 LOG.info("Accepted application " + applicationId + " from user: " + user
 + ", in queue: " + queueName);
 rmContext.getDispatcher().getEventHandler()
 .handle(new RMAppEvent(applicationId, RMAppEventType.APP_ACCEPTED));
}

11. 从上面的代码可以看出,Capatity调度器提交了事件APP_ACCEPTED,状态从SUBMITTED转成了ACCEPTED并触发 

StartAppAttemptTransition()
 .addTransition(RMAppState.SUBMITTED, RMAppState.ACCEPTED,
 RMAppEventType.APP_ACCEPTED, new StartAppAttemptTransition())

 12. 这个类在内部创建了一个新的RMAppAttempt,然后提交了事件RMAppAttemptEventType.START,触发了AttemptStartedTransition(),很明显这个类对象使我们刚new出来的,那匹配的状态机的初始状态就是NEW,现在由于提交了START事件,状态变为了SUBMITTED

 // Transitions from NEW State
 .addTransition(RMAppAttemptState.NEW, RMAppAttemptState.SUBMITTED,
 RMAppAttemptEventType.START, new AttemptStartedTransition())

13. 那我们来看看这个方法AttemptStartedTransition(),其实我们要看一下registerAppAttempt方法

//注册AM的service
 appAttempt.masterService
 .registerAppAttempt(appAttempt.applicationAttemptId);

 // Add the applicationAttempt to the scheduler and inform the scheduler
 // whether to transfer the state from previous attempt.
 appAttempt.eventHandler.handle(new AppAttemptAddedSchedulerEvent(
 appAttempt.applicationAttemptId, transferStateFromPreviousAttempt));

14. 我们看到把response设置成了-1,这个Id会在AM后面每次的通信中自增,会借助这个id来判断请求是重复请求还是新的请求,还是旧的请求。

AllocateResponse response =
 recordFactory.newRecordInstance(AllocateResponse.class);
 // set response id to -1 before application master for the following
 // attemptID get registered
 response.setResponseId(-1);
 LOG.info("Registering app attempt : " + attemptId);
 responseMap.put(attemptId, new AllocateResponseLock(response));
 rmContext.getNMTokenSecretManager().registerApplicationAttempt(attemptId);

15. 回到AttemptStartedTransition()方法中,最后它提交了一个事件SchedulerEventType.APP_ATTEMPT_ADDED,这个事件交回给Capatity调度器去处理

case APP_ATTEMPT_ADDED:
 {
 AppAttemptAddedSchedulerEvent appAttemptAddedEvent =
 (AppAttemptAddedSchedulerEvent) event;
 addApplicationAttempt(appAttemptAddedEvent.getApplicationAttemptId(),
 appAttemptAddedEvent.getTransferStateFromPreviousAttempt(),
 appAttemptAddedEvent.getIsAttemptRecovering());
 }

16. 那我们自然是进入addApplicationAttempt方法去分析,内部我选了部分代码做分析,下面这段new了一个FiCaSchedulerApp,在内部设置了AM启动资源信息

FiCaSchedulerApp attempt =
 new FiCaSchedulerApp(applicationAttemptId, application.getUser(),
 queue, queue.getActiveUsersManager(), rmContext);

17. 设置完后提交了RMAppAttemptEventType.ATTEMPT_ADDED事件

rmContext.getDispatcher().getEventHandler().handle(
 new RMAppAttemptEvent(applicationAttemptId,
 RMAppAttemptEventType.ATTEMPT_ADDED));

这里的意思是提交了ATTEMPT_ADDED事件使得状态从SUBMITTED转变,转变的结果可能有LAUNCHED_UNMANAGED_SAVING或者SCHEDULED,而后状态机会根据返回的不同状态信息再做处理

.addTransition(RMAppAttemptState.SUBMITTED, 
 EnumSet.of(RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING,
 RMAppAttemptState.SCHEDULED),
 RMAppAttemptEventType.ATTEMPT_ADDED,
 new ScheduleTransition())

18. 我们接着分析ScheduleTransition(),if入口的开关subCtx.getUnmanagedAM()是获取RM是否应该管理AM的执行。如果为真,那么RM将不会为AM分配一个容器并启动它,默认是false。那很明显我们这里要返回的状态是SCHEDULED

 ApplicationSubmissionContext subCtx = appAttempt.submissionContext;
 //获取RM是否应该管理AM的执行。如果为真,那么RM将不会为AM分配一个容器并启动它,默认是false
 if (!subCtx.getUnmanagedAM()) {
 //在创建新的尝试之前需要重置容器,因为这个请求将被传递给调度器,调度器将在AM容器分配后扣除这个数字
appAttempt.amReq.setNumContainers(1);
 appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
 /* 表示为任一机器 */
appAttempt.amReq.setResourceName(ResourceRequest.ANY);
 appAttempt.amReq.setRelaxLocality(true);

 //调度器分配资源
Allocation amContainerAllocation =
 appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,
 Collections.singletonList(appAttempt.amReq),
 EMPTY_CONTAINER_RELEASE_LIST, null, null);
 if (amContainerAllocation != null
 && amContainerAllocation.getContainers() != null) {
 assert (amContainerAllocation.getContainers().size() == 0);
 }
 return RMAppAttemptState.SCHEDULED;
 } else {
 // save state and then go to LAUNCHED state
 appAttempt.storeAttempt();
 return RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING;
 }

19. 上面的代码中应该能看到一行令人振奋的代码appAttempt.scheduler.allocate(),这里做的是资源的调度,我们这不做详细的分析,在后文AM申请资源时也会调用这个接口申请剩下的Container,后文会有详细的介绍,我们刚刚知道了上文返回了SCHEDULED状态,之前添加转换的方法是会根据返回的状态形成新的转换,这个时候就会调用到下面这个转换,触发了AMContainerAllocatedTransition()

 .addTransition(RMAppAttemptState.SCHEDULED,
 EnumSet.of(RMAppAttemptState.ALLOCATED_SAVING,
 RMAppAttemptState.SCHEDULED),
 RMAppAttemptEventType.CONTAINER_ALLOCATED,
 new AMContainerAllocatedTransition())

20. 具体的分析见代码块,发现在这里也调用了allocate,但是传入没有传入请求,在allocate方法中做了判断的,如果传入的空的请求就是去尝试获取之前申请过的容器,而不是再做一次资源调度

// Acquire the AM container from the scheduler.
 //从调度器获取AM容器
Allocation amContainerAllocation =
 appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,
 EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null,
 null);
 //至少分配一个容器,因为一个container_allocation是在构建一个RMContainer之后发出的,
// 并将其放入到requerapplication # newallocatedcontainers中。
// 注意,YarnScheduler#分配不能保证能够获取它,
// 因为由于某些原因(如DNS不可用导致未生成容器令牌)容器可能无法获取。
// 因此,我们返回到以前的状态并继续重试,直到获取am容器。
if (amContainerAllocation.getContainers().size() == 0) {
 appAttempt.retryFetchingAMContainer(appAttempt);
 return RMAppAttemptState.SCHEDULED;
 }

 // Set the masterContainer
 appAttempt.setMasterContainer(amContainerAllocation.getContainers()
 .get(0));
 RMContainerImpl rmMasterContainer = (RMContainerImpl)appAttempt.scheduler
 .getRMContainer(appAttempt.getMasterContainer().getId());
 rmMasterContainer.setAMContainer(true);
 // NMTokenSecrentManager中的节点集用于标记该节点是否已向AM发出NMToken。
// 当AM容器分配给RM本身时,分配这个AM容器的节点被标记为已经发送的NMToken。
// 因此,清除这个节点集,以便以下来自AM的分配请求能够检索相应的NMToken。
appAttempt.rmContext.getNMTokenSecretManager()
 .clearNodeSetForAttempt(appAttempt.applicationAttemptId);
 appAttempt.getSubmissionContext().setResource(
 appAttempt.getMasterContainer().getResource());
 appAttempt.storeAttempt();

 return RMAppAttemptState.ALLOCATED_SAVING;

 21. 我们看到最终返回了ALLOCATED_SAVING,与之前一样根据返回的状态触发另一个事件

.addTransition(RMAppAttemptState.ALLOCATED_SAVING, 
 RMAppAttemptState.ALLOCATED,
 RMAppAttemptEventType.ATTEMPT_NEW_SAVED, new AttemptStoredTransition())
这个事件终于看到了启动的方法,launchAttempt()这个方法内部提交了一个LAUNCH事件
private static final class AttemptStoredTransition extends BaseTransition {
 @Override
 public void transition(RMAppAttemptImpl appAttempt,
 RMAppAttemptEvent event) {

 appAttempt.registerClientToken();
 appAttempt.launchAttempt();
 }
 }

22. 走到这,我们终于发现了令人振奋的类ApplicationMasterLauncher,刚刚提交了LAUNCH事件,自然走launch()方法

AMLauncherEventType event = appEvent.getType();
 RMAppAttempt application = appEvent.getAppAttempt();
 switch (event) {
 case LAUNCH:
 launch(application);
 break;
 case CLEANUP:
 cleanup(application);
 break;
 default:
 break;
 }

23. 在这里我们首先要分析一下ApplicationMasterLauncher的初始化和启动,这个属于RM的子服务,那在Yarn源码剖析(一) --- RM与NM服务启动以及心跳通信我们也提到过,RM会逐一初始化和启动它的子服务,很明显这里最重要的是启动了一个线程用来处理相关的事件,那我们来看一下线程的run方法

@Override
 protected void serviceInit(Configuration conf) throws Exception {
 int threadCount = conf.getInt(
 YarnConfiguration.RM_AMLAUNCHER_THREAD_COUNT,
 YarnConfiguration.DEFAULT_RM_AMLAUNCHER_THREAD_COUNT);
 ThreadFactory tf = new ThreadFactoryBuilder()
 .setNameFormat("ApplicationMasterLauncher #%d")
 .build();
 launcherPool = new ThreadPoolExecutor(threadCount, threadCount, 1,
 TimeUnit.HOURS, new LinkedBlockingQueue());
 launcherPool.setThreadFactory(tf);

 Configuration newConf = new YarnConfiguration(conf);
 newConf.setInt(CommonConfigurationKeysPublic.
 IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SOCKET_TIMEOUTS_KEY,
 conf.getInt(YarnConfiguration.RM_NODEMANAGER_CONNECT_RETIRES,
 YarnConfiguration.DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES));
 setConfig(newConf);
 super.serviceInit(newConf);
 }

 @Override
 protected void serviceStart() throws Exception {
 launcherHandlingThread.start();
 super.serviceStart();
 }
可以看到run方法是逐一从masterEvents队列中取出事件进行处理
while (!this.isInterrupted()) {
 Runnable toLaunch;
 try {
 toLaunch = masterEvents.take();
 launcherPool.execute(toLaunch);
 } catch (InterruptedException e) {
 LOG.warn(this.getClass().getName() + " interrupted. Returning.");
 return;
 }
 }

24. 这个时候我们回到之前的lunch()方法,很明显,内部调用了createRunnableLauncher,new了一个AMLauncher,并传入 AMLauncherEventType.LAUNCH事件,最后由ApplicationMasterLauncher线程来处理

private void launch(RMAppAttempt application) {
 Runnable launcher = createRunnableLauncher(application, 
 AMLauncherEventType.LAUNCH);
 masterEvents.add(launcher);
 }

 protected Runnable createRunnableLauncher(RMAppAttempt application, 
 AMLauncherEventType event) {
 Runnable launcher =
 new AMLauncher(context, application, event, getConfig());
 return launcher;
 }

25. 那就会触发AMLauncher的run方法,里面有一个lunch()方法,以及提交了一个事件RMAppAttemptEventType.LAUNCHED,这个事件的提交是为了启动AM监控线程的,所以就不做分析了,重点来看lunch()方法

case LAUNCH:
 try {
 LOG.info("Launching master" + application.getAppAttemptId());
 launch();
 handler.handle(new RMAppAttemptEvent(application.getAppAttemptId(),
 RMAppAttemptEventType.LAUNCHED));
 }

26. 这里终于取出了随spark-submit传入的启动AM的上下文,并放在了StartContainerRequest请求中,然后利用调用了startContainers方法

ContainerLaunchContext launchContext =
 createAMContainerLaunchContext(applicationContext, masterContainerID);

 StartContainerRequest scRequest =
 StartContainerRequest.newInstance(launchContext,
 masterContainer.getContainerToken());
 List list = new ArrayList();
 list.add(scRequest); 
 StartContainersRequest allRequests =
 StartContainersRequest.newInstance(list);

 StartContainersResponse response =
 containerMgrProxy.startContainers(allRequests);

27. 终于开始启动AM所在的Container了,这里由ContainerManagerImpl实现,首先内部做了一些校验,执行了关键方法startContainerInternal(nmTokenIdentifier, containerTokenIdentifier, request);这段代码非常的多,所以也只选取关键的部分,我们看到它提交了事件INIT_APPLICATION。跟着代码看进去,发现最终调用了RequestResourcesTransition()方法,我们这里不分

析资源本地化的特性,有兴趣了解的可以自己查阅相关的资料,这个方法的篇幅很长,所以我选了关键的代码来分析,container.sendLaunchEvent()内部提交了ContainersLauncherEventType.LAUNCH_CONTAINER事件,这个事件交由ContainerLuncher类来处理

container.sendLaunchEvent();
container.metrics.endInitingContainer();
return ContainerState.LOCALIZED;

 28.  containerLuncher是一个线程池对线,所以这里非常清楚的看到,new了一个ContainerLuncher对线交由线程池来处理,这里再提一下,前文也涉及到过,spark自己封装的AM启动上下文就是在这里传进去来启动AM的

      case LAUNCH_CONTAINER:
        Application app =
          context.getApplications().get(
              containerId.getApplicationAttemptId().getApplicationId());

        ContainerLaunch launch =
            new ContainerLaunch(context, getConfig(), dispatcher, exec, app,
              event.getContainer(), dirsHandler, containerManager);
        containerLauncher.submit(launch);
        running.put(containerId, launch);
        break;

29. 那到这,AM的启动基本就结束了,关于我们ContainerLuncher线程到底做了什么,大家可以自己去看内部的call()方法,这里我也不做赘述了。

总结

本文讲述了AM启动的全过程,内部的代码真的很复杂,也涉及到许多别的模块的的东西,蛋挞在这并没有全部分析,如果要统筹分析会使得思路变得混乱,对于一些蛋挞感兴趣的模块如状态机、rpc通信这些,在后续Yarn的研究中也会慢慢的学习的。后文将要介绍AM是如何注册到RM上,以及AM申请Container和Container的启动。


作者:蛋挞

日期:2018.08.28

你可能感兴趣的:(Yarn)