APACHE YARN 2.6.0 CODE REVIEW

导语

1.YARN各模块通信协议、gateway对象？
2.YARN App从client启动到执行完成的生命周期实现？
3.CapacityScheduler、FairScheduler实现细节?//TODO:
4.LinuxContainerExecutor vs Docker?//TODO:

YARN 2.6.0 版本还未支持GPU,且未启用DRF的DominantResourceCalculator的情况下，默认的DefaultResourceCalculator只是基于Memory来切割分配Container。

一.各组件间通信协议

框架层协议

ApplicationClientProtocol [ClientRMService] // clients -> RM
ContainerManagementProtocol[ContainerManagerImpl] // AM|RM -> NM
ApplicationMasterProtocol[ApplicationMasterService] // AM -> RM
ResourceTracker[ResourceTrackerService] // NM -> RM

应用层协议

MR:MRClientProtocol [MRClientService] //clinet -> AM

二.on-yarn-app生命周期

Client请求RM获取AM_ID,然后封装ApplicationSubmissionContext提交给RM，RM响应NM心跳，调度NM拉起AM，AM向RM注册，并汇报心跳请求资源，拿到资源后请求NM启动worker Container。
以下Client基于MR的client，但是AM用了distributedshell这个简单的版本，后续再完善MR版本的了

2.1 Client初始化，提交Job[`MR`]到RM，监控状态,打印进度日志

org.apache.hadoop.examples.WordCount.main
    ->new Job(conf, "word count");
    ->job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);
    ->FileInputFormat.addInputPath(job, new Path(otherArgs[i]))->FileOutputFormat.setOutputPath(job,new Path(otherArgs[otherArgs.length - 1]))
    ->job.waitForCompletion(true)
        ->org.apache.hadoop.mapreduce.Job.submit();
            ->org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(Job.this, cluster);
                ->JobSubmissionFiles.getStagingDir(cluster, conf);//初始化stage目录/staging/user/.staging
                ->org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID
                    ->client.createApplication().getApplicationSubmissionContext()//向RM申请AM_ID
                        ->ApplicationClientProtocol.getNewApplication
                            ->[RM]ClientRMService.getNewApplication
                                ->[RM]org.apache.hadoop.yarn.server.utils.BuilderUtils.newApplicationId(recordFactory, ResourceManager.getClusterTimeStamp(),applicationCounter.incrementAndGet() //就是RM的启动时间+自增ID
                    ->new org.apache.hadoop.mapred.JobID(identifier, appID.getId());//基于AM_ID构建Job_ID
                ->conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, new Path(jobStagingArea, jobId.toString()));//Configuring job with submitJobDir as the submit dir
                ->get delegation token for the dir
                ->JobResourceUploader.uploadFiles(job, jobSubmitDir);//Upload and configure files, libjars, jobjars, and archives pertaining to * the passed job
                ->JobSubmitter.writeSplits(job, submitJobDir);
                    ->org.apache.hadoop.mapreduce.split.JobSplitWriter.writeNewSplits(conf, splits, out);
                    ->org.apache.hadoop.mapreduce.split.JobSplitWriter.writeJobSplitMetaInfo
                ->conf.set(toFullPropertyName(queue,QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());//write "queue admins of the queue to which job is being submitted"
                ->writeConf(conf, submitJobFile);//Write job file to submit dir
                ->org.apache.hadoop.mapred.YARNRunner.submitJob(jobId, submitJobDir.toString(), job.getCredentials())
                    ->YARNRunner.createApplicationSubmissionContext
                        ->capability.setMemory(conf.getInt(MRJobConfig.MR_AM_VMEM_MB, MRJobConfig.DEFAULT_MR_AM_VMEM_MB));capability.setVirtualCores;
                        ->localResources.put(MRJobConfig.JOB_CONF_FILE,createApplicationResource(defaultFileContext,jobConfPath, LocalResourceType.FILE))
                        ->localResources.put(MRJobConfig.JOB_JAR, createApplicationResource(FileContext.getFileContext(jobJarPath.toUri(), jobConf),jobJarPath,LocalResourceType.PATTERN)) //jar
                        ->localResources.put(MRJobConfig.JOB_SUBMIT_DIR + "/" + (MRJobConfig.JOB_SPLIT||MRJobConfig.JOB_SPLIT_METAINFO),createApplicationResource(defaultFileContext,new Path(jobSubmitDir, s), LocalResourceType.FILE)//split info
                        ->/bin/java org.apache.hadoop.mapreduce.v2.app.MRAppMaster ..>>ApplicationConstants.LOG_DIR_EXPANSION_VAR  //Setup the command to run the AM
                        ->MRApps.setClasspath(environment, conf);
                        ->//Setup the environment
                        ->ContainerLaunchContext amContainer =ContainerLaunchContext.newInstance(localResources, environment,vargsFinal, null, securityTokens, acls);
                        ->appContext.setApplicationId(applicationId); appContext.setQueue();appContext.setApplicationName
                        ->appContext.setAMContainerSpec(amContainer); // AM Container
                    ->[RM]ClientRMService.submitApplication(appContext)
                        ->[RM]RMAppManager.submitApplication
                            ->[RM]new RMAppImpl() // Create RMApp
                            ->[RM]rmContext.getRMApps().putIfAbsent(applicationId, application)//add to rmContext
                            ->[RMAppState.NEW->RMAppEventType.NEW_SAVING]RMAppImpl.RMAppNewlySavingTransition.transition()
                                ->StoreAppTransition.transition
                                ->storeApplicationStateInternal(appId, appState);//RMAppEventType.APP_NEW_SAVED
                            ->[RMAppState.NEW_SAVING->RMAppState.SUBMITTED]AddApplicationToSchedulerTransition
                                ->FifoScheduler.handle(new AppAddedSchedulerEvent())//APP_ADDED
                                    ->FifoScheduler.addApplication->AbstractYarnScheduler.applications.put(applicationId, application);
                            ->[RMAppState.SUBMITTED->RMAppState.ACCEPTED] StartAppAttemptTransition
                                ->RMAppImpl.createAndStartNewAttempt
                                    ->RMAppImpl.createNewAttempt
                                        ->new RMAppAttemptImpl
                                        ->RMAppImpl.attempts.put(appAttemptId, attempt);
                                ->[RMAppAttemptState.NEW, RMAppAttemptState.SUBMITTED]AttemptStartedTransition
                                    ->ApplicationMasterService.registerAppAttempt(appAttempt.applicationAttemptId)
                                        ->ApplicationMasterService.responseMap.put(attemptId, new AllocateResponseLock(response));
                                    ->FifoScheduler.handle(new AppAttemptAddedSchedulerEvent())//APP_ATTEMPT_ADDED
                                        ->FifoScheduler.addApplicationAttempt
                                            ->[RMAppAttemptState.SUBMITTED, RMAppAttemptState.SCHEDULED] ScheduleTransition
                                                ->appAttempt.amReq.setNumContainers(1);setPriority(AM_CONTAINER_PRIORITY);setResourceName(ResourceRequest.ANY);setRelaxLocality(true);//设置AM ResourceRequest
                                                ->appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,Collections.singletonList(appAttempt.amReq),EMPTY_CONTAINER_RELEASE_LIST,amBlacklist.getAdditions(),amBlacklist.getRemovals())
                                                    ->FifoScheduler.allocate //注意：此处并未同步分配资源，只是记录下来，返回的Allocation是之前异步分配得到的资源
                                                        ->scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens
                                                            ->rmContainer.handle(new RMContainerEvent(rmContainer.getContainerId(),RMContainerEventType.ACQUIRED)) //给NM->scheduler->app获取到的newlyAllocatedContainers发送event
                                                                ->[RMContainerState.ALLOCATED -> RMContainerState.ACQUIRED] AcquiredTransition
                                                                    ->container.containerAllocationExpirer.register(container.getContainerId());
                                                                    ->[RMAppState.ACCEPTED->RMAppState.ACCEPTED] AppRunningOnNodeTransition
                                                                        ->rmAppImpl.ranNodes.add(nodeAddedEvent.getNodeId());
                                                            ->return new ContainersAndNMTokensAllocation(SchedulerApplicationAttempt.newlyAllocatedContainers, nmTokens);//返回获取到的application获得的Container[TODO://newlyAllocatedContainers如何初始化的,即scheduler如何赋值这个]
                    ->[RM]ClientRMService.getApplicationReport
                        ->[RM]RMAppImpl.createAndGetApplicationReport
                            ->RMAppImpl.currentAttempt //读取currentAttempt状态并返回
        ->org.apache.hadoop.mapreduce.Job.monitorAndPrintJob();//连接AM并读取应用状态，打印日志
            ->while (!isComplete() || !reportedAfterCompletion) 
                ->mapreduce.Job.updateStatus //更新MR进度
                    ->YARNRunner.getJobStatus(status.getJobID())
                        ->mapred.ClientServiceDelegate.getJobStatus
                            ->obReport report = ((GetJobReportResponse) invoke("getJobReport",GetJobReportRequest.class, request)).getJobReport()
                                ->mapred.ClientServiceDelegate.invoke
                                    ->mapred.ClientServiceDelegate.getProxy
                                        ->MRClientProtocol MRClientProxy=application.getTrackingUrl();
                                        ->serviceAddr = NetUtils.createSocketAddrForHost(application.getHost(), application.getRpcPort());//获取AM的host以及RPC接口
                                        ->instantiateAMProxy(serviceAddr)//构建client->AM proxy
                                ->[AM]MRClientService.getJobReport //AM读取job状态
                                    ->org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.getReport
                ->mapreduce.Job.getTaskCompletionEvents 
                    ->[AM]org.apache.hadoop.mapreduce.v2.app.client.MRClientService.MRClientProtocolHandler.getTaskAttemptCompletionEvents

2.2 NM启动，汇报心跳给RM , Scheduler[`FifoScheduler`]分配资源，拉起AM container

org.apache.hadoop.yarn.server.nodemanager.NodeManager.main
    ->new NodeManager().initAndStartNodeManager(conf, false) //CompositeService类
        ->NodeManager.serviceInit;
            ->initAndStartRecoveryStore//涉及Recovery的暂时跳过
            ->this.aclsManager = new ApplicationACLsManager(conf);//涉及Token的也都跳过
            ->ContainerExecutor exec = ReflectionUtils.newInstance(conf.getClass(YarnConfiguration.NM_CONTAINER_EXECUTOR,DefaultContainerExecutor.class, ContainerExecutor.class), conf);
            ->ContainerExecutor.init
                ->nodemanager.LinuxContainerExecutor.init || DockerContainerExecutor
                    ->container-executor --checksetup //执行shell检查健康状态
            ->addService(DeletionService)//NM本地文件及日志的清理Service
            ->new NodeHealthCheckerService();//健康检查service
            ->new LocalDirsHandlerService()//本地目录检查
            ->NodeManager.context=createNMContext(containerTokenSecretManager,nmTokenSecretManager, nmStore)
            ->NodeManager.nodeStatusUpdater = createNodeStatusUpdater(context, dispatcher, nodeHealthChecker);[TODO://]
                ->NodeStatusUpdaterImpl.NodeStatusUpdaterImpl()
                ->NodeStatusUpdaterImpl.serviceInit
                    ->NodeStatusUpdaterImpl.totalResource = Resource.newInstance(memoryMb, virtualCores) //计算当前节点资源
                ->NodeStatusUpdaterImpl.serviceStart
                    ->NodeStatusUpdaterImpl.resourceTracker = getRMClient();
                    ->NodeStatusUpdaterImpl.registerWithRM
                        ->RegisterNodeManagerResponse regNMResponse = resourceTracker.registerNodeManager(request);
                            ->[RM]org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.registerNodeManager //以下全是RM里的操作
                                ->[RM]this.nodesListManager.isValidNode(host)//Check if this node is a 'valid' node
                                ->[RM]RMNode rmNode = new RMNodeImpl(nodeId, rmContext, host, cmPort, httpPort,resolve(host), capability, nodeManagerVersion)
                                ->[RM]this.rmContext.getDispatcher().getEventHandler().handle(new RMNodeStartedEvent)
                                    ->[NodeState.NEW, NodeState.RUNNING]AddNodeTransition
                                ->[RM]resourcemanager.ResourceTrackerService.handleNMContainerStatus(status, nodeId);
                                    ->if (container.getContainerState() == ContainerState.RUNNING) { RMNodeImpl.launchedContainers.add(container.getContainerId()) }//将NM汇报的的container加入launchedContainers
                                    ->RMNodeImpl.handleRunningAppOnNode(rmNode, rmNode.context, startEvent.getRunningApplications(), rmNode.nodeId)//处理RunningApplications，分发RMAppRunningOnNodeEvent事件
                                        ->getDispatcher().getEventHandler().handle(new RMAppRunningOnNodeEvent)
                                            ->[ ->RMAppState.RUNNING]AppRunningOnNodeTransition
                                    ->context.getDispatcher().getEventHandler().handle(new NodeAddedSchedulerEvent())
                                        ->FifoScheduler.handle(NODE_ADDED)
                                            ->FifoScheduler.addNode
                                                ->FiCaSchedulerNode schedulerNode = new FiCaSchedulerNode(rMNode, usePortForNodeName);
                                                ->FifoScheduler.nodes.put(nodeManager.getNodeID(), schedulerNode);
                                                ->Resources.addTo(FifoScheduler.clusterResource, nodeManager.getTotalCapability());//当前NM的资源加入集群资源
                                                ->AbstractYarnScheduler.updateMaximumAllocation//更新集群最大资源配置
                                            ->AbstractYarnScheduler.recoverContainersOnNode //recoverContainers on that NM
                                    ->context.getDispatcher().getEventHandler().handle(new NodesListManagerEvent(NodesListManagerEventType.NODE_USABLE, rmNode))
                                        ->resourcemanager.NodesListManager.handle(NODE_USABLE)
                                            ->rmContext.getDispatcher().getEventHandler().handle(new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,RMAppNodeUpdateType.NODE_USABLE)
                                                -[RMAppState.* -> RMAppState.*] RMAppNodeUpdateTransition
                                                    ->RMAppImpl.processNodeUpdate
                                                        ->RMAppImpl.updatedNodes.add(node);//将NM状态变更告诉RMAppImpl
                    ->NodeStatusUpdaterImpl.startStatusUpdater //启动线程汇报心跳并触发RM Scheduler
                        ->response = resourceTracker.nodeHeartbeat(request);
                            ->↓[RM]↓ ResourceTrackerService.nodeHeartbeat//下面都是RM内的流程
                                ->this.nodesListManager.isValidNode()//1. Check if it's a valid (i.e. not excluded) node
                                ->RMNode rmNode = this.rmContext.getRMNodes().get(nodeId);//2. Check if it's a registered node
                                ->ResourceTrackerService.nmLivelinessMonitor.receivedPing(nodeId);//记录心跳
                                ->//3. Check if it's a 'fresh' heartbeat i.e. not duplicate heartbeat
                                ->RMNodeImpl.updateNodeHeartbeatResponseForCleanup(nodeHeartBeatResponse);
                                    ->response.addAllContainersToCleanup(new ArrayList(this.containersToClean));response.addAllApplicationsToCleanup(this.finishedApplications);response.addContainersToBeRemovedFromNM(new ArrayList(this.containersToBeRemovedFromNM));
                                ->rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent)//发送RMNodeEventType.STATUS_UPDATE事件
                                    ->[NodeState.RUNNING, NodeState.RUNNING]StatusUpdateWhenHealthyTransition
                                        ->RMNodeImpl.handleContainerStatus(statusEvent.getContainers());//更新RMNodeImpl内状态
                                            ->RMNodeImpl.nodeUpdateQueue.add(new UpdatedContainerInfo(newlyLaunchedContainers,completedContainers)//将container变动加入nodeUpdateQueue中等下一步Scheduler用
                                        ->context.getDispatcher().getEventHandler().handle(new NodeUpdateSchedulerEvent(rmNode))//nodeupdate触发Scheduler资源分配
                                            ->FifoScheduler.handle(NODE_UPDATE)
                                                ->FifoScheduler.nodeUpdate
                                                    ->RMNodeImpl.pullContainerUpdates //之前汇报心跳保存的RMNodeImpl.nodeUpdateQueue
                                                    ->AbstractYarnScheduler.containerLaunchedOnNode
                                                        ->rmContainer.handle(new RMContainerEvent(containerId,RMContainerEventType.LAUNCHED));// Processing the newly launched containers,给Container发送LAUNCHED事件[TODO://]
                                                    ->FifoScheduler.completedContainer(RMContainerEventType.FINISHED)
                                                        ->application.containerCompleted(rmContainer, containerStatus, event);// Inform the application
                                                            ->rmContainer.handle(new RMContainerFinishedEvent(containerId,containerStatus, event));
                                                                ->[RMContainerState.RUNNING, RMContainerState.COMPLETED] FinishedTransition
                                                                    ->eventHandler.handle(new RMAppAttemptContainerFinishedEvent(container.appAttemptId, finishedEvent.getRemoteContainerStatus(),container.getAllocatedNode())//更新
                                                                        ->[RMAppAttemptState.RUNNING -> RMAppAttemptState.*]RMAppAttemptImpl.ContainerFinishedTransition.transition [TODO://Container对APP的影响后续再跟了]
                                                        ->FiCaSchedulerNode.releaseContainer(container);
                                                    ->FifoScheduler.assignContainers(node);//终于到了给node分配资源了
                                                        ->//基于fifo取出app,然后基于priority分配资源
                                                        ->FifoScheduler.assignContainersOnNode //依次满足节点、机架、随机的需求
                                                            ->FifoScheduler.assignNodeLocalContainers 
                                                            ->FifoScheduler.assignRackLocalContainers
                                                            ->FifoScheduler.assignOffSwitchContainers
                                                                ->FifoScheduler.assignContainer(FiCaSchedulerNode node, FiCaSchedulerApp application, Priority priority, int assignableContainers, ResourceRequest request, NodeType type)
                                                                    ->Container container =BuilderUtils.newContainer(containerId, nodeId, node.getRMNode().getHttpAddress(), capability, priority, null);// new capability Container 
                                                                    ->FiCaSchedulerApp.allocate(type, node, priority, request, container) 
                                                                        ->RMContainer rmContainer = new RMContainerImpl(container, this.getApplicationAttemptId(), node.getNodeID(),appSchedulingInfo.getUser(), this.rmContext)
                                                                            ->//更新SchedulerApplicationAttempt.newlyAllocatedContainers、SchedulerApplicationAttempt.liveContainers
                                                                            ->rmContainer.handle(new RMContainerEvent(container.getId(), RMContainerEventType.START)
                                                                                ->[RMContainerState.NEW->RMContainerState.ALLOCATED] ContainerStartedTransition
                                                                                    ->eventHandler.handle(new RMAppAttemptContainerAllocatedEvent()) //RMAppAttemptEventType.CONTAINER_ALLOCATED
                                                                                        ->[RMAppAttemptState.SCHEDULED->RMAppAttemptState.ALLOCATED_SAVING] AMContainerAllocatedTransition
                                                                                            ->Allocation amContainerAllocation =appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null,null);//拿到新分配的container
                                                                                            ->appAttempt.setMasterContainer() // Set the masterContainer
                                                                                            ->appAttempt.storeAttempt();
                                                                                                ->RMStateStore.storeNewApplicationAttempt
                                                                                                    ->dispatcher.getEventHandler().handle(new RMStateStoreAppAttemptEvent(attemptState))
                                                                                                        ->RMStateStore.StoreAppAttemptTransition
                                                                                                            ->RMStateStore.notifyApplicationAttempt(new RMAppAttemptEvent(attemptState.getAttemptId(),RMAppAttemptEventType.ATTEMPT_NEW_SAVED))
                                                                                                                ->[RMAppAttemptState.ALLOCATED_SAVING, RMAppAttemptState.ALLOCATED]AttemptStoredTransition() 
                                                                                                                    ->RMAppAttemptImpl.launchAttempt -> eventHandler.handle(new AMLauncherEvent(AMLauncherEventType.LAUNCH, this));//发出拉起AM事件
                                                                                                                        ->ApplicationMasterLauncher.handle(LAUNCH)->ApplicationMasterLauncher.createRunnableLauncher -> new AMLauncher
                                                                                                                        ->ApplicationMasterLauncher线程会一直读取masterEvents，并启动AMLauncher线程
                                                                                                                            ->resourcemanager.amlauncher.AMLauncher.run
                                                                                                                                ->resourcemanager.amlauncher.AMLauncher.launch();
                                                                                                                                    ->StartContainersResponse response = containerMgrProxy.startContainers(allRequests);
                                                                                                                                        ->[NM]ContainerManagerImpl.startContainers //NM拉起AM Container
                                                                                                                                ->handler.handle(new RMAppAttemptEvent(application.getAppAttemptId(),RMAppAttemptEventType.LAUNCHED)) //至此RM已经通知NM启动AM，等待AM的注册，使AppAttempt变为RUNNING[TODO://后续流程见下面]

                                                                    ->FiCaSchedulerNode.allocateContainer(rmContainer)
                        ->response.getNodeAction()->SHUTDOWN | RESYNC //NM是否关闭或是同步
                        ->removeOrTrackCompletedContainersFromContext(response.getContainersToBeRemovedFromNM());//移除RM不再需要的Containers
                        ->dispatcher.getEventHandler().handle(new CMgrCompletedContainersEvent(containersToCleanup,CMgrCompletedContainersEvent.Reason.BY_RESOURCEMANAGER))
                            ->dispatcher.getEventHandler().handle(new ContainerKillEvent(container,ContainerExitStatus.KILLED_BY_RESOURCEMANAGER,"Container Killed by ResourceManager")) //发送ContainerKillEvent
                        ->heartbeatMonitor.wait(nextHeartBeatInterval);
            ->NodeResourceMonitor nodeResourceMonitor = createNodeResourceMonitor();//yarn后续版本的资源监控直接起线程监控了
            ->NodeManager.containerManager = createContainerManager(context, exec, del, nodeStatusUpdater,this.aclsManager, dirsHandler) [TODO://]
                ->ContainerManagerImpl.ContainerManagerImpl() //构造函数
                    ->ContainerManagerImpl.resourceLocalizationService = createResourceLocalizationService(exec, deletionContext, context);
                        ->ResourceLocalizationService.serviceInit
                            ->ResourceLocalizationService.publicRsrc = new LocalResourcesTrackerImpl
                            ->cleanUpLocalDirs;initializeLocalDirs;initializeLogDirs;//初始化本地目录
                            ->localizationServerAddress= "localizer.address"//Address where the localizer IPC is
                            ->localizerTracker = createLocalizerTracker(conf)
                                ->LocalizerTracker.LocalizerTracker()
                                    ->this.publicLocalizer = new PublicLocalizer(conf); 
                                    ->this.privLocalizers = privLocalizers; //new一个
                                ->LocalizerTracker.serviceStart
                                    ->publicLocalizer.start()
                                        ->ResourceLocalizationService.PublicLocalizer.run()
                                            ->//循环从PublicLocalizer.queue中获取完成下载的任务，发出ResourceLocalizedEvent事件，触发LocalizedResource状态变更方法
                                                ->[ResourceState.DOWNLOADING, ResourceState.LOCALIZED] FetchSuccessTransition
                    ->ContainerManagerImpl.containersLauncher = createContainersLauncher(context, exec);
                    ->ContainerManagerImpl.auxiliaryServices = new AuxServices();[TODO://AuxServices shuffle 后面再看]
                        ->
                    ->ContainerManagerImpl.containersMonitor = new ContainersMonitorImpl(exec, dispatcher, this.context);//其实监控的是貌似只有内存[TODO://监控细节后续再看]
                        ->ContainersMonitorImpl.serviceInit//初始化一堆比如CPU、内存的配置，比如maxPmemAllottedForContainers,maxVCoresAllottedForContainers
                        ->ContainersMonitorImpl.serviceStart
                            ->MonitoringThread.run
                                ->new ContainerKillEvent //如果发现container超过资源使用,则发出kill事件
                                    ->[->ContainerState.KILLING] KillTransition
                ->ContainerManagerImpl.serviceInit
                    ->new LogAggregationService(this.dispatcher, context,deletionService, dirsHandler)[TODO://LogAggregation相关的后续再看]
                ->ContainerManagerImpl.serviceStart
                    ->ContainerManagerImpl.server = rpc.getServer(ContainerManagementProtocol.class, this, initialAddress, serverConf, this.context.getNMTokenSecretManager(),conf.getInt(YarnConfiguration.NM_CONTAINER_MGR_THREAD_COUNT, YarnConfiguration.DEFAULT_NM_CONTAINER_MGR_THREAD_COUNT));//创建NM 的ipc接口服务
            ->WebServer webServer = createWebServer(context, containerManager.getContainersMonitor(), this.aclsManager, dirsHandler); //启动NM 的web服务，默认8042
        ->NodeManager.serviceStart;
            ->super.serviceStart(); //do nothing

2.3 NM启动container

org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers
    ->ContainerManagerImpl.startContainerInternal
        ->Container container = new ContainerImpl(getConfig(), this.dispatcher,launchContext, credentials, metrics, containerTokenIdentifier,context);
        ->Application application = new ApplicationImpl(dispatcher, user, applicationID, credentials, context);
        ->dispatcher.getEventHandler().handle(new ApplicationInitEvent(applicationID, appAcls,logAggregationContext))//如果本NM上没有此APP_ID,则初始化
            ->[ApplicationState.NEW->ApplicationState.INITING]AppInitTransition
                ->dispatcher.getEventHandler().handle(new LogHandlerAppStartedEvent())
                    ->logaggregation.LogAggregationService.handle(APPLICATION_STARTED) [TODO://logaggregation的暂时都跳过]
        ->this.context.getNMStateStore().storeContainer(containerId, request);
        ->dispatcher.getEventHandler().handle(new ApplicationContainerInitEvent(container))
            ->[ApplicationState.INITING->ApplicationState.INITING]InitContainerTransition
                ->ApplicationImpl.containers.put(container.getContainerId(), container);
                ->dispatcher.getEventHandler().handle(new ContainerInitEvent(container.getContainerId())); //ContainerEventType.INIT_CONTAINER
                    ->[ContainerState.NEW->ContainerState.LOCALIZING]RequestResourcesTransition
                        ->dispatcher.getEventHandler().handle(new AuxServicesEvent(AuxServicesEventType.CONTAINER_INIT, container));[TODO://AuxServices相关的暂时跳过]
                        ->container.pendingResources分别放入ContainerImpl.publicRsrcs，ContainerImpl.privateRsrcs,ContainerImpl.appRsrcs
                        ->dispatcher.getEventHandler().handle(new ContainerLocalizationRequestEvent(container, req)) //资源下载请求封装为ContainerLocalizationRequestEvent
                            ->nodemanager.containermanager.localizer.ResourceLocalizationService.handle(INIT_CONTAINER_RESOURCES)
                                ->ResourceLocalizationService.handleInitContainerResources
                                    ->LocalizerContext ctxt = new LocalizerContext(c.getUser(), c.getContainerId(), c.getCredentials(), statCache);
                                    ->LocalResourcesTracker tracker = getLocalResourcesTracker(e.getKey(), c.getUser(),c.getContainerId().getApplicationAttemptId().getApplicationId()) //public为全局,PRIVATE、APPLICATION获取map中相应的实例
                                    ->tracker.handle(new ResourceRequestEvent(req, e.getKey(), ctxt));
                                        ->[ResourceState.INIT, ResourceState.DOWNLOADING]FetchResourceTransition
                                            ->LocalizedResource.dispatcher.getEventHandler().handle(new LocalizerResourceRequestEvent))
                                                ->containermanager.localizer.ResourceLocalizationService.LocalizerTracker.handle(REQUEST_RESOURCE_LOCALIZATION)
                                                    ->PUBLIC:
                                                        ->PublicLocalizer.addResource(req);
                                                    ->PRIVATE || APPLICATION
                                                        ->if (null == localizer) {localizer = new LocalizerRunner(req.getContext(), locId);localizer.start()};
                                                            ->nodemanager.LinuxContainerExecutor.startLocalizer //调用container-executor启动ContainerLocalizer
                                                            -> delService.delete(context.getUser(),null, paths.toArray(new Path[paths.size()])) //下载的文件加入delService
                                                            ->localizer.addResource(req);->LocalizerRunner.pending.add(..)
                                                                ->scheduled.put(nextRsrc, evt);//上面初始化的ContainerLocalizer会持续的轮询，并启动执行下再
                                                                    ->getLocalResourcesTracker(req.getVisibility(), user, applicationId).handle(new ResourceLocalizedEvent()) //ResourceEventType.LOCALIZED
                                                                        ->[ResourceState.DOWNLOADING -> ResourceState.LOCALIZED]FetchSuccessTransition
                                                                            ->dispatcher.getEventHandler().handle(new ContainerResourceLocalizedEvent())
                                                                                ->[ContainerState.LOCALIZING -> ContainerState.LOCALIZED] LocalizedTransition
                                                                                    ->ContainerImpl.sendLaunchEvent();
                                                                                        ->dispatcher.getEventHandler().handle(new ContainersLauncherEvent()) //ContainersLauncherEventType.LAUNCH_CONTAINER
                                                                                            ->launcher.ContainersLauncher.handle(LAUNCH_CONTAINER)
                                                                                                ->ContainerLaunch launch = new ContainerLaunch(context, getConfig(), dispatcher, exec, app,event.getContainer(), dirsHandler, containerManager);
                                                                                                ->containerLauncher.submit(launch);//提交线程池
                                                                                                    ->ContainerLaunch.call
                                                                                                        ->
                                                                                                        ->dispatcher.getEventHandler().handle(new ContainerEvent(containerID,ContainerEventType.CONTAINER_LAUNCHED)))
                                                                                                        ->LinuxContainerExecutor.activateContainer(containerID, pidFilePath);//开始执行container
                                                                                                        ->int ret = LinuxContainerExecutor.launchContainer(container, nmPrivateContainerScriptPath,nmPrivateTokensPath, user, appIdStr, containerWorkDir,localDirs, logDirs)
                                                                                                        ->CASE(Throwable)
                                                                                                             -> dispatcher.getEventHandler().handle(new ContainerExitEvent(containerID, ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, ret,e.getMessage()))//报错触发CONTAINER_EXITED_WITH_FAILURE
                                                                                                        ->CASE(ExitCode.FORCE_KILLED || ExitCode.TERMINATED)
                                                                                                            ->dispatcher.getEventHandler().handle(new ContainerExitEvent(containerID,ContainerEventType.CONTAINER_KILLED_ON_REQUEST, ret,"Container exited with a non-zero exit code " + ret)
                                                                                                        ->CASE(0)
                                                                                                            ->dispatcher.getEventHandler().handle(new ContainerEvent(containerID,ContainerEventType.CONTAINER_EXITED_WITH_SUCCESS))
                                                                                                                ->[ContainerState.RUNNING->ContainerState.EXITED_WITH_SUCCESS]ExitedWithSuccessTransition
                                                                                                                    ->dispatcher.getEventHandler().handle(new ContainersLauncherEvent(container,ContainersLauncherEventType.CLEANUP_CONTAINER)
                                                                                                                        ->nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer //执行完成清理Container[TODO://]
                                                                                                ->ContainersLauncher.running.put(containerId, launch);

2.4 AM[`distributedshell.ApplicationMaster instead of MRAppMaster`]汇报给RM心跳，联系NM启动container，并监控状态

distributedshell.ApplicationMaster.main
    ->distributedshell.ApplicationMaster.init//yarn container传参都是通过环境变量传参的，envs=System.getenv()
        ->appAttemptID = containerId.getApplicationAttemptId()
        ->shellCommand = readContent(shellCommandPath);//文件等从hdfs读取，client提交任务会准备好
    ->distributedshell.ApplicationMaster.run
        ->amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); //AMRMClientAsync.CallbackHandler 由于不管是NM管理container还是RM分配资源都是异步的，需要提供Callback来处理请求结果
        ->nmClientAsync = new NMClientAsyncImpl(containerListener); //NMCallbackHandler
        ->//Setup local RPC Server to accept status requests directly from clients 按道理AM要提供一个应用层协议RPC接口供client获取日志、状态等
        ->amRMClient.registerApplicationMaster(appMasterHostname, appMasterRpcPort,appMasterTrackingUrl);
            ->↓[RM]↓ApplicationMasterService.registerApplicationMaster //RM中注册AM
                ->rmContext.getDispatcher().getEventHandler().handle(new RMAppAttemptRegistrationEvent())
                    ->[RMAppAttemptState.LAUNCHED, RMAppAttemptState.RUNNING] AMRegisteredTransition //Attempt转为RUNNING状态，后续FINISHING、KILLED、FAILED以后再补充
                        ->eventHandler.handle(new RMAppEvent(appAttempt.getAppAttemptId().getApplicationId(),RMAppEventType.ATTEMPT_REGISTERED))
                            ->[RMAppState.ACCEPTED->RMAppState.RUNNING] //do nothing
        ->distributedshell.ApplicationMaster.RMCallbackHandler.onContainersAllocated //当AM的资源获的满足，则AM直接联系NM启动containers
            ->ApplicationMaster.LaunchContainerRunnable.run
                ->ContainerLaunchContext ctx = ContainerLaunchContext.newInstance(localResources, myShellEnv, commands, null, allTokens.duplicate()//初始化ContainerLaunchContext
                ->startContainerAsync.startContainerAsync(container, ctx)//联系NM拉起Container
                    ->[NM]org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers
        ->ApplicationMaster.NMCallbackHandler.NMCallbackHandler//获取NM的Container状态、事件[TODO://]
    ->distributedshell.ApplicationMaster.finish
        ->unregisterApplicationMaster(appStatus, appMessage, null);
            ->[RM]ApplicationMasterService.finishApplicationMaster //通知RM任务完成

三.调度器

实现一个调度器需要关注一下几点:本地化、reserve 、preempt、label
TODO://后续再补充了

四.reference

https://hortonworks.com/blog/introducing-apache-hadoop-yarn/
Hadoop YARN权威指南
https://www.cnblogs.com/shenh062326/p/3587108.html/
https://blog.csdn.net/gaopenghigh/article/details/45507765/

五.下篇

APACHE KAFKA 0.10.0 CODE REVIEW