导语
1.YARN各模块通信协议、gateway对象?
2.YARN App从client启动到执行完成的生命周期实现?
3.CapacityScheduler、FairScheduler实现细节?//TODO:
4.LinuxContainerExecutor vs Docker?//TODO:
YARN 2.6.0 版本还未支持GPU,且未启用DRF的DominantResourceCalculator
的情况下,默认的DefaultResourceCalculator
只是基于Memory来切割分配Container。
一.各组件间通信协议
- 框架层协议
ApplicationClientProtocol [ClientRMService] // clients -> RM
ContainerManagementProtocol[ContainerManagerImpl] // AM|RM -> NM
ApplicationMasterProtocol[ApplicationMasterService] // AM -> RM
ResourceTracker[ResourceTrackerService] // NM -> RM
- 应用层协议
MR:MRClientProtocol [MRClientService] //clinet -> AM
二.on-yarn-app生命周期
Client请求RM获取AM_ID,然后封装ApplicationSubmissionContext提交给RM,RM响应NM心跳,调度NM拉起AM,AM向RM注册,并汇报心跳请求资源,拿到资源后请求NM启动worker Container。
以下Client基于MR的client,但是AM用了distributedshell这个简单的版本,后续再完善MR版本的了
2.1 Client初始化,提交Job[MR
]到RM,监控状态,打印进度日志
org.apache.hadoop.examples.WordCount.main
->new Job(conf, "word count");
->job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);
->FileInputFormat.addInputPath(job, new Path(otherArgs[i]))->FileOutputFormat.setOutputPath(job,new Path(otherArgs[otherArgs.length - 1]))
->job.waitForCompletion(true)
->org.apache.hadoop.mapreduce.Job.submit();
->org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(Job.this, cluster);
->JobSubmissionFiles.getStagingDir(cluster, conf);//初始化stage目录/staging/user/.staging
->org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID
->client.createApplication().getApplicationSubmissionContext()//向RM申请AM_ID
->ApplicationClientProtocol.getNewApplication
->[RM]ClientRMService.getNewApplication
->[RM]org.apache.hadoop.yarn.server.utils.BuilderUtils.newApplicationId(recordFactory, ResourceManager.getClusterTimeStamp(),applicationCounter.incrementAndGet() //就是RM的启动时间+自增ID
->new org.apache.hadoop.mapred.JobID(identifier, appID.getId());//基于AM_ID构建Job_ID
->conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, new Path(jobStagingArea, jobId.toString()));//Configuring job with submitJobDir as the submit dir
->get delegation token for the dir
->JobResourceUploader.uploadFiles(job, jobSubmitDir);//Upload and configure files, libjars, jobjars, and archives pertaining to * the passed job
->JobSubmitter.writeSplits(job, submitJobDir);
->org.apache.hadoop.mapreduce.split.JobSplitWriter.writeNewSplits(conf, splits, out);
->org.apache.hadoop.mapreduce.split.JobSplitWriter.writeJobSplitMetaInfo
->conf.set(toFullPropertyName(queue,QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());//write "queue admins of the queue to which job is being submitted"
->writeConf(conf, submitJobFile);//Write job file to submit dir
->org.apache.hadoop.mapred.YARNRunner.submitJob(jobId, submitJobDir.toString(), job.getCredentials())
->YARNRunner.createApplicationSubmissionContext
->capability.setMemory(conf.getInt(MRJobConfig.MR_AM_VMEM_MB, MRJobConfig.DEFAULT_MR_AM_VMEM_MB));capability.setVirtualCores;
->localResources.put(MRJobConfig.JOB_CONF_FILE,createApplicationResource(defaultFileContext,jobConfPath, LocalResourceType.FILE))
->localResources.put(MRJobConfig.JOB_JAR, createApplicationResource(FileContext.getFileContext(jobJarPath.toUri(), jobConf),jobJarPath,LocalResourceType.PATTERN)) //jar
->localResources.put(MRJobConfig.JOB_SUBMIT_DIR + "/" + (MRJobConfig.JOB_SPLIT||MRJobConfig.JOB_SPLIT_METAINFO),createApplicationResource(defaultFileContext,new Path(jobSubmitDir, s), LocalResourceType.FILE)//split info
->/bin/java org.apache.hadoop.mapreduce.v2.app.MRAppMaster ..>>ApplicationConstants.LOG_DIR_EXPANSION_VAR //Setup the command to run the AM
->MRApps.setClasspath(environment, conf);
->//Setup the environment
->ContainerLaunchContext amContainer =ContainerLaunchContext.newInstance(localResources, environment,vargsFinal, null, securityTokens, acls);
->appContext.setApplicationId(applicationId); appContext.setQueue();appContext.setApplicationName
->appContext.setAMContainerSpec(amContainer); // AM Container
->[RM]ClientRMService.submitApplication(appContext)
->[RM]RMAppManager.submitApplication
->[RM]new RMAppImpl() // Create RMApp
->[RM]rmContext.getRMApps().putIfAbsent(applicationId, application)//add to rmContext
->[RMAppState.NEW->RMAppEventType.NEW_SAVING]RMAppImpl.RMAppNewlySavingTransition.transition()
->StoreAppTransition.transition
->storeApplicationStateInternal(appId, appState);//RMAppEventType.APP_NEW_SAVED
->[RMAppState.NEW_SAVING->RMAppState.SUBMITTED]AddApplicationToSchedulerTransition
->FifoScheduler.handle(new AppAddedSchedulerEvent())//APP_ADDED
->FifoScheduler.addApplication->AbstractYarnScheduler.applications.put(applicationId, application);
->[RMAppState.SUBMITTED->RMAppState.ACCEPTED] StartAppAttemptTransition
->RMAppImpl.createAndStartNewAttempt
->RMAppImpl.createNewAttempt
->new RMAppAttemptImpl
->RMAppImpl.attempts.put(appAttemptId, attempt);
->[RMAppAttemptState.NEW, RMAppAttemptState.SUBMITTED]AttemptStartedTransition
->ApplicationMasterService.registerAppAttempt(appAttempt.applicationAttemptId)
->ApplicationMasterService.responseMap.put(attemptId, new AllocateResponseLock(response));
->FifoScheduler.handle(new AppAttemptAddedSchedulerEvent())//APP_ATTEMPT_ADDED
->FifoScheduler.addApplicationAttempt
->[RMAppAttemptState.SUBMITTED, RMAppAttemptState.SCHEDULED] ScheduleTransition
->appAttempt.amReq.setNumContainers(1);setPriority(AM_CONTAINER_PRIORITY);setResourceName(ResourceRequest.ANY);setRelaxLocality(true);//设置AM ResourceRequest
->appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,Collections.singletonList(appAttempt.amReq),EMPTY_CONTAINER_RELEASE_LIST,amBlacklist.getAdditions(),amBlacklist.getRemovals())
->FifoScheduler.allocate //注意:此处并未同步分配资源,只是记录下来,返回的Allocation是之前异步分配得到的资源
->scheduler.SchedulerApplicationAttempt.pullNewlyAllocatedContainersAndNMTokens
->rmContainer.handle(new RMContainerEvent(rmContainer.getContainerId(),RMContainerEventType.ACQUIRED)) //给NM->scheduler->app获取到的newlyAllocatedContainers发送event
->[RMContainerState.ALLOCATED -> RMContainerState.ACQUIRED] AcquiredTransition
->container.containerAllocationExpirer.register(container.getContainerId());
->[RMAppState.ACCEPTED->RMAppState.ACCEPTED] AppRunningOnNodeTransition
->rmAppImpl.ranNodes.add(nodeAddedEvent.getNodeId());
->return new ContainersAndNMTokensAllocation(SchedulerApplicationAttempt.newlyAllocatedContainers, nmTokens);//返回获取到的application获得的Container[TODO://newlyAllocatedContainers如何初始化的,即scheduler如何赋值这个]
->[RM]ClientRMService.getApplicationReport
->[RM]RMAppImpl.createAndGetApplicationReport
->RMAppImpl.currentAttempt //读取currentAttempt状态并返回
->org.apache.hadoop.mapreduce.Job.monitorAndPrintJob();//连接AM并读取应用状态,打印日志
->while (!isComplete() || !reportedAfterCompletion)
->mapreduce.Job.updateStatus //更新MR进度
->YARNRunner.getJobStatus(status.getJobID())
->mapred.ClientServiceDelegate.getJobStatus
->obReport report = ((GetJobReportResponse) invoke("getJobReport",GetJobReportRequest.class, request)).getJobReport()
->mapred.ClientServiceDelegate.invoke
->mapred.ClientServiceDelegate.getProxy
->MRClientProtocol MRClientProxy=application.getTrackingUrl();
->serviceAddr = NetUtils.createSocketAddrForHost(application.getHost(), application.getRpcPort());//获取AM的host以及RPC接口
->instantiateAMProxy(serviceAddr)//构建client->AM proxy
->[AM]MRClientService.getJobReport //AM读取job状态
->org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.getReport
->mapreduce.Job.getTaskCompletionEvents
->[AM]org.apache.hadoop.mapreduce.v2.app.client.MRClientService.MRClientProtocolHandler.getTaskAttemptCompletionEvents
2.2 NM启动,汇报心跳给RM , Scheduler[FifoScheduler
]分配资源,拉起AM container
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main
->new NodeManager().initAndStartNodeManager(conf, false) //CompositeService类
->NodeManager.serviceInit;
->initAndStartRecoveryStore//涉及Recovery的暂时跳过
->this.aclsManager = new ApplicationACLsManager(conf);//涉及Token的也都跳过
->ContainerExecutor exec = ReflectionUtils.newInstance(conf.getClass(YarnConfiguration.NM_CONTAINER_EXECUTOR,DefaultContainerExecutor.class, ContainerExecutor.class), conf);
->ContainerExecutor.init
->nodemanager.LinuxContainerExecutor.init || DockerContainerExecutor
->container-executor --checksetup //执行shell检查健康状态
->addService(DeletionService)//NM本地文件及日志的清理Service
->new NodeHealthCheckerService();//健康检查service
->new LocalDirsHandlerService()//本地目录检查
->NodeManager.context=createNMContext(containerTokenSecretManager,nmTokenSecretManager, nmStore)
->NodeManager.nodeStatusUpdater = createNodeStatusUpdater(context, dispatcher, nodeHealthChecker);[TODO://]
->NodeStatusUpdaterImpl.NodeStatusUpdaterImpl()
->NodeStatusUpdaterImpl.serviceInit
->NodeStatusUpdaterImpl.totalResource = Resource.newInstance(memoryMb, virtualCores) //计算当前节点资源
->NodeStatusUpdaterImpl.serviceStart
->NodeStatusUpdaterImpl.resourceTracker = getRMClient();
->NodeStatusUpdaterImpl.registerWithRM
->RegisterNodeManagerResponse regNMResponse = resourceTracker.registerNodeManager(request);
->[RM]org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.registerNodeManager //以下全是RM里的操作
->[RM]this.nodesListManager.isValidNode(host)//Check if this node is a 'valid' node
->[RM]RMNode rmNode = new RMNodeImpl(nodeId, rmContext, host, cmPort, httpPort,resolve(host), capability, nodeManagerVersion)
->[RM]this.rmContext.getDispatcher().getEventHandler().handle(new RMNodeStartedEvent)
->[NodeState.NEW, NodeState.RUNNING]AddNodeTransition
->[RM]resourcemanager.ResourceTrackerService.handleNMContainerStatus(status, nodeId);
->if (container.getContainerState() == ContainerState.RUNNING) { RMNodeImpl.launchedContainers.add(container.getContainerId()) }//将NM汇报的的container加入launchedContainers
->RMNodeImpl.handleRunningAppOnNode(rmNode, rmNode.context, startEvent.getRunningApplications(), rmNode.nodeId)//处理RunningApplications,分发RMAppRunningOnNodeEvent事件
->getDispatcher().getEventHandler().handle(new RMAppRunningOnNodeEvent)
->[ ->RMAppState.RUNNING]AppRunningOnNodeTransition
->context.getDispatcher().getEventHandler().handle(new NodeAddedSchedulerEvent())
->FifoScheduler.handle(NODE_ADDED)
->FifoScheduler.addNode
->FiCaSchedulerNode schedulerNode = new FiCaSchedulerNode(rMNode, usePortForNodeName);
->FifoScheduler.nodes.put(nodeManager.getNodeID(), schedulerNode);
->Resources.addTo(FifoScheduler.clusterResource, nodeManager.getTotalCapability());//当前NM的资源加入集群资源
->AbstractYarnScheduler.updateMaximumAllocation//更新集群最大资源配置
->AbstractYarnScheduler.recoverContainersOnNode //recoverContainers on that NM
->context.getDispatcher().getEventHandler().handle(new NodesListManagerEvent(NodesListManagerEventType.NODE_USABLE, rmNode))
->resourcemanager.NodesListManager.handle(NODE_USABLE)
->rmContext.getDispatcher().getEventHandler().handle(new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,RMAppNodeUpdateType.NODE_USABLE)
-[RMAppState.* -> RMAppState.*] RMAppNodeUpdateTransition
->RMAppImpl.processNodeUpdate
->RMAppImpl.updatedNodes.add(node);//将NM状态变更告诉RMAppImpl
->NodeStatusUpdaterImpl.startStatusUpdater //启动线程汇报心跳并触发RM Scheduler
->response = resourceTracker.nodeHeartbeat(request);
->↓[RM]↓ ResourceTrackerService.nodeHeartbeat//下面都是RM内的流程
->this.nodesListManager.isValidNode()//1. Check if it's a valid (i.e. not excluded) node
->RMNode rmNode = this.rmContext.getRMNodes().get(nodeId);//2. Check if it's a registered node
->ResourceTrackerService.nmLivelinessMonitor.receivedPing(nodeId);//记录心跳
->//3. Check if it's a 'fresh' heartbeat i.e. not duplicate heartbeat
->RMNodeImpl.updateNodeHeartbeatResponseForCleanup(nodeHeartBeatResponse);
->response.addAllContainersToCleanup(new ArrayList(this.containersToClean));response.addAllApplicationsToCleanup(this.finishedApplications);response.addContainersToBeRemovedFromNM(new ArrayList(this.containersToBeRemovedFromNM));
->rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent)//发送RMNodeEventType.STATUS_UPDATE事件
->[NodeState.RUNNING, NodeState.RUNNING]StatusUpdateWhenHealthyTransition
->RMNodeImpl.handleContainerStatus(statusEvent.getContainers());//更新RMNodeImpl内状态
->RMNodeImpl.nodeUpdateQueue.add(new UpdatedContainerInfo(newlyLaunchedContainers,completedContainers)//将container变动加入nodeUpdateQueue中等下一步Scheduler用
->context.getDispatcher().getEventHandler().handle(new NodeUpdateSchedulerEvent(rmNode))//nodeupdate触发Scheduler资源分配
->FifoScheduler.handle(NODE_UPDATE)
->FifoScheduler.nodeUpdate
->RMNodeImpl.pullContainerUpdates //之前汇报心跳保存的RMNodeImpl.nodeUpdateQueue
->AbstractYarnScheduler.containerLaunchedOnNode
->rmContainer.handle(new RMContainerEvent(containerId,RMContainerEventType.LAUNCHED));// Processing the newly launched containers,给Container发送LAUNCHED事件[TODO://]
->FifoScheduler.completedContainer(RMContainerEventType.FINISHED)
->application.containerCompleted(rmContainer, containerStatus, event);// Inform the application
->rmContainer.handle(new RMContainerFinishedEvent(containerId,containerStatus, event));
->[RMContainerState.RUNNING, RMContainerState.COMPLETED] FinishedTransition
->eventHandler.handle(new RMAppAttemptContainerFinishedEvent(container.appAttemptId, finishedEvent.getRemoteContainerStatus(),container.getAllocatedNode())//更新
->[RMAppAttemptState.RUNNING -> RMAppAttemptState.*]RMAppAttemptImpl.ContainerFinishedTransition.transition [TODO://Container对APP的影响后续再跟了]
->FiCaSchedulerNode.releaseContainer(container);
->FifoScheduler.assignContainers(node);//终于到了给node分配资源了
->//基于fifo取出app,然后基于priority分配资源
->FifoScheduler.assignContainersOnNode //依次满足节点、机架、随机的需求
->FifoScheduler.assignNodeLocalContainers
->FifoScheduler.assignRackLocalContainers
->FifoScheduler.assignOffSwitchContainers
->FifoScheduler.assignContainer(FiCaSchedulerNode node, FiCaSchedulerApp application, Priority priority, int assignableContainers, ResourceRequest request, NodeType type)
->Container container =BuilderUtils.newContainer(containerId, nodeId, node.getRMNode().getHttpAddress(), capability, priority, null);// new capability Container
->FiCaSchedulerApp.allocate(type, node, priority, request, container)
->RMContainer rmContainer = new RMContainerImpl(container, this.getApplicationAttemptId(), node.getNodeID(),appSchedulingInfo.getUser(), this.rmContext)
->//更新SchedulerApplicationAttempt.newlyAllocatedContainers、SchedulerApplicationAttempt.liveContainers
->rmContainer.handle(new RMContainerEvent(container.getId(), RMContainerEventType.START)
->[RMContainerState.NEW->RMContainerState.ALLOCATED] ContainerStartedTransition
->eventHandler.handle(new RMAppAttemptContainerAllocatedEvent()) //RMAppAttemptEventType.CONTAINER_ALLOCATED
->[RMAppAttemptState.SCHEDULED->RMAppAttemptState.ALLOCATED_SAVING] AMContainerAllocatedTransition
->Allocation amContainerAllocation =appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null,null);//拿到新分配的container
->appAttempt.setMasterContainer() // Set the masterContainer
->appAttempt.storeAttempt();
->RMStateStore.storeNewApplicationAttempt
->dispatcher.getEventHandler().handle(new RMStateStoreAppAttemptEvent(attemptState))
->RMStateStore.StoreAppAttemptTransition
->RMStateStore.notifyApplicationAttempt(new RMAppAttemptEvent(attemptState.getAttemptId(),RMAppAttemptEventType.ATTEMPT_NEW_SAVED))
->[RMAppAttemptState.ALLOCATED_SAVING, RMAppAttemptState.ALLOCATED]AttemptStoredTransition()
->RMAppAttemptImpl.launchAttempt -> eventHandler.handle(new AMLauncherEvent(AMLauncherEventType.LAUNCH, this));//发出拉起AM事件
->ApplicationMasterLauncher.handle(LAUNCH)->ApplicationMasterLauncher.createRunnableLauncher -> new AMLauncher
->ApplicationMasterLauncher线程会一直读取masterEvents,并启动AMLauncher线程
->resourcemanager.amlauncher.AMLauncher.run
->resourcemanager.amlauncher.AMLauncher.launch();
->StartContainersResponse response = containerMgrProxy.startContainers(allRequests);
->[NM]ContainerManagerImpl.startContainers //NM拉起AM Container
->handler.handle(new RMAppAttemptEvent(application.getAppAttemptId(),RMAppAttemptEventType.LAUNCHED)) //至此RM已经通知NM启动AM,等待AM的注册,使AppAttempt变为RUNNING[TODO://后续流程见下面]
->FiCaSchedulerNode.allocateContainer(rmContainer)
->response.getNodeAction()->SHUTDOWN | RESYNC //NM是否关闭或是同步
->removeOrTrackCompletedContainersFromContext(response.getContainersToBeRemovedFromNM());//移除RM不再需要的Containers
->dispatcher.getEventHandler().handle(new CMgrCompletedContainersEvent(containersToCleanup,CMgrCompletedContainersEvent.Reason.BY_RESOURCEMANAGER))
->dispatcher.getEventHandler().handle(new ContainerKillEvent(container,ContainerExitStatus.KILLED_BY_RESOURCEMANAGER,"Container Killed by ResourceManager")) //发送ContainerKillEvent
->heartbeatMonitor.wait(nextHeartBeatInterval);
->NodeResourceMonitor nodeResourceMonitor = createNodeResourceMonitor();//yarn后续版本的资源监控直接起线程监控了
->NodeManager.containerManager = createContainerManager(context, exec, del, nodeStatusUpdater,this.aclsManager, dirsHandler) [TODO://]
->ContainerManagerImpl.ContainerManagerImpl() //构造函数
->ContainerManagerImpl.resourceLocalizationService = createResourceLocalizationService(exec, deletionContext, context);
->ResourceLocalizationService.serviceInit
->ResourceLocalizationService.publicRsrc = new LocalResourcesTrackerImpl
->cleanUpLocalDirs;initializeLocalDirs;initializeLogDirs;//初始化本地目录
->localizationServerAddress= "localizer.address"//Address where the localizer IPC is
->localizerTracker = createLocalizerTracker(conf)
->LocalizerTracker.LocalizerTracker()
->this.publicLocalizer = new PublicLocalizer(conf);
->this.privLocalizers = privLocalizers; //new一个
->LocalizerTracker.serviceStart
->publicLocalizer.start()
->ResourceLocalizationService.PublicLocalizer.run()
->//循环从PublicLocalizer.queue中获取完成下载的任务,发出ResourceLocalizedEvent事件,触发LocalizedResource状态变更方法
->[ResourceState.DOWNLOADING, ResourceState.LOCALIZED] FetchSuccessTransition
->ContainerManagerImpl.containersLauncher = createContainersLauncher(context, exec);
->ContainerManagerImpl.auxiliaryServices = new AuxServices();[TODO://AuxServices shuffle 后面再看]
->
->ContainerManagerImpl.containersMonitor = new ContainersMonitorImpl(exec, dispatcher, this.context);//其实监控的是貌似只有内存[TODO://监控细节后续再看]
->ContainersMonitorImpl.serviceInit//初始化一堆比如CPU、内存的配置,比如maxPmemAllottedForContainers,maxVCoresAllottedForContainers
->ContainersMonitorImpl.serviceStart
->MonitoringThread.run
->new ContainerKillEvent //如果发现container超过资源使用,则发出kill事件
->[->ContainerState.KILLING] KillTransition
->ContainerManagerImpl.serviceInit
->new LogAggregationService(this.dispatcher, context,deletionService, dirsHandler)[TODO://LogAggregation相关的后续再看]
->ContainerManagerImpl.serviceStart
->ContainerManagerImpl.server = rpc.getServer(ContainerManagementProtocol.class, this, initialAddress, serverConf, this.context.getNMTokenSecretManager(),conf.getInt(YarnConfiguration.NM_CONTAINER_MGR_THREAD_COUNT, YarnConfiguration.DEFAULT_NM_CONTAINER_MGR_THREAD_COUNT));//创建NM 的ipc接口服务
->WebServer webServer = createWebServer(context, containerManager.getContainersMonitor(), this.aclsManager, dirsHandler); //启动NM 的web服务,默认8042
->NodeManager.serviceStart;
->super.serviceStart(); //do nothing
2.3 NM启动container
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers
->ContainerManagerImpl.startContainerInternal
->Container container = new ContainerImpl(getConfig(), this.dispatcher,launchContext, credentials, metrics, containerTokenIdentifier,context);
->Application application = new ApplicationImpl(dispatcher, user, applicationID, credentials, context);
->dispatcher.getEventHandler().handle(new ApplicationInitEvent(applicationID, appAcls,logAggregationContext))//如果本NM上没有此APP_ID,则初始化
->[ApplicationState.NEW->ApplicationState.INITING]AppInitTransition
->dispatcher.getEventHandler().handle(new LogHandlerAppStartedEvent())
->logaggregation.LogAggregationService.handle(APPLICATION_STARTED) [TODO://logaggregation的暂时都跳过]
->this.context.getNMStateStore().storeContainer(containerId, request);
->dispatcher.getEventHandler().handle(new ApplicationContainerInitEvent(container))
->[ApplicationState.INITING->ApplicationState.INITING]InitContainerTransition
->ApplicationImpl.containers.put(container.getContainerId(), container);
->dispatcher.getEventHandler().handle(new ContainerInitEvent(container.getContainerId())); //ContainerEventType.INIT_CONTAINER
->[ContainerState.NEW->ContainerState.LOCALIZING]RequestResourcesTransition
->dispatcher.getEventHandler().handle(new AuxServicesEvent(AuxServicesEventType.CONTAINER_INIT, container));[TODO://AuxServices相关的暂时跳过]
->container.pendingResources分别放入ContainerImpl.publicRsrcs,ContainerImpl.privateRsrcs,ContainerImpl.appRsrcs
->dispatcher.getEventHandler().handle(new ContainerLocalizationRequestEvent(container, req)) //资源下载请求封装为ContainerLocalizationRequestEvent
->nodemanager.containermanager.localizer.ResourceLocalizationService.handle(INIT_CONTAINER_RESOURCES)
->ResourceLocalizationService.handleInitContainerResources
->LocalizerContext ctxt = new LocalizerContext(c.getUser(), c.getContainerId(), c.getCredentials(), statCache);
->LocalResourcesTracker tracker = getLocalResourcesTracker(e.getKey(), c.getUser(),c.getContainerId().getApplicationAttemptId().getApplicationId()) //public为全局,PRIVATE、APPLICATION获取map中相应的实例
->tracker.handle(new ResourceRequestEvent(req, e.getKey(), ctxt));
->[ResourceState.INIT, ResourceState.DOWNLOADING]FetchResourceTransition
->LocalizedResource.dispatcher.getEventHandler().handle(new LocalizerResourceRequestEvent))
->containermanager.localizer.ResourceLocalizationService.LocalizerTracker.handle(REQUEST_RESOURCE_LOCALIZATION)
->PUBLIC:
->PublicLocalizer.addResource(req);
->PRIVATE || APPLICATION
->if (null == localizer) {localizer = new LocalizerRunner(req.getContext(), locId);localizer.start()};
->nodemanager.LinuxContainerExecutor.startLocalizer //调用container-executor启动ContainerLocalizer
-> delService.delete(context.getUser(),null, paths.toArray(new Path[paths.size()])) //下载的文件加入delService
->localizer.addResource(req);->LocalizerRunner.pending.add(..)
->scheduled.put(nextRsrc, evt);//上面初始化的ContainerLocalizer会持续的轮询,并启动执行下再
->getLocalResourcesTracker(req.getVisibility(), user, applicationId).handle(new ResourceLocalizedEvent()) //ResourceEventType.LOCALIZED
->[ResourceState.DOWNLOADING -> ResourceState.LOCALIZED]FetchSuccessTransition
->dispatcher.getEventHandler().handle(new ContainerResourceLocalizedEvent())
->[ContainerState.LOCALIZING -> ContainerState.LOCALIZED] LocalizedTransition
->ContainerImpl.sendLaunchEvent();
->dispatcher.getEventHandler().handle(new ContainersLauncherEvent()) //ContainersLauncherEventType.LAUNCH_CONTAINER
->launcher.ContainersLauncher.handle(LAUNCH_CONTAINER)
->ContainerLaunch launch = new ContainerLaunch(context, getConfig(), dispatcher, exec, app,event.getContainer(), dirsHandler, containerManager);
->containerLauncher.submit(launch);//提交线程池
->ContainerLaunch.call
->
->dispatcher.getEventHandler().handle(new ContainerEvent(containerID,ContainerEventType.CONTAINER_LAUNCHED)))
->LinuxContainerExecutor.activateContainer(containerID, pidFilePath);//开始执行container
->int ret = LinuxContainerExecutor.launchContainer(container, nmPrivateContainerScriptPath,nmPrivateTokensPath, user, appIdStr, containerWorkDir,localDirs, logDirs)
->CASE(Throwable)
-> dispatcher.getEventHandler().handle(new ContainerExitEvent(containerID, ContainerEventType.CONTAINER_EXITED_WITH_FAILURE, ret,e.getMessage()))//报错触发CONTAINER_EXITED_WITH_FAILURE
->CASE(ExitCode.FORCE_KILLED || ExitCode.TERMINATED)
->dispatcher.getEventHandler().handle(new ContainerExitEvent(containerID,ContainerEventType.CONTAINER_KILLED_ON_REQUEST, ret,"Container exited with a non-zero exit code " + ret)
->CASE(0)
->dispatcher.getEventHandler().handle(new ContainerEvent(containerID,ContainerEventType.CONTAINER_EXITED_WITH_SUCCESS))
->[ContainerState.RUNNING->ContainerState.EXITED_WITH_SUCCESS]ExitedWithSuccessTransition
->dispatcher.getEventHandler().handle(new ContainersLauncherEvent(container,ContainersLauncherEventType.CLEANUP_CONTAINER)
->nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer //执行完成清理Container[TODO://]
->ContainersLauncher.running.put(containerId, launch);
2.4 AM[distributedshell.ApplicationMaster instead of MRAppMaster
]汇报给RM心跳,联系NM启动container,并监控状态
distributedshell.ApplicationMaster.main
->distributedshell.ApplicationMaster.init//yarn container传参都是通过环境变量传参的,envs=System.getenv()
->appAttemptID = containerId.getApplicationAttemptId()
->shellCommand = readContent(shellCommandPath);//文件等从hdfs读取,client提交任务会准备好
->distributedshell.ApplicationMaster.run
->amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener); //AMRMClientAsync.CallbackHandler 由于不管是NM管理container还是RM分配资源都是异步的,需要提供Callback来处理请求结果
->nmClientAsync = new NMClientAsyncImpl(containerListener); //NMCallbackHandler
->//Setup local RPC Server to accept status requests directly from clients 按道理AM要提供一个应用层协议RPC接口供client获取日志、状态等
->amRMClient.registerApplicationMaster(appMasterHostname, appMasterRpcPort,appMasterTrackingUrl);
->↓[RM]↓ApplicationMasterService.registerApplicationMaster //RM中注册AM
->rmContext.getDispatcher().getEventHandler().handle(new RMAppAttemptRegistrationEvent())
->[RMAppAttemptState.LAUNCHED, RMAppAttemptState.RUNNING] AMRegisteredTransition //Attempt转为RUNNING状态,后续FINISHING、KILLED、FAILED以后再补充
->eventHandler.handle(new RMAppEvent(appAttempt.getAppAttemptId().getApplicationId(),RMAppEventType.ATTEMPT_REGISTERED))
->[RMAppState.ACCEPTED->RMAppState.RUNNING] //do nothing
->distributedshell.ApplicationMaster.RMCallbackHandler.onContainersAllocated //当AM的资源获的满足,则AM直接联系NM启动containers
->ApplicationMaster.LaunchContainerRunnable.run
->ContainerLaunchContext ctx = ContainerLaunchContext.newInstance(localResources, myShellEnv, commands, null, allTokens.duplicate()//初始化ContainerLaunchContext
->startContainerAsync.startContainerAsync(container, ctx)//联系NM拉起Container
->[NM]org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers
->ApplicationMaster.NMCallbackHandler.NMCallbackHandler//获取NM的Container状态、事件[TODO://]
->distributedshell.ApplicationMaster.finish
->unregisterApplicationMaster(appStatus, appMessage, null);
->[RM]ApplicationMasterService.finishApplicationMaster //通知RM任务完成
三.调度器
实现一个调度器需要关注一下几点:本地化
、reserve
、preempt
、label
TODO://后续再补充了
四.reference
- https://hortonworks.com/blog/introducing-apache-hadoop-yarn/
- Hadoop YARN权威指南
- https://www.cnblogs.com/shenh062326/p/3587108.html/
- https://blog.csdn.net/gaopenghigh/article/details/45507765/
五.下篇
APACHE KAFKA 0.10.0 CODE REVIEW