在上文Yarn源码剖析(二) --- spark-submit,我们介绍了spark任务通过spark-submit提交任务至yarn申请资源至启动的全流程,本篇将介绍启动过程中ApplicationMaster(后文简称AM)是如何启动。
1. 在Yarn源码剖析(二)中yarnClient最终调用submitApplication方法提交任务,传入的参数带有AM启动的上下文,因此AM的启动就是在yarn这个方法中实现的
val containerContext = createContainerLaunchContext(newAppResponse) //封装AM启动的上下文
val appContext = createApplicationSubmissionContext(newApp, containerContext) //App的上下文
yarnClient.submitApplication(appContext) //提交任务
2. AM的启动异常的复杂,篇幅巨大,下面我会摘选重要的部分做分析,spark在此处封装好AM运行的上下文后,最终在yarn的事件处理机制会运行这些上下文,回调到spark中的AM类,client模式和cluster模式运行的类是不一样的,具体的运行类如下所示:
val amClass =
if (isClusterMode) { //集群cluster模式
Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
} else { //client模式
Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
}
3. 那显而易见的,我们应该从yarnClient.submitApplication(appContext)去分析hadoop端的代码,分析yarn是如何来通过这个上下文来启动spark自己封装的这个AM的,显而易见这个接口由YarnClientImpl实现,在该方法内部又调用了ApplicationClientProtocol.submitApplication,这个类是yarn利用rpc相互通信的关键类,这里也不多做介绍了,我们看到提交任务后会启动一个死循环,等待任务提交完成。
//request是包含了我们服务整体参数以及脚本的对象,提交至RM
rmClient.submitApplication(request);
int pollCount = 0;
long startTime = System.currentTimeMillis();
EnumSet waitingStates =
EnumSet.of(YarnApplicationState.NEW,
YarnApplicationState.NEW_SAVING,
YarnApplicationState.SUBMITTED);
EnumSet failToSubmitStates =
EnumSet.of(YarnApplicationState.FAILED,
YarnApplicationState.KILLED);
while (true) {
try {
ApplicationReport appReport = getApplicationReport(applicationId);
YarnApplicationState state = appReport.getYarnApplicationState();
if (!waitingStates.contains(state)) {
if(failToSubmitStates.contains(state)) {
throw new YarnException("Failed to submit " + applicationId +
" to YARN : " + appReport.getDiagnostics());
}
LOG.info("Submitted application " + applicationId);
break;
}
}
4. 这个submitApplication是由ClientRMService来实现的,我把整段方法都贴进来了,所以我把分析内容放到了代码的注释中
//为了保证安全性ApplicationSubmissionContext在这里会被验证,哪些独立于RM
//字段在此处验证,而依赖于RM发的则在RMAppManager被验证
String user = null;
try {
user = UserGroupInformation.getCurrentUser().getShortUserName();
} catch (IOException ie) {
LOG.warn("Unable to get the current user.", ie);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
ie.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId);
throw RPCUtil.getRemoteException(ie);
}
//确认app是否被放在了rmContext中,如果是则响应
if (rmContext.getRMApps().get(applicationId) != null) {
LOG.info("This is an earlier submitted application: " + applicationId);
return SubmitApplicationResponse.newInstance();
}
//判断任务队列
if (submissionContext.getQueue() == null) {
submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME);
}
//判断是否是无效的任务名称
if (submissionContext.getApplicationName() == null) {
submissionContext.setApplicationName(
YarnConfiguration.DEFAULT_APPLICATION_NAME);
}
//任务类型判断
if (submissionContext.getApplicationType() == null) {
submissionContext
.setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE);
} else {
if (submissionContext.getApplicationType().length() >
YarnConfiguration.APPLICATION_TYPE_LENGTH) {
submissionContext.setApplicationType(submissionContext
.getApplicationType().substring(0,
YarnConfiguration.APPLICATION_TYPE_LENGTH));
}
}
try {
// call RMAppManager to submit application directly
//让RMAppManager立即提交应用
//关于ApplicationManager大家可以参考我基础组件分析的那一章节
rmAppManager.submitApplication(submissionContext,
System.currentTimeMillis(), user);
LOG.info("Application with id " + applicationId.getId() +
" submitted by user " + user);
RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
"ClientRMService", applicationId);
} catch (YarnException e) {
LOG.info("Exception in submitting application with id " +
applicationId.getId(), e);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
e.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId);
throw e;
}
SubmitApplicationResponse response = recordFactory
.newRecordInstance(SubmitApplicationResponse.class);
return response;
}
5. 从上列的代码中可知,我们应该进到rmAppManager.submitApplication()去分析,该方法内部有一个createAndPopulateNewRMApp(),我们来看一下
private RMAppImpl createAndPopulateNewRMApp(
ApplicationSubmissionContext submissionContext, long submitTime,
String user, boolean isRecovery) throws YarnException {
ApplicationId applicationId = submissionContext.getApplicationId();
//检查AM的请求,对资源做检查
ResourceRequest amReq =
validateAndCreateResourceRequest(submissionContext, isRecovery);
// Create RMApp
//此处封装了一个状态机,这是yarn的一个重大机制,每个服务随着状态不断改变而做出操作
RMAppImpl application =
new RMAppImpl(applicationId, rmContext, this.conf,
submissionContext.getApplicationName(), user,
submissionContext.getQueue(),
submissionContext, this.scheduler, this.masterService,
submitTime, submissionContext.getApplicationType(),
submissionContext.getApplicationTags(), amReq);
return application;
}
6. 最后提交了一个START事件,上文我们可以知道new了一个RMAppImpl,这里就是触发它的状态机对应事件,这段代码的意思是处理START事件,任务状态从NEW转换到NEW_SAVING并,触发了RMAppNewlySavingTransition转换
.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
RMAppEventType.START, new RMAppNewlySavingTransition())
7. 那很明显,我们要看RMAppNewlySavingTransition(),点进去看内部的代码很简单,代码
private static final class RMAppNewlySavingTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
///如果恢复配置被启用,那么将应用程序信息存储在非阻塞调用中,
// 因此要确保RM已经存储了在RM重新启动后能够重启AM所需的信息,而无需再次与客户端通信
LOG.info("Storing application with id " + app.applicationId);
app.rmContext.getStateStore().storeNewApplication(app);
}
}
8. 那我们来看这个存储App信息的方法做了些什么,触发了一个STORE_APP事件,由StoreAppTransition处理
public void storeNewApplication(RMApp app) {
ApplicationSubmissionContext context = app.getApplicationSubmissionContext();
assert context instanceof ApplicationSubmissionContextPBImpl;
ApplicationStateData appState =
ApplicationStateData.newInstance(
app.getSubmitTime(), app.getStartTime(), context, app.getUser());
dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState));
}
9. 可以看到内部有一段代码,通知rm提交APP_NEW_SAVED事件,这个事件由AddApplicationToSchedulerTransition
处理
try {
store.storeApplicationStateInternal(appId, appState);
store.notifyApplication(new RMAppEvent(appId,
RMAppEventType.APP_NEW_SAVED));
} catch (Exception e) {
LOG.error("Error storing app: " + appId, e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
10. 这个事件很简单从表面意思也能读懂,就是将应用程序交个调度器去处理,所以他提交了一个APP_ADDED事件,我们分析默认调度器Capatity Scheduler,所以此时就去看Capatity中的代码
case APP_ADDED:
{
AppAddedSchedulerEvent appAddedEvent = (AppAddedSchedulerEvent) event;
String queueName =
resolveReservationQueueName(appAddedEvent.getQueue(),
appAddedEvent.getApplicationId(),
appAddedEvent.getReservationID());
if (queueName != null) {
if (!appAddedEvent.getIsAppRecovering()) {
//这里是告知队列有任务提交了,队列会统计任务数量
addApplication(appAddedEvent.getApplicationId(), queueName,
appAddedEvent.getUser());
} else {
addApplicationOnRecovery(appAddedEvent.getApplicationId(), queueName,
appAddedEvent.getUser());
}
}
// 提交了APP_ACCEPTED事件
queue.getMetrics().submitApp(user);
SchedulerApplication application =
new SchedulerApplication(queue, user);
applications.put(applicationId, application);
LOG.info("Accepted application " + applicationId + " from user: " + user
+ ", in queue: " + queueName);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.APP_ACCEPTED));
}
11. 从上面的代码可以看出,Capatity调度器提交了事件APP_ACCEPTED,状态从SUBMITTED转成了ACCEPTED并触发
StartAppAttemptTransition()
.addTransition(RMAppState.SUBMITTED, RMAppState.ACCEPTED,
RMAppEventType.APP_ACCEPTED, new StartAppAttemptTransition())
12. 这个类在内部创建了一个新的RMAppAttempt,然后提交了事件RMAppAttemptEventType.START,触发了AttemptStartedTransition(),很明显这个类对象使我们刚new出来的,那匹配的状态机的初始状态就是NEW,现在由于提交了START事件,状态变为了SUBMITTED
// Transitions from NEW State
.addTransition(RMAppAttemptState.NEW, RMAppAttemptState.SUBMITTED,
RMAppAttemptEventType.START, new AttemptStartedTransition())
13. 那我们来看看这个方法AttemptStartedTransition(),其实我们要看一下registerAppAttempt方法
//注册AM的service
appAttempt.masterService
.registerAppAttempt(appAttempt.applicationAttemptId);
// Add the applicationAttempt to the scheduler and inform the scheduler
// whether to transfer the state from previous attempt.
appAttempt.eventHandler.handle(new AppAttemptAddedSchedulerEvent(
appAttempt.applicationAttemptId, transferStateFromPreviousAttempt));
14. 我们看到把response设置成了-1,这个Id会在AM后面每次的通信中自增,会借助这个id来判断请求是重复请求还是新的请求,还是旧的请求。
AllocateResponse response =
recordFactory.newRecordInstance(AllocateResponse.class);
// set response id to -1 before application master for the following
// attemptID get registered
response.setResponseId(-1);
LOG.info("Registering app attempt : " + attemptId);
responseMap.put(attemptId, new AllocateResponseLock(response));
rmContext.getNMTokenSecretManager().registerApplicationAttempt(attemptId);
15. 回到AttemptStartedTransition()方法中,最后它提交了一个事件SchedulerEventType.APP_ATTEMPT_ADDED,这个事件交回给Capatity调度器去处理
case APP_ATTEMPT_ADDED:
{
AppAttemptAddedSchedulerEvent appAttemptAddedEvent =
(AppAttemptAddedSchedulerEvent) event;
addApplicationAttempt(appAttemptAddedEvent.getApplicationAttemptId(),
appAttemptAddedEvent.getTransferStateFromPreviousAttempt(),
appAttemptAddedEvent.getIsAttemptRecovering());
}
16. 那我们自然是进入addApplicationAttempt方法去分析,内部我选了部分代码做分析,下面这段new了一个FiCaSchedulerApp,在内部设置了AM启动资源信息
FiCaSchedulerApp attempt =
new FiCaSchedulerApp(applicationAttemptId, application.getUser(),
queue, queue.getActiveUsersManager(), rmContext);
17. 设置完后提交了RMAppAttemptEventType.ATTEMPT_ADDED事件
rmContext.getDispatcher().getEventHandler().handle(
new RMAppAttemptEvent(applicationAttemptId,
RMAppAttemptEventType.ATTEMPT_ADDED));
这里的意思是提交了ATTEMPT_ADDED事件使得状态从SUBMITTED转变,转变的结果可能有LAUNCHED_UNMANAGED_SAVING或者SCHEDULED,而后状态机会根据返回的不同状态信息再做处理
.addTransition(RMAppAttemptState.SUBMITTED,
EnumSet.of(RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING,
RMAppAttemptState.SCHEDULED),
RMAppAttemptEventType.ATTEMPT_ADDED,
new ScheduleTransition())
18. 我们接着分析ScheduleTransition(),if入口的开关subCtx.getUnmanagedAM()是获取RM是否应该管理AM的执行。如果为真,那么RM将不会为AM分配一个容器并启动它,默认是false。那很明显我们这里要返回的状态是SCHEDULED
ApplicationSubmissionContext subCtx = appAttempt.submissionContext;
//获取RM是否应该管理AM的执行。如果为真,那么RM将不会为AM分配一个容器并启动它,默认是false
if (!subCtx.getUnmanagedAM()) {
//在创建新的尝试之前需要重置容器,因为这个请求将被传递给调度器,调度器将在AM容器分配后扣除这个数字
appAttempt.amReq.setNumContainers(1);
appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
/* 表示为任一机器 */
appAttempt.amReq.setResourceName(ResourceRequest.ANY);
appAttempt.amReq.setRelaxLocality(true);
//调度器分配资源
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,
Collections.singletonList(appAttempt.amReq),
EMPTY_CONTAINER_RELEASE_LIST, null, null);
if (amContainerAllocation != null
&& amContainerAllocation.getContainers() != null) {
assert (amContainerAllocation.getContainers().size() == 0);
}
return RMAppAttemptState.SCHEDULED;
} else {
// save state and then go to LAUNCHED state
appAttempt.storeAttempt();
return RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING;
}
19. 上面的代码中应该能看到一行令人振奋的代码appAttempt.scheduler.allocate(),这里做的是资源的调度,我们这不做详细的分析,在后文AM申请资源时也会调用这个接口申请剩下的Container,后文会有详细的介绍,我们刚刚知道了上文返回了SCHEDULED状态,之前添加转换的方法是会根据返回的状态形成新的转换,这个时候就会调用到下面这个转换,触发了AMContainerAllocatedTransition()
.addTransition(RMAppAttemptState.SCHEDULED,
EnumSet.of(RMAppAttemptState.ALLOCATED_SAVING,
RMAppAttemptState.SCHEDULED),
RMAppAttemptEventType.CONTAINER_ALLOCATED,
new AMContainerAllocatedTransition())
20. 具体的分析见代码块,发现在这里也调用了allocate,但是传入没有传入请求,在allocate方法中做了判断的,如果传入的空的请求就是去尝试获取之前申请过的容器,而不是再做一次资源调度
// Acquire the AM container from the scheduler.
//从调度器获取AM容器
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,
EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null,
null);
//至少分配一个容器,因为一个container_allocation是在构建一个RMContainer之后发出的,
// 并将其放入到requerapplication # newallocatedcontainers中。
// 注意,YarnScheduler#分配不能保证能够获取它,
// 因为由于某些原因(如DNS不可用导致未生成容器令牌)容器可能无法获取。
// 因此,我们返回到以前的状态并继续重试,直到获取am容器。
if (amContainerAllocation.getContainers().size() == 0) {
appAttempt.retryFetchingAMContainer(appAttempt);
return RMAppAttemptState.SCHEDULED;
}
// Set the masterContainer
appAttempt.setMasterContainer(amContainerAllocation.getContainers()
.get(0));
RMContainerImpl rmMasterContainer = (RMContainerImpl)appAttempt.scheduler
.getRMContainer(appAttempt.getMasterContainer().getId());
rmMasterContainer.setAMContainer(true);
// NMTokenSecrentManager中的节点集用于标记该节点是否已向AM发出NMToken。
// 当AM容器分配给RM本身时,分配这个AM容器的节点被标记为已经发送的NMToken。
// 因此,清除这个节点集,以便以下来自AM的分配请求能够检索相应的NMToken。
appAttempt.rmContext.getNMTokenSecretManager()
.clearNodeSetForAttempt(appAttempt.applicationAttemptId);
appAttempt.getSubmissionContext().setResource(
appAttempt.getMasterContainer().getResource());
appAttempt.storeAttempt();
return RMAppAttemptState.ALLOCATED_SAVING;
21. 我们看到最终返回了ALLOCATED_SAVING,与之前一样根据返回的状态触发另一个事件
.addTransition(RMAppAttemptState.ALLOCATED_SAVING,
RMAppAttemptState.ALLOCATED,
RMAppAttemptEventType.ATTEMPT_NEW_SAVED, new AttemptStoredTransition())
这个事件终于看到了启动的方法,launchAttempt()这个方法内部提交了一个LAUNCH事件
private static final class AttemptStoredTransition extends BaseTransition {
@Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
appAttempt.registerClientToken();
appAttempt.launchAttempt();
}
}
22. 走到这,我们终于发现了令人振奋的类ApplicationMasterLauncher,刚刚提交了LAUNCH事件,自然走launch()方法
AMLauncherEventType event = appEvent.getType();
RMAppAttempt application = appEvent.getAppAttempt();
switch (event) {
case LAUNCH:
launch(application);
break;
case CLEANUP:
cleanup(application);
break;
default:
break;
}
23. 在这里我们首先要分析一下ApplicationMasterLauncher的初始化和启动,这个属于RM的子服务,那在Yarn源码剖析(一) --- RM与NM服务启动以及心跳通信我们也提到过,RM会逐一初始化和启动它的子服务,很明显这里最重要的是启动了一个线程用来处理相关的事件,那我们来看一下线程的run方法
@Override
protected void serviceInit(Configuration conf) throws Exception {
int threadCount = conf.getInt(
YarnConfiguration.RM_AMLAUNCHER_THREAD_COUNT,
YarnConfiguration.DEFAULT_RM_AMLAUNCHER_THREAD_COUNT);
ThreadFactory tf = new ThreadFactoryBuilder()
.setNameFormat("ApplicationMasterLauncher #%d")
.build();
launcherPool = new ThreadPoolExecutor(threadCount, threadCount, 1,
TimeUnit.HOURS, new LinkedBlockingQueue());
launcherPool.setThreadFactory(tf);
Configuration newConf = new YarnConfiguration(conf);
newConf.setInt(CommonConfigurationKeysPublic.
IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SOCKET_TIMEOUTS_KEY,
conf.getInt(YarnConfiguration.RM_NODEMANAGER_CONNECT_RETIRES,
YarnConfiguration.DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES));
setConfig(newConf);
super.serviceInit(newConf);
}
@Override
protected void serviceStart() throws Exception {
launcherHandlingThread.start();
super.serviceStart();
}
可以看到run方法是逐一从masterEvents队列中取出事件进行处理
while (!this.isInterrupted()) {
Runnable toLaunch;
try {
toLaunch = masterEvents.take();
launcherPool.execute(toLaunch);
} catch (InterruptedException e) {
LOG.warn(this.getClass().getName() + " interrupted. Returning.");
return;
}
}
24. 这个时候我们回到之前的lunch()方法,很明显,内部调用了createRunnableLauncher,new了一个AMLauncher,并传入 AMLauncherEventType.LAUNCH事件,最后由ApplicationMasterLauncher线程来处理
private void launch(RMAppAttempt application) {
Runnable launcher = createRunnableLauncher(application,
AMLauncherEventType.LAUNCH);
masterEvents.add(launcher);
}
protected Runnable createRunnableLauncher(RMAppAttempt application,
AMLauncherEventType event) {
Runnable launcher =
new AMLauncher(context, application, event, getConfig());
return launcher;
}
25. 那就会触发AMLauncher的run方法,里面有一个lunch()方法,以及提交了一个事件RMAppAttemptEventType.LAUNCHED,这个事件的提交是为了启动AM监控线程的,所以就不做分析了,重点来看lunch()方法
case LAUNCH:
try {
LOG.info("Launching master" + application.getAppAttemptId());
launch();
handler.handle(new RMAppAttemptEvent(application.getAppAttemptId(),
RMAppAttemptEventType.LAUNCHED));
}
26. 这里终于取出了随spark-submit传入的启动AM的上下文,并放在了StartContainerRequest请求中,然后利用调用了startContainers方法
ContainerLaunchContext launchContext =
createAMContainerLaunchContext(applicationContext, masterContainerID);
StartContainerRequest scRequest =
StartContainerRequest.newInstance(launchContext,
masterContainer.getContainerToken());
List list = new ArrayList();
list.add(scRequest);
StartContainersRequest allRequests =
StartContainersRequest.newInstance(list);
StartContainersResponse response =
containerMgrProxy.startContainers(allRequests);
27. 终于开始启动AM所在的Container了,这里由ContainerManagerImpl实现,首先内部做了一些校验,执行了关键方法startContainerInternal(nmTokenIdentifier, containerTokenIdentifier, request);这段代码非常的多,所以也只选取关键的部分,我们看到它提交了事件INIT_APPLICATION。跟着代码看进去,发现最终调用了RequestResourcesTransition()方法,我们这里不分
析资源本地化的特性,有兴趣了解的可以自己查阅相关的资料,这个方法的篇幅很长,所以我选了关键的代码来分析,container.sendLaunchEvent()内部提交了ContainersLauncherEventType.LAUNCH_CONTAINER事件,这个事件交由ContainerLuncher类来处理
container.sendLaunchEvent();
container.metrics.endInitingContainer();
return ContainerState.LOCALIZED;
28. containerLuncher是一个线程池对线,所以这里非常清楚的看到,new了一个ContainerLuncher对线交由线程池来处理,这里再提一下,前文也涉及到过,spark自己封装的AM启动上下文就是在这里传进去来启动AM的
case LAUNCH_CONTAINER:
Application app =
context.getApplications().get(
containerId.getApplicationAttemptId().getApplicationId());
ContainerLaunch launch =
new ContainerLaunch(context, getConfig(), dispatcher, exec, app,
event.getContainer(), dirsHandler, containerManager);
containerLauncher.submit(launch);
running.put(containerId, launch);
break;
29. 那到这,AM的启动基本就结束了,关于我们ContainerLuncher线程到底做了什么,大家可以自己去看内部的call()方法,这里我也不做赘述了。
本文讲述了AM启动的全过程,内部的代码真的很复杂,也涉及到许多别的模块的的东西,蛋挞在这并没有全部分析,如果要统筹分析会使得思路变得混乱,对于一些蛋挞感兴趣的模块如状态机、rpc通信这些,在后续Yarn的研究中也会慢慢的学习的。后文将要介绍AM是如何注册到RM上,以及AM申请Container和Container的启动。
作者:蛋挞
日期:2018.08.28