提交流程文字说明:
1、执行bin/spark-submit命令后,Client会组装commnd命令到yarn集群的ResourceManager。
commnd命令:bin/java org.apache.spark.deploy.yarn.ApplicationMaster,如果非集群模式就是bin/java org.apache.spark.deploy.yarn.ExecutorLauncher,比如通过bin/spark-shell --master yarn执行,就不能通过集群模式执行
2、ResourceManager接收到命令后,RM就找相应的NodeManager执行这个命令,这个命令的作用就是创建并启动一个ApplicationMaster
ApplicationMaster启动后,里面的run方法会再启动一个Driver
3、Driver向ResourceManager申请资源
4、ResourceManager返回给Driver可用的Container列表
5、Driver就向NodeManager节点发送启动Container JVM命令,NodeManager接收到这个命令以后,就执行这个命令,启动一个JVM进程,生成一个Container
6、Container启动后会向线程池中放入Executor线程,并启动Executor线程,Executor线程启动以后,会反向向Driver注册自己,让driver监控自己的执行情况;同时发送各种执行命令RegisteredExecutor、LaunchTask。让每个节点执行receiver方法,进而启动真正的Executor
bin/spark-submit
--class org.apache.spark.examples.SparkPi
--master yarn
--deploymode client
./examples/jars/spark-examples.jar 100
提交以上脚本以后,等于启动一个java进程
程序的调用链如下:
1)、object SparkSubmit.scala
这个是一个伴生对象,除了普通方法和构造方法以外的程序都会被执行
—>main()
def main(args: Array[String]): Unit = {
//根据命令行的参数,封装和解析参数
val appArgs = new SparkSubmitArguments(args)
if (appArgs.verbose) {
// scalastyle:off println
printStream.println(appArgs)
// scalastyle:on println
}
//appArgs.action match模式匹配;action = Option(action).getOrElse(SUBMIT),意思是没有设置的话就取SUBMIT,我们用的是submit,那么就执行submit(appArgs)
appArgs.action match {
case SparkSubmitAction.SUBMIT => submit(appArgs)
case SparkSubmitAction.KILL => kill(appArgs)
case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
}
}
—>submit()
private def submit(args: SparkSubmitArguments): Unit = {
//获取参数
val (childArgs, childClasspath, sysProps, childMainClass) = prepareSubmitEnvironment(args)
...
runMain()
}
—>runMain()
//使用当前线程的类加载器,获取classpath下所有的jar包,加载jar包
val loader =
if (sysProps.getOrElse("spark.driver.userClassPathFirst", "false").toBoolean) {
new ChildFirstURLClassLoader(new Array[URL](0),
Thread.currentThread.getContextClassLoader)
} else {
new MutableURLClassLoader(new Array[URL](0),
Thread.currentThread.getContextClassLoader)
}
Thread.currentThread.setContextClassLoader(loader)
for (jar <- childClasspath) {
addJarToClasspath(jar, loader)
}
for ((key, value) <- sysProps) {
System.setProperty(key, value)
}
//通过反射得到main类并执行main函数
//如果是Cluster模式部署,main类是org.apache.spark.deploy.yarn.Client
//如果是Client模式运行,main类就是我们传入的参数class:org.apache.spark.examples.SparkPi
mainClass = Utils.classForName(childMainClass)
val mainMethod = mainClass.getMethod("main", new Array[String](0).getClass)
mainMethod.invoke(null, childArgs.toArray)
2)、——> org.apache.spark.deploy.yarn.Client.scala
—>main()
def main(argStrings: Array[String]) {
//目的就是新建一个Client,运行run方法
new Client(args, sparkConf).run()
}
—>run()
def run(): Unit = {
this.appId = submitApplication()
}
—>submitApplication()
def submitApplication(): ApplicationId = {
var appId: ApplicationId = null
try {
launcherBackend.connect()
// 启动yarn客户端
setupCredentials()
yarnClient.init(yarnConf)
yarnClient.start()
// yarn客户端在ResourceManager里面创建一个应用,得到一个全局appId
val newApp = yarnClient.createApplication()
val newAppResponse = newApp.getNewApplicationResponse()
appId = newAppResponse.getApplicationId()
reportLauncherState(SparkAppHandle.State.SUBMITTED)
launcherBackend.setAppId(appId.toString)
new CallerContext("CLIENT", Option(appId.toString)).setCurrentContext()
// 判断ResourceManager是否有足够的资源给ApplicationMaster
verifyClusterResources(newAppResponse)
// 拼装命令bin/java org.apache.spark.deploy.yarn.ApplicationMaster,
// 如果非集群模式就是bin/java org.apache.spark.deploy.yarn.ExecutorLauncher,比如通过bin/spark-shell --master yarn执行,就不能通过集群模式执行
// 提交命令给yarn集群执行,ResourceManager就将命令给其中一个NodeManager执行,执行后启动一个ApplicationMaster
val containerContext = createContainerLaunchContext(newAppResponse)
val appContext = createApplicationSubmissionContext(newApp, containerContext)
// 向yarn集群ResourceManager提交应用
yarnClient.submitApplication(appContext)
// 返回全局appId,可以在日志中查看
appId
} catch {
...
}
}
------------------------上面的是在yarnClient和ResourceManager运行的,下面的是在各个NodeManager运行的
3)、——> org.apache.spark.deploy.yarn.ApplicationMaster
—>main()
def main(args: Array[String]): Unit = {
// 封装参数
val amArgs = new ApplicationMasterArguments(args)
//新建一个ApplicationMaster,执行run方法,参数是yarn的ResourceManager客户端,通过这个客户端让NodeManager跟ResourceManager进行交互
SparkHadoopUtil.get.runAsSparkUser { () =>
master = new ApplicationMaster(amArgs, new YarnRMClient)
System.exit(master.run())
}
}
—>run():目的就是启动Driver,过程就是通过yarnClient提交一个command到ResourceManager,ResourceManager交给其中一个NodeManager执行,执行后会启动一个ApplicationMaster,ApplicationMaster里面的run方法会再启动一个Driver
final def run(): Int = {
...
try {
if (isClusterMode) {
// 启动Driver
runDriver(securityMgr)
} else {
runExecutorLauncher(securityMgr)
}
} catch {
}
}
—>runDriver():
private def runDriver(securityMgr: SecurityManager): Unit = {
addAmIpFilter()
// 在NodeManager里面启动应用程序,也就是我们自己定义的传入的class
userClassThread = startUserApplication()
try {
val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
Duration(totalWaitTime, TimeUnit.MILLISECONDS))
// 向yarn注册我们的ApplicationMaster,
// 建立ApplicationMaster和ResourceManager之间的关联,AM和RM都是两个不同的JAVA进程,他们之间是通过RPC进行交互的
// 注册的目的是申请资源,申请到资源后生成Container,然后运行Container
rpcEnv = sc.env.rpcEnv
val driverRef = runAMEndpoint(
sc.getConf.get("spark.driver.host"),
sc.getConf.get("spark.driver.port"),
isClusterMode = true)
registerAM(sc.getConf, rpcEnv, driverRef, sc.ui.map(_.appUIAddress).getOrElse(""),
securityMgr)
userClassThread.join()
} catch {
}
}
—>startUserApplication():新建一个线程启动我们用户自定义的class的main函数,也就是我们自己的代码的driver程序开始执行了
private def startUserApplication(): Thread = {
// 获取类加载器
val classpath = Client.getUserClasspath(sparkConf)
val userClassLoader =
if (Client.isUserClassPathFirst(sparkConf, isDriver = true)) {
new ChildFirstURLClassLoader(urls, Utils.getContextOrSparkClassLoader)
} else {
new MutableURLClassLoader(urls, Utils.getContextOrSparkClassLoader)
}
var userArgs = args.userArgs
if (args.primaryPyFile != null && args.primaryPyFile.endsWith(".py")) {
userArgs = Seq(args.primaryPyFile, "") ++ userArgs
}
val mainMethod = userClassLoader.loadClass(args.userClass)
.getMethod("main", classOf[Array[String]])
val userThread = new Thread {
override def run() {
try {
// 执行我们自定义的class的main方法
mainMethod.invoke(null, userArgs.toArray)
finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
logDebug("Done running users class")
} catch {
} finally {
}
}
}
userThread.setContextClassLoader(userClassLoader)
userThread.setName("Driver")
userThread.start()
userThread
}
4)、org.apache.spark.deploy.yarn.YarnAllocator.scala:内置有一个线程池,来启动ExecutorRunnable
—>allocateResources():
—>handleAllocatedContainers():
—>runAllocatedContainers():
private def runAllocatedContainers(containersToUse: ArrayBuffer[Container]): Unit = {
for (container <- containersToUse) {
def updateInternalState(): Unit = synchronized {
numExecutorsRunning += 1
executorIdToContainer(executorId) = container
containerIdToExecutorId(container.getId) = executorId
val containerSet = allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
new HashSet[ContainerId])
containerSet += containerId
allocatedContainerToHostMap.put(containerId, executorHostname)
}
launcherPool.execute(new Runnable {
override def run(): Unit = {
try {
new ExecutorRunnable(
Some(container),
conf,
sparkConf,
driverUrl,
executorId,
executorHostname,
executorMemory,
executorCores,
appAttemptId.getApplicationId.toString,
securityMgr,
localResources
).run()
updateInternalState()
}
}
}
5)、org.apache.spark.deploy.yarn.ExecutorRunnable:目的是向NodeManager发送一个启动Executor进程的JAVA命令,
—>run():
—>startContainer():
6)、org.apache.spark.executor.CoarseGrainedExecutorBackend:进去以后执行onstart方法,向driver注册自己,让driver监控自己的执行情况;同时发送各种执行命令RegisteredExecutor、LaunchTask。让每个节点执行receiver方法,进而启动真正的Executor
—>main():
—>run():
—>receive():
override def receive: PartialFunction[Any, Unit] = {
case RegisteredExecutor =>
logInfo("Successfully registered with driver")
try {
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
} catch {
case NonFatal(e) =>
exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
}
case RegisterExecutorFailed(message) =>
exitExecutor(1, "Slave registration failed: " + message)
case LaunchTask(data) =>
if (executor == null) {
exitExecutor(1, "Received LaunchTask command but executor was null")
} else {
val taskDesc = ser.deserialize[TaskDescription](data.value)
logInfo("Got assigned task " + taskDesc.taskId)
executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,
taskDesc.name, taskDesc.serializedTask)
}
case KillTask(taskId, _, interruptThread) =>
if (executor == null) {
exitExecutor(1, "Received KillTask command but executor was null")
} else {
executor.killTask(taskId, interruptThread)
}
case StopExecutor =>
stopping.set(true)
self.send(Shutdown)
case Shutdown =>
stopping.set(true)
new Thread("CoarseGrainedExecutorBackend-stop-executor") {
override def run(): Unit = {
executor.stop()
}
}.start()
}