Spark是如何提交到Yarn上的
大部分的书籍在和文档在分析sparkjob提交的时候都是以standalone的方式分析的,由于大部分生产环境spark都是运行在yarn上的,并且deploy 多为cluser,所以我抽时间专门看了一下相关的源码,一起学习一下,有助于对于线上问题的的排查和分析。如有不正确的地方欢迎指出。
先看下提交的脚本spark-submit.sh
if [ -z "${SPARK_HOME}" ]; then
source "$(dirname "$0")"/find-spark-home
fi
# disable randomized hash for string in Python 3.3+
export PYTHONHASHSEED=0
exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
可以看到执行的是 org.apache.spark.deploy.SparkSubmit
于是查看 org.apache.spark.deploy.SparkSubmit
override def main(args: Array[String]): Unit = {
val appArgs = new SparkSubmitArguments(args)
if (appArgs.verbose) {
// scalastyle:off println
printStream.println(appArgs)
// scalastyle:on println
}
appArgs.action match {
case SparkSubmitAction.SUBMIT => submit(appArgs)
case SparkSubmitAction.KILL => kill(appArgs)
case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
}
}
private def submit(args: SparkSubmitArguments): Unit = {
val (childArgs, childClasspath, sysProps, childMainClass) = prepareSubmitEnvironment(args)
省略部分代码
doRunMain()
首先执行prepareSubmitEnvironment 这个方法会返回
- 子进程的参数
- 子进程的classpath
- 系统变量
- 子进程的main class 这儿很关键
可以看到childMainClass是yarn.Client,这一定要注意,我刚开始以为这个class是用户自定义的class,所以在后面查看ApplicationMaster的代码的时候不知道是如何调用的applicationMaster。
省略很多代码
// In yarn-cluster mode, use yarn.Client as a wrapper around the user class
if (isYarnCluster) {
childMainClass = "org.apache.spark.deploy.yarn.Client"
if (args.isPython) {
childArgs += ("--primary-py-file", args.primaryResource)
childArgs += ("--class", "org.apache.spark.deploy.PythonRunner")
} else if (args.isR) {
val mainFile = new Path(args.primaryResource).getName
childArgs += ("--primary-r-file", mainFile)
childArgs += ("--class", "org.apache.spark.deploy.RRunner")
} else {
if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
childArgs += ("--jar", args.primaryResource)
}
childArgs += ("--class", args.mainClass)
}
if (args.childArgs != null) {
args.childArgs.foreach { arg => childArgs += ("--arg", arg) }
}
}
省略很多代码
执行main函数并且会判断传入的action 类型是SUBMIT还是KILL,如果是kill则执行sumbit函数,然后找到doRunMain ->runMain
...省略部分代码
mainClass = Utils.classForName(childMainClass)
....省略部分代码
val mainMethod = mainClass.getMethod("main", new Array[String](0).getClass)
mainMethod.invoke(null, childArgs.toArray)
就是根据你指定的class加载,然后反射调用yarn.Client
然后我们看下Client的main方法
...省略部分代码
System.setProperty("SPARK_YARN_MODE", "true")
val sparkConf = new SparkConf
// SparkSubmit would use yarn cache to distribute files & jars in yarn mode,
// so remove them from sparkConf here for yarn mode.
sparkConf.remove("spark.jars")
sparkConf.remove("spark.files")
val args = new ClientArguments(argStrings)
new Client(args, sparkConf).run()
执行Client的run方法,
this.appId = submitApplication()
if (!launcherBackend.isConnected() && fireAndForget) {
val report = getApplicationReport(appId)
val state = report.getYarnApplicationState
logInfo(s"Application report for $appId (state: $state)")
logInfo(formatReportDetails(report))
if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) {
throw new SparkException(s"Application $appId finished with status: $state")
}
}
调用submitApplication提交Application,如果开启了spark.yarn.submit.waitAppCompletio这个进程会一直报告应用的状态直到应用退出。
def submitApplication(): ApplicationId = {
var appId: ApplicationId = null
try {
launcherBackend.connect()//启动launcherBackend,用于和launcher server沟通
// Setup the credentials before doing anything else,
// so we have don't have issues at any point.
setupCredentials()
yarnClient.init(yarnConf)//初始化yarnClient
yarnClient.start()//启动yarnClient
logInfo("Requesting a new application from cluster with %d NodeManagers"
.format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))
// Get a new application from our RM
val newApp = yarnClient.createApplication()//直接调用接口创建一个yarn的application
val newAppResponse = newApp.getNewApplicationResponse()
appId = newAppResponse.getApplicationId()
new CallerContext("CLIENT", sparkConf.get(APP_CALLER_CONTEXT),
Option(appId.toString)).setCurrentContext()
// Verify whether the cluster has enough resources for our AM
//判断集群是否有足够的资源启动我们的ApplicationMaster,主要是判断cpu和内存
verifyClusterResources(newAppResponse)
// Set up the appropriate contexts to launch our AM
//创建containerContext 启动AppplicationMaster
//设置环境变量,jvm的参数以及启动的命令
val containerContext = createContainerLaunchContext(newAppResponse)
val appContext = createApplicationSubmissionContext(newApp, containerContext)
//提交applicaion
// Finally, submit and monitor the application
logInfo(s"Submitting application $appId to ResourceManager")
//最后提交appContext
yarnClient.submitApplication(appContext)
launcherBackend.setAppId(appId.toString)
reportLauncherState(SparkAppHandle.State.SUBMITTED)
appId
} catch {
case e: Throwable =>
if (appId != null) {
cleanupStagingDir(appId)
}
throw e
}
}
然后我们主要看 createContainerLaunchContext,这个方法主要设置container的环境变量,jvm的参数,以及application执行的命令,比较长我就只贴一些关键的行
//准备资源
val localResources = prepareLocalResources(appStagingDirPath, pySparkArchives)
val amContainer = Records.newRecord(classOf[ContainerLaunchContext])
amContainer.setLocalResources(localResources.asJava)
amContainer.setEnvironment(launchEnv.asJava)
.....
val amClass =
if (isClusterMode) {
Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
} else {
Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
}
.....
val userArgs = args.userArgs.flatMap { arg =>
Seq("--arg", YarnSparkHadoopUtil.escapeForShell(arg))
}
val amArgs =
Seq(amClass) ++ userClass ++ userJar ++ primaryPyFile ++ primaryRFile ++ userArgs ++
Seq("--properties-file", buildPath(Environment.PWD.$$(), LOCALIZED_CONF_DIR, SPARK_CONF_FILE))
// Command for the ApplicationMaster
val commands = prefixEnv ++
Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++
javaOpts ++ amArgs ++
Seq(
"1>", ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout",
"2>", ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr")
// TODO: it would be nicer to just make sure there are no null commands here
val printableCommands = commands.map(s => if (s == null) "null" else s).toList
amContainer.setCommands(printableCommands.asJava)
调用prepareLocalResources准备资源,设置java相关参数,然后根据client或者cluster方式,传入不同的AMClass,然后拼出来启动ApplicationMaster的脚本。调用yarn的submitApplication 提交后,yarn会选择一个container执行applicationMaster。
ok这个时候yarn上某个executor上就会执行applicationMaster了。
然后我们在看一下ApplicationMaster
def main(args: Array[String]): Unit = {
SignalUtils.registerLogger(log)
val amArgs = new ApplicationMasterArguments(args)
// Load the properties file with the Spark configuration and set entries as system properties,
// so that user code run inside the AM also has access to them.
// Note: we must do this before SparkHadoopUtil instantiated
if (amArgs.propertiesFile != null) {
Utils.getPropertiesFromFile(amArgs.propertiesFile).foreach { case (k, v) =>
sys.props(k) = v
}
}
SparkHadoopUtil.get.runAsSparkUser { () =>
master = new ApplicationMaster(amArgs, new YarnRMClient)
System.exit(master.run())
}
}
创建一个AM,然后执行am.run 如果你去driver的进程查看一下(ps -ef|grep 000001)会看到如下所示
yarn 30383 30381 0 Jul14 ? 00:00:00 /bin/bash -c /usr/java/default/bin/java -server -Xmx4096m -Djava.io.tmpdir=/data/yarn/local/usercache/hdp-sm/appcache/application_1516175032931_7507/container_1516175032931_7507_01_000001/tmp -Dspark.yarn.app.container.log.dir=/data/yarn/logs/application_1516175032931_7507/container_1516175032931_7507_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.data.statemachine.StreamDivide' --jar file:/data/apps/session-join-v2/state-machine/bin/../lib/state-machine-0.0.1-SNAPSHOT.jar --arg 'v2' --arg 'StreamDivide' --properties-file /data/yarn/local/usercache/hdp-sm/appcache/application_1516175032931_7507/container_1516175032931_7507_01_000001/__spark_conf__/__spark_conf__.properties 1> /data/yarn/logs/application_1516175032931_7507/container_1516175032931_7507_01_000001/stdout 2> /data/yarn/logs/application_1516175032931_7507/container_1516175032931_7507_01_000001/stderr
截图如下
启动driver程序,
首先看第二行
startUserApplication启动用户的Application,这儿会专门启动一个线程调用用户自己的Main Class,也是通过反射。
val mainMethod = userClassLoader.loadClass(args.userClass)
.getMethod("main", classOf[Array[String]])
val userThread = new Thread {
override def run() {
try {
mainMethod.invoke(null, userArgs.toArray)
finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
logDebug("Done running users class")
}
...
第二部注册AM。
现在我们就可以看用户自己main方法的代码了。
首先我们知道要创建一个SparklJob,需要先创建一个SparkContext(spark2.3.0以后就是SparkSession了,但是也是封装了SparkContext),我们看SparkContext的代码.
省略部分代码
// Create and start the scheduler
val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)
_schedulerBackend = sched
_taskScheduler = ts
_dagScheduler = new DAGScheduler(this)
_heartbeatReceiver.ask[Boolean](TaskSchedulerIsSet)
// start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
// constructor
_taskScheduler.start()
省略部分代码
首先创建schedulerBackend和taskScheduler,启动taskScheduler.
org.apache.spark.SparkContext
master match {
case "local" =>//local或者standalone的等省略了
省略部分代码
case masterUrl =>//如果是master的url则执行
val cm = getClusterManager(masterUrl) match {
case Some(clusterMgr) => clusterMgr
case None => throw new SparkException("Could not parse Master URL: '" + master + "'")
}
try {
val scheduler = cm.createTaskScheduler(sc, masterUrl)
val backend = cm.createSchedulerBackend(sc, masterUrl, scheduler)
cm.initialize(scheduler, backend)
(backend, scheduler)
}
省略部分代码
}
该方法会根据你传入的master的url实例化一个clusterManager,然后看一下getClusterManager方法
private def getClusterManager(url: String): Option[ExternalClusterManager] = {
val loader = Utils.getContextOrSparkClassLoader
val serviceLoaders =
ServiceLoader.load(classOf[ExternalClusterManager], loader).asScala.filter(_.canCreate(url))
if (serviceLoaders.size > 1) {
throw new SparkException(
s"Multiple external cluster managers registered for the url $url: $serviceLoaders")
}
serviceLoaders.headOption
}
这个方法是调用ServiceLoader.load 加载一个ExternalClusterManager,如果以前有同学写过rpc服务,应该见过这种加载服务的方式。
首先获取当前的ClassLoader,然后Service.load 加载指定的cm。原理就是 Service.load 会在classpatch下的查找servcie的名称
进入你写的job,首先我们知道要创建一个SparkContext,
org.apache.spark.SparkContext$#createTaskScheduler
打开看一下,ExternalClusterManager指定的就是YarnClusterManager
org.apache.spark.scheduler.cluster.YarnClusterManager
然后获取clusterManager之后,就开始初始化TaskSchduler 和 TaskSchedulerBackend
ok 这样我们就可以回到taskSchedulser.start 也就是 YarnScheduler,但是它是继承TaskSchedulerImpl,所以看TaskSchedulerImpl的start
override def start() {
backend.start()
if (!isLocal && conf.getBoolean("spark.speculation", false)) {
logInfo("Starting speculative execution thread")
speculationScheduler.scheduleWithFixedDelay(new Runnable {
override def run(): Unit = Utils.tryOrStopSparkContext(sc) {
checkSpeculatableTasks()
}
}, SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS)
}
}
可以看到首先调用backend的start方法。这还有点逻辑就是spark的推测执行,这儿就不展开讲了。然后我们继续看backend的start方法.这儿有多个实现,基于yarn的有YarnClientSchedulerBackend和YarnClusterSchedulerBackend,我们就主要看下后者。
override def start() {
val attemptId = ApplicationMaster.getAttemptId
bindToYarn(attemptId.getApplicationId(), Some(attemptId))
super.start()
totalExpectedExecutors = YarnSparkHadoopUtil.getInitialTargetExecutorNumber(sc.conf)
}
首先调用ApplicationMaster获取attemptId,其中包含applicationid。然后调用YarnSparkHadoopUtil获取executor。