Spark的任务, 生产环境中一般提交到Yarn上执行. 具体流程如下图所示
1、client提交任务到RM.
2、RM启动AM.
3、AM启动Driver线程, 并向RM申请资源.
4、RM返回可用资源列表.
5、AM通过nmClient启动Container, 并且启动CoraseGrainedExecutorBackend后台进程.
6、Executor反向注册给Driver
7、Executor启动任务
spark-submit.sh内部是执行了org.apache.spark.deploy.SparkSubmit这个类.(不再赘述, 感兴趣的同学可以vim看下)
我们在idea中找到这个类, 并定位main函数, 得到以下代码.
override def main(args: Array[String]): Unit = {
appArgs.action match {
case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
}
}
appArgs.action, 初始化的时候有赋值,
// Action should be SUBMIT unless otherwise specified
action = Option(action).getOrElse(SUBMIT)
我们直接点击submit(appArgs, uninitLog), 跳转到对应的方法.
private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
def doRunMain(): Unit = {
if (args.proxyUser != null) {
val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
UserGroupInformation.getCurrentUser())
try {
proxyUser.doAs(new PrivilegedExceptionAction[Unit]() {
override def run(): Unit = {
runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
}
})
} catch {
case e: Exception =>
// Hadoop's AuthorizationException suppresses the exception's stack trace, which
// makes the message printed to the output by the JVM not very helpful. Instead,
// detect exceptions with empty stack traces here, and treat them differently.
if (e.getStackTrace().length == 0) {
// scalastyle:off println
printStream.println(s"ERROR: ${e.getClass().getName()}: ${e.getMessage()}")
// scalastyle:on println
exitFn(1)
} else {
throw e
}
}
} else {
runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
}
}
// Let the main class re-initialize the logging system once it starts.
if (uninitLog) {
Logging.uninitialize()
}
// In standalone cluster mode, there are two submission gateways:
// (1) The traditional RPC gateway using o.a.s.deploy.Client as a wrapper
// (2) The new REST-based gateway introduced in Spark 1.3
// The latter is the default behavior as of Spark 1.3, but Spark submit will fail over
// to use the legacy gateway if the master endpoint turns out to be not a REST server.
if (args.isStandaloneCluster && args.useRest) {
try {
// scalastyle:off println
printStream.println("Running Spark using the REST application submission protocol.")
// scalastyle:on println
doRunMain()
} catch {
// Fail over to use the legacy submission gateway
case e: SubmitRestConnectionException =>
printWarning(s"Master endpoint ${args.master} was not a REST server. " +
"Falling back to legacy submission gateway instead.")
args.useRest = false
submit(args, false)
}
// In all other modes, just run the main class as prepared
} else {
doRunMain()
}
}
我们主要关心两个,
其一是val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args), 这是一个模式匹配, childMainClass在Yarn模式下, 对应的类是"org.apache.spark.deploy.yarn.YarnClusterApplication", 任务提交其实就是提交基于这个类的java进程command命令.
其二是runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose), 这是我们下一步需要点击的, 这个类中, 我们需要关注
val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
mainClass.newInstance().asInstanceOf[SparkApplication]
}
这里面的mainClass就是上面的"org.apache.spark.deploy.yarn.YarnClusterApplication", 我们通过反射的方法, 获得这个类的实例, 然后执行
app.start(childArgs.toArray, sparkConf)
我们进入"org.apache.spark.deploy.yarn.YarnClusterApplication"类.
private[spark] class YarnClusterApplication extends SparkApplication {
override def start(args: Array[String], conf: SparkConf): Unit = {
// SparkSubmit would use yarn cache to distribute files & jars in yarn mode,
// so remove them from sparkConf here for yarn mode.
conf.remove("spark.jars")
conf.remove("spark.files")
new Client(new ClientArguments(args), conf).run()
}
}
这个类重写了start方法, 然后new Client(new ClientArguments(args), conf).run()
run方法中有一行代码
this.appId = submitApplication()
这个方法主要实现了submit的提交, 例如实例化一个yarnClient, 设置启动AM的参数(command命令), 真正提交的是这些命令.
其中集群模式下, 对应的是"org.apache.spark.deploy.yarn.ApplicationMaster"类, 然后提交yarnClient.submitApplication(appContext).
def submitApplication(): ApplicationId = {
var appId: ApplicationId = null
try {
launcherBackend.connect()
// Setup the credentials before doing anything else,
// so we have don't have issues at any point.
setupCredentials()
yarnClient.init(hadoopConf)
yarnClient.start()
logInfo("Requesting a new application from cluster with %d NodeManagers"
.format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))
// Get a new application from our RM
val newApp = yarnClient.createApplication()
val newAppResponse = newApp.getNewApplicationResponse()
appId = newAppResponse.getApplicationId()
new CallerContext("CLIENT", sparkConf.get(APP_CALLER_CONTEXT),
Option(appId.toString)).setCurrentContext()
// Verify whether the cluster has enough resources for our AM
verifyClusterResources(newAppResponse)
// Set up the appropriate contexts to launch our AM
val containerContext = createContainerLaunchContext(newAppResponse)
val appContext = createApplicationSubmissionContext(newApp, containerContext)
// Finally, submit and monitor the application
logInfo(s"Submitting application $appId to ResourceManager")
yarnClient.submitApplication(appContext)
launcherBackend.setAppId(appId.toString)
reportLauncherState(SparkAppHandle.State.SUBMITTED)
appId
} catch {
case e: Throwable =>
if (appId != null) {
cleanupStagingDir(appId)
}
throw e
}
}
我们进入"org.apache.spark.deploy.yarn.ApplicationMaster"类, 找到main函数
def main(args: Array[String]): Unit = {
SignalUtils.registerLogger(log)
val amArgs = new ApplicationMasterArguments(args)
master = new ApplicationMaster(amArgs)
System.exit(master.run())
}
我们顺着master.run, 一路点击, 找到, 这里有几个关键的地方,
其一是userClassThread = startUserApplication(), 这就是启动我们的driver线程, 通过反射启动我们的--class 后面对应的类, 并且给这个线程命名为"Driver".
其二是registerAM(sc.getConf, rpcEnv, driverRef, sc.ui.map(_.webUrl)), 这是注册AM, 即AM向RM注册, 得到可用的资源用于run后续程序.
其三是allocator.allocateResources(), 这个操作是申请资源后, 实行本地化策略, 即节点本地化, 机架本地化等.
其四是runAllocatedContainers(containersToUse), 即获得可用资源后, 启动NM以及对应的container
private def runDriver(): Unit = {
addAmIpFilter(None)
userClassThread = startUserApplication()
// This a bit hacky, but we need to wait until the spark.driver.port property has
// been set by the Thread executing the user class.
logInfo("Waiting for spark context initialization...")
val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME)
try {
val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
Duration(totalWaitTime, TimeUnit.MILLISECONDS))
if (sc != null) {
rpcEnv = sc.env.rpcEnv
val driverRef = createSchedulerRef(
sc.getConf.get("spark.driver.host"),
sc.getConf.get("spark.driver.port"))
registerAM(sc.getConf, rpcEnv, driverRef, sc.ui.map(_.webUrl))
registered = true
} else {
// Sanity check; should never happen in normal operation, since sc should only be null
// if the user app did not create a SparkContext.
throw new IllegalStateException("User did not initialize spark context!")
}
resumeDriver()
userClassThread.join()
} catch {
case e: SparkException if e.getCause().isInstanceOf[TimeoutException] =>
logError(
s"SparkContext did not initialize after waiting for $totalWaitTime ms. " +
"Please check earlier log output for errors. Failing the application.")
finish(FinalApplicationStatus.FAILED,
ApplicationMaster.EXIT_SC_NOT_INITED,
"Timed out waiting for SparkContext.")
} finally {
resumeDriver()
}
}
我们看下"runAllocatedContainers(containersToUse)"的实现
if (runningExecutors.size() < targetNumExecutors) {
numExecutorsStarting.incrementAndGet()
if (launchContainers) {
launcherPool.execute(new Runnable {
override def run(): Unit = {
try {
new ExecutorRunnable(
Some(container),
conf,
sparkConf,
driverUrl,
executorId,
executorHostname,
executorMemory,
executorCores,
appAttemptId.getApplicationId.toString,
securityMgr,
localResources
).run()
updateInternalState()
} catch {
case e: Throwable =>
numExecutorsStarting.decrementAndGet()
if (NonFatal(e)) {
logError(s"Failed to launch executor $executorId on container $containerId", e)
// Assigned container should be released immediately
// to avoid unnecessary resource occupation.
amClient.releaseAssignedContainer(containerId)
} else {
throw e
}
}
}
})
} else {
// For test only
updateInternalState()
}
}
再看下new ExecutorRunnable().run的实现
def run(): Unit = {
logDebug("Starting Executor Container")
nmClient = NMClient.createNMClient()
nmClient.init(conf)
nmClient.start()
startContainer()
}
我们分析出, 获得资源后, 首先在资源本地启动一个NMClient, 然后创建container.
startContainer()代码太长, 其内部主要就是把"org.apache.spark.executor.CoarseGrainedExecutorBackend"启动封装一个command命令, 通过nmClient.startContainer(container.get, ctx)启动这个进程.(ctx封装了上诉类)
我们切入"org.apache.spark.executor.CoarseGrainedExecutorBackend"类, 找到main函数.
def main(args: Array[String]) {
run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
System.exit(0)
}
private def run(
driverUrl: String,
executorId: String,
hostname: String,
cores: Int,
appId: String,
workerUrl: Option[String],
userClassPath: Seq[URL]) {
SparkHadoopUtil.get.runAsSparkUser { () =>
val env = SparkEnv.createExecutorEnv(
driverConf, executorId, hostname, cores, cfg.ioEncryptionKey, isLocal = false)
env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
workerUrl.foreach { url =>
env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
}
env.rpcEnv.awaitTermination()
}
}
run函数中, 主要是启动一个名为"Executore"的终端, 这个一个CoarseGrainedExecutorBackend类.
CoarseGrainedExecutorBackend是一个RPC的终端, 它的生命周期是constructor -> onStart -> receive* -> onStop.
所以我们先看下onStart方法,
主要是向Driver注册
override def onStart() {
rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
// This is a very fast action so we can use "ThreadUtils.sameThread"
driver = Some(ref)
ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls))
}(ThreadUtils.sameThread).onComplete {
// This is a very fast action so we can use "ThreadUtils.sameThread"
case Success(msg) =>
// Always receive `true`. Just ignore it
case Failure(e) =>
exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)
}(ThreadUtils.sameThread)
}
receive方法
刚注册成功, 得到反馈消息, 实例化Executor.
模式匹配是LaunchTask(data)的时候, 上诉的executor开始执行任务
override def receive: PartialFunction[Any, Unit] = {
case RegisteredExecutor =>
try {
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
}
case LaunchTask(data) =>
if (executor == null) {
exitExecutor(1, "Received LaunchTask command but executor was null")
} else {
val taskDesc = TaskDescription.decode(data.value)
logInfo("Got assigned task " + taskDesc.taskId)
executor.launchTask(this, taskDesc)
}
}
上诉流程基本就是Spark的任务提交流程总览.
我们再总结下Yarn集群模式下, 使用spark-submit提交的任务流程.
1、client提交任务到RM.
2、RM启动AM.
3、AM启动Driver线程, 并向RM申请资源.
4、RM返回可用资源列表.
5、AM通过nmClient启动Container, 并且启动CoraseGrainedExecutorBackend后台进程.
6、Executor反向注册给Driver
7、Executor启动任务