MichaelZhu

Spark任务提交流程

Spark的任务, 生产环境中一般提交到Yarn上执行. 具体流程如下图所示

1、client提交任务到RM.

2、RM启动AM.

3、AM启动Driver线程, 并向RM申请资源.

4、RM返回可用资源列表.

5、AM通过nmClient启动Container, 并且启动CoraseGrainedExecutorBackend后台进程.

6、Executor反向注册给Driver

7、Executor启动任务

具体用法

spark-submit.sh内部是执行了org.apache.spark.deploy.SparkSubmit这个类.(不再赘述, 感兴趣的同学可以vim看下)

我们在idea中找到这个类, 并定位main函数, 得到以下代码.

  override def main(args: Array[String]): Unit = {
    appArgs.action match {
      case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
    }
  }

appArgs.action, 初始化的时候有赋值,

// Action should be SUBMIT unless otherwise specified
action = Option(action).getOrElse(SUBMIT)

我们直接点击submit(appArgs, uninitLog), 跳转到对应的方法.

private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
    val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)

    def doRunMain(): Unit = {
      if (args.proxyUser != null) {
        val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
          UserGroupInformation.getCurrentUser())
        try {
          proxyUser.doAs(new PrivilegedExceptionAction[Unit]() {
            override def run(): Unit = {
              runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
            }
          })
        } catch {
          case e: Exception =>
            // Hadoop's AuthorizationException suppresses the exception's stack trace, which
            // makes the message printed to the output by the JVM not very helpful. Instead,
            // detect exceptions with empty stack traces here, and treat them differently.
            if (e.getStackTrace().length == 0) {
              // scalastyle:off println
              printStream.println(s"ERROR: ${e.getClass().getName()}: ${e.getMessage()}")
              // scalastyle:on println
              exitFn(1)
            } else {
              throw e
            }
        }
      } else {
        runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
      }
    }

    // Let the main class re-initialize the logging system once it starts.
    if (uninitLog) {
      Logging.uninitialize()
    }

    // In standalone cluster mode, there are two submission gateways:
    //   (1) The traditional RPC gateway using o.a.s.deploy.Client as a wrapper
    //   (2) The new REST-based gateway introduced in Spark 1.3
    // The latter is the default behavior as of Spark 1.3, but Spark submit will fail over
    // to use the legacy gateway if the master endpoint turns out to be not a REST server.
    if (args.isStandaloneCluster && args.useRest) {
      try {
        // scalastyle:off println
        printStream.println("Running Spark using the REST application submission protocol.")
        // scalastyle:on println
        doRunMain()
      } catch {
        // Fail over to use the legacy submission gateway
        case e: SubmitRestConnectionException =>
          printWarning(s"Master endpoint ${args.master} was not a REST server. " +
            "Falling back to legacy submission gateway instead.")
          args.useRest = false
          submit(args, false)
      }
    // In all other modes, just run the main class as prepared
    } else {
      doRunMain()
    }
  }

我们主要关心两个,

其一是val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args), 这是一个模式匹配, childMainClass在Yarn模式下, 对应的类是"org.apache.spark.deploy.yarn.YarnClusterApplication", 任务提交其实就是提交基于这个类的java进程command命令.

其二是runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose), 这是我们下一步需要点击的, 这个类中, 我们需要关注

val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
  mainClass.newInstance().asInstanceOf[SparkApplication]
}

这里面的mainClass就是上面的"org.apache.spark.deploy.yarn.YarnClusterApplication", 我们通过反射的方法, 获得这个类的实例, 然后执行

app.start(childArgs.toArray, sparkConf)

我们进入"org.apache.spark.deploy.yarn.YarnClusterApplication"类.

private[spark] class YarnClusterApplication extends SparkApplication {

  override def start(args: Array[String], conf: SparkConf): Unit = {
    // SparkSubmit would use yarn cache to distribute files & jars in yarn mode,
    // so remove them from sparkConf here for yarn mode.
    conf.remove("spark.jars")
    conf.remove("spark.files")

    new Client(new ClientArguments(args), conf).run()
  }

}

这个类重写了start方法, 然后new Client(new ClientArguments(args), conf).run()

run方法中有一行代码

this.appId = submitApplication()

这个方法主要实现了submit的提交, 例如实例化一个yarnClient, 设置启动AM的参数(command命令), 真正提交的是这些命令.

其中集群模式下, 对应的是"org.apache.spark.deploy.yarn.ApplicationMaster"类, 然后提交yarnClient.submitApplication(appContext).

  def submitApplication(): ApplicationId = {
    var appId: ApplicationId = null
    try {
      launcherBackend.connect()
      // Setup the credentials before doing anything else,
      // so we have don't have issues at any point.
      setupCredentials()
      yarnClient.init(hadoopConf)
      yarnClient.start()

      logInfo("Requesting a new application from cluster with %d NodeManagers"
        .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))

      // Get a new application from our RM
      val newApp = yarnClient.createApplication()
      val newAppResponse = newApp.getNewApplicationResponse()
      appId = newAppResponse.getApplicationId()

      new CallerContext("CLIENT", sparkConf.get(APP_CALLER_CONTEXT),
        Option(appId.toString)).setCurrentContext()

      // Verify whether the cluster has enough resources for our AM
      verifyClusterResources(newAppResponse)

      // Set up the appropriate contexts to launch our AM
      val containerContext = createContainerLaunchContext(newAppResponse)
      val appContext = createApplicationSubmissionContext(newApp, containerContext)

      // Finally, submit and monitor the application
      logInfo(s"Submitting application $appId to ResourceManager")
      yarnClient.submitApplication(appContext)
      launcherBackend.setAppId(appId.toString)
      reportLauncherState(SparkAppHandle.State.SUBMITTED)

      appId
    } catch {
      case e: Throwable =>
        if (appId != null) {
          cleanupStagingDir(appId)
        }
        throw e
    }
  }

我们进入"org.apache.spark.deploy.yarn.ApplicationMaster"类, 找到main函数

  def main(args: Array[String]): Unit = {
    SignalUtils.registerLogger(log)
    val amArgs = new ApplicationMasterArguments(args)
    master = new ApplicationMaster(amArgs)
    System.exit(master.run())
  }

我们顺着master.run, 一路点击, 找到, 这里有几个关键的地方,

其一是userClassThread = startUserApplication(), 这就是启动我们的driver线程, 通过反射启动我们的--class 后面对应的类, 并且给这个线程命名为"Driver".

其二是registerAM(sc.getConf, rpcEnv, driverRef, sc.ui.map(_.webUrl)), 这是注册AM, 即AM向RM注册, 得到可用的资源用于run后续程序.

其三是allocator.allocateResources(), 这个操作是申请资源后, 实行本地化策略, 即节点本地化, 机架本地化等.

其四是runAllocatedContainers(containersToUse), 即获得可用资源后, 启动NM以及对应的container

private def runDriver(): Unit = {
    addAmIpFilter(None)
    userClassThread = startUserApplication()

    // This a bit hacky, but we need to wait until the spark.driver.port property has
    // been set by the Thread executing the user class.
    logInfo("Waiting for spark context initialization...")
    val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME)
    try {
      val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
        Duration(totalWaitTime, TimeUnit.MILLISECONDS))
      if (sc != null) {
        rpcEnv = sc.env.rpcEnv
        val driverRef = createSchedulerRef(
          sc.getConf.get("spark.driver.host"),
          sc.getConf.get("spark.driver.port"))
        registerAM(sc.getConf, rpcEnv, driverRef, sc.ui.map(_.webUrl))
        registered = true
      } else {
        // Sanity check; should never happen in normal operation, since sc should only be null
        // if the user app did not create a SparkContext.
        throw new IllegalStateException("User did not initialize spark context!")
      }
      resumeDriver()
      userClassThread.join()
    } catch {
      case e: SparkException if e.getCause().isInstanceOf[TimeoutException] =>
        logError(
          s"SparkContext did not initialize after waiting for $totalWaitTime ms. " +
           "Please check earlier log output for errors. Failing the application.")
        finish(FinalApplicationStatus.FAILED,
          ApplicationMaster.EXIT_SC_NOT_INITED,
          "Timed out waiting for SparkContext.")
    } finally {
      resumeDriver()
    }
  }

我们看下"runAllocatedContainers(containersToUse)"的实现

      if (runningExecutors.size() < targetNumExecutors) {
        numExecutorsStarting.incrementAndGet()
        if (launchContainers) {
          launcherPool.execute(new Runnable {
            override def run(): Unit = {
              try {
                new ExecutorRunnable(
                  Some(container),
                  conf,
                  sparkConf,
                  driverUrl,
                  executorId,
                  executorHostname,
                  executorMemory,
                  executorCores,
                  appAttemptId.getApplicationId.toString,
                  securityMgr,
                  localResources
                ).run()
                updateInternalState()
              } catch {
                case e: Throwable =>
                  numExecutorsStarting.decrementAndGet()
                  if (NonFatal(e)) {
                    logError(s"Failed to launch executor $executorId on container $containerId", e)
                    // Assigned container should be released immediately
                    // to avoid unnecessary resource occupation.
                    amClient.releaseAssignedContainer(containerId)
                  } else {
                    throw e
                  }
              }
            }
          })
        } else {
          // For test only
          updateInternalState()
        }
      }

再看下new ExecutorRunnable().run的实现

  def run(): Unit = {
    logDebug("Starting Executor Container")
    nmClient = NMClient.createNMClient()
    nmClient.init(conf)
    nmClient.start()
    startContainer()
  }

我们分析出, 获得资源后, 首先在资源本地启动一个NMClient, 然后创建container.

startContainer()代码太长, 其内部主要就是把"org.apache.spark.executor.CoarseGrainedExecutorBackend"启动封装一个command命令, 通过nmClient.startContainer(container.get, ctx)启动这个进程.(ctx封装了上诉类)

我们切入"org.apache.spark.executor.CoarseGrainedExecutorBackend"类, 找到main函数.

def main(args: Array[String]) {
    run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
    System.exit(0)
  }

private def run(
      driverUrl: String,
      executorId: String,
      hostname: String,
      cores: Int,
      appId: String,
      workerUrl: Option[String],
      userClassPath: Seq[URL]) {


    SparkHadoopUtil.get.runAsSparkUser { () =>

      val env = SparkEnv.createExecutorEnv(
        driverConf, executorId, hostname, cores, cfg.ioEncryptionKey, isLocal = false)

      env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
        env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
      workerUrl.foreach { url =>
        env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
      }
      env.rpcEnv.awaitTermination()
    }
  }

run函数中, 主要是启动一个名为"Executore"的终端, 这个一个CoarseGrainedExecutorBackend类.

CoarseGrainedExecutorBackend是一个RPC的终端, 它的生命周期是constructor -> onStart -> receive* -> onStop.

所以我们先看下onStart方法,

主要是向Driver注册

  override def onStart() {
    rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
      // This is a very fast action so we can use "ThreadUtils.sameThread"
      driver = Some(ref)
      ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls))
    }(ThreadUtils.sameThread).onComplete {
      // This is a very fast action so we can use "ThreadUtils.sameThread"
      case Success(msg) =>
        // Always receive `true`. Just ignore it
      case Failure(e) =>
        exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)
    }(ThreadUtils.sameThread)
  }

receive方法

刚注册成功, 得到反馈消息, 实例化Executor.

模式匹配是LaunchTask(data)的时候, 上诉的executor开始执行任务

override def receive: PartialFunction[Any, Unit] = {
    case RegisteredExecutor =>
      try {
        executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
      } 

    case LaunchTask(data) =>
      if (executor == null) {
        exitExecutor(1, "Received LaunchTask command but executor was null")
      } else {
        val taskDesc = TaskDescription.decode(data.value)
        logInfo("Got assigned task " + taskDesc.taskId)
        executor.launchTask(this, taskDesc)
      }
  }

上诉流程基本就是Spark的任务提交流程总览.

我们再总结下Yarn集群模式下, 使用spark-submit提交的任务流程.