Spark源码分析之CoarseGrainedExecutorBackend运行流程(Executor)

接上文 Spark源码分析之AM端运行流程(Driver) 分析完了在AM端Driver的运行流程,在最后我们看到AM向Yarn提交申请Executor容器请求,请求上下文参数如下图:

Yarn分配运行Executor容器流程和Yarn分配运行Driver容器流程一样(流程分析见 Spark源码分析之任务提交流程(Client)),我们继续看启动Executor的launch_container.sh

如上面两图可以看出,容器启动后的入口类是 org.apache.spark.executor.CoarseGrainedExecutorBackend;由--driver-url spark://CoarseGrainedScheduler@node0:43195可知Driver注册的服务名为CoarseGrainedScheduler,对应的服务类是DriverEndpoint,是org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend的内部类。
下面我们进入CoarseGrainedExecutorBackend的伴生类的main函数开始分析(源码同样基于Spark2.4):

  def main(args: Array[String]) {
    var driverUrl: String = null
    var executorId: String = null
    var hostname: String = null
    var cores: Int = 0
    var appId: String = null
    var workerUrl: Option[String] = None
    val userClassPath = new mutable.ListBuffer[URL]()

    var argv = args.toList
    while (!argv.isEmpty) {
      argv match {
        case ("--driver-url") :: value :: tail =>
          driverUrl = value
          argv = tail
        case ("--executor-id") :: value :: tail =>
          executorId = value
          argv = tail
        case ("--hostname") :: value :: tail =>
          hostname = value
          argv = tail
        case ("--cores") :: value :: tail =>
          cores = value.toInt
          argv = tail
        case ("--app-id") :: value :: tail =>
          appId = value
          argv = tail
        case ("--worker-url") :: value :: tail =>
          // Worker url is used in spark standalone mode to enforce fate-sharing with worker
          workerUrl = Some(value)
          argv = tail
        case ("--user-class-path") :: value :: tail =>
          userClassPath += new URL(value)
          argv = tail
        case Nil =>
        case tail =>
          // scalastyle:off println
          System.err.println(s"Unrecognized options: ${tail.mkString(" ")}")
          // scalastyle:on println
          printUsageAndExit()
      }
    }

    if (hostname == null) {
      hostname = Utils.localHostName()
      log.info(s"Executor hostname is not provided, will use '$hostname' to advertise itself")
    }

    if (driverUrl == null || executorId == null || cores <= 0 || appId == null) {
      printUsageAndExit()
    }

    run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
    System.exit(0)
  }

main函数操作包含:获取命令行参数,解析校验参数,最后调用伴生类的run方法(解析见代码注释),如下:


  private def run(
      driverUrl: String,
      executorId: String,
      hostname: String,
      cores: Int,
      appId: String,
      workerUrl: Option[String],
      userClassPath: Seq[URL]) {

    Utils.initDaemon(log)

    SparkHadoopUtil.get.runAsSparkUser { () =>
      // Debug code
      Utils.checkHost(hostname)

      // Bootstrap to fetch the driver's Spark properties.
      // 首先创建名为fetcher的RpcEnv,主要用于从driver拉取Spark配置信息,用完后面就关闭了
      val executorConf = new SparkConf
      val fetcher = RpcEnv.create(
        "driverPropsFetcher",
        hostname,
        -1,
        executorConf,
        new SecurityManager(executorConf),
        clientMode = true)
      // 通过--driver-url参数注册获得Driver引用(driverUrl例如:spark://CoarseGrainedScheduler@node0:43195)
      val driver = fetcher.setupEndpointRefByURI(driverUrl)
      // 向Driver发送RetrieveSparkAppConfig消息,拉取Spark配置信息
      val cfg = driver.askSync[SparkAppConfig](RetrieveSparkAppConfig)
      val props = cfg.sparkProperties ++ Seq[(String, String)](("spark.app.id", appId))
      fetcher.shutdown()

      // Create SparkEnv using properties we fetched from the driver.
      // 通过拉取的driver配置信息创建SparkConf
      val driverConf = new SparkConf()
      for ((key, value) <- props) {
        // this is required for SSL in standalone mode
        if (SparkConf.isExecutorStartupConf(key)) {
          driverConf.setIfMissing(key, value)
        } else {
          driverConf.set(key, value)
        }
      }

      cfg.hadoopDelegationCreds.foreach { tokens =>
        SparkHadoopUtil.get.addDelegationTokens(tokens, driverConf)
      }

      // 创建Executor的SparkEnv
      val env = SparkEnv.createExecutorEnv(
        driverConf, executorId, hostname, cores, cfg.ioEncryptionKey, isLocal = false)

      // 创建CoarseGrainedExecutorBackend实例,并注册到自身的Executor Env的rpcEnv中
      env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
        env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
      // 创建WorkerWatcher,用于当worker发生异常情况时,关闭CoarseGrainedExecutorBackend(仅standalone模式有效)
      workerUrl.foreach { url =>
        env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
      }
      // 等待,直到rpcEnv退出
      env.rpcEnv.awaitTermination()
    }
  }

其中重点是val env = SparkEnv.createExecutorEnv(...)env.rpcEnv.setupEndpoint(...),涉及Spark的Rpc框架设计(详细分析见另一篇博文 Spark源码分析之Rpc框架),在前者SparkEnv.createExecutorEnv()函数初始化注册了多个服务,例如:MapOutputTrackerBlockManagerMasterOutputCommitCoordinator等,后者env.rpcEnv.setupEndpoint()函数又单独注册了Executor服务(为啥这么分开注册,目前没明白原因,但不影响代码分析),注册完后就可以和driver进行正常消息交互。
那我们看交互是如何开始的?
从 Spark源码分析之Rpc框架 分析可知,在setupEndpoint函数注册过程中,会向Dispatcher调度器注册EndpointData,而在实例化EndpointData时候,会维护一个Inbox用于接受消息,在实例化Inbox时候会放入第一个消息OnStart消息,这样在后面的循环消费消息时候就能消费第一个消息调用服务对应的onStart()方法了,即是每个服务在注册过程都会首先触发调用自身的onStart方法。下面以注册CoarseGrainedExecutorBackend服务调用其onStart()方法演示其过程:


在其StandaloneSchedulerBackend.onstart()方法中会向Driver(即CoarseGrainedExecutorBackend的内部类DriverEndpoint)发送RegisterExecutor消息,如下图:

DriverEndpoint接受到RegisterExecutor消息后,判断executor是否重复注册,如果重复注册直接回复消息;否则,初始化生成ExecutorData并添加到内存中,并向CoarseGrainedExecutorBackend发送RegisteredExecutor消息。CoarseGrainedExecutorBackend接收到消息后,会初始化 Executor 实例,初始化Executor工作,例如定时发送心跳等。从上面分析我们可以看出:CoarseGrainedExecutorBackend是一个JVM进程,该进程为Executor的守护进程,用于Executor的创建和维护,CoarseGrainedExecutorBackendExecutor是一一对应,一个Worker可以启动多个CoarseGrainedExecutorBackend
至此完成了ExecutorDriver的注册,之后Executor就可以接受Driver下发的各种消息了。

你可能感兴趣的:(大数据:Spark)