Spark1.3.1源码(1)-Spark Submit任务提交以及Worker向Executor注册

当我们写好的Spark程序想要提交到集群运行时,我们通常是调用SparkSubmit脚本进行提交,那调用Spark Submit脚本是如何在集群中执行我们的Spark程序的,Master是如何进行调度的,Executor如何向Driver注册的等等,过程复杂,这篇文章将为你一一揭晓。

1.任务提交Executor的注册流程

执行命令bin/spark-submit,内部通过exec执行${SPARK_HOME}/bin/spark-class org.apache.spark.deploy.SparkSubmit。
通过spark-class脚本,最终执行的命令中,执行了程序的入口为org.apache.spark.deploy.SparkSubmit,类中的doRunMain中通过反射调用自定勒的main方法。
自定义main方法中构造了SparkContext执行其主构造器

1.1 调用createSparkEnv创建一个ActorSystem,这是akka用于通信的类

//TODO 该方法创建了一个ActorSystem
private[spark] def createSparkEnv(
    conf: SparkConf,
    isLocal: Boolean,
    listenerBus: LiveListenerBus): SparkEnv = {
  SparkEnv.createDriverEnv(conf, isLocal, listenerBus)
}
 
private[spark] val env = createSparkEnv(conf, isLocal, listenerBus)
SparkEnv.set(env)

------------------------

// Create the ActorSystem for Akka and get the port it binds to.
val (actorSystem, boundPort) = {
  val actorSystemName = if (isDriver) driverActorSystemName else executorActorSystemName
  //TODO 利用AkkaUtils这个工具类创建ActorSystem
  AkkaUtils.createActorSystem(actorSystemName, hostname, port, conf, securityManager)
}

1.2创建TaskScheduler->根据提交任务的URL进行匹配(Standalone)->创建TaskSchedulerImpl->创建SparkDeploySchedulerBackend(里面有两个Actor----ClientActor,DriverActor)

val SPARK_REGEX = """spark://(.*)""".r
------------
//TODO 创建一个TaskScheduler
private[spark] var (schedulerBackend, taskScheduler) =
  SparkContext.createTaskScheduler(this, master)
  
------------
 
  //TODO spark的StandAlone模式,根据URL匹配的Spark://
case SPARK_REGEX(sparkUrl) =>
  //TODO 创建了一个TaskSchedulerImpl
  val scheduler = new TaskSchedulerImpl(sc)
  val masterUrls = sparkUrl.split(",").map("spark://" + _)
  //TODO 创建了一个SparkDeploySchedulerBackend
  val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
  //TODO 调用initialize创建调度器
  scheduler.initialize(backend)
  (backend, scheduler)

1.3 通过ActorSystem创建了一个Actor,这个心跳是Executors和DriverActor的心跳

//TODO 通过ActorSystem创建了一个Actor,这个心跳是Executors和DriverActor的心跳
private val heartbeatReceiver = env.actorSystem.actorOf(
  Props(new HeartbeatReceiver(taskScheduler)), "HeartbeatReceiver")

1.4创建DAGScheduler,以后用来把DAG切分成Stage

this指的是SparkContext

//TODO 创建了一个DAGScheduler,以后用来把DAG切分成Stage
dagScheduler = new DAGScheduler(this)

DAGScheduler在初始化的时候,就会启动eventProcessLoop(FIFO),DAGSchedulerEventProcessLoop继承了EventLoop,启动之后,会启动一个后台线程,从队列BlockingQueue里取各种event,然后根据取出的event,进行不同的处理。

1.5启动taskScheduler

SparkContext

//TODO 启动taskScheduler
taskScheduler.start()

TaskSchedulerImpl

//TODO 首先调用SparkDeploySchedulerBackend的start方法
backend.start()

SparkDeploySchedulerBackend

//TODO 首先调用父类的start方法来创建DriverActor
super.start()

CoarseGrainedSchedulerBackend

创建DriverActor,用于与Executor通信

 // TODO 通过ActorSystem创建DriverActor
  driverActor = actorSystem.actorOf(
    Props(new DriverActor(properties)), name = CoarseGrainedSchedulerBackend.ACTOR_NAME)
}

SparkDeploySchedulerBackend

回到SparkDeploySchedulerBackend调用父类start()方法之后

创建AppClient,并调用start方法,start方法中创建了ClientActor用于与Master通信。把应用信息封装在一个ApplicationDescription中发给Master

//TODO 准备一些参数,以后把这些参数封装到一个对象中,然后将该对象发送给Master
val driverUrl = AkkaUtils.address(
  AkkaUtils.protocol(actorSystem),
  SparkEnv.driverActorSystemName,
  conf.get("spark.driver.host"),
  conf.get("spark.driver.port"),
  CoarseGrainedSchedulerBackend.ACTOR_NAME)
  
//TODO 这个参数是以后Executor的实现类
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
  args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
//TODO 把参数封装到ApplicationDescription
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
  appUIAddress, sc.eventLogDir, sc.eventLogCodec)
//TODO 创建一个AppClient把ApplicationDescription通过主构造器传进去
client = new AppClient(sc.env.actorSystem, masters, appDesc, this, conf)
//TODO 然后调用AppClient的start方法,在start方法中创建了一个ClientActor用于与Master通信
client.start()
 
waitForRegistration()

AppClient

创建ClientActor,用于与Executor通信

def start() {
  // Just launch an actor; it will call back into the listener.
  //TODO 创建ClientActor调用主构造器 -》preStart -》 receive
  actor = actorSystem.actorOf(Props(new ClientActor))
}

生命周期方法(preStart())中向Master注册

//TODO ClientActor的生命周期方法
override def preStart() {
  context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent])
  try {
    //TODO ClientActor向Master注册
    registerWithMaster()
  } catch {
    case e: Exception =>
      logWarning("Failed to connect to master", e)
      markDisconnected()
      context.stop(self)
  }
}

向Master注册成功后Master会返回case class RegisteredApplication

//TODO Master发送给ClientActor注册成功的消息
case RegisteredApplication(appId_, masterUrl) =>
  appId = appId_
  registered = true
  changeMaster(masterUrl)
  listener.connected(appId)

Master

接收到来自DriverActor的注册消息,保存信息,发送注册成功的消息,调用schedule()方法,开始调度资源,根据任务决定在哪些Worker上启动Executor

//TODO ClientActor发送过来的注册应用的消息
case RegisterApplication(description) => {
  if (state == RecoveryState.STANDBY) {
    // ignore, don't send response
  } else {
    logInfo("Registering app " + description.name)
    //TODO 首先把应用的信息放到内存中存储
    val app = createApplication(description, sender)
    registerApplication(app)
    logInfo("Registered app " + description.name + " with ID " + app.id)
    //TODO 利用持久化引擎保存
    persistenceEngine.addApplication(app)
    //TODO Master向ClientActor发送注册成功的消息
    sender ! RegisteredApplication(app.id, masterUrl)
    //TODO 重要:Master开始调度资源,其实就是把任务启动到哪些Worker上
    schedule()
  }
}

schedule()

Master调用该方法开始进行调度。在1.3中一个worker只能存在一个application的Executor,在1.6去掉了这种限制

有两种方式,尽量大散和尽量集中

尽量打散
  1. 过滤出所有符合memory的worker并按core的倒序排序
  2. 遍历worker,一次分配core,直到分完位置。core信息记录在一个assigned数组中
  3. 依次启动Executor
尽量集中
  1. 过滤出所有符合memory的worker
  2. 遍历worker,取worker剩余core和application剩余core两者的最小值(榨干)。在该worker中启动Executor。如果该任务还有core需要分配,则继续下一个worker。

Master

发送消息启动Executor,并告诉ClientActor Executor启动成功

//TODO Master发送消息让Worker启动Executor
launchExecutor(usableWorkers(pos), exec)
def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc) {
  logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
  //TODO 记录该Worker使用的资源
  worker.addExecutor(exec)
  //TODO Master发送消息给Worker,把参数通过case class传递给Worker,让他启动Executor,
  worker.actor ! LaunchExecutor(masterUrl,
    exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory)
  //TODO Master向ClientActor发送消息,告诉它Executor已经启动了
  exec.application.driver ! ExecutorAdded(
    exec.id, worker.id, worker.hostPort, exec.cores, exec.memory)
}

Worker

接收到来自Master的消息,里面封装了以后要启动的Executor的信息

//TODO Master发送给Worker的消息,让Worker启动Executor。
//TODO LaunchExecutor是一个Case Class,里面封装了以后要启动的Executor的信息
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) => ...........

创建一个ExecutorRunner,将参数都放到其中,然后在通过他启动Executor

//TODO 创建一个ExecutorRunner,将参数都放到其中,然后在通过他启动Executor
  val manager = new ExecutorRunner(
    appId,
    execId,
    appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
    cores_,
    memory_,
    self,
    workerId,
    host,
    webUi.boundPort,
    publicAddress,
    sparkHome,
    executorDir,
    akkaUrl,
    conf,
    appLocalDirs, ExecutorState.LOADING)

保存Executor信息,并调用ExecutorRunner的start方法来启动Executor java子进程

  //TODO 把ExecutorID-> Executor放到一个Map中了
  executors(appId + "/" + execId) = manager
  //TODO 调用ExecutorRunner的start方法来启动Executor java子进程
  manager.start()
  coresUsed += cores_
  memoryUsed += memory_
  master ! ExecutorStateChanged(appId, execId, manager.state, None, None)
}

ExecutorRunner

创建一个线程,通过一个线程来启动一个Java子进程

def start() {
  //TODO 先创建一个线程对象,然后通过一个线程来启动一个Java子进程
  workerThread = new Thread("ExecutorRunner for " + fullId) {
    override def run() { fetchAndRunExecutor() }
  }
  //TODO 调用线程对象的start方法 -> 线程对象的run方法
  workerThread.start()
  // Shutdown hook that kills actors on shutdown.
  shutdownHook = new Thread() {
    override def run() {
      killProcess(Some("Worker shutting down"))
    }
  }
  Runtime.getRuntime.addShutdownHook(shutdownHook)
}

线程调用fetchAndRunExecutor()方法来启动子进程

//TODO 线程对象调用该方法启动Java子进程
  def fetchAndRunExecutor() {
    try {
      // Launch the process
      //TODO 启动子进程
      val builder = CommandUtils.buildProcessBuilder(appDesc.command, memory,
        sparkHome.getAbsolutePath, substituteVariables)
      val command = builder.command()
      logInfo("Launch command: " + command.mkString("\"", "\" \"", "\""))

      builder.directory(executorDir)
      builder.environment.put("SPARK_LOCAL_DIRS", appLocalDirs.mkString(","))
      // In case we are running this from within the Spark Shell, avoid creating a "scala"
      // parent process for the executor command
      builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0")

      // Add webUI log urls
      val baseUrl =
        s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="
      builder.environment.put("SPARK_LOG_URL_STDERR", s"${baseUrl}stderr")
      builder.environment.put("SPARK_LOG_URL_STDOUT", s"${baseUrl}stdout")

      //TODO 真正启动一个Java子进程 ->  CoarseGrainedExecutorBackend的main方法
      process = builder.start()
      val header = "Spark Executor Command: %s\n%s\n\n".format(
        command.mkString("\"", "\" \"", "\""), "=" * 40)

CoarseGrainedExecutorBackend

启动好一个Executor后,会调用Executor的生命周期方法,在生命周期方法中向Driver注册

//TODO CoarseGrainedExecutorBackend的生命周期方法
override def preStart() {
  logInfo("Connecting to driver: " + driverUrl)
  //TODO 跟Driver建立连接
  driver = context.actorSelection(driverUrl)
  //TODO Executor向DriverActor发送消息,来注册Exectuor
  driver ! RegisterExecutor(executorId, hostPort, cores, extractLogUrls)
  context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent])
}

DriverActor

接收Executor向DriverActor发送的注册消息(RegisterExecutor),并返回注册成功消息(RegisteredExecutor)

def receiveWithLogging = {
  //TODO Executor向DriverActor发送的消息
  case RegisterExecutor(executorId, hostPort, cores, logUrls) =>
    Utils.checkHostPort(hostPort, "Host port expected " + hostPort)
    if (executorDataMap.contains(executorId)) {
      sender ! RegisterExecutorFailed("Duplicate executor ID: " + executorId)
    } else {
      logInfo("Registered executor: " + sender + " with ID " + executorId)
      //TODO DriverActor向Executor发送消息,告诉Executor注册成功
      sender ! RegisteredExecutor    

CoarseGrainedExecutorBackend

接收Driver发送来的注册成功消息,创建一个Executor实例

//TODO DirverActor发送给Executor的消息,告诉它已经注册成功
case RegisteredExecutor =>
  logInfo("Successfully registered with driver")
  val (hostname, _) = Utils.parseHostPort(hostPort)
  //TODO 创建了一个Executor实例,用来执行业务逻辑
  executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)

查看是否有任务需要提交

//TODO 查看是否有任务需要提交(DriverActor -> Executor)
makeOffers()

Executor

收到DriverActor的注册成功消息后,创建一个线程池用来执行task

 //TODO DirverActor发送给Executor的消息,告诉她已经注册成功
    case RegisteredExecutor =>
      logInfo("Successfully registered with driver")
      val (hostname, _) = Utils.parseHostPort(hostPort)
      //TODO 创建了一个Executor实例,用来执行业务逻辑
      executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)

//TODO 初始化线程池
val threadPool = Utils.newDaemonCachedThreadPool("Executor task launch worker")

向DriverActor发送心跳

//TODO Executor向DriverActor发送心跳
startDriverHeartbeater()

到这就完成Executor的注册工作啦,接下来就是Driver提交的job到Executor运行了

本篇先讲到这里,如果喜欢的话记得关注喔, 公众号同名

你可能感兴趣的:(spark)