Spark2.x精通:TaskScheduler的Task任务提交源码剖析

微信公众号:大数据开发运维架构

关注可了解更多大数据相关的资讯。问题或建议,请公众号留言;

如果您觉得“大数据开发运维架构”对你有帮助,欢迎转发朋友圈

从微信公众号拷贝过来,格式有些错乱,建议直接去公众号阅读


接上篇文章:Spark2.x精通:Job触发流程源码深度剖析(二),我们这里继续讲解TaskScheduler如何进行Task任务提交的。

1.上篇文章中最后是通过调用taskScheduler.submitTasks()函数,向TaskScheuler发起Task任务调度的,从SparkContext()初始化我们知道submitTasks方法的实现在TaskScheduler的实现类TaskSchedulerImpl中,代码如下:

override def submitTasks(taskSet: TaskSet) {

    val tasks = taskSet.tasks

    logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")

    this.synchronized {

    //首先创建TasksetManaer里面其实就是对TaskSet做了一层封装,

    //TaskManager 负责管理TaskSchedulerImpl中一个单独TaskSet,

    // 跟踪每一个task,如果task失败,负责重试task直到达到task重试次数的最多次数。

    // 并且通过延迟调度来执行task的位置感知调度

      val manager = createTaskSetManager(taskSet, maxTaskFailures)

      val stage = taskSet.stageId

      val stageTaskSets =

        taskSetsByStageIdAndAttempt.getOrElseUpdate(stage, new HashMap[Int, TaskSetManager])

      stageTaskSets(taskSet.stageAttemptId) = manager

      //isZombie是TaskSetManager中所有tasks是否不需要执行(成功执行或者stage被删除)的一个标记,

      // 如果该TaskSet没有被完全执行并且已经存在和新进来的taskset一样的另一个TaskSet,

      // 则抛出异常,确保一个stage不能有两个taskSet同时运行

      val conflictingTaskSet = stageTaskSets.exists { case (_, ts) =>

        ts.taskSet != taskSet && !ts.isZombie

      }

      if (conflictingTaskSet) {

        throw new IllegalStateException(s"more than one active taskSet for stage $stage:" +

          s" ${stageTaskSets.toSeq.map{_._2.taskSet.id}.mkString(",")}")

      }

      // 将TaskManager加入到调度池中,这个调度池是在SparkContext初始化的时候根据调度

      //策略进行初始化的,两种调度策略(FIFOSchedulableBuilder和FairSchedulableBuilder两种)

      schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)

      if (!isLocal && !hasReceivedTask) {

        starvationTimer.scheduleAtFixedRate(new TimerTask() {

          override def run() {

            if (!hasLaunchedTask) {

              logWarning("Initial job has not accepted any resources; " +

                "check your cluster UI to ensure that workers are registered " +

                "and have sufficient resources")

            } else {

              this.cancel()

            }

          }

        }, STARVATION_TIMEOUT_MS, STARVATION_TIMEOUT_MS)

      }

      hasReceivedTask = true

    }

    //最后调动reviveOffers函数向backend的driverActor实例发送ReviveOffers消息,

    // driveerActor收到ReviveOffers消息后。调用makeOffers处理函数,最后调用CoarseGrainedSchedulerBackend中的

    //makeOffers()函数

    backend.reviveOffers()

  }

2.这里直接看CoarseGrainedSchedulerBackend.scala中的makeOffers()函数,代码如下:

// Make fake resource offers on all executors

    private def makeOffers() {

      // Make sure no executor is killed while some task is launching on it

      val taskDescs = CoarseGrainedSchedulerBackend.this.synchronized {

        // Filter out executors under killing

        //过滤所有存活的Executor

        val activeExecutors = executorDataMap.filterKeys(executorIsAlive)

        //封装成workOffers,这里面包含了:executorId、host地址、可用的cpu core信息

        val workOffers = activeExecutors.map { case (id, executorData) =>

          new WorkerOffer(id, executorData.executorHost, executorData.freeCores)

        }.toIndexedSeq

        //TaskSchedulerImpl调用resourceOffers方法通过准备好的资源获得要被执行的TaskDescription集合

        scheduler.resourceOffers(workOffers)

      }


      if (!taskDescs.isEmpty) {

        //执行task

        launchTasks(taskDescs)

      }

    }

3.我们先来看resourceOffers()函数,代码如下:

//通过准备好的资源offers获得要被执行的TaskDescription集合

// 说白了就是 task分配Executor

def  resourceOffers(offers: IndexedSeq[WorkerOffer]): Seq[Seq[TaskDescription]] = synchronized {

    // Mark each slave as alive and remember its hostname

    // Also track if new executor is added

    var newExecAvail = false

    //循环WorkerOffer,更新executor,host,rack信息,

    for (o <- offers) {

      if (!hostToExecutors.contains(o.host)) {

        hostToExecutors(o.host) = new HashSet[String]()

      }

      if (!executorIdToRunningTaskIds.contains(o.executorId)) {

        hostToExecutors(o.host) += o.executorId

        executorAdded(o.executorId, o.host)

        executorIdToHost(o.executorId) = o.host

        executorIdToRunningTaskIds(o.executorId) = HashSet[Long]()

        newExecAvail = true

      }

      for (rack <- getRackForHost(o.host)) {

        hostsByRack.getOrElseUpdate(rack, new HashSet[String]()) += o.host

      }

    }

    // Before making any offers, remove any nodes from the blacklist whose blacklist has expired. Do

    // this here to avoid a separate thread and added synchronization overhead, and also because

    // updating the blacklist is only relevant when task offers are being made.

    blacklistTrackerOpt.foreach(_.applyBlacklistTimeout())

    val filteredOffers = blacklistTrackerOpt.map { blacklistTracker =>

      offers.filter { offer =>

        !blacklistTracker.isNodeBlacklisted(offer.host) &&

          !blacklistTracker.isExecutorBlacklisted(offer.executorId)

      }

    }.getOrElse(offers)

    //随机打乱传进来的WorkerOffers

    val shuffledOffers = shuffleOffers(filteredOffers)

    //tasks是一个二维数组,保存的是哪个Exeutor分配的task列表

    val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores))

    // 每个executor上可用的cores

    val availableCpus = shuffledOffers.map(o => o.cores).toArray

    //根据之前的调度策略放回排序的Taskset,之前说过两种调度策略

    //分别为FIFO、Fair 默认FIFO先进先出

    val sortedTaskSets = rootPool.getSortedTaskSetQueue


    for (taskSet <- sortedTaskSets) {

      logDebug("parentName: %s, name: %s, runningTasks: %s".format(

        taskSet.parent.name, taskSet.name, taskSet.runningTasks))

      //如果executor是重新分配的 ,需要重新重新计算task调度策略

      if (newExecAvail) {

        taskSet.executorAdded()

      }

    }

    //这里是个双重循环,首先循环taskSet,然后对每个taskSet

    // 依次按照本地性级别顺序尝试启动task

    //spark读取数据的本地性级别有5个,性能从左到右依次降低:PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY

  for (taskSet <- sortedTaskSets) {

      var launchedAnyTask = false

      var launchedTaskAtCurrentMaxLocality = false

      for (currentMaxLocality <- taskSet.myLocalityLevels) {

        do {

        //resourceOfferSingleTaskSet负责任务的启动,先从最优的级别启动,如果启动失败,

        //do while进入下一个级别重新启动,直到申请启动成功为止

        //这里需要注意如果taskset中有一个task或者多个task分配成功,这里会添加到tasks变量中

          launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(

            taskSet, currentMaxLocality, shuffledOffers, availableCpus, tasks)

          launchedAnyTask |= launchedTaskAtCurrentMaxLocality

        } while (launchedTaskAtCurrentMaxLocality)

      }

      if (!launchedAnyTask) {

        taskSet.abortIfCompletelyBlacklisted(hostToExecutors)

      }

    }

    //如果有分配成功的task 则标识hasLaunchedTask 为true

    if (tasks.size > 0) {

      hasLaunchedTask = true

    }

    return tasks

  }

4.下面看resourceOfferSingleTaskSet()函数,如何利用当前级别的数据本地性,在Executor上分配Task,代码如下:

private def  resourceOfferSingleTaskSet(

      taskSet: TaskSetManager,

      maxLocality: TaskLocality,

      shuffledOffers: Seq[WorkerOffer],

      availableCpus: Array[Int],

      tasks: IndexedSeq[ArrayBuffer[TaskDescription]]) : Boolean = {

    var launchedTask = false

    // nodes and executors that are blacklisted for the entire application have already been

    // filtered out by this point

    for (i <- 0 until shuffledOffers.size) {

      //workOffer中封装的executorId和host地址

      val execId = shuffledOffers(i).executorId

      val host = shuffledOffers(i).host

      //如果executor可用的cpu数据大于每个task所需要的cpu数量,默认task需要cpu为1

      //由参数spark.task.cpu参数控制,

      if (availableCpus(i) >= CPUS_PER_TASK) {

        try {

          //这里就是遍历,哪些task可以在该exeutor上启动

          //resourceOffer()方法返回当前execId分配到的一个TaskDescription

          //这里东西比较多下面会单独写一篇文章讲解

          for (task <- taskSet.resourceOffer(execId, host, maxLocality)) {

            tasks(i) += task

            val tid = task.taskId

            taskIdToTaskSetManager(tid) = taskSet

            taskIdToExecutorId(tid) = execId

            executorIdToRunningTaskIds(execId).add(tid)

            availableCpus(i) -= CPUS_PER_TASK

            assert(availableCpus(i) >= 0)

            //只要有task申请启动成功  这里就置为true

            launchedTask = true

          }

        } catch {

          case e: TaskNotSerializableException =>

            logError(s"Resource offer failed, task set ${taskSet.name} was not serializable")

            // Do not offer resources for this task, but don't throw an error to allow other

            // task sets to be submitted.

            return launchedTask

        }

      }

    }

    return launchedTask

  }

6.这里继续回到上面步骤2中的makeOffers()函数,如果有tasks分配申请成功,返回已经申请成功的TaskDescription集合,它有两个比较重要的信息:taskid、已分配的executorid,这样下面调用launchTasks执行task时才知道哪个task应该在哪个executor上面执行,下面看launchTasks()函数,代码如下:

// Launch tasks returned by a set of resource offers

    private def launchTasks(tasks: Seq[Seq[TaskDescription]]) {


  //迭代TaskDescription的seq集合

      for (task <- tasks.flatten) {

        val serializedTask = TaskDescription.encode(task)

        if (serializedTask.limit >= maxRpcMessageSize) {

          scheduler.taskIdToTaskSetManager.get(task.taskId).foreach { taskSetMgr =>

            try {

              var msg = "Serialized task %s:%d was %d bytes, which exceeds max allowed: " +

                "spark.rpc.message.maxSize (%d bytes). Consider increasing " +

                "spark.rpc.message.maxSize or using broadcast variables for large values."

              msg = msg.format(task.taskId, task.index, serializedTask.limit, maxRpcMessageSize)

              taskSetMgr.abort(msg)

            } catch {

              case e: Exception => logError("Exception in error callback", e)

            }

          }

        }

        else {

          val executorData = executorDataMap(task.executorId)

          executorData.freeCores -= scheduler.CPUS_PER_TASK

          logDebug(s"Launching task ${task.taskId} on executor id: ${task.executorId} hostname: " +

            s"${executorData.executorHost}.")

          //发起launchTask的rpc请求,

          // 消息会在CoarseGrainedExecutorBackend.scala的receive消息接收函数进行处理

          executorData.executorEndpoint.send(LaunchTask(new SerializableBuffer(serializedTask)))

        }

      }

    }

7.CoarseGrainedExecutorBackend消息处理函数receive接收到LaunchTask消息,调用executor.launchTask()函数进行处理这里,代码如下:

def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {

  //新建TaskRunner实例,TaskRunner是继承了java的Runnable接口的线程实例

    val tr = new TaskRunner(context, taskDescription)

    //更新Executor中的已启动的task进程

    runningTasks.put(taskDescription.taskId, tr)

    //放到线程池中执行,调用run()方法

    threadPool.execute(tr)

  }

    至此Task任务提交源码剖析,已经讲解完成,后面会接这篇文章讲解Executor源码剖析,谢谢关注!!!

    阅读建议,spark源码剖析的文章可按照这个顺序去读,先看文章,然后去看源码,再去看文章:

    1、Spark2.x精通:Standalone模式Master节点启动源码剖析

    2、Spark2.x精通:Master端循环消息处理源码剖析(一)

    3、Spark2.x精通:Master端循环消息处理源码剖析(二)

    4、Spark2.x精通:Worker节点启动源码剖析

    5、Spark2.x精通:Worker端循环消息处理源码剖析(一)

    6、Spark2.2.0精通:详解Master端任务调度schedule()函数

    7、Spark2.x精通:从spark-submit提交到driver启动

    8、Spark2.x精通:Job触发流程源码深度剖析(一)

    9、Spark2.x精通:Job触发流程源码深度剖析(二)

    如果觉得我的文章能帮到您,请关注微信公众号“大数据开发运维架构”,并转发朋友圈,谢谢支持!!!

你可能感兴趣的:(Spark2.x精通:TaskScheduler的Task任务提交源码剖析)