姜上清风

Spark-TaskSchedulerImpl,TaskSetManager,Pool 源码解析

class Pool
TaskSchedulerImpl

class TaskSchedulerImpl
object TaskSchedulerImpl

TaskSetManager

class TaskSetManager
object TaskSetManager

TaskResultGetter

class Pool

管理多个 TaskSetManager，一般会在 TaskSchedulerImpl 中使用，所以大多数的方法是在 TaskSchedulerImpl 中被调用的。
下面来看看源码：

private[spark] class Pool(
    val poolName: String, //poll 名称
    val schedulingMode: SchedulingMode, //调度模式 公平型和先入先出型
    initMinShare: Int,//权重
    initWeight: Int)//计算资源中的cpu核数
  extends Schedulable with Logging {

  val schedulableQueue = new ConcurrentLinkedQueue[Schedulable] //任务队列
  val schedulableNameToSchedulable = new ConcurrentHashMap[String, Schedulable] //保存调度 任务的名称和任务的 map
  val weight = initWeight
  val minShare = initMinShare
  var runningTasks = 0 //目前这个 数量 只是统计 正在运行的 任务数
  val priority = 0

  // A pool's stage id is used to break the tie in scheduling.
  var stageId = -1
  val name = poolName
  var parent: Pool = null
//根据 调度模式 返回 调度算法
  private val taskSetSchedulingAlgorithm: SchedulingAlgorithm = {
    schedulingMode match {
      case SchedulingMode.FAIR =>
        new FairSchedulingAlgorithm()
      case SchedulingMode.FIFO => //默认 FIFO
        new FIFOSchedulingAlgorithm()
      case _ =>
        val msg = s"Unsupported scheduling mode: $schedulingMode. Use FAIR or FIFO instead."
        throw new IllegalArgumentException(msg)
    }
  }
// add 一个 TaskSetManager
  //Schedulable 的一个子类 是 TaskSetManager
  //这个方法会在 FIFOSchedulableBuilder 中 调用的，那么最终的 在TaskSchedulerImpl的submitTasks的方法中调用这个 addSchedulable 方法
  override def addSchedulable(schedulable: Schedulable) {
    require(schedulable != null)
    schedulableQueue.add(schedulable) //加到任务队列里面去
    schedulableNameToSchedulable.put(schedulable.name, schedulable) //加到 任务map中去
    schedulable.parent = this //更新 schedulable.parent
  }
//移除一个 TaskSetManager
  //这个方法会在 TaskSchedulerImpl的 taskSetFinished方法中 通过manager.parent.removeSchedulable(manager) 调用，因为在 addSchedulable方法中 设置schedulable.parent = this
  override def removeSchedulable(schedulable: Schedulable) {
    schedulableQueue.remove(schedulable)
    schedulableNameToSchedulable.remove(schedulable.name)
  }
//通过 任务Name 获取任务实体 首先从 任务map中获取，否则从 任务队列中 获取
  override def getSchedulableByName(schedulableName: String): Schedulable = {
    if (schedulableNameToSchedulable.containsKey(schedulableName)) {
      return schedulableNameToSchedulable.get(schedulableName)
    }
    for (schedulable <- schedulableQueue.asScala) {
      val sched = schedulable.getSchedulableByName(schedulableName)
      if (sched != null) {
        return sched
      }
    }
    null
  }
//executor 丢失处理
  //在 TaskSchedulerImpl的removeExecutor中会调用这个方法
  override def executorLost(executorId: String, host: String, reason: ExecutorLossReason) {
    schedulableQueue.asScala.foreach(_.executorLost(executorId, host, reason)) //依次处理 TaskSetManager 中的这个 executorId 的任务
  }
//检查 推测 任务，只要 任务队列里面 有一个 任务满足 则返回true
  //这个方法会在 TaskSchedulerImpl中的checkSpeculatableTasks中调用，并且是一个定时任务
  override def checkSpeculatableTasks(minTimeToSpeculation: Int): Boolean = {
    var shouldRevive = false
    for (schedulable <- schedulableQueue.asScala) {
      shouldRevive |= schedulable.checkSpeculatableTasks(minTimeToSpeculation) //这个方法是 TaskSetManager
    }
    shouldRevive
  }
//获取 sorted task
  //在 TaskSchedulerImpl的resourceOffers中会调用这个方法
  override def getSortedTaskSetQueue: ArrayBuffer[TaskSetManager] = {
    val sortedTaskSetQueue = new ArrayBuffer[TaskSetManager]
    val sortedSchedulableQueue =
      schedulableQueue.asScala.toSeq.sortWith(taskSetSchedulingAlgorithm.comparator) //根据 调度算法 返回 排序后的 taskSetManager
    for (schedulable <- sortedSchedulableQueue) {
      sortedTaskSetQueue ++= schedulable.getSortedTaskSetQueue
    }
    sortedTaskSetQueue
  }
//增加  running的 task 数量
  //TaskSetManager中调用
  def increaseRunningTasks(taskNum: Int) {
    runningTasks += taskNum
    if (parent != null) {
      parent.increaseRunningTasks(taskNum)
    }
  }
  //减少  running的 task 数量
  //TaskSetManager中调用
  def decreaseRunningTasks(taskNum: Int) {
    runningTasks -= taskNum
    if (parent != null) {
      parent.decreaseRunningTasks(taskNum)
    }
  }
}

TaskSchedulerImpl

class TaskSchedulerImpl

主要管理Task的调度，所以会在DAGScheduler中使用，在一个Task的执行过程中需要和executos交互，所以在本类的initialize方法中会传入SchedulerBackend用来和executos交互
源码解析如下：

//管理Task的调度，所以会在DAGScheduler中使用，在一个Task的执行过程中 需要和executos交互，所以在本类的initialize方法中会传入SchedulerBackend用来和executos交互
private[spark] class TaskSchedulerImpl(
    val sc: SparkContext,
    val maxTaskFailures: Int, //默认是4
    isLocal: Boolean = false)
  extends TaskScheduler with Logging {

  import TaskSchedulerImpl._

  def this(sc: SparkContext) = {
    this(sc, sc.conf.get(config.MAX_TASK_FAILURES)) //默认是 4
  }

  // Lazily initializing blacklistTrackerOpt to avoid getting empty ExecutorAllocationClient,
  // because ExecutorAllocationClient is created after this TaskSchedulerImpl.
  private[scheduler] lazy val blacklistTrackerOpt = maybeCreateBlacklistTracker(sc) //默认关闭的话 是 None

  val conf = sc.conf

  // How often to check for speculative tasks
  //推测任务 间隔 默认 100ms
  val SPECULATION_INTERVAL_MS = conf.getTimeAsMs("spark.speculation.interval", "100ms")

  // Duplicate copies of a task will only be launched if the original copy has been running for
  // at least this amount of time. This is to avoid the overhead of launching speculative copies
  // of tasks that are very short.
  val MIN_TIME_TO_SPECULATION = 100

  private val speculationScheduler = //推测 scheduler 后台线程
    ThreadUtils.newDaemonSingleThreadScheduledExecutor("task-scheduler-speculation")

  // Threshold above which we warn user initial TaskSet may be starved
  val STARVATION_TIMEOUT_MS = conf.getTimeAsMs("spark.starvation.timeout", "15s")

  // CPUs to request per task
  val CPUS_PER_TASK = conf.getInt("spark.task.cpus", 1)

  // TaskSetManagers are not thread safe, so any access to one should be synchronized
  // on this class.
  //一个Stage 有多个 attemptNumber的次数，每一次都会有一个对应的 TaskSetManager
  private val taskSetsByStageIdAndAttempt = new HashMap[Int, HashMap[Int, TaskSetManager]] //stage 与 taskID和TaskSetManager 的 关系

  // Protected by `this`
  private[scheduler] val taskIdToTaskSetManager = new ConcurrentHashMap[Long, TaskSetManager] // task id 和 TaskSetManager 的关系
  val taskIdToExecutorId = new HashMap[Long, String] //taskID 和 executor的关系

  @volatile private var hasReceivedTask = false
  @volatile private var hasLaunchedTask = false
  private val starvationTimer = new Timer(true)

  // Incrementing task IDs
  val nextTaskId = new AtomicLong(0) //产生task ID

  // IDs of the tasks running on each executor
  private val executorIdToRunningTaskIds = new HashMap[String, HashSet[Long]] //executor 与running 任务的 关系
//返回 executors 运行的 任务数 的 map
  def runningTasksByExecutors: Map[String, Int] = synchronized {
    executorIdToRunningTaskIds.toMap.mapValues(_.size)
  }

  // The set of executors we have on each host; this is used to compute hostsAlive, which
  // in turn is used to decide when we can attain data locality on a given host
  protected val hostToExecutors = new HashMap[String, HashSet[String]] //host 和executor 的关系，一个host节点可能有多个executor

  protected val hostsByRack = new HashMap[String, HashSet[String]]

  protected val executorIdToHost = new HashMap[String, String] //executor 和 host的map
  // Listener object to pass upcalls into
  var dagScheduler: DAGScheduler = null

  var backend: SchedulerBackend = null //这个是driver 和其他 exector 通信的

  val mapOutputTracker = SparkEnv.get.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster] //保持 stage 的  map output 的 location

  private var schedulableBuilder: SchedulableBuilder = null
  // default scheduler is FIFO
  private val schedulingModeConf = conf.get(SCHEDULER_MODE_PROPERTY, SchedulingMode.FIFO.toString) //调度模式 默认FIFO
  val schedulingMode: SchedulingMode =
    try {
      SchedulingMode.withName(schedulingModeConf.toUpperCase(Locale.ROOT))
    } catch {
      case e: java.util.NoSuchElementException =>
        throw new SparkException(s"Unrecognized $SCHEDULER_MODE_PROPERTY: $schedulingModeConf")
    }

  //管理 多个 TaskSetManager，一般会在 TaskSchedulerImpl 中使用，所以大多数的方法是在 TaskSchedulerImpl 中被调用的
  val rootPool: Pool = new Pool("", schedulingMode, 0, 0)

  // This is a var so that we can reset it for testing purposes.
  // task 结果 获取器
  private[spark] var taskResultGetter = new TaskResultGetter(sc.env, this) //用于获取 Task 的 结果
//设置 DAGScheduler，是在DAGScheduler中调用的
  override def setDAGScheduler(dagScheduler: DAGScheduler) {
    this.dagScheduler = dagScheduler
  }
//初始化方法，在 SparkContext中随后调用，在 start方法调用之前
  def initialize(backend: SchedulerBackend) {
    this.backend = backend
    schedulableBuilder = {
      schedulingMode match {
        case SchedulingMode.FIFO =>
          new FIFOSchedulableBuilder(rootPool)
        case SchedulingMode.FAIR =>
          new FairSchedulableBuilder(rootPool, conf)
        case _ =>
          throw new IllegalArgumentException(s"Unsupported $SCHEDULER_MODE_PROPERTY: " +
          s"$schedulingMode")
      }
    }
    schedulableBuilder.buildPools()
  }

  def newTaskId(): Long = nextTaskId.getAndIncrement() //递增的 task id

//start方法 在 SparkContext 中 随后调用 line 508 主要启动 推测 scheduler 定时轮询 后台线程
  override def start() {
    backend.start() //backend start方法

    if (!isLocal && conf.getBoolean("spark.speculation", false)) { //开启 spark 推测 机制的话
      logInfo("Starting speculative execution thread")
      speculationScheduler.scheduleWithFixedDelay(new Runnable { //推测 scheduler 定时轮询 后台线程
        override def run(): Unit = Utils.tryOrStopSparkContext(sc) {
          checkSpeculatableTasks() //检查 spark 推测 任务
        }
      }, SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS)
    }
  }
//start的 钩子函数
  override def postStartHook() {
    waitBackendReady() //等待 backend ready
  }
//提交一个Stage的任务  -》更新 taskSetsByStageIdAndAttempt
  //在DAGScheduler中 submitMissingTasks 方法中 调用的
  override def submitTasks(taskSet: TaskSet) {//这里的TaskSet 是一个Stage里的所有的任务
    val tasks: Array[Task[_]] = taskSet.tasks
    logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")
    this.synchronized {
      val manager: TaskSetManager = createTaskSetManager(taskSet, maxTaskFailures) //为这个Stage的TaskSet new 一个 TaskSetManager
      val stage: Int = taskSet.stageId //这个Stage Id
      val stageTaskSets: mutable.Map[Int, TaskSetManager] =
        taskSetsByStageIdAndAttempt.getOrElseUpdate(stage, new HashMap[Int, TaskSetManager]) //一个Stage 有多个 attemptNumber的次数，每一次都会有一个对应的 TaskSetManager

      // Mark all the existing TaskSetManagers of this stage as zombie, as we are adding a new one.
      // This is necessary to handle a corner case. Let's say a stage has 10 partitions and has 2
      // TaskSetManagers: TSM1(zombie) and TSM2(active). TSM1 has a running task for partition 10
      // and it completes. TSM2 finishes tasks for partition 1-9, and thinks he is still active
      // because partition 10 is not completed yet. However, DAGScheduler gets task completion
      // events for all the 10 partitions and thinks the stage is finished. If it's a shuffle stage
      // and somehow it has missing map outputs, then DAGScheduler will resubmit it and create a
      // TSM3 for it. As a stage can't have more than one active task set managers, we must mark
      // TSM2 as zombie (it actually is).
      stageTaskSets.foreach { case (_, ts) => //为已经存在的 TaskSetManagers 的 isZombie 设置为 true，表示这个Stage的 tasks 都已经 运行完成了，当新添加一个Stage时，必然前面的Stage已经完成了
        ts.isZombie = true
      }
      stageTaskSets(taskSet.stageAttemptId) = manager //更新这个 stageAttemptId的 TaskSetManager
      schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties) //使用FIFOSchedulableBuilder add TaskSetManager 到 Pool中去

      if (!isLocal && !hasReceivedTask) {//hasReceivedTask 首次 是 false， isLocal在yarn-cluster是false
        starvationTimer.scheduleAtFixedRate(new TimerTask() {//定时任务
          override def run() {
            if (!hasLaunchedTask) {//hasLaunchedTask 首次是false
              logWarning("Initial job has not accepted any resources; " +
                "check your cluster UI to ensure that workers are registered " +
                "and have sufficient resources")
            } else {//当任务 第二次的时候，走这个 分支，下面这个方法就是 stop 自己的 TimerTask 线程
              this.cancel()
            }
          }
        }, STARVATION_TIMEOUT_MS, STARVATION_TIMEOUT_MS)
      }
      hasReceivedTask = true//hasReceivedTask 更新为true
    }
    backend.reviveOffers() //通知SchedulerBackend  拿到对应的task的TaskDescript ，来通知executor 执行tasks
  }

  // Label as private[scheduler] to allow tests to swap in different task set managers if necessary
  //新建一个taskSetManager，自己内部调用
  private[scheduler] def createTaskSetManager(
      taskSet: TaskSet,
      maxTaskFailures: Int): TaskSetManager = {
    new TaskSetManager(this, taskSet, maxTaskFailures, blacklistTrackerOpt) //新建 TaskSetManager 对象 ， spark 的 黑名单 默认关闭的话 blacklistTrackerOpt 是 None
  }

  //取消 某个 Stage的tasks
  //在DAGScheduler中 failJobAndIndependentStages 方法中 调用的
  override def cancelTasks(stageId: Int, interruptThread: Boolean): Unit = synchronized {
    logInfo("Cancelling stage " + stageId)
    taskSetsByStageIdAndAttempt.get(stageId).foreach { attempts => //get到 这个Stage的 attempts
      attempts.foreach { case (_, tsm) => //tsm 就是 TaskSetManager的实例
        // There are two possible cases here:
        // 1. The task set manager has been created and some tasks have been scheduled.
        //    In this case, send a kill signal to the executors to kill the task and then abort
        //    the stage.
        // 2. The task set manager has been created but no tasks have been scheduled. In this case,
        //    simply abort the stage.
        tsm.runningTasksSet.foreach { tid => //所有的task
            taskIdToExecutorId.get(tid).foreach(execId => //取得运行task 的 executorID
              backend.killTask(tid, execId, interruptThread, reason = "Stage cancelled")) //kill 这个 execID 的task
        }
        tsm.abort("Stage %s cancelled".format(stageId)) //再次 abort 确保正确
        logInfo("Stage %d was cancelled".format(stageId))
      }
    }
  }

//kill 一个 task，在DAGScheduler中调用
//在DAGScheduler中 killTaskAttempt 方法中 调用的
  override def killTaskAttempt(taskId: Long, interruptThread: Boolean, reason: String): Boolean = {
    logInfo(s"Killing task $taskId: $reason")
    val execId = taskIdToExecutorId.get(taskId) //拿到这个task的 execID
    if (execId.isDefined) {
      backend.killTask(taskId, execId.get, interruptThread, reason) //使用 backend kill 掉这个 execID的这个task
      true
    } else {
      logWarning(s"Could not kill task $taskId because no task with that ID was found.")
      false
    }
  }

  /**
   * Called to indicate that all task attempts (including speculated tasks) associated with the
   * given TaskSetManager have completed, so state associated with the TaskSetManager should be
   * cleaned up.
   */
  //这个TaskSetManager 处理成功
  //这个方法会在 TaskSetManager 中的 maybeFinishTaskSet 中调用，因为一个Stage的tasks 是否已经运行完成，是在 TaskSetManager 中保存维持的
  def taskSetFinished(manager: TaskSetManager): Unit = synchronized {
    taskSetsByStageIdAndAttempt.get(manager.taskSet.stageId).foreach { taskSetsForStage => //taskSetsForStage这个就是 HashMap[Int, TaskSetManager]
      taskSetsForStage -= manager.taskSet.stageAttemptId // HashMap[Int, TaskSetManager] 中去掉  这个成功的stageAttemptId
      if (taskSetsForStage.isEmpty) { //可能需要清理这个 taskSetsByStageIdAndAttempt
        taskSetsByStageIdAndAttempt -= manager.taskSet.stageId
      }
    }
    manager.parent.removeSchedulable(manager) //这个 manager.parent  就是 Pool这个对象，通知 Pool 移除这个 TaskSetMananger
    logInfo(s"Removed TaskSet ${manager.taskSet.id}, whose tasks have all completed, from pool" +
      s" ${manager.parent.name}")
  }

//此方法本类中使用， 下面的 resourceOffers 会被调用 用来 更新 tasks ArrayBuffer的TaskDescription 信息，会从 此 TaskSetManager 的 resourceOffer拿到这些 tasks
  private def resourceOfferSingleTaskSet(
      taskSet: TaskSetManager,
      maxLocality: TaskLocality, //maxLocality 从 PROCESS_LOCAL -》 ANY
      shuffledOffers: Seq[WorkerOffer],//每个executor 的 信息
      availableCpus: Array[Int], //每个executor 可用的核数
      tasks: IndexedSeq[ArrayBuffer[TaskDescription]]) : Boolean = { //这个目前 只是 null 的 ArrayBuffer[TaskDescription]
    var launchedTask = false
    // nodes and executors that are blacklisted for the entire application have already been
    // filtered out by this point
    for (i <- 0 until shuffledOffers.size) { //shuffledOffers 是 executor 的粒度
      val execId = shuffledOffers(i).executorId
      val host = shuffledOffers(i).host
      if (availableCpus(i) >= CPUS_PER_TASK) {//可用核数 》= 1
        try {//resourceOffer 这个方法只会返回 一个 Option[TaskDescription]
          for (task <- taskSet.resourceOffer(execId, host, maxLocality)) { //在 此 host 的 execId 的maxLocality 条件下 调度任务，返回 TaskDescription 信息
            tasks(i) += task //更新上面的 null 的 ArrayBuffer[TaskDescription] ，注意这里是 ArrayBuffer
            val tid = task.taskId
            taskIdToTaskSetManager.put(tid, taskSet)
            taskIdToExecutorId(tid) = execId
            executorIdToRunningTaskIds(execId).add(tid)
            availableCpus(i) -= CPUS_PER_TASK //更新这个 exector的 可用核数
            assert(availableCpus(i) >= 0)
            launchedTask = true
          }
        } catch {
          case e: TaskNotSerializableException =>
            logError(s"Resource offer failed, task set ${taskSet.name} was not serializable")
            // Do not offer resources for this task, but don't throw an error to allow other
            // task sets to be submitted.
            return launchedTask
        }
      }
    }
    return launchedTask
  }

  /**
   * Called by cluster manager to offer resources on slaves. We respond by asking our active task
   * sets for tasks in order of priority. We fill each node with tasks in a round-robin manner so
   * that tasks are balanced across the cluster.
   */
  //WorkerOffer 是executor的空闲核数
  //offers 是 存活的 executors 的 包含 空闲核数的 包装类 WorkerOffer
  //这个方法在 CoarseGrainedSchedulerBackend 中的 makeOffers 中调用，在 CoarseGrainedSchedulerBackend 中 持有这个 TaskScheduler 对象
  def resourceOffers(offers: IndexedSeq[WorkerOffer]): Seq[Seq[TaskDescription]] = synchronized {
    // Mark each slave as alive and remember its hostname
    // Also track if new executor is added
    var newExecAvail = false
    for (o <- offers) {//offers 是 存活的 executors 的 包含 空闲核数的 包装类 WorkerOffer
      if (!hostToExecutors.contains(o.host)) {
        hostToExecutors(o.host) = new HashSet[String]()
      }
      if (!executorIdToRunningTaskIds.contains(o.executorId)) {
        hostToExecutors(o.host) += o.executorId
        executorAdded(o.executorId, o.host)
        executorIdToHost(o.executorId) = o.host
        executorIdToRunningTaskIds(o.executorId) = HashSet[Long]()
        newExecAvail = true
      }
      for (rack <- getRackForHost(o.host)) {
        hostsByRack.getOrElseUpdate(rack, new HashSet[String]()) += o.host
      }
    }

    // Before making any offers, remove any nodes from the blacklist whose blacklist has expired. Do
    // this here to avoid a separate thread and added synchronization overhead, and also because
    // updating the blacklist is only relevant when task offers are being made.
    blacklistTrackerOpt.foreach(_.applyBlacklistTimeout())

    //因为 Spark黑名单机制么有开启，所以 filteredOffers 就是 offers
    val filteredOffers = blacklistTrackerOpt.map { blacklistTracker =>
      offers.filter { offer =>
        !blacklistTracker.isNodeBlacklisted(offer.host) &&
          !blacklistTracker.isExecutorBlacklisted(offer.executorId)
      }
    }.getOrElse(offers)
    //做一个 shuffle
    val shuffledOffers: IndexedSeq[WorkerOffer] = shuffleOffers(filteredOffers)
    // Build a list of tasks to assign to each worker.
    val tasks: IndexedSeq[ArrayBuffer[TaskDescription]] = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK)) //CPUS_PER_TASK 的意思就是 一个 CPU分配 几个task任务，默认一个CPU一个task
    val availableCpus: Array[Int] = shuffledOffers.map(o => o.cores).toArray //每个 Offers 可用的核数
    val sortedTaskSets: mutable.Seq[TaskSetManager] = rootPool.getSortedTaskSetQueue //获取 sorted task 一般情况下，这个 Queue中只有一个 Stage的 TaskSetManager
    for (taskSet <- sortedTaskSets) {
      logDebug("parentName: %s, name: %s, runningTasks: %s".format(
        taskSet.parent.name, taskSet.name, taskSet.runningTasks)) //这个 taskSet.parent.name 就是 Pool的name 也就是 本类中 new Pool 时指定的 名称 ""
      //taskSet.name 就是 taskSet_id这个id 就是 DAGScheduler 传过来的 Task id
      if (newExecAvail) {
        taskSet.executorAdded()
      }
    }

    // Take each TaskSet in our scheduling order, and then offer it each node in increasing order
    // of locality levels so that it gets a chance to launch local tasks on all of them.
    // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY
    for (taskSet <- sortedTaskSets) {
      var launchedAnyTask = false
      var launchedTaskAtCurrentMaxLocality = false
      for (currentMaxLocality <- taskSet.myLocalityLevels) {
        do {
          launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet( //更新 tasks 这个resourceOfferSingleTaskSet方法就是 紧邻的上面
            taskSet, currentMaxLocality, shuffledOffers, availableCpus, tasks)
          launchedAnyTask |= launchedTaskAtCurrentMaxLocality
        } while (launchedTaskAtCurrentMaxLocality)
      }
      if (!launchedAnyTask) {
        taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
      }
    }

    if (tasks.size > 0) {
      hasLaunchedTask = true
    }
    return tasks //返回 分配好的 IndexedSeq[ArrayBuffer[TaskDescription]]
  }

  /**
   * Shuffle offers around to avoid always placing tasks on the same workers.  Exposed to allow
   * overriding in tests, so it can be deterministic.
   */
  //shuffled WorkerOffer
  protected def shuffleOffers(offers: IndexedSeq[WorkerOffer]): IndexedSeq[WorkerOffer] = {
    Random.shuffle(offers)
  }

  //更新 某个 task 的执行状态
  //这个方法会在 CoarseGrainedSchedulerbackend 中 当 executor 中的一个 task 运行完成后，会通知 backend 执行 StatusUpdate，接着就会 调用本方法
  //当 这个 任务的 state 是 FINISHED时，
  def statusUpdate(tid: Long, state: TaskState, serializedData: ByteBuffer) {
    var failedExecutor: Option[String] = None
    var reason: Option[ExecutorLossReason] = None
    synchronized {
      try {
        Option(taskIdToTaskSetManager.get(tid)) match {//取得 对应的 taskSetManager
          case Some(taskSet) =>
            if (state == TaskState.LOST) {//TaskLost 情况
              // TaskState.LOST is only used by the deprecated Mesos fine-grained scheduling mode,
              // where each executor corresponds to a single task, so mark the executor as failed.
              val execId = taskIdToExecutorId.getOrElse(tid, throw new IllegalStateException( //获取 对应的 executor
                "taskIdToTaskSetManager.contains(tid) <=> taskIdToExecutorId.contains(tid)"))
              if (executorIdToRunningTaskIds.contains(execId)) {//这个 executor 上是否有运行的 任务
                reason = Some(
                  SlaveLost(s"Task $tid was lost, so marking the executor as lost as well."))
                removeExecutor(execId, reason.get) //移除一个executor，会移除这个exec上的所有的任务，更新 hostToExecutors 信息，hostsByRack，executorIdToHost
                failedExecutor = Some(execId)
              }
            }
            if (TaskState.isFinished(state)) {//Task Finished 情况
              cleanupTaskState(tid) //清理 taskIdToTaskSetManager，taskIdToExecutorId，executorIdToRunningTaskIds 信息
              taskSet.removeRunningTask(tid)  //从 runningTasksSet 和 Poll  中移除这个 task id
              if (state == TaskState.FINISHED) {
                taskResultGetter.enqueueSuccessfulTask(taskSet, tid, serializedData) //调用 taskResultGetter 的 enqueueSuccessfulTask 方法，这个方法中 会调用本类的 handleSuccessfulTask 方法
                //而这个 handleSuccessfulTask 方法中 会调用 TaskSetManager 的 handleSuccessfulTask 方法，TaskSetManager 的 handleSuccessfulTask 中 会 调用 TaskSetManager 的 maybeFinishTaskSet
                //方法，可能这个Stage 的 所有的tasks 都已经完成了
                //同理 下面的 Task FAILED，KILLED，LOST的时候也会 和上面的 调用逻辑类似
              } else if (Set(TaskState.FAILED, TaskState.KILLED, TaskState.LOST).contains(state)) {
                taskResultGetter.enqueueFailedTask(taskSet, tid, state, serializedData)
              }
            }
          case None =>
            logError(
              ("Ignoring update with state %s for TID %s because its task set is gone (this is " +
                "likely the result of receiving duplicate task finished status updates) or its " +
                "executor has been marked as failed.")
                .format(state, tid))
        }
      } catch {
        case e: Exception => logError("Exception in statusUpdate", e)
      }
    }
    // Update the DAGScheduler without holding a lock on this, since that can deadlock
    if (failedExecutor.isDefined) {//如果有 failedExecutor 通知 dagScheduler.executorLost 和 backend
      assert(reason.isDefined)
      dagScheduler.executorLost(failedExecutor.get, reason.get)
      backend.reviveOffers()
    }
  }

  /**
   * Update metrics for in-progress tasks and let the master know that the BlockManager is still
   * alive. Return true if the driver knows about the given block manager. Otherwise, return false,
   * indicating that the block manager should re-register.
   */
  //driver 的 blockManagerMasterEndpoint 中是否已经注册过了这个 blockManagerId
  //这个方法 会 在 HeartbeatReceiver 的 receiveAndReply 的 case heartbeat @ Heartbeat(executorId, accumUpdates, blockManagerId) => 方法中调用
  //主要用来 更新 累加变量
  override def executorHeartbeatReceived(
      execId: String,
      accumUpdates: Array[(Long, Seq[AccumulatorV2[_, _]])],
      blockManagerId: BlockManagerId): Boolean = {
    // (taskId, stageId, stageAttemptId, accumUpdates)
    val accumUpdatesWithTaskIds: Array[(Long, Int, Int, Seq[AccumulableInfo])] = {
      accumUpdates.flatMap { case (id, updates) =>
        val accInfos = updates.map(acc => acc.toInfo(Some(acc.value), None))
        Option(taskIdToTaskSetManager.get(id)).map { taskSetMgr =>
          (id, taskSetMgr.stageId, taskSetMgr.taskSet.stageAttemptId, accInfos)
        }
      }
    }
    //driver de blockManagerMasterEndpoint 中是否已经注册过了这个 blockManagerId
    dagScheduler.executorHeartbeatReceived(execId, accumUpdatesWithTaskIds, blockManagerId)
  }

  //在 TaskResultGetter 的 enqueueSuccessfulTask 方法中 会使用到
  def handleTaskGettingResult(taskSetManager: TaskSetManager, tid: Long): Unit = synchronized {
    taskSetManager.handleTaskGettingResult(tid)
  }

  //在 TaskResultGetter 的 enqueueSuccessfulTask 方法中 会使用到
  def handleSuccessfulTask(
      taskSetManager: TaskSetManager,
      tid: Long,
      taskResult: DirectTaskResult[_]): Unit = synchronized {
    taskSetManager.handleSuccessfulTask(tid, taskResult) //调用 TaskScheduler的 handleSuccessfulTask 方法
  }

  //在 TaskResultGetter 的 enqueueSuccessfulTask 方法中 会使用到
  def handleFailedTask(
      taskSetManager: TaskSetManager,
      tid: Long,
      taskState: TaskState,
      reason: TaskFailedReason): Unit = synchronized {
    taskSetManager.handleFailedTask(tid, taskState, reason)
    if (!taskSetManager.isZombie && !taskSetManager.someAttemptSucceeded(tid)) {
      // Need to revive offers again now that the task set manager state has been updated to
      // reflect failed tasks that need to be re-run.
      backend.reviveOffers()
    }
  }

  def error(message: String) {
    synchronized {
      if (taskSetsByStageIdAndAttempt.nonEmpty) {
        // Have each task set throw a SparkException with the error
        for {
          attempts <- taskSetsByStageIdAndAttempt.values
          manager <- attempts.values
        } {
          try {
            manager.abort(message) //每个 Stage 的 TaskSetManager 手动 taskSetFailed
          } catch {
            case e: Exception => logError("Exception in error callback", e)
          }
        }
      } else {
        // No task sets are active but we still got an error. Just exit since this
        // must mean the error is during registration.
        // It might be good to do something smarter here in the future.
        throw new SparkException(s"Exiting due to error from cluster scheduler: $message")
      }
    }
  }

  //在DAGScheduler中 stop 方法中 调用的
  override def stop() {
    speculationScheduler.shutdown()
    if (backend != null) {
      backend.stop()
    }
    if (taskResultGetter != null) {
      taskResultGetter.stop()
    }
    starvationTimer.cancel()
  }

  //在DAGScheduler 中会使用到
  override def defaultParallelism(): Int = backend.defaultParallelism()

  // Check for speculatable tasks in all our active jobs.
  //检查 spark 推测 任务,本类的定时任务
  def checkSpeculatableTasks() {
    var shouldRevive = false
    synchronized {
      shouldRevive = rootPool.checkSpeculatableTasks(MIN_TIME_TO_SPECULATION) //调用 Pool 的 checkSpeculatableTasks
    }
    if (shouldRevive) {//有可以推测的任务
      backend.reviveOffers() //执行这些任务
    }
  }

  //HeartbeatReceiver 监测这个executor 超时后， taskScheduler 执行 executor lost 相应的 操作
  //在 Backend 中的removeExecutor中会使用到
  override def executorLost(executorId: String, reason: ExecutorLossReason): Unit = {
    var failedExecutor: Option[String] = None

    synchronized {
      if (executorIdToRunningTaskIds.contains(executorId)) {
        val hostPort = executorIdToHost(executorId)
        logExecutorLoss(executorId, hostPort, reason)
        removeExecutor(executorId, reason)
        failedExecutor = Some(executorId)
      } else {
        executorIdToHost.get(executorId) match {
          case Some(hostPort) =>
            // If the host mapping still exists, it means we don't know the loss reason for the
            // executor. So call removeExecutor() to update tasks running on that executor when
            // the real loss reason is finally known.
            logExecutorLoss(executorId, hostPort, reason)
            removeExecutor(executorId, reason)

          case None =>
            // We may get multiple executorLost() calls with different loss reasons. For example,
            // one may be triggered by a dropped connection from the slave while another may be a
            // report of executor termination from Mesos. We produce log messages for both so we
            // eventually report the termination reason.
            logError(s"Lost an executor $executorId (already removed): $reason")
        }
      }
    }
    // Call dagScheduler.executorLost without holding the lock on this to prevent deadlock
    if (failedExecutor.isDefined) {
      dagScheduler.executorLost(failedExecutor.get, reason)
      backend.reviveOffers()
    }
  }
  //在 Backend 中的removeExecutor中会使用到
  override def workerRemoved(workerId: String, host: String, message: String): Unit = {
    logInfo(s"Handle removed worker $workerId: $message")
    dagScheduler.workerRemoved(workerId, host, message)
  }

  //本类自己使用
  private def logExecutorLoss(
      executorId: String,
      hostPort: String,
      reason: ExecutorLossReason): Unit = reason match {
    case LossReasonPending =>
      logDebug(s"Executor $executorId on $hostPort lost, but reason not yet known.")
    case ExecutorKilled =>
      logInfo(s"Executor $executorId on $hostPort killed by driver.")
    case _ =>
      logError(s"Lost executor $executorId on $hostPort: $reason")
  }

  /**
   * Cleans up the TaskScheduler's state for tracking the given task.
   */
  //清理 TaskScheduler 状态等，本类自己使用
  private def cleanupTaskState(tid: Long): Unit = {
    taskIdToTaskSetManager.remove(tid) //清理taskIdToTaskSetManager
    taskIdToExecutorId.remove(tid).foreach { executorId => //清理taskIdToExecutorId 上的 这个任务
      executorIdToRunningTaskIds.get(executorId).foreach { _.remove(tid) }
    }
  }

  /**
   * Remove an executor from all our data structures and mark it as lost. If the executor's loss
   * reason is not yet known, do not yet remove its association with its host nor update the status
   * of any running tasks, since the loss reason defines whether we'll fail those tasks.
   */
  //移除一个executor，会移除这个exec上的所有的任务，更新 hostToExecutors 信息，hostsByRack，executorIdToHost
  //本类的 statusUpdate 和 executorLost 方法中会调用
  private def removeExecutor(executorId: String, reason: ExecutorLossReason) {
    // The tasks on the lost executor may not send any more status updates (because the executor
    // has been lost), so they should be cleaned up here.
    executorIdToRunningTaskIds.remove(executorId).foreach { taskIds => //对这个 executor 上的 task 依次 cleanupTaskState
      logDebug("Cleaning up TaskScheduler state for tasks " +
        s"${taskIds.mkString("[", ",", "]")} on failed executor $executorId")
      // We do not notify the TaskSetManager of the task failures because that will
      // happen below in the rootPool.executorLost() call.
      taskIds.foreach(cleanupTaskState) //清理 TaskScheduler 状态等
    }

    val host = executorIdToHost(executorId) //拿到这个 executor的host
    val execs: mutable.Set[String] = hostToExecutors.getOrElse(host, new HashSet) //更新 hostToExecutors 信息，hostsByRack，executorIdToHost
    execs -= executorId
    if (execs.isEmpty) {
      hostToExecutors -= host
      for (rack <- getRackForHost(host); hosts <- hostsByRack.get(rack)) {
        hosts -= host
        if (hosts.isEmpty) {
          hostsByRack -= rack
        }
      }
    }

    if (reason != LossReasonPending) {
      executorIdToHost -= executorId
      rootPool.executorLost(executorId, host, reason) //
    }
    blacklistTrackerOpt.foreach(_.handleRemovedExecutor(executorId))
  }
//增加一个 executor
  def executorAdded(execId: String, host: String) {
    dagScheduler.executorAdded(execId, host)
  }
  //这个 host上的 executor的 set
  def getExecutorsAliveOnHost(host: String): Option[Set[String]] = synchronized {
    hostToExecutors.get(host).map(_.toSet) //这个 host上的 executor的 set
  }
//这个host是否存在exector
  def hasExecutorsAliveOnHost(host: String): Boolean = synchronized {
    hostToExecutors.contains(host) //host与executor的关系中 是否存在这个 host
  }
//这个rack里面是否有 host存在
  def hasHostAliveOnRack(rack: String): Boolean = synchronized {
    hostsByRack.contains(rack)
  }
//这个execID是否存在
  def isExecutorAlive(execId: String): Boolean = synchronized {
    executorIdToRunningTaskIds.contains(execId) //executor 与running 任务的 关系
  }
//这个ececID是否有task在运行
  def isExecutorBusy(execId: String): Boolean = synchronized {
    executorIdToRunningTaskIds.get(execId).exists(_.nonEmpty)
  }

  /**
   * Get a snapshot of the currently blacklisted nodes for the entire application.  This is
   * thread-safe -- it can be called without a lock on the TaskScheduler.
   */
  //None
  def nodeBlacklist(): scala.collection.immutable.Set[String] = {
    blacklistTrackerOpt.map(_.nodeBlacklist()).getOrElse(scala.collection.immutable.Set())
  }

  // By default, rack is unknown
  def getRackForHost(value: String): Option[String] = None
// 等待 backend ready
  private def waitBackendReady(): Unit = {
    if (backend.isReady) { //如果 backend 已经ready 则直接 return ，否则 等待 backend 就绪
      return
    }
    while (!backend.isReady) {
      // Might take a while for backend to be ready if it is waiting on resources.
      if (sc.stopped.get) {
        // For example: the master removes the application for some reason
        throw new IllegalStateException("Spark context stopped while waiting for backend")
      }
      synchronized {
        this.wait(100)
      }
    }
  }
//获取 applicationID
  override def applicationId(): String = backend.applicationId()
//获取application AttemptID
  override def applicationAttemptId(): Option[String] = backend.applicationAttemptId()
//获取 这个 stageId 的 stageAttemptId 的 TaskSetManager
  private[scheduler] def taskSetManagerForAttempt(
      stageId: Int,
      stageAttemptId: Int): Option[TaskSetManager] = {
    for {
      attempts <- taskSetsByStageIdAndAttempt.get(stageId)
      manager <- attempts.get(stageAttemptId)
    } yield {
      manager
    }
  }

  /**
   * Marks the task has completed in all TaskSetManagers for the given stage.
   *
   * After stage failure and retry, there may be multiple TaskSetManagers for the stage.
   * If an earlier attempt of a stage completes a task, we should ensure that the later attempts
   * do not also submit those same tasks.  That also means that a task completion from an earlier
   * attempt can lead to the entire stage getting marked as successful.
   */
  //标记 这个 stageId 的 partitionId这个task 完成
  //在 TaskSetMananger 中的 handleSuccessfulTask 方法中会调用这个 markPartitionCompletedInAllTaskSets 方法
  private[scheduler] def markPartitionCompletedInAllTaskSets(
      stageId: Int,
      partitionId: Int,
      taskInfo: TaskInfo) = {
    taskSetsByStageIdAndAttempt.getOrElse(stageId, Map()).values.foreach { tsm => //。values 就是所有的 TaskSetManager
      tsm.markPartitionCompleted(partitionId, taskInfo)//调用 TaskSetManager 的 markPartitionCompleted 方法
    }
  }

}

object TaskSchedulerImpl

private[spark] object TaskSchedulerImpl {

  val SCHEDULER_MODE_PROPERTY = "spark.scheduler.mode"

  /**
   * Used to balance containers across hosts.
   *
   * Accepts a map of hosts to resource offers for that host, and returns a prioritized list of
   * resource offers representing the order in which the offers should be used. The resource
   * offers are ordered such that we'll allocate one container on each host before allocating a
   * second container on any host, and so on, in order to reduce the damage if a host fails.
   *
   * For example, given {@literal }, {@literal } and
   * {@literal }, returns {@literal [o1, o5, o4, o2, o6, o3]}.
   */
  def prioritizeContainers[K, T] (map: HashMap[K, ArrayBuffer[T]]): List[T] = {
    val _keyList = new ArrayBuffer[K](map.size)
    _keyList ++= map.keys

    // order keyList based on population of value in map
    val keyList = _keyList.sortWith(
      (left, right) => map(left).size > map(right).size
    )

    val retval = new ArrayBuffer[T](keyList.size * 2)
    var index = 0
    var found = true

    while (found) {
      found = false
      for (key <- keyList) {
        val containerList: ArrayBuffer[T] = map.getOrElse(key, null)
        assert(containerList != null)
        // Get the index'th entry for this host - if present
        if (index < containerList.size) {
          retval += containerList.apply(index)
          found = true
        }
      }
      index += 1
    }

    retval.toList
  }

  private def maybeCreateBlacklistTracker(sc: SparkContext): Option[BlacklistTracker] = {
    if (BlacklistTracker.isBlacklistEnabled(sc.conf)) {//默认关闭
      val executorAllocClient: Option[ExecutorAllocationClient] = sc.schedulerBackend match {
        case b: ExecutorAllocationClient => Some(b)
        case _ => None
      }
      Some(new BlacklistTracker(sc, executorAllocClient))
    } else {
      None
    }
  }

}

TaskSetManager

class TaskSetManager

在 TaskSchedulerImpl 的createTaskSetManager 方法中会 new 这个对象
所以一般会在 TaskSchedulerImpl 使用这个类的方法
管理一个Stage 的 tasks.

//在 TaskSchedulerImpl 的createTaskSetManager 方法中 会 new 这个对象
//所以一般 会在 TaskSchedulerImpl 使用这个类的方法
//管理 一个Stage 的 tasks
private[spark] class TaskSetManager(
    sched: TaskSchedulerImpl,
    val taskSet: TaskSet, //这个Stage的任务 集合，由DAGScheduler 产生且传入进来
    val maxTaskFailures: Int,
    blacklistTracker: Option[BlacklistTracker] = None, //spark 的 黑名单 默认关闭的话 blacklistTrackerOpt 是 None
    clock: Clock = new SystemClock()) extends Schedulable with Logging {

  private val conf = sched.sc.conf

  // SPARK-21563 make a copy of the jars/files so they are consistent across the TaskSet
  private val addedJars = HashMap[String, Long](sched.sc.addedJars.toSeq: _*)
  private val addedFiles = HashMap[String, Long](sched.sc.addedFiles.toSeq: _*)

  // Quantile of tasks at which to start speculation
  val SPECULATION_QUANTILE = conf.getDouble("spark.speculation.quantile", 0.75)
  val SPECULATION_MULTIPLIER = conf.getDouble("spark.speculation.multiplier", 1.5)

  // Limit of bytes for total size of results (default is 1GB)
  val maxResultSize = Utils.getMaxResultSize(conf) // spark.driver.maxResultSize 的限制

  val speculationEnabled = conf.getBoolean("spark.speculation", false) //spark 推测 机制 是否开启

  // Serializer for closures and tasks.
  val env = SparkEnv.get
  val ser = env.closureSerializer.newInstance() //序列化器 默认 java 序列化

  val tasks: Array[Task[_]] = taskSet.tasks
  private[scheduler] val partitionToIndex: Map[Int, Int] = tasks.zipWithIndex
    .map { case (t, idx) => t.partitionId -> idx }.toMap //task 的 partitionId和index 的map
  val numTasks = tasks.length //task的长度
  val copiesRunning = new Array[Int](numTasks) //记录这个 task 正在运行的 数量

  // For each task, tracks whether a copy of the task has succeeded. A task will also be
  // marked as "succeeded" if it failed with a fetch failure, in which case it should not
  // be re-run because the missing map data needs to be regenerated first.
  val successful = new Array[Boolean](numTasks) //成功任务结果的 统计 数组
  private val numFailures = new Array[Int](numTasks)//失败任务结果的 统计 数组

  // Add the tid of task into this HashSet when the task is killed by other attempt tasks.
  // This happened while we set the `spark.speculation` to true. The task killed by others
  // should not resubmit while executor lost.
  private val killedByOtherAttempt = new HashSet[Long] //被 kill 的
  //TaskInfo ： 一个 task 的描述 包括 taskID，index，attemptNum，executorid，host，task本地行，是否可以推断 等属性
  val taskAttempts: Array[List[TaskInfo]] = Array.fill[List[TaskInfo]](numTasks)(Nil)
  private[scheduler] var tasksSuccessful = 0

  val weight = 1
  val minShare = 0
  var priority = taskSet.priority
  var stageId = taskSet.stageId
  val name = "TaskSet_" + taskSet.id
  var parent: Pool = null
  private var totalResultSize = 0L
  private var calculatedTasks = 0

  private[scheduler] val taskSetBlacklistHelperOpt: Option[TaskSetBlacklist] = { ////spark 的 黑名单 默认关闭的话 blacklistTrackerOpt 是 None ；这个也是为 None
    blacklistTracker.map { _ => //spark 的 黑名单 默认关闭的话 blacklistTrackerOpt 是 None
      new TaskSetBlacklist(conf, stageId, clock)
    }
  }

  private[scheduler] val runningTasksSet = new HashSet[Long] //正在running task 的set
  //这个方法在 TaskSchedulerImpl 中会被使用
  override def runningTasks: Int = runningTasksSet.size //正在running task 的数量
//返回 成功任务结果的 统计 数组 中的这个 tid 的状态，在 TaskSchedulerImpl 中会被使用
  def someAttemptSucceeded(tid: Long): Boolean = {
    successful(taskInfos(tid).index) //taskInfos = taskID 和 TaskInfo 的映射关系； successful = 成功任务结果的 统计 数组
  }

  // True once no more tasks should be launched for this task set manager. TaskSetManagers enter
  // the zombie state once at least one attempt of each task has completed successfully, or if the
  // task set is aborted (for example, because it was killed).  TaskSetManagers remain in the zombie
  // state until all tasks have finished running; we keep TaskSetManagers that are in the zombie
  // state in order to continue to track and account for the running tasks.
  // TODO: We should kill any running task attempts when the task set manager becomes a zombie.
  private[scheduler] var isZombie = false

  // Set of pending tasks for each executor. These collections are actually
  // treated as stacks, in which new tasks are added to the end of the
  // ArrayBuffer and removed from the end. This makes it faster to detect
  // tasks that repeatedly fail because whenever a task failed, it is put
  // back at the head of the stack. These collections may contain duplicates
  // for two reasons:
  // (1): Tasks are only removed lazily; when a task is launched, it remains
  // in all the pending lists except the one that it was launched from.
  // (2): Tasks may be re-added to these lists multiple times as a result
  // of failures.
  // Duplicates are handled in dequeueTaskFromList, which ensures that a
  // task hasn't already started running before launching it.
  private val pendingTasksForExecutor = new HashMap[String, ArrayBuffer[Int]] //pending的 executor  和 task 的 map

  // Set of pending tasks for each host. Similar to pendingTasksForExecutor,
  // but at host level.
  private val pendingTasksForHost = new HashMap[String, ArrayBuffer[Int]] //pending 的 host 和 task 的 map

  // Set of pending tasks for each rack -- similar to the above.
  private val pendingTasksForRack = new HashMap[String, ArrayBuffer[Int]] //pending 的 rack 和 task 的 map

  // Set containing pending tasks with no locality preferences.
  private[scheduler] var pendingTasksWithNoPrefs = new ArrayBuffer[Int] //pending 的 无特性 和 task 的 array

  // Set containing all pending tasks (also used as a stack, as above).
  private val allPendingTasks = new ArrayBuffer[Int] //所有 pending的 task的 array

  // Tasks that can be speculated. Since these will be a small fraction of total
  // tasks, we'll just hold them in a HashSet.
  private[scheduler] val speculatableTasks = new HashSet[Int] //推测 任务的 set集合

  // Task index, start and finish time for each task attempt (indexed by task ID)
  private[scheduler] val taskInfos = new HashMap[Long, TaskInfo] //taskID 和 TaskInfo 的映射关系

  // Use a MedianHeap to record durations of successful tasks so we know when to launch
  // speculative tasks. This is only used when speculation is enabled, to avoid the overhead
  // of inserting into the heap when the heap won't be used.
  val successfulTaskDurations = new MedianHeap() //记录 推测的任务 durations

  // How frequently to reprint duplicate exceptions in full, in milliseconds
  val EXCEPTION_PRINT_INTERVAL =
    conf.getLong("spark.logging.exceptionPrintInterval", 10000)

  // Map of recent exceptions (identified by string representation and top stack frame) to
  // duplicate count (how many times the same exception has appeared) and time the full exception
  // was printed. This should ideally be an LRU map that can drop old exceptions automatically.
  private val recentExceptions = HashMap[String, (Int, Long)]()

  // Figure out the current map output tracker epoch and set it on all tasks
  val epoch = sched.mapOutputTracker.getEpoch
  logDebug("Epoch for " + taskSet + ": " + epoch)
  for (t <- tasks) {
    t.epoch = epoch
  }

  // Add all our tasks to the pending lists. We do this in reverse order
  // of task index so that tasks with low indices get launched first.
  for (i <- (0 until numTasks).reverse) {
    addPendingTask(i)
  }

  /**
   * Track the set of locality levels which are valid given the tasks locality preferences and
   * the set of currently available executors.  This is updated as executors are added and removed.
   * This allows a performance optimization, of skipping levels that aren't relevant (eg., skip
   * PROCESS_LOCAL if no tasks could be run PROCESS_LOCAL for the current set of executors).
   */
  //计算 有效的 task 本地特性，一般的 levels 除了 RACK_LOCAL都会有的
  private[scheduler] var myLocalityLevels = computeValidLocalityLevels()

  // Time to wait at each level
  //不同 级别的 task 本地性 等待时间 spark.locality.wait 默认 3s
  private[scheduler] var localityWaits: Array[Long] = myLocalityLevels.map(getLocalityWait)

  // Delay scheduling variables: we keep track of our current locality level and the time we
  // last launched a task at that level, and move up a level when localityWaits[curLevel] expires.
  // We then move down if we manage to launch a "more local" task.
  private var currentLocalityIndex = 0 // Index of our current locality level in validLocalityLevels
  private var lastLaunchTime = clock.getTimeMillis()  // Time we last launched a task at this level

  override def schedulableQueue: ConcurrentLinkedQueue[Schedulable] = null

  override def schedulingMode: SchedulingMode = SchedulingMode.NONE

  private[scheduler] var emittedTaskSizeWarning = false

  /** Add a task to all the pending-task lists that it should be on. */
  //add 一个 pending的任务 根据 task的本地性 放到尽量放到 队executor，host，rack， NoPrefs列中
  //这个方法在 本对象初始化的时候 已经调用过了
  private[spark] def addPendingTask(index: Int) {
    for (loc <- tasks(index).preferredLocations) { //获取 task的 本地特性
      loc match {
        case e: ExecutorCacheTaskLocation => //同一个 executor
          pendingTasksForExecutor.getOrElseUpdate(e.executorId, new ArrayBuffer) += index // pendingTasksForExecutor = //pending的 executor  和 task 的 map 增加这个的 task
        case e: HDFSCacheTaskLocation => //同一个 host 且可以在 hdfs上 缓存
          val exe = sched.getExecutorsAliveOnHost(loc.host) //这个 host上的 executor的 set
          exe match {
            case Some(set) =>
              for (e <- set) {//遍历这些 executor
                pendingTasksForExecutor.getOrElseUpdate(e, new ArrayBuffer) += index // pendingTasksForExecutor = //pending的 executor  和 task 的 map 增加这个的 task
              }
              logInfo(s"Pending task $index has a cached location at ${e.host} " +
                ", where there are executors " + set.mkString(","))
            case None => logDebug(s"Pending task $index has a cached location at ${e.host} " +
                ", but there are no executors alive there.")
          }
        case _ =>
      }
      pendingTasksForHost.getOrElseUpdate(loc.host, new ArrayBuffer) += index //pending 的 host 和 task 的 map
      for (rack <- sched.getRackForHost(loc.host)) { //获取所在的 机架
        pendingTasksForRack.getOrElseUpdate(rack, new ArrayBuffer) += index //pending 的 rack 和 task 的 map
      }
    }

    if (tasks(index).preferredLocations == Nil) {// 本地特性不存在
      pendingTasksWithNoPrefs += index //pending 的 无特性 和 task 的 map
    }
    //所有 pending的 task的 array
    allPendingTasks += index  // No point scanning this whole list to find the old task there
  }

  /**
   * Return the pending tasks list for a given executor ID, or an empty list if
   * there is no map entry for that host
   */
  //获取 这个 executor 上的 pending的 tasks
  private def getPendingTasksForExecutor(executorId: String): ArrayBuffer[Int] = {
    pendingTasksForExecutor.getOrElse(executorId, ArrayBuffer())
  }

  /**
   * Return the pending tasks list for a given host, or an empty list if
   * there is no map entry for that host
   */
  //获取这个 host 上的 pending 的 tasks
  private def getPendingTasksForHost(host: String): ArrayBuffer[Int] = {
    pendingTasksForHost.getOrElse(host, ArrayBuffer())
  }

  /**
   * Return the pending rack-local task list for a given rack, or an empty list if
   * there is no map entry for that rack
   */
  //获取这个 rack 上的 pending 的 tasks
  private def getPendingTasksForRack(rack: String): ArrayBuffer[Int] = {
    pendingTasksForRack.getOrElse(rack, ArrayBuffer())
  }

  /**
   * Dequeue a pending task from the given list and return its index.
   * Return None if the list is empty.
   * This method also cleans up any tasks in the list that have already
   * been launched, since we want that to happen lazily.
   */
  //在这个 host的executor的pending tasks 中 从后向前 拿到 没有运行成功和 没有copyRun的 task index
  private def dequeueTaskFromList(
      execId: String,
      host: String,
      list: ArrayBuffer[Int]): Option[Int] = {
    var indexOffset = list.size
    while (indexOffset > 0) {
      indexOffset -= 1
      val index = list(indexOffset) //拿到尾端元素
      if (!isTaskBlacklistedOnExecOrNode(index, execId, host)) { //spark 黑名单机制没有开启的话，isTaskBlacklistedOnExecOrNode 方法返回的是 false
        // This should almost always be list.trimEnd(1) to remove tail
        list.remove(indexOffset) //去处尾端 元素
        if (copiesRunning(index) == 0 && !successful(index)) {//这个任务 copiesRunning 状态是0  和 没有运行成功 ， copiesRunning 表示 在运行和完成后 它一直是 有值的，
          //所以这里通过 copiesRunning(index) == 0 来过滤 正在运行和已经完成的 task
          return Some(index) //返回这个 task
        }
      }
    }
    None
  }

  /** Check whether a task is currently running an attempt on a given host */
  //指定的 taskAttempts 里面的 这个 list的 TaskInfo 里面 是否存在 这个 host的 taskInfo
  private def hasAttemptOnHost(taskIndex: Int, host: String): Boolean = {
    val x: Seq[TaskInfo] = taskAttempts(taskIndex)
    taskAttempts(taskIndex).exists(_.host == host) //这个 list的 TaskInfo 里面 是否存在 这个 host的 taskInfo
  }
//spark 黑名单机制 没有开启的话，taskSetBlacklistHelperOpt 应该是 None，所以这里应该返回false
  private def isTaskBlacklistedOnExecOrNode(index: Int, execId: String, host: String): Boolean = {
    taskSetBlacklistHelperOpt.exists { blacklist =>
      blacklist.isNodeBlacklistedForTask(host, index) ||
        blacklist.isExecutorBlacklistedForTask(execId, index)
    }
  }

  /**
   * Return a speculative task for a given executor if any are available. The task should not have
   * an attempt running on this host, in case the host is slow. In addition, the task should meet
   * the given locality constraint.
   */
  // Labeled as protected to allow tests to override providing speculative tasks if necessary
  // 处理推测task array 没有完成的 tasks，根据 task 本地化特性 在不同的 host 上启动推测 任务
  protected def dequeueSpeculativeTask(execId: String, host: String, locality: TaskLocality.Value)
    : Option[(Int, TaskLocality.Value)] =
  { //  推测 任务的 set集合 去掉 成功的 任务
    speculatableTasks.retain(index => !successful(index)) // Remove finished tasks from set

    def canRunOnHost(index: Int): Boolean = {
      !hasAttemptOnHost(index, host) &&
        !isTaskBlacklistedOnExecOrNode(index, execId, host) //默认 isTaskBlacklistedOnExecOrNode 返回false ，所以 有 这个任务没有在这个 host上运行的话 整个返回 true
    }

    if (!speculatableTasks.isEmpty) {//推测 任务的 set集合 中还有元素
      // Check for process-local tasks; note that tasks can be process-local
      // on multiple nodes when we replicate cached blocks, as in Spark Streaming
      for (index <- speculatableTasks if canRunOnHost(index)) {//过滤 没有在 这个 host上 运行的这个 index
        val prefs: Seq[TaskLocation] = tasks(index).preferredLocations //task 本地性
        val executors: Seq[String] = prefs.flatMap(_ match { //拿到这个 任务 可能 在 同一个 executor 启动的 所有的 executors
          case e: ExecutorCacheTaskLocation => Some(e.executorId)
          case _ => None
        });
        if (executors.contains(execId)) { //去除掉已经 在运行的这个 executor
          speculatableTasks -= index
          return Some((index, TaskLocality.PROCESS_LOCAL))
        }
      }

      // Check for node-local tasks
      if (TaskLocality.isAllowed(locality, TaskLocality.NODE_LOCAL)) {
        for (index <- speculatableTasks if canRunOnHost(index)) {
          val locations: Seq[String] = tasks(index).preferredLocations.map(_.host)
          if (locations.contains(host)) {
            speculatableTasks -= index
            return Some((index, TaskLocality.NODE_LOCAL)) //是否可以在 同一个 host的不同executor 上启动这个 任务
          }
        }
      }

      // Check for no-preference tasks
      if (TaskLocality.isAllowed(locality, TaskLocality.NO_PREF)) {
        for (index <- speculatableTasks if canRunOnHost(index)) {
          val locations = tasks(index).preferredLocations
          if (locations.size == 0) {
            speculatableTasks -= index
            return Some((index, TaskLocality.PROCESS_LOCAL)) //没有 本地性的偏好 则默认是 PROCESS_LOCAL
          }
        }
      }

      // Check for rack-local tasks
      if (TaskLocality.isAllowed(locality, TaskLocality.RACK_LOCAL)) {
        for (rack <- sched.getRackForHost(host)) {
          for (index <- speculatableTasks if canRunOnHost(index)) {
            val racks = tasks(index).preferredLocations.map(_.host).flatMap(sched.getRackForHost)
            if (racks.contains(rack)) {
              speculatableTasks -= index
              return Some((index, TaskLocality.RACK_LOCAL)) // 同一个 机架
            }
          }
        }
      }

      // Check for non-local tasks
      if (TaskLocality.isAllowed(locality, TaskLocality.ANY)) {
        for (index <- speculatableTasks if canRunOnHost(index)) {
          speculatableTasks -= index
          return Some((index, TaskLocality.ANY)) //任何位置
        }
      }
    }

    None
  }

  /**
   * Dequeue a pending task for a given node and return its index and locality level.
   * Only search for tasks matching the given locality constraint.
   *
   * @return An option containing (task index within the task set, locality, is speculative?)
   */
  //先 处理 pennding，最后处理 推测 tasks
  private def dequeueTask(execId: String, host: String, maxLocality: TaskLocality.Value)
    : Option[(Int, TaskLocality.Value, Boolean)] =
  {
    for (index <- dequeueTaskFromList(execId, host, getPendingTasksForExecutor(execId))) {
      // getPendingTasksForExecutor 获取 这个 executor 上的 pending的 tasks
      // 在这个 host的executor的pending tasks 中 从后向前 拿到 没有运行成功和 没有copyRun的 task index
      //这个任务 就是 PROCESS_LOCAL 级别的
      return Some((index, TaskLocality.PROCESS_LOCAL, false))
    }

    if (TaskLocality.isAllowed(maxLocality, TaskLocality.NODE_LOCAL)) { //如果允许 NODE_LOCAL 级别的话
      for (index <- dequeueTaskFromList(execId, host, getPendingTasksForHost(host))) {
        //getPendingTasksForHost 获取这个 host 上的 pending 的 tasks
        // 在这个 host的executor的pending tasks 中 从后向前 拿到 没有运行成功和 没有copyRun的 task index
        //这个任务 就是 NODE_LOCAL 级别的
        return Some((index, TaskLocality.NODE_LOCAL, false))
      }
    }

    if (TaskLocality.isAllowed(maxLocality, TaskLocality.NO_PREF)) {//如果允许 NO_PREF 级别的话
      // Look for noPref tasks after NODE_LOCAL for minimize cross-rack traffic
      for (index <- dequeueTaskFromList(execId, host, pendingTasksWithNoPrefs)) { //pendingTasksWithNoPrefs = pending 的 无特性 和 task 的 array
        //在这个 host的executor的pending tasks 中 从后向前 拿到 没有运行成功和 没有copyRun的 task index
        //这个任务 就是 PROCESS_LOCAL 级别的
        return Some((index, TaskLocality.PROCESS_LOCAL, false))
      }
    }

    if (TaskLocality.isAllowed(maxLocality, TaskLocality.RACK_LOCAL)) {//如果允许 RACK_LOCAL 级别的话
      for {
        rack <- sched.getRackForHost(host) //默认的机架的 位置信息 是 None
        index <- dequeueTaskFromList(execId, host, getPendingTasksForRack(rack))
      } {
        return Some((index, TaskLocality.RACK_LOCAL, false))
      }
    }

    if (TaskLocality.isAllowed(maxLocality, TaskLocality.ANY)) {//如果允许 ANY 级别的话
      for (index <- dequeueTaskFromList(execId, host, allPendingTasks)) {
        //allPendingTasks = 从 所有的 pending task 的 array中
        //在这个 host的executor的pending tasks 中 从后向前 拿到 没有运行成功和 没有copyRun的 task index
        //这个任务 就是 ANY 级别的
        return Some((index, TaskLocality.ANY, false))
      }
    }

    // find a speculative task if all others tasks have been scheduled
    //如果 已经 走到这一步的时候，说明所有的 pending的任务 都运行起来了，这个时候 就会启动 推测 任务
    // 处理推测task array 没有完成的 tasks，根据 task 本地化特性 在不同的 host 上启动推测 任务
    dequeueSpeculativeTask(execId, host, maxLocality).map {
      case (taskIndex, allowedLocality) => (taskIndex, allowedLocality, true)}
  }

  /**
   * Respond to an offer of a single executor from the scheduler by finding a task
   *
   * NOTE: this function is either called with a maxLocality which
   * would be adjusted by delay scheduling algorithm or it will be with a special
   * NO_PREF locality which will be not modified
   *
   * @param execId the executor Id of the offered resource
   * @param host  the host Id of the offered resource
   * @param maxLocality the maximum locality we want to schedule the tasks at
   */
  @throws[TaskNotSerializableException]
  //在 此 host 的 execId 的maxLocality 条件下 调度任务，返回 TaskDescription 信息
  //这个方法 会在 TaskSchedulerImpl 的 resourceOfferSingleTaskSet 方法中调用
  def resourceOffer(
      execId: String,
      host: String,
      maxLocality: TaskLocality.TaskLocality)
    : Option[TaskDescription] =
  {
    val offerBlacklisted = taskSetBlacklistHelperOpt.exists { blacklist => //没有开启 spark 的 黑名单机制的话 是 false
      blacklist.isNodeBlacklistedForTaskSet(host) ||
        blacklist.isExecutorBlacklistedForTaskSet(execId)
    }
    if (!isZombie && !offerBlacklisted) {//isZombie 是 false 是正常状态，所以一般这里 是 true
      val curTime = clock.getTimeMillis()

      var allowedLocality = maxLocality

      if (maxLocality != TaskLocality.NO_PREF) {
        allowedLocality = getAllowedLocalityLevel(curTime)//获取 此刻 的 允许的 task 本地化级别（有pending的任务的 task 本地化特性） 由 PROCESS_LOCAL-》NODE_LOCAL-》NO_PREF-》RACK_LOCAL
        if (allowedLocality > maxLocality) {//如果 allowedLocality 的比  maxLocality的 宽松
          // We're not allowed to search for farther-away tasks
          allowedLocality = maxLocality //更新 allowedLocality，因为 maxLocality 这个是要求的 特性，更接近 数据的位置
        }
      }
      //dequeueTask 先 处理 pennding，最后处理 推测 tasks
      dequeueTask(execId, host, allowedLocality).map { case ((index, taskLocality, speculative)) =>
        // Found a task; do some bookkeeping and return a task description
        val task: Task[_] = tasks(index) //拿到这个 task，这里的task是 一个Stage的TaskS 中的task
        val taskId = sched.newTaskId() //生成taskID
        // Do various bookkeeping
        copiesRunning(index) += 1 //记录这个 task 正在运行的 数量
        val attemptNum = taskAttempts(index).size //拿到这个 array 的长度。默认 一个
        val info = new TaskInfo(taskId, index, attemptNum, curTime,
          execId, host, taskLocality, speculative)
        taskInfos(taskId) = info //更新 taskInfos 信息，在 handleSuccessfulTask 和 handleFailedTask 方法中 以便可以获取到
        taskAttempts(index) = info :: taskAttempts(index)
        // Update our locality level for delay scheduling
        // NO_PREF will not affect the variables related to delay scheduling
        if (maxLocality != TaskLocality.NO_PREF) {
          currentLocalityIndex = getLocalityIndex(taskLocality)
          lastLaunchTime = curTime
        }
        // Serialize and return the task
        val serializedTask: ByteBuffer = try {
          ser.serialize(task)
        } catch {
          // If the task cannot be serialized, then there's no point to re-attempt the task,
          // as it will always fail. So just abort the whole task-set.
          case NonFatal(e) =>
            val msg = s"Failed to serialize task $taskId, not attempting to retry it."
            logError(msg, e)
            abort(s"$msg Exception during serialization: $e")
            throw new TaskNotSerializableException(e)
        }
        if (serializedTask.limit() > TaskSetManager.TASK_SIZE_TO_WARN_KB * 1024 &&
          !emittedTaskSizeWarning) {
          emittedTaskSizeWarning = true
          logWarning(s"Stage ${task.stageId} contains a task of very large size " +
            s"(${serializedTask.limit() / 1024} KB). The maximum recommended task size is " +
            s"${TaskSetManager.TASK_SIZE_TO_WARN_KB} KB.") //task 序列化后的大小警告限制 100K
        }
        addRunningTask(taskId)  //更新runningTasksSet 和 Poll的 runningTasks

        // We used to log the time it takes to serialize the task, but task size is already
        // a good proxy to task serialization time.
        // val timeTaken = clock.getTime() - startTime
        val taskName = s"task ${info.id} in stage ${taskSet.id}" //某个Stage中的某个task
        logInfo(s"Starting $taskName (TID $taskId, $host, executor ${info.executorId}, " +
          s"partition ${task.partitionId}, $taskLocality, ${serializedTask.limit()} bytes)")

        sched.dagScheduler.taskStarted(task, info)
        new TaskDescription( //返回 这个task描述类的对象
          taskId,
          attemptNum,
          execId,
          taskName,
          index,
          addedFiles,
          addedJars,
          task.localProperties,
          serializedTask)
      }
    } else {
      None
    }
  }

  // 可能 Stage tasks 已经 运行完成，这个方法 在本类内部调用
  private def maybeFinishTaskSet() {
    if (isZombie && runningTasks == 0) {//当isZombie 为 true 和 runningTasks 的数目是 0 的时候，表示 这个Stage的所有Tasks 都已经完成了
      sched.taskSetFinished(this)//调用 TaskSchedulerImpl 的 taskSetFinished 方法
      if (tasksSuccessful == numTasks) {
        blacklistTracker.foreach(_.updateBlacklistForSuccessfulTaskSet(
          taskSet.stageId,
          taskSet.stageAttemptId,
          taskSetBlacklistHelperOpt.get.execToFailures))
      }
    }
  }

  /**
   * Get the level we can launch tasks according to delay scheduling, based on current wait time.
   */
  //获取 此刻 的 允许的 task 本地化级别（有pending的任务的 task 本地化特性） 由 PROCESS_LOCAL-》NODE_LOCAL-》NO_PREF-》RACK_LOCAL
  private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = {
    // Remove the scheduled or finished tasks lazily
    //pendingTaskIds 移除已经 running 和 完成的 task，遇到 没有在 running 和 已经运行完成 的 话 返回 true
    def tasksNeedToBeScheduledFrom(pendingTaskIds: ArrayBuffer[Int]): Boolean = {
      var indexOffset = pendingTaskIds.size
      while (indexOffset > 0) {
        indexOffset -= 1
        val index = pendingTaskIds(indexOffset)
        if (copiesRunning(index) == 0 && !successful(index)) { //这个 任务 没有在 running 和 已经运行完成 的 话 返回 true
          return true
        } else {
          pendingTaskIds.remove(indexOffset) //移除已经 running 和 完成的 task
        }
      }
      false
    }
    // Walk through the list of tasks that can be scheduled at each location and returns true
    // if there are any tasks that still need to be scheduled. Lazily cleans up tasks that have
    // already been scheduled.
    //清理这个 pendingTasks 的 非pending tasks， 返回是否还有 pending的 任务
    def moreTasksToRunIn(pendingTasks: HashMap[String, ArrayBuffer[Int]]): Boolean = {
      val emptyKeys = new ArrayBuffer[String]
      val hasTasks: Boolean = pendingTasks.exists {
        case (id: String, tasks: ArrayBuffer[Int]) =>
          if (tasksNeedToBeScheduledFrom(tasks)) { // tasksNeedToBeScheduledFrom pendingTaskIds 移除已经 running 和 完成的 task，遇到 没有在 running 和 已经运行完成 的 话 返回 true
            true
          } else {
            emptyKeys += id //这个 key下面 任务都 没有在 pending的话  加入到 emptyKeys
            false
          }
      }
      // The key could be executorId, host or rackId
      emptyKeys.foreach(id => pendingTasks.remove(id)) //清理这个 pendingTasks 的 value 是空的 key-value
      hasTasks //这里返回是 true 的话 说明 还有 pending的任务
    }

    while (currentLocalityIndex < myLocalityLevels.length - 1) {// currentLocalityIndex 从 0 开始
      val moreTasks: Boolean = myLocalityLevels(currentLocalityIndex) match {
        case TaskLocality.PROCESS_LOCAL => moreTasksToRunIn(pendingTasksForExecutor)//同一个 executor的 pending队列 ； moreTasksToRunIn返回是true的话，说明 还有 pending的任务
        case TaskLocality.NODE_LOCAL => moreTasksToRunIn(pendingTasksForHost)//同一个 host的pending对列
        case TaskLocality.NO_PREF => pendingTasksWithNoPrefs.nonEmpty //没有 任务本地性偏好的 tasks
        case TaskLocality.RACK_LOCAL => moreTasksToRunIn(pendingTasksForRack)
      }
      if (!moreTasks) {//这里的话，就是 没有pending的任务了
        // This is a performance optimization: if there are no more tasks that can
        // be scheduled at a particular locality level, there is no point in waiting
        // for the locality wait timeout (SPARK-4939).
        lastLaunchTime = curTime //更新 lastLaunchTime
        logDebug(s"No tasks for locality level ${myLocalityLevels(currentLocalityIndex)}, " +
          s"so moving to locality level ${myLocalityLevels(currentLocalityIndex + 1)}")
        currentLocalityIndex += 1
      } else if (curTime - lastLaunchTime >= localityWaits(currentLocalityIndex)) { //如果 curTime - lastLaunchTime差值 超过 设置的 （task 本地性 等待时间 spark.locality.wait ）
        // Jump to the next locality level, and reset lastLaunchTime so that the next locality
        // wait timer doesn't immediately expire
        lastLaunchTime += localityWaits(currentLocalityIndex) //更新 lastLaunchTime
        logDebug(s"Moving to ${myLocalityLevels(currentLocalityIndex + 1)} after waiting for " +
          s"${localityWaits(currentLocalityIndex)}ms")
        currentLocalityIndex += 1
      } else {
        return myLocalityLevels(currentLocalityIndex) //这个级别的 TaskLocality
      }
    }
    myLocalityLevels(currentLocalityIndex)//这个级别的 TaskLocality
  }

  /**
   * Find the index in myLocalityLevels for a given locality. This is also designed to work with
   * localities that are not in myLocalityLevels (in case we somehow get those) by returning the
   * next-biggest level we have. Uses the fact that the last value in myLocalityLevels is ANY.
   */
  //获取 locality 在 myLocalityLevels 中的 index
  def getLocalityIndex(locality: TaskLocality.TaskLocality): Int = {
    var index = 0
    while (locality > myLocalityLevels(index)) {
      index += 1
    }
    index
  }

  /**
   * Check whether the given task set has been blacklisted to the point that it can't run anywhere.
   *
   * It is possible that this taskset has become impossible to schedule *anywhere* due to the
   * blacklist.  The most common scenario would be if there are fewer executors than
   * spark.task.maxFailures. We need to detect this so we can fail the task set, otherwise the job
   * will hang.
   *
   * There's a tradeoff here: we could make sure all tasks in the task set are schedulable, but that
   * would add extra time to each iteration of the scheduling loop. Here, we take the approach of
   * making sure at least one of the unscheduled tasks is schedulable. This means we may not detect
   * the hang as quickly as we could have, but we'll always detect the hang eventually, and the
   * method is faster in the typical case. In the worst case, this method can take
   * O(maxTaskFailures + numTasks) time, but it will be faster when there haven't been any task
   * failures (this is because the method picks one unscheduled task, and then iterates through each
   * executor until it finds one that the task isn't blacklisted on).
   */
  private[scheduler] def abortIfCompletelyBlacklisted(//spark 的 黑名单 默认关闭的话 blacklistTrackerOpt 是 None ；这个也是为 None
      hostToExecutors: HashMap[String, HashSet[String]]): Unit = {
    taskSetBlacklistHelperOpt.foreach { taskSetBlacklist =>
      val appBlacklist = blacklistTracker.get
      // Only look for unschedulable tasks when at least one executor has registered. Otherwise,
      // task sets will be (unnecessarily) aborted in cases when no executors have registered yet.
      if (hostToExecutors.nonEmpty) {
        // find any task that needs to be scheduled
        val pendingTask: Option[Int] = {
          // usually this will just take the last pending task, but because of the lazy removal
          // from each list, we may need to go deeper in the list.  We poll from the end because
          // failed tasks are put back at the end of allPendingTasks, so we're more likely to find
          // an unschedulable task this way.
          val indexOffset = allPendingTasks.lastIndexWhere { indexInTaskSet =>
            copiesRunning(indexInTaskSet) == 0 && !successful(indexInTaskSet)
          }
          if (indexOffset == -1) {
            None
          } else {
            Some(allPendingTasks(indexOffset))
          }
        }

        pendingTask.foreach { indexInTaskSet =>
          // try to find some executor this task can run on.  Its possible that some *other*
          // task isn't schedulable anywhere, but we will discover that in some later call,
          // when that unschedulable task is the last task remaining.
          val blacklistedEverywhere = hostToExecutors.forall { case (host, execsOnHost) =>
            // Check if the task can run on the node
            val nodeBlacklisted =
              appBlacklist.isNodeBlacklisted(host) ||
                taskSetBlacklist.isNodeBlacklistedForTaskSet(host) ||
                taskSetBlacklist.isNodeBlacklistedForTask(host, indexInTaskSet)
            if (nodeBlacklisted) {
              true
            } else {
              // Check if the task can run on any of the executors
              execsOnHost.forall { exec =>
                appBlacklist.isExecutorBlacklisted(exec) ||
                  taskSetBlacklist.isExecutorBlacklistedForTaskSet(exec) ||
                  taskSetBlacklist.isExecutorBlacklistedForTask(exec, indexInTaskSet)
              }
            }
          }
          if (blacklistedEverywhere) {
            val partition = tasks(indexInTaskSet).partitionId
            abort(s"""
              |Aborting $taskSet because task $indexInTaskSet (partition $partition)
              |cannot run anywhere due to node and executor blacklist.
              |Most recent failure:
              |${taskSetBlacklist.getLatestFailureReason}
              |
              |Blacklisting behavior can be configured via spark.blacklist.*.
              |""".stripMargin)
          }
        }
      }
    }
  }

  /**
   * Marks the task as getting result and notifies the DAG Scheduler
   */
  def handleTaskGettingResult(tid: Long): Unit = {
    val info = taskInfos(tid)
    info.markGettingResult(clock.getTimeMillis())
    sched.dagScheduler.taskGettingResult(info)
  }

  /**
   * Check whether has enough quota to fetch the result with `size` bytes
   */
  //计算 driver 获取的结果量大小 是否超过  spark.driver.maxResultSize，如果超过 返回false 就是不能 获取到更多的result
  def canFetchMoreResults(size: Long): Boolean = sched.synchronized {
    totalResultSize += size
    calculatedTasks += 1
    if (maxResultSize > 0 && totalResultSize > maxResultSize) {
      val msg = s"Total size of serialized results of ${calculatedTasks} tasks " +
        s"(${Utils.bytesToString(totalResultSize)}) is bigger than spark.driver.maxResultSize " +
        s"(${Utils.bytesToString(maxResultSize)})"
      logError(msg)
      abort(msg)
      false
    } else {
      true
    }
  }

  /**
   * Marks a task as successful and notifies the DAGScheduler that the task has ended.
   */
  //处理成功的 task，会在 TaskSchedulerImpl的 handleSuccessfulTask 方法中被调用
  //开始是由 executor 执行 task 完成之后，向 driver的 CoarseGrainedSchedulerbackend 发送 StatusUpdate 信息后，调用 TaskSchedulerImpl 的
  //statusUpdate ，statusUpdate 里面根据 task 的完成状态 通过 TaskResultGetter和TaskSchedulerImpl 传递调用本类的 handleSuccessfulTask 或者 handleFailedTask
  def handleSuccessfulTask(tid: Long, result: DirectTaskResult[_]): Unit = {
    val info = taskInfos(tid) //拿到info信息
    val index = info.index
    info.markFinished(TaskState.FINISHED, clock.getTimeMillis()) //标记成功
    if (speculationEnabled) { //spark 推测机制开启的话，
      successfulTaskDurations.insert(info.duration)
    }
    removeRunningTask(tid)//从 runningTasksSet 和 Poll  中移除这个 task id

    // Kill any other attempts for the same task (since those are unnecessary now that one
    // attempt completed successfully).
    for (attemptInfo <- taskAttempts(index) if attemptInfo.running) {
      logInfo(s"Killing attempt ${attemptInfo.attemptNumber} for task ${attemptInfo.id} " +
        s"in stage ${taskSet.id} (TID ${attemptInfo.taskId}) on ${attemptInfo.host} " +
        s"as the attempt ${info.attemptNumber} succeeded on ${info.host}")
      killedByOtherAttempt += attemptInfo.taskId
      sched.backend.killTask(
        attemptInfo.taskId,
        attemptInfo.executorId,
        interruptThread = true,
        reason = "another attempt succeeded")
    }
    if (!successful(index)) {
      tasksSuccessful += 1
      logInfo(s"Finished task ${info.id} in stage ${taskSet.id} (TID ${info.taskId}) in" +
        s" ${info.duration} ms on ${info.host} (executor ${info.executorId})" +
        s" ($tasksSuccessful/$numTasks)")
      // Mark successful and stop if all the tasks have succeeded.
      successful(index) = true
      if (tasksSuccessful == numTasks) {
        isZombie = true
      }
    } else {
      logInfo("Ignoring task-finished event for " + info.id + " in stage " + taskSet.id +
        " because task " + index + " has already completed successfully")
    }
    // There may be multiple tasksets for this stage -- we let all of them know that the partition
    // was completed.  This may result in some of the tasksets getting completed.
    sched.markPartitionCompletedInAllTaskSets(stageId, tasks(index).partitionId, info)//
    // This method is called by "TaskSchedulerImpl.handleSuccessfulTask" which holds the
    // "TaskSchedulerImpl" lock until exiting. To avoid the SPARK-7655 issue, we should not
    // "deserialize" the value when holding a lock to avoid blocking other threads. So we call
    // "result.value()" in "TaskResultGetter.enqueueSuccessfulTask" before reaching here.
    // Note: "result.value()" only deserializes the value when it's called at the first time, so
    // here "result.value()" just returns the value and won't block other threads.
    sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info)
    maybeFinishTaskSet() //一个task 运行完成之后，可能就是这个Stage的最后一个任务，所以需要检查 这个 Stage的tasks 是否都已经完成
  }

  //partitionId 是 tasks.zipWithIndex
  //标记 这个 partitionId 的task 完成，当一个 task完成的时候，可能这个task 就是这个Stage的最后一个任务，所以要 maybeFinishTaskSet
  //在 TaskSchedulerImpl 中 markPartitionCompletedInAllTaskSets 方法中会使用 会使用
  private[scheduler] def markPartitionCompleted(partitionId: Int, taskInfo: TaskInfo): Unit = {
    partitionToIndex.get(partitionId).foreach { index =>
      if (!successful(index)) {//还没有成功的话 ，执行下面
        if (speculationEnabled && !isZombie) {
          successfulTaskDurations.insert(taskInfo.duration)
        }
        tasksSuccessful += 1 //标记成功
        successful(index) = true
        if (tasksSuccessful == numTasks) { //如果 所有的 任务 都运行成功了，标记这个 TaskSet 的所有的任务成功
          isZombie = true
        }
        maybeFinishTaskSet() //一个task 运行完成之后，可能就是这个Stage的最后一个任务，所以需要检查 这个 Stage的tasks 是否都已经完成
      }
    }
  }

  /**
   * Marks the task as failed, re-adds it to the list of pending tasks, and notifies the
   * DAG Scheduler.
   */
  //处理失败的任务，在一些 情况下会 再加入到 pending 队列去
  //处理成功的 task，会在 TaskSchedulerImpl的 handleFailedTask 方法中被调用
  //开始是由 executor 执行 task 完成之后，向 driver的 CoarseGrainedSchedulerbackend 发送 StatusUpdate 信息后，调用 TaskSchedulerImpl 的
  //statusUpdate ，statusUpdate 里面根据 task 的完成状态 通过 TaskResultGetter和TaskSchedulerImpl 传递调用本类的 handleSuccessfulTask 或者 handleFailedTask
  def handleFailedTask(tid: Long, state: TaskState, reason: TaskFailedReason) {
    val info = taskInfos(tid) //拿到 taskInfo信息
    if (info.failed || info.killed) {//如果这个 任务 已经failed或者killed 则直接返回
      return
    }
    removeRunningTask(tid)//从 runningTasksSet 和 Poll  中移除这个 task id
    info.markFinished(state, clock.getTimeMillis()) //标记这个 任务 已经完成
    val index = info.index
    copiesRunning(index) -= 1 //正在运行的任务数 -1
    var accumUpdates: Seq[AccumulatorV2[_, _]] = Seq.empty
    val failureReason = s"Lost task ${info.id} in stage ${taskSet.id} (TID $tid, ${info.host}," +
      s" executor ${info.executorId}): ${reason.toErrorString}"
    val failureException: Option[Throwable] = reason match {
      case fetchFailed: FetchFailed => //获取运行结果失败
        logWarning(failureReason)
        if (!successful(index)) { //标记为 success
          successful(index) = true
          tasksSuccessful += 1
        }
        isZombie = true

        if (fetchFailed.bmAddress != null) {
          blacklistTracker.foreach(_.updateBlacklistForFetchFailure(
            fetchFailed.bmAddress.host, fetchFailed.bmAddress.executorId))
        }

        None

      case ef: ExceptionFailure =>
        // ExceptionFailure's might have accumulator updates
        accumUpdates = ef.accums
        if (ef.className == classOf[NotSerializableException].getName) {
          // If the task result wasn't serializable, there's no point in trying to re-execute it.
          logError("Task %s in stage %s (TID %d) had a not serializable result: %s; not retrying"
            .format(info.id, taskSet.id, tid, ef.description))
          abort("Task %s in stage %s (TID %d) had a not serializable result: %s".format(
            info.id, taskSet.id, tid, ef.description))
          return
        }
        val key = ef.description
        val now = clock.getTimeMillis()
        val (printFull, dupCount) = {
          if (recentExceptions.contains(key)) {
            val (dupCount, printTime) = recentExceptions(key)
            if (now - printTime > EXCEPTION_PRINT_INTERVAL) {
              recentExceptions(key) = (0, now)
              (true, 0)
            } else {
              recentExceptions(key) = (dupCount + 1, printTime)
              (false, dupCount + 1)
            }
          } else {
            recentExceptions(key) = (0, now)
            (true, 0)
          }
        }
        if (printFull) {
          logWarning(failureReason)
        } else {
          logInfo(
            s"Lost task ${info.id} in stage ${taskSet.id} (TID $tid) on ${info.host}, executor" +
              s" ${info.executorId}: ${ef.className} (${ef.description}) [duplicate $dupCount]")
        }
        ef.exception

      case e: ExecutorLostFailure if !e.exitCausedByApp =>
        logInfo(s"Task $tid failed because while it was being computed, its executor " +
          "exited for a reason unrelated to the task. Not counting this failure towards the " +
          "maximum number of failures for the task.")
        None

      case e: TaskFailedReason =>  // TaskResultLost, TaskKilled, and others
        logWarning(failureReason)
        None
    }

    sched.dagScheduler.taskEnded(tasks(index), reason, null, accumUpdates, info)

    if (!isZombie && reason.countTowardsTaskFailures) {
      assert (null != failureReason)
      taskSetBlacklistHelperOpt.foreach(_.updateBlacklistForFailedTask(
        info.host, info.executorId, index, failureReason))
      numFailures(index) += 1
      if (numFailures(index) >= maxTaskFailures) {
        logError("Task %d in stage %s failed %d times; aborting job".format(
          index, taskSet.id, maxTaskFailures))
        abort("Task %d in stage %s failed %d times, most recent failure: %s\nDriver stacktrace:"
          .format(index, taskSet.id, maxTaskFailures, failureReason), failureException)
        return
      }
    }

    if (successful(index)) {//如果被标记 success 则不会继续 加入到 pending 队列中去
      logInfo(s"Task ${info.id} in stage ${taskSet.id} (TID $tid) failed, but the task will not" +
        s" be re-executed (either because the task failed with a shuffle data fetch failure," +
        s" so the previous stage needs to be re-run, or because a different copy of the task" +
        s" has already succeeded).")
    } else {
      addPendingTask(index) //继续 加入到 pending 队列中去，重新运行
    }

    maybeFinishTaskSet()//一个task 运行完成之后，可能就是这个Stage的最后一个任务，所以需要检查 这个 Stage的tasks 是否都已经完成
  }

  //手动 taskSetFailed
  def abort(message: String, exception: Option[Throwable] = None): Unit = sched.synchronized {
    // TODO: Kill running tasks if we were not terminated due to a Mesos error
    sched.dagScheduler.taskSetFailed(taskSet, message, exception) //通知 DAGScheduler 这个 taskSet 任务失败，DAGScheduler 里面会 使用 TaskSchedulerImpl 取消这个 Stage 的所有的任务
    isZombie = true //手动设置异常，停止 tasks的运行
    maybeFinishTaskSet() //一个task 运行完成之后，可能就是这个Stage的最后一个任务，所以需要检查 这个 Stage的tasks 是否都已经完成
  }

  /** If the given task ID is not in the set of running tasks, adds it.
   *
   * Used to keep track of the number of running tasks, for enforcing scheduling policies.
   */
  //runningTasksSet 增加这个 tid，Pool中也 增加 increaseRunningTasks
  //在 本类的 resourceOffer 方法中使用
  def addRunningTask(tid: Long) {
    if (runningTasksSet.add(tid) && parent != null) {// runningTasksSet 正在running task 的set
      parent.increaseRunningTasks(1) //更新 Poll 中的runningTasks 的 数量
    }
  }

  /** If the given task ID is in the set of running tasks, removes it. */
  //从 runningTasksSet 和 Poll  中移除这个 task id
  //在 本类的 handleSuccessfulTask 和 handleFailedTask 方法中使用
  def removeRunningTask(tid: Long) {
    if (runningTasksSet.remove(tid) && parent != null) { //runningTasksSet 正在running task 的数量
      parent.decreaseRunningTasks(1)
    }
  }

  override def getSchedulableByName(name: String): Schedulable = {
    null
  }

  override def addSchedulable(schedulable: Schedulable) {}

  override def removeSchedulable(schedulable: Schedulable) {}

  override def getSortedTaskSetQueue(): ArrayBuffer[TaskSetManager] = {
    val sortedTaskSetQueue = new ArrayBuffer[TaskSetManager]()
    sortedTaskSetQueue += this
    sortedTaskSetQueue
  }

  /** Called by TaskScheduler when an executor is lost so we can re-enqueue our tasks */
  //TaskSchedulerImpl 的 removeExecutor方法 调用Pool 中的 executorLost 方法，  Pool 中的 executorLost会调用 本方法
  override def executorLost(execId: String, host: String, reason: ExecutorLossReason) {
    // Re-enqueue any tasks that ran on the failed executor if this is a shuffle map stage,
    // and we are not using an external shuffle server which could serve the shuffle outputs.
    // The reason is the next stage wouldn't be able to fetch the data from this dead executor
    // so we would need to rerun these tasks on other executors.
    if (tasks(0).isInstanceOf[ShuffleMapTask] && !env.blockManager.externalShuffleServiceEnabled
        && !isZombie) {
      for ((tid, info) <- taskInfos if info.executorId == execId) {
        val index = taskInfos(tid).index
        if (successful(index) && !killedByOtherAttempt.contains(tid)) {
          successful(index) = false
          copiesRunning(index) -= 1
          tasksSuccessful -= 1
          addPendingTask(index)
          // Tell the DAGScheduler that this task was resubmitted so that it doesn't think our
          // stage finishes when a total of tasks.size tasks finish.
          sched.dagScheduler.taskEnded(
            tasks(index), Resubmitted, null, Seq.empty, info)
        }
      }
    }
    for ((tid, info) <- taskInfos if info.running && info.executorId == execId) {
      val exitCausedByApp: Boolean = reason match {
        case exited: ExecutorExited => exited.exitCausedByApp
        case ExecutorKilled => false
        case _ => true
      }
      handleFailedTask(tid, TaskState.FAILED, ExecutorLostFailure(info.executorId, exitCausedByApp,
        Some(reason.toString)))
    }
    // recalculate valid locality levels and waits when executor is lost
    recomputeLocality()
  }

  /**
   * Check for tasks to be speculated and return true if there are any. This is called periodically
   * by the TaskScheduler.
   *
   */
  //在 Poll的checkSpeculatableTasks 中调用这个方法
  override def checkSpeculatableTasks(minTimeToSpeculation: Int): Boolean = {
    // Can't speculate if we only have one task, and no need to speculate if the task set is a
    // zombie.
    if (isZombie || numTasks == 1) { //如果只有 1个 任务 那就 不用推测执行了
      return false
    }
    var foundTasks = false
    val minFinishedForSpeculation = (SPECULATION_QUANTILE * numTasks).floor.toInt // SPECULATION_QUANTILE = 0。75
    logDebug("Checking for speculative tasks: minFinished = " + minFinishedForSpeculation)

    if (tasksSuccessful >= minFinishedForSpeculation && tasksSuccessful > 0) { //只有超过一定的阀值 才会 开始推测任务，成功的task数量超过一定量
      val time = clock.getTimeMillis()
      val medianDuration: Double = successfulTaskDurations.median
      val threshold = max(SPECULATION_MULTIPLIER * medianDuration, minTimeToSpeculation)
      // TODO: Threshold should also look at standard deviation of task durations and have a lower
      // bound based on that.
      logDebug("Task length threshold for speculation: " + threshold)
      for (tid <- runningTasksSet) {
        val info = taskInfos(tid)
        val index = info.index
        if (!successful(index) && copiesRunning(index) == 1 && info.timeRunning(time) > threshold && //这个task 已经运行超过了一定的阀值
          !speculatableTasks.contains(index)) { //满足 推测任务的 要求
          logInfo(
            "Marking task %d in stage %s (on %s) as speculatable because it ran more than %.0f ms"
              .format(index, taskSet.id, info.host, threshold))
          speculatableTasks += index //加入到 推测Tasks中去
          sched.dagScheduler.speculativeTaskSubmitted(tasks(index)) //提交这个 task
          foundTasks = true
        }
      }
    }
    foundTasks //有满足条件的推测任务 则返回true
  }

  private def getLocalityWait(level: TaskLocality.TaskLocality): Long = {
    val defaultWait = conf.get(config.LOCALITY_WAIT) //spark.locality.wait 默认 3s
    val localityWaitKey = level match {
      case TaskLocality.PROCESS_LOCAL => "spark.locality.wait.process"
      case TaskLocality.NODE_LOCAL => "spark.locality.wait.node"
      case TaskLocality.RACK_LOCAL => "spark.locality.wait.rack"
      case _ => null
    }

    if (localityWaitKey != null) {
      conf.getTimeAsMs(localityWaitKey, defaultWait.toString)
    } else {
      0L
    }
  }

  /**
   * Compute the locality levels used in this TaskSet. Assumes that all tasks have already been
   * added to queues using addPendingTask.
   *
   */
  //计算 有效的 task 本地特性，一般的 levels 除了 RACK_LOCAL都会有的
  private def computeValidLocalityLevels(): Array[TaskLocality.TaskLocality] = {
    import TaskLocality.{PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY}
    val levels = new ArrayBuffer[TaskLocality.TaskLocality]
    if (!pendingTasksForExecutor.isEmpty &&
        pendingTasksForExecutor.keySet.exists(sched.isExecutorAlive(_))) { //pending的 executor  和 task 的 map 不为空 和  这个队列中 存在的 executor 中有任务运行
      levels += PROCESS_LOCAL //级别中加入 PROCESS_LOCAL
    }
    if (!pendingTasksForHost.isEmpty &&
        pendingTasksForHost.keySet.exists(sched.hasExecutorsAliveOnHost(_))) {
      levels += NODE_LOCAL //级别中加入 NODE_LOCAL
    }
    if (!pendingTasksWithNoPrefs.isEmpty) {
      levels += NO_PREF //级别中加入 NO_PREF
    }
    if (!pendingTasksForRack.isEmpty &&
        pendingTasksForRack.keySet.exists(sched.hasHostAliveOnRack(_))) {
      levels += RACK_LOCAL
    }
    levels += ANY //级别中加入 ANY
    logDebug("Valid locality levels for " + taskSet + ": " + levels.mkString(", "))
    levels.toArray
  }

  def recomputeLocality() {
    val previousLocalityLevel = myLocalityLevels(currentLocalityIndex)
    myLocalityLevels = computeValidLocalityLevels()
    localityWaits = myLocalityLevels.map(getLocalityWait)
    currentLocalityIndex = getLocalityIndex(previousLocalityLevel)
  }

  def executorAdded() {
    recomputeLocality()
  }
}

object TaskSetManager

private[spark] object TaskSetManager {
  // The user will be warned if any stages contain a task that has a serialized size greater than
  // this.
  val TASK_SIZE_TO_WARN_KB = 100
}

TaskResultGetter

TaskResultGetter主要的作用是在 executor 完成一个task之后，根据TaskState的状态，TaskState.FINISHED的话使用 taskResultGetter.enqueueSuccessfulTask 方法；TaskState.FAILED｜TaskState.KILLED｜TaskState.LOST使用enqueueFailedTask的方法。
enqueueSuccessfulTask 主要是反序列化拿到 executor的task 执行结果；
enqueueFailedTask 主要目的是反序列化拿到任务错误的 reason。

//在 TaskSchedulerImpl line 142 中 被实例化的
private[spark] class TaskResultGetter(sparkEnv: SparkEnv, scheduler: TaskSchedulerImpl)
  extends Logging {

  private val THREADS = sparkEnv.conf.getInt("spark.resultGetter.threads", 4)

  // Exposed for testing.
  protected val getTaskResultExecutor: ExecutorService =
    ThreadUtils.newDaemonFixedThreadPool(THREADS, "task-result-getter")

  // Exposed for testing.
  protected val serializer = new ThreadLocal[SerializerInstance] {
    override def initialValue(): SerializerInstance = {
      sparkEnv.closureSerializer.newInstance()
    }
  }

  protected val taskResultSerializer = new ThreadLocal[SerializerInstance] {
    override def initialValue(): SerializerInstance = {
      sparkEnv.serializer.newInstance()
    }
  }
//出对 成功的task
  //在 TaskSchedulerImpl 中 的 statusUpdate 中调用 当 task 的执行状态是 TaskState.FINISHED
  //主要是反序列化 拿到 executor的task 执行结果
  def enqueueSuccessfulTask(
      taskSetManager: TaskSetManager,
      tid: Long,
      serializedData: ByteBuffer): Unit = {
    getTaskResultExecutor.execute(new Runnable {
      override def run(): Unit = Utils.logUncaughtExceptions {
        try {
          val (result, size) = serializer.get().deserialize[TaskResult[_]](serializedData) match {
            case directResult: DirectTaskResult[_] =>
              if (!taskSetManager.canFetchMoreResults(serializedData.limit())) { //超过大小限制
                return
              }
              // deserialize "value" without holding any lock so that it won't block other threads.
              // We should call it here, so that when it's called again in
              // "TaskSetManager.handleSuccessfulTask", it does not need to deserialize the value.
              directResult.value(taskResultSerializer.get()) //value 方法内部已经完成了反序列化操作 拿到 executor task 的执行结果
              (directResult, serializedData.limit())
            case IndirectTaskResult(blockId, size) =>
              if (!taskSetManager.canFetchMoreResults(size)) {
                // dropped by executor if size is larger than maxResultSize
                sparkEnv.blockManager.master.removeBlock(blockId)
                return
              }
              logDebug("Fetching indirect task result for TID %s".format(tid))
              scheduler.handleTaskGettingResult(taskSetManager, tid) //
              val serializedTaskResult = sparkEnv.blockManager.getRemoteBytes(blockId)
              if (!serializedTaskResult.isDefined) {
                /* We won't be able to get the task result if the machine that ran the task failed
                 * between when the task ended and when we tried to fetch the result, or if the
                 * block manager had to flush the result. */
                scheduler.handleFailedTask(
                  taskSetManager, tid, TaskState.FINISHED, TaskResultLost)
                return
              }
              val deserializedResult = serializer.get().deserialize[DirectTaskResult[_]](
                serializedTaskResult.get.toByteBuffer)
              // force deserialization of referenced value
              deserializedResult.value(taskResultSerializer.get())
              sparkEnv.blockManager.master.removeBlock(blockId)
              (deserializedResult, size)
          }

          // Set the task result size in the accumulator updates received from the executors.
          // We need to do this here on the driver because if we did this on the executors then
          // we would have to serialize the result again after updating the size.
          result.accumUpdates = result.accumUpdates.map { a =>
            if (a.name == Some(InternalAccumulator.RESULT_SIZE)) {
              val acc = a.asInstanceOf[LongAccumulator]
              assert(acc.sum == 0L, "task result size should not have been set on the executors")
              acc.setValue(size.toLong)
              acc
            } else {
              a
            }
          }

          scheduler.handleSuccessfulTask(taskSetManager, tid, result)
        } catch {
          case cnf: ClassNotFoundException =>
            val loader = Thread.currentThread.getContextClassLoader
            taskSetManager.abort("ClassNotFound with classloader: " + loader)
          // Matching NonFatal so we don't catch the ControlThrowable from the "return" above.
          case NonFatal(ex) =>
            logError("Exception while getting task result", ex)
            taskSetManager.abort("Exception while getting task result: %s".format(ex))
        }
      }
    })
  }

//出对 失败的任务
//在 TaskSchedulerImpl 中 的 statusUpdate 中调用 当 task 的执行状态是 TaskState.FAILED ｜ TaskState.KILLED ｜ TaskState.LOST
  //主要目的是 拿到 任务错误的 reason
  def enqueueFailedTask(taskSetManager: TaskSetManager, tid: Long, taskState: TaskState,
    serializedData: ByteBuffer) {
    var reason : TaskFailedReason = UnknownReason
    try {
      getTaskResultExecutor.execute(new Runnable {
        override def run(): Unit = Utils.logUncaughtExceptions {
          val loader = Utils.getContextOrSparkClassLoader
          try {
            if (serializedData != null && serializedData.limit() > 0) {
              reason = serializer.get().deserialize[TaskFailedReason](
                serializedData, loader)
            }
          } catch {
            case cnd: ClassNotFoundException =>
              // Log an error but keep going here -- the task failed, so not catastrophic
              // if we can't deserialize the reason.
              logError(
                "Could not deserialize TaskEndReason: ClassNotFound with classloader " + loader)
            case ex: Exception => // No-op
          } finally {
            // If there's an error while deserializing the TaskEndReason, this Runnable
            // will die. Still tell the scheduler about the task failure, to avoid a hang
            // where the scheduler thinks the task is still running.
            scheduler.handleFailedTask(taskSetManager, tid, taskState, reason)
          }
        }
      })
    } catch {
      case e: RejectedExecutionException if sparkEnv.isStopped =>
        // ignore it
    }
  }

  def stop() {
    getTaskResultExecutor.shutdownNow()
  }
}

你可能感兴趣的:(Spark,源码)

店群合一模式下的社区团购新发展——结合链动 2+1 模式、AI 智能名片与 S2B2C 商城小程序源码说私域人工智能小程序
摘要：本文探讨了店群合一的社区团购平台在当今商业环境中的重要性和优势。通过分析店群合一模式如何将互联网社群与线下终端紧密结合，阐述了链动2+1模式、AI智能名片和S2B2C商城小程序源码在这一模式中的应用价值。这些创新元素的结合为社区团购带来了新的机遇，提升了用户信任感、拓展了营销渠道，并实现了线上线下的完美融合。一、引言随着互联网技术的不断发展，社区团购作为一种新兴的商业模式，在满足消费者日常需
四章-32-点要素的聚合彩云飘过
本文基于腾讯课堂老胡的课《跟我学Openlayers--基础实例详解》做的学习笔记，使用的openlayers5.3.xapi。源码见1032.html，对应的官网示例https://openlayers.org/en/latest/examples/cluster.htmlhttps://openlayers.org/en/latest/examples/earthquake-clusters.
DIV+CSS+JavaScript技术制作网页（旅游主题网页设计与制作）云南大理 STU学生网页设计网页设计期末网页作业 html静态网页 html5期末大作业网页设计 web大作业
️精彩专栏推荐作者主页:【进入主页—获取更多源码】web前端期末大作业：【HTML5网页期末作业(1000套)】程序员有趣的告白方式：【HTML七夕情人节表白网页制作(110套)】文章目录二、网站介绍三、网站效果▶️1.视频演示2.图片演示四、网站代码HTML结构代码CSS样式代码五、更多源码二、网站介绍网站布局方面：计划采用目前主流的、能兼容各大主流浏览器、显示效果稳定的浮动网页布局结构。网站程
关于城市旅游的HTML网页设计——(旅游风景云南 5页)HTML+CSS+JavaScript 二挡起步 web前端期末大作业 javascript html css 旅游风景
⛵源码获取文末联系✈Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业|游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作|HTML期末大学生网页设计作业，Web大学生网页HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScrip
libyuv之linux编译 jaronho Linux linux 运维服务器
文章目录一、下载源码二、编译源码三、注意事项1、银河麒麟系统（aarch64）（1）解决armv8-a+dotprod+i8mm指令集支持问题（2）解决armv9-a+sve2指令集支持问题一、下载源码到GitHub网站下载https://github.com/lemenkov/libyuv源码，或者用直接用git克隆到本地，如：gitclonehttps://github.com/lemenko
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
ArrayList 源码解析程序猿进阶 Java基础 ArrayList List java 面试性能优化架构设计 idea
ArrayList是Java集合框架中的一个动态数组实现，提供了可变大小的数组功能。它继承自AbstractList并实现了List接口，是顺序容器，即元素存放的数据与放进去的顺序相同，允许放入null元素，底层通过数组实现。除该类未实现同步外，其余跟Vector大致相同。每个ArrayList都有一个容量capacity，表示底层数组的实际大小，容器内存储元素的个数不能多于当前容量。当向容器中添
Python神器！WEB自动化测试集成工具 DrissionPage 亚丁号 python 开发语言
一、前言用requests做数据采集面对要登录的网站时，要分析数据包、JS源码，构造复杂的请求，往往还要应付验证码、JS混淆、签名参数等反爬手段，门槛较高。若数据是由JS计算生成的，还须重现计算过程，体验不好，开发效率不高。使用浏览器，可以很大程度上绕过这些坑，但浏览器运行效率不高。因此，这个库设计初衷，是将它们合而为一，能够在不同须要时切换相应模式，并提供一种人性化的使用方法，提高开发和运行效率
笋丁网页自动回复机器人V3.0.0免授权版源码希希分享软希网58soho_cn 源码资源笋丁网页自动回复机器人
笋丁网页机器人一款可设置自动回复，默认消息，调用自定义api接口的网页机器人。此程序后端语言使用Golang，内存占用最高不超过30MB，1H1G服务器流畅运行。仅支持Linux服务器部署，不支持虚拟主机，请悉知！使用自定义api功能需要有一定的建站基础。源码下载：https://download.csdn.net/download/m0_66047725/89754250更多资源下载：关注我。安
ESP32-C3入门教程网络篇⑩——基于esp_https_ota和MQTT实现开机主动升级和被动触发升级的OTA功能小康师兄 ESP32-C3入门教程 https 服务器 esp32 OTA MQTT
文章目录一、前言二、软件流程三、部分源码四、运行演示一、前言本文基于VSCodeIDE进行编程、编译、下载、运行等操作基础入门章节请查阅：ESP32-C3入门教程基础篇①——基于VSCode构建HelloWorld教程目录大纲请查阅：ESP32-C3入门教程——导读ESP32-C3入门教程网络篇⑨——基于esp_https_ota实现史上最简单的ESP32OTA远程固件升级功能二、软件流程
【Python搞定车载自动化测试】——Python实现车载以太网DoIP刷写（含Python源码）疯狂的机器人 Python搞定车载自动化 python DoIP UDS ISO 14229 1SO 13400 Bootloader tcp/ip
系列文章目录【Python搞定车载自动化测试】系列文章目录汇总文章目录系列文章目录前言一、环境搭建1.软件环境2.硬件环境二、目录结构三、源码展示1.DoIP诊断基础函数方法2.DoIP诊断业务函数方法3.27服务安全解锁4.DoIP自动化刷写四、测试日志1.测试日志五、完整源码链接前言随着智能电动汽车行业的发展，汽车=智能终端+四个轮子，各家车企都推出了各自的OTA升级方案，本章节主要介绍如何使
进销存小程序源码 PHP网络版ERP进销存管理系统全开源可二开摸鱼小号 php
可直接源码搭建部署发布后使用：一、功能模块介绍该系统模板主要有进，销，存三个主要模板功能组成，下面将介绍各模块所对应的功能；进：需要将产品采购入库，自动生成采购明细台账同时关联财务生成付款账单；销：是指对客户的销售订单记录，汇总生成产品销售明细及回款计划；存：库存的日常盘点与统计，库存下限预警、出入库台账、库存位置等。1.进购管理采购订单：采购下单审批→由上级审批通过采购入库；采购入库：货品到货>
计算机毕业设计PHP仓储综合管理系统（源码+程序+VUE+lw+部署） java毕设程序源码王哥 php 课程设计 vue.js
该项目含有源码、文档、程序、数据库、配套开发软件、软件安装教程。欢迎交流项目运行环境配置：phpStudy+Vscode+Mysql5.7+HBuilderX+Navicat11+Vue+Express。项目技术：原生PHP++Vue等等组成，B/S模式+Vscode管理+前后端分离等等。环境需要1.运行环境：最好是小皮phpstudy最新版，我们在这个版本上开发的。其他版本理论上也可以。2.开发
JVM源码分析之堆外内存完全解读 HeapDump性能社区
概述广义的堆外内存说到堆外内存，那大家肯定想到堆内内存，这也是我们大家接触最多的，我们在jvm参数里通常设置-Xmx来指定我们的堆的最大值，不过这还不是我们理解的Java堆，-Xmx的值是新生代和老生代的和的最大值，我们在jvm参数里通常还会加一个参数-XX:MaxPermSize来指定持久代的最大值，那么我们认识的Java堆的最大值其实是-Xmx和-XX:MaxPermSize的总和，在分代算法
html+css网页设计旅游网站首页1个页面 html+css+js网页设计 html css 旅游
html+css网页设计旅游网站首页1个页面网页作品代码简单，可使用任意HTML辑软件（如：Dreamweaver、HBuilder、Vscode、Sublime、Webstorm、Text、Notepad++等任意html编辑软件进行运行及修改编辑等操作）。获取源码1，访问该网站https://download.csdn.net/download/qq_42431718/897527112，点击
Istio pilot-discovery服务发现源码解析（1.13版本） xidianjiapei001 #Istio istio 云原生服务发现
Istiopilot-discovery服务发现介绍工作机制初始化初始化Config控制器初始化Service控制器controller初始化NamespaceServiceNodePodPilotDiscovery各组件启动流程DiscoveryServer接收Envoy的gRPC连接请求流程Config变化后向Envoy推送更新的流程总结参考介绍IstioPilot的代码分为Pilot-Dis
python中文版软件下载-Python中文版编程大乐趣
python中文版是一种面向对象的解释型计算机程序设计语言。python中文版官网面向对象编程，拥有高效的高级数据结构和简单而有效的方法，其优雅的语法、动态类型、以及天然的解释能力，让它成为理想的语言。软件功能强大，简单易学，可以帮助用户快速编写代码，而且代码运行速度非常快，几乎可以支持所有的操作系统，实用性真的超高的。python中文版软件介绍：python中文版的解释器及其扩展标准库的源码和编
Scanpy源码浅析之pp.normalize_total 何物昂
版本导入Scanpy,其版本为'1.9.1'，如果你看到的源码和下文有差异，其可能是由于版本差异。importscanpyasscsc.__version__#'1.9.1'例子函数pp.normalize_total用于Normalizecountspercell，其源代码在scanpy/preprocessing/_normalization.py我们通过一个简单例子来了解该函数主要功能:将一
基于JavaWeb开发的Java+SpringMvc+vue+element实现上海汽车博物馆平台网顺技术团队成品程序项目 java vue.js 汽车课程设计 spring boot
基于JavaWeb开发的Java+SpringMvc+vue+element实现上海汽车博物馆平台作者主页网顺技术团队欢迎点赞收藏⭐留言文末获取源码联系方式查看下方微信号获取联系方式承接各种定制系统精彩系列推荐精彩专栏推荐订阅不然下次找不到哟Java毕设项目精品实战案例《1000套》感兴趣的可以先收藏起来，还有大家在毕设选题，项目以及论文编写等相关问题都可以给我留言咨询，希望帮助更多的人文章目录基
【K8s】专题十一：Kubernetes 集群证书过期处理方法行者Sun1989 Kubernetes kubernetes 云原生容器
本文内容均来自个人笔记并重新梳理，如有错误欢迎指正！如果对您有帮助，烦请点赞、关注、转发、订阅专栏！专栏订阅入口Linux专栏|Docker专栏|Kubernetes专栏往期精彩文章【Docker】（全网首发）KylinV10下MySQL容器内存占用异常的解决方法【Docker】（全网首发）KylinV10下MySQL容器内存占用异常的解决方法（续）【Docker】MySQL源码构建Docker镜
SAP自动化-ME12批量更新最后一行的价格小九不懂SAP 自动化 SAP python
Python源码#-Begin-----------------------------------------------------------------#-Includes--------------------------------------------------------------importsys,win32com.clientimportosimporttime#-Sub
linux gcc 格式,Linux下gcc与gdb简介神奇的战士 linux gcc 格式
gcc编译器可以将C、C++等语言源程序、汇编程序编译、链接成可执行程序。gdb是GNU开发的一个Unix/Linux下强大的程序调试工具。linux下没有后缀名的概念。但gcc根据文件的后缀来区别输入文件的类别：.cC语言源代码文件.a由目标文件构成的库文件.C、.cc、.cppC++源码文件.h头文件.i经过预处理之后的C语言文件.ii经过预处理之后的C++文件.o编译后的目标文件.s汇编源码
浅谈openresty 爱编码的钓鱼佬 nginx openresty 运维
熟悉了nginx后再来看openresty，不得不说openresty是比较优秀的。对nginx和openresty的历史等在这此就不介绍了。首先对标nginx，自然有优劣一、开发难度nginx：毫无疑问nginx的开发难度比较高，需要扎实的c/c++基础，而且还需要对nginx源码比较熟悉，开发效率慢，比如实现一个类似echo的功能，至少要上百行代码。而openresty只需要一句ngx.say
Golang Channel PandaSkr golang
Channel解析1.Channel源码分析1.1Channel数据结构typehchanstruct{qcountuint//channel的元素数量dataqsizuint//channel循环队列长度bufunsafe.Pointer//指向循环队列的指针elemsizeuint16//元素大小closeduint32//channel是否关闭0-未关闭elemtype*_type//元素类
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
使用FPGA接收MIPI CSI RX信号并进行去抖动、RGB转YUV处理：FX3014 USB3.0 UVC传输与帧率控制源代码，FPGA实现MIPI CSI RX接收，去Debayer， RGB转 kVfINoSzdrt fpga开发程序人生
fpgamipicsirx接收去debayer,rgb转yuv,fx3014usb3.0uvc传输与帧率控制源代码，具体架构看图，除dphy物理层外，mipi均为源码sensorimx219mipi源码mipi4lanecsirxraw10fpgamachXO3lf-690usb3.0fx301432bityuvdatawithframesync测试模式3280*246415fps1920*108
移动订货小程序哪个好批发订货系统源码哪个好多用户商城系统订货系统源码移动订货小程序批发订货系统订货系统源码
订货小程序就是依托微信小程序的订货系统，微信小程序订货系统相较于其他终端的订货方式，能够更快进入商城，对经销商而言更为方便。今天，我们一起盘点三个主流的移动订货小程序，看看哪个移动订货小程序好。第一、核货宝订货小程序核货宝是商淘科技旗下的订货系统，可为批发企业提供不同客户不同商品、不同客户不同价格快速订货和商家账期管理。功能介绍：客户批发订货的专属数字化订货系统，可以移动端订货。与传统手写开单相比
MacOS Catalina 从源码构建Qt6.2开发库之01: 编译Qt6.2源代码捕鲸叉 QT macos c++QT
安装xcode，cmake，ninjabrewinstallnodemac下安装OpenGL库并使之对各项目可见在macOS上安装OpenGL通常涉及到安装一些依赖库，如MGL、GLUT或者是GLEW等，同时确保LLVM的OpenGL框架和相关工具链的兼容性。以下是一个基本的安装步骤，你可以在终端中执行：安装Homebrew（如果还没有安装的话）：/bin/bash-c"$(curl-fsSLht
基于Python执行lua脚本 xu-jssy Python自动化脚本 python lua 自动化 rpa
一、依赖安装pipinstalllupa二、源码将lua文件存放在base_path路径，将lua文件名称（不包含后缀名）传递给lua_runner函数即可importmultiprocessingimportlupa#lua文件存放位置base_path='D:\\test\\lua'classLuaFuncion:#创建Lua运行时环境lua=lupa.LuaRuntime(unpack_re
Python实现mysql命令行 xu-jssy python mysql adb
一、源码importosimportpymysqldefsql_shell():password=input("EnterPassword:")#访问密码ifpassword.strip()!="yyds":print("Bye")return#清空控制台输出os.system("cls"ifos.name=="nt"else"clear")try:#连接到MySQL数据库conn=pymysql
html页面js获取参数值 0624chenhong html
1.js获取参数值js function GetQueryString(name) { var reg = new RegExp("(^|&)"+ name +"=([^&]*)(&|$)"); var r = windo
MongoDB 在多线程高并发下的问题 BigCat2013 mongodb DB 高并发重复数据
最近项目用到 MongoDB , 主要是一些读取数据及改状态位的操作. 因为是结合了最近流行的 Storm进行大数据的分析处理，并将分析结果插入Vertica数据库，所以在多线程高并发的情境下, 会发现 Vertica 数据库中有部分重复的数据. 这到底是什么原因导致的呢？笔者开始也是一筹莫展，重复去看 MongoDB 的 API , 终于有了新发现： com.mongodb.DB 这个类有
c++ 用类模版实现链表(c++语言程序设计第四版示例代码) CrazyMizzz 数据结构 C++
#include<iostream> #include<cassert> using namespace std; template<class T> class Node { private: Node<T> * next; public: T data;
最近情况麦田的设计者感慨考试生活
在五月黄梅天的岁月里，一年两次的软考又要开始了。到目前为止，我已经考了多达三次的软考，最后的结果就是通过了初级考试（程序员）。人啊，就是不满足，考了初级就希望考中级，于是，这学期我就报考了中级，明天就要考试。感觉机会不大，期待奇迹发生吧。这个学期忙于练车，写项目，反正最后是一团糟。后天还要考试科目二。这个星期真的是很艰难的一周，希望能快点度过。
linux系统中用pkill踢出在线登录用户被触发 linux
由于linux服务器允许多用户登录，公司很多人知道密码，工作造成一定的障碍所以需要有时踢出指定的用户 1/#who 查出当前有那些终端登录（用 w 命令更详细） # who root pts/0 2010-10-28 09:36 (192
仿QQ聊天第二版肆无忌惮_ qq
在第一版之上的改进内容: 第一版链接: http://479001499.iteye.com/admin/blogs/2100893 用map存起来号码对应的聊天窗口对象,解决私聊的时候所有消息发到一个窗口的问题. 增加ViewInfo类,这个是信息预览的窗口,如果是自己的信息,则可以进行编辑. 信息修改后上传至服务器再告诉所有用户,自己的窗口
java读取配置文件知了ing
1，java读取.properties配置文件 InputStream in; try { in = test.class.getClassLoader().getResourceAsStream("config/ipnetOracle.properties");//配置文件的路径 Properties p = new Properties()
__attribute__ 你知多少？矮蛋蛋 C++gcc
原文地址: http://www.cnblogs.com/astwish/p/3460618.html GNU C 的一大特色就是__attribute__ 机制。__attribute__ 可以设置函数属性（Function Attribute ）、变量属性（Variable Attribute ）和类型属性（Type Attribute ）。 __attribute__ 书写特征是：
jsoup使用笔记 alleni123 java 爬虫 JSoup
<dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.7.3</version> </dependency> 2014/08/28 今天遇到这种形式，
JAVA中的集合 Collectio 和Map的简单使用及方法百合不是茶 list map set
List ,set ,map的使用方法和区别 java容器类类库的用途是保存对象，并将其分为两个概念： Collection集合：一个独立的序列，这些序列都服从一条或多条规则;List必须按顺序保存元素，set不能重复元素；Queue按照排队规则来确定对象产生的顺序（通常与他们被插入的
杀LINUX的JOB进程 bijian1013 linux unix
今天发现数据库一个JOB一直在执行，都执行了好几个小时还在执行，所以想办法给删除掉系统环境： ORACLE 10G Linux操作系统操作步骤如下：第一步.查询出来那个job在运行，找个对应的SID字段 select * from dba_jobs_running--找到job对应的sid &n
Spring AOP详解 bijian1013 java spring AOP
最近项目中遇到了以下几点需求，仔细思考之后，觉得采用AOP来解决。一方面是为了以更加灵活的方式来解决问题，另一方面是借此机会深入学习Spring AOP相关的内容。例如，以下需求不用AOP肯定也能解决，至于是否牵强附会，仁者见仁智者见智。 1.对部分函数的调用进行日志记录，用于观察特定问题在运行过程中的函数调用
[Gson六]Gson类型适配器(TypeAdapter) bit1129 Adapter
TypeAdapter的使用动机 Gson在序列化和反序列化时，默认情况下，是按照POJO类的字段属性名和JSON串键进行一一映射匹配，然后把JSON串的键对应的值转换成POJO相同字段对应的值，反之亦然，在这个过程中有一个JSON串Key对应的Value和对象之间如何转换(序列化/反序列化)的问题。以Date为例，在序列化和反序列化时，Gson默认使用java.
【spark八十七】给定Driver Program，如何判断哪些代码在Driver运行，哪些代码在Worker上执行 bit1129 driver
Driver Program是用户编写的提交给Spark集群执行的application，它包含两部分作为驱动： Driver与Master、Worker协作完成application进程的启动、DAG划分、计算任务封装、计算任务分发到各个计算节点(Worker)、计算资源的分配等。计算逻辑本身，当计算任务在Worker执行时，执行计算逻辑完成application的计算任务
nginx 经验总结 ronin47 nginx 总结
　　　深感nginx的强大，只学了皮毛，把学下的记录。　　　获取Header 信息，一般是以$http_XX（ＸＸ是小写）获取body,通过接口，再展开，根据Ｋ取Ｖ　　　获取uri,以$arg_XX &n
轩辕互动-1.求三个整数中第二大的数2.整型数组的平衡点 bylijinnan 数组
import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class ExoWeb { public static void main(String[] args) { ExoWeb ew=new ExoWeb(); System.out.pri
Netty源码学习-Java-NIO-Reactor bylijinnan java 多线程 netty
Netty里面采用了NIO-based Reactor Pattern 了解这个模式对学习Netty非常有帮助参考以下两篇文章： http://jeewanthad.blogspot.com/2013/02/reactor-pattern-explained-part-1.html http://gee.cs.oswego.edu/dl/cpjslides/nio.pdf
AOP通俗理解 cngolon spring AOP
1.我所知道的aop 初看aop,上来就是一大堆术语，而且还有个拉风的名字，面向切面编程，都说是OOP的一种有益补充等等。一下子让你不知所措，心想着：怪不得很多人都和我说aop多难多难。当我看进去以后，我才发现：它就是一些java基础上的朴实无华的应用，包括ioc，包括许许多多这样的名词，都是万变不离其宗而已。 2.为什么用aop&nb
cursor variable 实例 ctrain variable
create or replace procedure proc_test01 as type emp_row is record( empno emp.empno%type, ename emp.ename%type, job emp.job%type, mgr emp.mgr%type, hiberdate emp.hiredate%type, sal emp.sal%t
shell报bash: service: command not found解决方法 daizj linux shell service jps
今天在执行一个脚本时，本来是想在脚本中启动hdfs和hive等程序，可以在执行到service hive-server start等启动服务的命令时会报错，最终解决方法记录一下：脚本报错如下： ./olap_quick_intall.sh: line 57: service: command not found ./olap_quick_intall.sh: line 59
40个迹象表明你还是PHP菜鸟 dcj3sjt126com 设计模式 PHP 正则表达式 oop
你是PHP菜鸟，如果你：1. 不会利用如phpDoc 这样的工具来恰当地注释你的代码2. 对优秀的集成开发环境如Zend Studio 或Eclipse PDT 视而不见3. 从未用过任何形式的版本控制系统，如Subclipse4. 不采用某种编码与命名标准，以及通用约定，不能在项目开发周期里贯彻落实5. 不使用统一开发方式6. 不转换（或）也不验证某些输入或SQL查询串（译注：参考PHP相关函
Android逐帧动画的实现 dcj3sjt126com android
一、代码实现： private ImageView iv; private AnimationDrawable ad; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout
java远程调用linux的命令或者脚本 eksliang linux ganymed-ssh2
转载请出自出处： http://eksliang.iteye.com/blog/2105862 Java通过SSH2协议执行远程Shell脚本(ganymed-ssh2-build210.jar) 使用步骤如下： 1.导包官网下载: http://www.ganymed.ethz.ch/ssh2/ ma
adb端口被占用问题 gqdy365 adb
最近重新安装的电脑，配置了新环境，老是出现： adb server is out of date. killing... ADB server didn't ACK * failed to start daemon * 百度了一下，说是端口被占用，我开个eclipse，然后打开cmd，就提示这个，很烦人。一个比较彻底的解决办法就是修改
ASP.NET使用FileUpload上传文件 hvt .net C#hovertree asp.net webform
前台代码： <asp:FileUpload ID="fuKeleyi" runat="server" /> <asp:Button ID="BtnUp" runat="server" onclick="BtnUp_Click" Text="上传" />
代码之谜（四）- 浮点数（从惊讶到思考） justjavac 浮点数精度代码之谜 IEEE
在『代码之谜』系列的前几篇文章中，很多次出现了浮点数。浮点数在很多编程语言中被称为简单数据类型，其实，浮点数比起那些复杂数据类型（比如字符串）来说，一点都不简单。单单是说明 IEEE浮点数就可以写一本书了，我将用几篇博文来简单的说说我所理解的浮点数，算是抛砖引玉吧。一次面试记得多年前我招聘 Java 程序员时的一次关于浮点数、二分法、编码的面试，多年以后，他已经称为了一名很出色的
数据结构随记_1 lx.asymmetric 数据结构笔记
第一章 1.数据结构包括数据的逻辑结构、数据的物理/存储结构和数据的逻辑关系这三个方面的内容。 2.数据的存储结构可用四种基本的存储方法表示，它们分别是顺序存储、链式存储、索引存储和散列存储。 3.数据运算最常用的有五种，分别是查找/检索、排序、插入、删除、修改。 4.算法主要有以下五个特性：输入、输出、可行性、确定性和有穷性。 5.算法分析的
linux的会话和进程组网络接口 linux
会话：一个或多个进程组。起于用户登录，终止于用户退出。此期间所有进程都属于这个会话期。会话首进程：调用setsid创建会话的进程1.规定组长进程不能调用setsid，因为调用setsid后，调用进程会成为新的进程组的组长进程.如何保证？先调用fork，然后终止父进程，此时由于子进程的进程组ID为父进程的进程组ID，而子进程的ID是重新分配的，所以保证子进程不会是进程组长，从而子进程可以调用se
二维数组元素的连续求解 1140566087 二维数组 ACM
import java.util.HashMap; public class Title { public static void main(String[] args){ f(); } // 二位数组的应用 //12、二维数组中，哪一行或哪一列的连续存放的0的个数最多，是几个0。注意，是“连续”。 public static void f(){
也谈什么时候Java比C++快 windshome java C++
刚打开iteye就看到这个标题“Java什么时候比C++快”，觉得很好笑。你要比，就比同等水平的基础上的相比，笨蛋写得C代码和C++代码，去和高手写的Java代码比效率，有什么意义呢？我是写密码算法的，深刻知道算法C和C++实现和Java实现之间的效率差，甚至也比对过C代码和汇编代码的效率差，计算机是个死的东西，再怎么优化，Java也就是和C