yunlong34574

Spar学习3：Spark运行概览

中文版Spark运行核心概念解析

名词	解释
Application	基于Spark的用户程序，包含了driver程序和集群上的executor
Driver Program	运⾏行main函数并且新建SparkContext的程序
Cluster Manager	在集群上获取资源的外部服务(例如:standalone,Mesos,Yarn )
Worker Node	集群中任何可以运行应用代码的节点
Executor	是在一个worker node上为某应用启动的一个进程，该进程负责运行任务，并且负责将数据存在内存或者磁盘上。每个应用都有各自独立的executors
Task	被送到某个executor上的工作单元
Job	包含很多任务的并⾏行计算，可以看做和Spark的action对应
Stage	一个Job会被拆分很多组任务，每组任务被称为Stage(就像Mapreduce分map任务和 reduce任务一样)

英文原版释义

http://spark.apache.org/docs/latest/cluster-overview.html

The following table summarizes terms you’ll see used to refer to cluster concepts:

Term	Meaning
Application	User program built on Spark. Consists of a driver program and executors on the cluster.
Application jar	A jar containing the user's Spark application. In some cases users will want to create an "uber jar" containing their application along with its dependencies. The user's jar should never include Hadoop or Spark libraries, however, these will be added at runtime.
Driver program	The process running the main() function of the application and creating the SparkContext
Cluster manager	An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)
Deploy mode	Distinguishes where the driver process runs. In "cluster" mode, the framework launches the driver inside of the cluster. In "client" mode, the submitter launches the driver outside of the cluster.
Worker node	Any node that can run application code in the cluster
Executor	A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.
Task	A unit of work that will be sent to one executor
Job	A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. `save`, `collect`); you'll see this term used in the driver's logs.
Stage	Each job gets divided into smaller sets of tasks called stages that depend on each other (similar to the map and reduce stages in MapReduce); you'll see this term used in the driver's logs.

下图是Spark官网提供的运行时结构图

Spark核心组件

任务调度

TaskScheduler.scala

阅读下注释，可以理解到SparkContext，DAGScheduler与TaskScheduler之间的关系

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.spark.scheduler

import org.apache.spark.scheduler.SchedulingMode.SchedulingMode
import org.apache.spark.executor.TaskMetrics
import org.apache.spark.storage.BlockManagerId

/**
 * Low-level task scheduler interface, currently implemented exclusively by TaskSchedulerImpl.
 * This interface allows plugging in different task schedulers. Each TaskScheduler schedulers tasks
 * for a single SparkContext. These schedulers get sets of tasks submitted to them from the
 * DAGScheduler for each stage, and are responsible for sending the tasks to the cluster, running
 * them, retrying if there are failures, and mitigating stragglers. They return events to the
 * DAGScheduler.
 */
private[spark] trait TaskScheduler {

  def rootPool: Pool

  def schedulingMode: SchedulingMode

  def start(): Unit

  // Invoked after system has successfully initialized (typically in spark context).
  // Yarn uses this to bootstrap allocation of resources based on preferred locations,
  // wait for slave registerations, etc.
  def postStartHook() { }

  // Disconnect from the cluster.
  def stop(): Unit

  // Submit a sequence of tasks to run.
  def submitTasks(taskSet: TaskSet): Unit

  // Cancel a stage.
  def cancelTasks(stageId: Int, interruptThread: Boolean)

  // Set the DAG scheduler for upcalls. This is guaranteed to be set before submitTasks is called.
  def setDAGScheduler(dagScheduler: DAGScheduler): Unit

  // Get the default level of parallelism to use in the cluster, as a hint for sizing jobs.
  def defaultParallelism(): Int

  /**
   * Update metrics for in-progress tasks and let the master know that the BlockManager is still
   * alive. Return true if the driver knows about the given block manager. Otherwise, return false,
   * indicating that the block manager should re-register.
   */
  def executorHeartbeatReceived(execId: String, taskMetrics: Array[(Long, TaskMetrics)],
    blockManagerId: BlockManagerId): Boolean

  /**
   * The application ID associated with the job, if any.
   *
   * @return The application ID, or None if the backend does not provide an ID.
   */
  def applicationId(): Option[String] = None

}

DAGScheduler.scala

/**
 * The high-level scheduling layer that implements stage-oriented scheduling. It computes a DAG of
 * stages for each job, keeps track of which RDDs and stage outputs are materialized, and finds a
 * minimal schedule to run the job. It then submits stages as TaskSets to an underlying
 * TaskScheduler implementation that runs them on the cluster.
 *
 * In addition to coming up with a DAG of stages, this class also determines the preferred
 * locations to run each task on, based on the current cache status, and passes these to the
 * low-level TaskScheduler. Furthermore, it handles failures due to shuffle output files being
 * lost, in which case old stages may need to be resubmitted. Failures *within* a stage that are
 * not caused by shuffle file loss are handled by the TaskScheduler, which will retry each task
 * a small number of times before cancelling the whole stage.
 *
 */
private[spark]
class DAGScheduler

TaskSet, 很明显taskset与stage对应

package org.apache.spark.scheduler

import java.util.Properties

/**
 * A set of tasks submitted together to the low-level TaskScheduler, usually representing
 * missing partitions of a particular stage.
 */
private[spark] class TaskSet(
    val tasks: Array[Task[_]],
    val stageId: Int,
    val attempt: Int,
    val priority: Int,
    val properties: Properties) {
    val id: String = stageId + "." + attempt

  override def toString: String = "TaskSet " + id
}

SparkContext -> DAGScheduler -> TaskScheduler -> TaskSchedularImpl -> SparkDeploySchedulerBackend ->CoarseGrainedExecutorBackend 脉络关系

SparkContext 创建DAGScheduler和TaskScheduler

  // Create and start the scheduler
  private[spark] var taskScheduler = SparkContext.createTaskScheduler(this, master)
  private val heartbeatReceiver = env.actorSystem.actorOf(
    Props(new HeartbeatReceiver(taskScheduler)), "HeartbeatReceiver")
  @volatile private[spark] var dagScheduler: DAGScheduler = _
  try {
    dagScheduler = new DAGScheduler(this)
  } catch {
    case e: Exception => throw
      new SparkException("DAGScheduler cannot be initialized due to %s".format(e.getMessage))
  }

  // start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
  // constructor
  taskScheduler.start()

创建TaskScheduler的时候调用了下面的方法，我们以Standalone模式为例，Spark_REGEX, 这里会生成一个TaskScheduler的实现类，并且生成一个backend

 /** Creates a task scheduler based on a given master URL. Extracted for testing. */
  private def createTaskScheduler(sc: SparkContext, master: String): TaskScheduler = {
    // Regular expression used for local[N] and local[*] master formats
    val LOCAL_N_REGEX = """local\[([0-9]+|\*)\]""".r
    // Regular expression for local[N, maxRetries], used in tests with failing tasks
    val LOCAL_N_FAILURES_REGEX = """local\[([0-9]+|\*)\s*,\s*([0-9]+)\]""".r
    // Regular expression for simulating a Spark cluster of [N, cores, memory] locally
    val LOCAL_CLUSTER_REGEX = """local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r
    // Regular expression for connecting to Spark deploy clusters
    val SPARK_REGEX = """spark://(.*)""".r
    // Regular expression for connection to Mesos cluster by mesos:// or zk:// url
    val MESOS_REGEX = """(mesos|zk)://.*""".r
    // Regular expression for connection to Simr cluster
    val SIMR_REGEX = """simr://(.*)""".r

    // When running locally, don't try to re-execute tasks on failure.
    val MAX_LOCAL_TASK_FAILURES = 1

    master match {
      case "local" =>
        val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
        val backend = new LocalBackend(scheduler, 1)
        scheduler.initialize(backend)
        scheduler

      case LOCAL_N_REGEX(threads) =>
        def localCpuCount = Runtime.getRuntime.availableProcessors()
        // local[*] estimates the number of cores on the machine; local[N] uses exactly N threads.
        val threadCount = if (threads == "*") localCpuCount else threads.toInt
        val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
        val backend = new LocalBackend(scheduler, threadCount)
        scheduler.initialize(backend)
        scheduler

      case LOCAL_N_FAILURES_REGEX(threads, maxFailures) =>
        def localCpuCount = Runtime.getRuntime.availableProcessors()
        // local[*, M] means the number of cores on the computer with M failures
        // local[N, M] means exactly N threads with M failures
        val threadCount = if (threads == "*") localCpuCount else threads.toInt
        val scheduler = new TaskSchedulerImpl(sc, maxFailures.toInt, isLocal = true)
        val backend = new LocalBackend(scheduler, threadCount)
        scheduler.initialize(backend)
        scheduler

      case SPARK_REGEX(sparkUrl) =>
        val scheduler = new TaskSchedulerImpl(sc)
        val masterUrls = sparkUrl.split(",").map("spark://" + _)
        val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
        scheduler.initialize(backend)
        scheduler

SparkDeploySchedulerBackend，val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
args, sc.executorEnvs, classPathEntries, libraryPathEntries, javaOpts)，注意这里的CoarseGrainedExecutorBackend。

package org.apache.spark.scheduler.cluster

import org.apache.spark.{Logging, SparkConf, SparkContext, SparkEnv}
import org.apache.spark.deploy.{ApplicationDescription, Command}
import org.apache.spark.deploy.client.{AppClient, AppClientListener}
import org.apache.spark.scheduler.{ExecutorExited, ExecutorLossReason, SlaveLost, TaskSchedulerImpl}
import org.apache.spark.util.Utils

private[spark] class SparkDeploySchedulerBackend(
    scheduler: TaskSchedulerImpl,
    sc: SparkContext,
    masters: Array[String])
  extends CoarseGrainedSchedulerBackend(scheduler, sc.env.actorSystem)
  with AppClientListener
  with Logging {

  var client: AppClient = null
  var stopping = false
  var shutdownCallback : (SparkDeploySchedulerBackend) => Unit = _
  var appId: String = _

  val registrationLock = new Object()
  var registrationDone = false

  val maxCores = conf.getOption("spark.cores.max").map(_.toInt)
  val totalExpectedCores = maxCores.getOrElse(0)

  override def start() {
    super.start()

    // The endpoint for executors to talk to us
    val driverUrl = "akka.tcp://%s@%s:%s/user/%s".format(
      SparkEnv.driverActorSystemName,
      conf.get("spark.driver.host"),
      conf.get("spark.driver.port"),
      CoarseGrainedSchedulerBackend.ACTOR_NAME)
    val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}", "{{WORKER_URL}}")
    val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
      .map(Utils.splitCommandString).getOrElse(Seq.empty)
    val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath").toSeq.flatMap { cp =>
      cp.split(java.io.File.pathSeparator)
    }
    val libraryPathEntries =
      sc.conf.getOption("spark.executor.extraLibraryPath").toSeq.flatMap { cp =>
        cp.split(java.io.File.pathSeparator)
      }

    // Start executors with a few necessary configs for registering with the scheduler
    val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
    val javaOpts = sparkJavaOpts ++ extraJavaOpts
    val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
      args, sc.executorEnvs, classPathEntries, libraryPathEntries, javaOpts)
    val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
      sc.ui.appUIAddress, sc.eventLogger.map(_.logDir))

    client = new AppClient(sc.env.actorSystem, masters, appDesc, this, conf)
    client.start()

    waitForRegistration()
  }

很明显CoarseGrainedExecutorBackend继承了Actor，根据他的方法可以看出，当接收到msg后，这个backend首先反序列化数据，然后调用executor执行task。根据类的结构也可以看出一个backend用来host一个executor，每个backend独立和driver进行通信。

package org.apache.spark.executor

import java.nio.ByteBuffer

import scala.concurrent.Await

import akka.actor.{Actor, ActorSelection, Props}
import akka.pattern.Patterns
import akka.remote.{RemotingLifecycleEvent, DisassociatedEvent}

import org.apache.spark.{Logging, SecurityManager, SparkConf, SparkEnv}
import org.apache.spark.TaskState.TaskState
import org.apache.spark.deploy.SparkHadoopUtil
import org.apache.spark.deploy.worker.WorkerWatcher
import org.apache.spark.scheduler.TaskDescription
import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
import org.apache.spark.util.{ActorLogReceive, AkkaUtils, SignalLogger, Utils}

private[spark] class CoarseGrainedExecutorBackend(
    driverUrl: String,
    executorId: String,
    hostPort: String,
    cores: Int,
    sparkProperties: Seq[(String, String)])
  extends Actor with ActorLogReceive with ExecutorBackend with Logging {

  Utils.checkHostPort(hostPort, "Expected hostport")

  var executor: Executor = null
  var driver: ActorSelection = null

  override def preStart() {
    logInfo("Connecting to driver: " + driverUrl)
    driver = context.actorSelection(driverUrl)
    driver ! RegisterExecutor(executorId, hostPort, cores)
    context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent])
  }

  override def receiveWithLogging = {
    case RegisteredExecutor =>
      logInfo("Successfully registered with driver")
      // Make this host instead of hostPort ?
      executor = new Executor(executorId, Utils.parseHostPort(hostPort)._1, sparkProperties,
        false)

    case RegisterExecutorFailed(message) =>
      logError("Slave registration failed: " + message)
      System.exit(1)

    case LaunchTask(data) =>
      if (executor == null) {
        logError("Received LaunchTask command but executor was null")
        System.exit(1)
      } else {
        val ser = SparkEnv.get.closureSerializer.newInstance()
        val taskDesc = ser.deserialize[TaskDescription](data.value)
        logInfo("Got assigned task " + taskDesc.taskId)
        executor.launchTask(this, taskDesc.taskId, taskDesc.name, taskDesc.serializedTask)
      }

    case KillTask(taskId, _, interruptThread) =>
      if (executor == null) {
        logError("Received KillTask command but executor was null")
        System.exit(1)
      } else {
        executor.killTask(taskId, interruptThread)
      }

    case x: DisassociatedEvent =>
      logError(s"Driver $x disassociated! Shutting down.")
      System.exit(1)

    case StopExecutor =>
      logInfo("Driver commanded a shutdown")
      executor.stop()
      context.stop(self)
      context.system.shutdown()
  }

  override def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer) {
    driver ! StatusUpdate(executorId, taskId, state, data)
  }
}

private[spark] object CoarseGrainedExecutorBackend extends Logging {

  private def run(
      driverUrl: String,
      executorId: String,
      hostname: String,
      cores: Int,
      workerUrl: Option[String]) {

    SignalLogger.register(log)

    SparkHadoopUtil.get.runAsSparkUser { () =>
      // Debug code
      Utils.checkHost(hostname)

      // Bootstrap to fetch the driver's Spark properties.
      val executorConf = new SparkConf
      val port = executorConf.getInt("spark.executor.port", 0)
      val (fetcher, _) = AkkaUtils.createActorSystem(
        "driverPropsFetcher", hostname, port, executorConf, new SecurityManager(executorConf))
      val driver = fetcher.actorSelection(driverUrl)
      val timeout = AkkaUtils.askTimeout(executorConf)
      val fut = Patterns.ask(driver, RetrieveSparkProps, timeout)
      val props = Await.result(fut, timeout).asInstanceOf[Seq[(String, String)]]
      fetcher.shutdown()

      // Create a new ActorSystem using driver's Spark properties to run the backend.
      val driverConf = new SparkConf().setAll(props)
      val (actorSystem, boundPort) = AkkaUtils.createActorSystem(
        "sparkExecutor", hostname, port, driverConf, new SecurityManager(driverConf))
      // set it
      val sparkHostPort = hostname + ":" + boundPort
      actorSystem.actorOf(
        Props(classOf[CoarseGrainedExecutorBackend],
          driverUrl, executorId, sparkHostPort, cores, props),
        name = "Executor")
      workerUrl.foreach { url =>
        actorSystem.actorOf(Props(classOf[WorkerWatcher], url), name = "WorkerWatcher")
      }
      actorSystem.awaitTermination()
    }
  }

  def main(args: Array[String]) {
    args.length match {
      case x if x < 4 =>
        System.err.println(
          // Worker url is used in spark standalone mode to enforce fate-sharing with worker
          "Usage: CoarseGrainedExecutorBackend <driverUrl> <executorId> <hostname> " +
          "<cores> [<workerUrl>]")
        System.exit(1)
      case 4 =>
        run(args(0), args(1), args(2), args(3).toInt, None)
      case x if x > 4 =>
        run(args(0), args(1), args(2), args(3).toInt, Some(args(4)))
    }
  }
}

OK，我们继续看SparkContext中后续的宁日

      case SPARK_REGEX(sparkUrl) =>
        val scheduler = new TaskSchedulerImpl(sc)
        val masterUrls = sparkUrl.split(",").map("spark://" + _)
        val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
        scheduler.initialize(backend)
        scheduler

在生成SchedulerBackend之后，开始初始化scheduler，来看看这个初始化方法在做什么。初始化方法中拿到backend之后，初始化调度模式FIFO或者FAIR模式，默认FIFO

package org.apache.spark.scheduler

import java.nio.ByteBuffer
import java.util.{TimerTask, Timer}
import java.util.concurrent.atomic.AtomicLong

import scala.concurrent.duration._
import scala.collection.mutable.ArrayBuffer
import scala.collection.mutable.HashMap
import scala.collection.mutable.HashSet
import scala.language.postfixOps
import scala.util.Random

import org.apache.spark._
import org.apache.spark.TaskState.TaskState
import org.apache.spark.scheduler.SchedulingMode.SchedulingMode
import org.apache.spark.util.Utils
import org.apache.spark.executor.TaskMetrics
import org.apache.spark.storage.BlockManagerId
import akka.actor.Props

/**
 * Schedules tasks for multiple types of clusters by acting through a SchedulerBackend.
 * It can also work with a local setup by using a LocalBackend and setting isLocal to true.
 * It handles common logic, like determining a scheduling order across jobs, waking up to launch
 * speculative tasks, etc.
 *
 * Clients should first call initialize() and start(), then submit task sets through the
 * runTasks method.
 *
 * THREADING: SchedulerBackends and task-submitting clients can call this class from multiple
 * threads, so it needs locks in public API methods to maintain its state. In addition, some
 * SchedulerBackends synchronize on themselves when they want to send events here, and then
 * acquire a lock on us, so we need to make sure that we don't try to lock the backend while
 * we are holding a lock on ourselves.
 */
private[spark] class TaskSchedulerImpl(
    val sc: SparkContext,
    val maxTaskFailures: Int,
    isLocal: Boolean = false)
  extends TaskScheduler with Logging
{
  def this(sc: SparkContext) = this(sc, sc.conf.getInt("spark.task.maxFailures", 4))

  val conf = sc.conf

  // How often to check for speculative tasks
  val SPECULATION_INTERVAL = conf.getLong("spark.speculation.interval", 100)

  // Threshold above which we warn user initial TaskSet may be starved
  val STARVATION_TIMEOUT = conf.getLong("spark.starvation.timeout", 15000)

  // CPUs to request per task
  val CPUS_PER_TASK = conf.getInt("spark.task.cpus", 1)

  // TaskSetManagers are not thread safe, so any access to one should be synchronized
  // on this class.
  val activeTaskSets = new HashMap[String, TaskSetManager]

  val taskIdToTaskSetId = new HashMap[Long, String]
  val taskIdToExecutorId = new HashMap[Long, String]

  @volatile private var hasReceivedTask = false
  @volatile private var hasLaunchedTask = false
  private val starvationTimer = new Timer(true)

  // Incrementing task IDs
  val nextTaskId = new AtomicLong(0)

  // Which executor IDs we have executors on
  val activeExecutorIds = new HashSet[String]

  // The set of executors we have on each host; this is used to compute hostsAlive, which
  // in turn is used to decide when we can attain data locality on a given host
  protected val executorsByHost = new HashMap[String, HashSet[String]]

  protected val hostsByRack = new HashMap[String, HashSet[String]]

  protected val executorIdToHost = new HashMap[String, String]

  // Listener object to pass upcalls into
  var dagScheduler: DAGScheduler = null

  var backend: SchedulerBackend = null

  val mapOutputTracker = SparkEnv.get.mapOutputTracker

  var schedulableBuilder: SchedulableBuilder = null
  var rootPool: Pool = null
  // default scheduler is FIFO
  private val schedulingModeConf = conf.get("spark.scheduler.mode", "FIFO")
  val schedulingMode: SchedulingMode = try {
    SchedulingMode.withName(schedulingModeConf.toUpperCase)
  } catch {
    case e: java.util.NoSuchElementException =>
      throw new SparkException(s"Unrecognized spark.scheduler.mode: $schedulingModeConf")
  }

  // This is a var so that we can reset it for testing purposes.
  private[spark] var taskResultGetter = new TaskResultGetter(sc.env, this)

  override def setDAGScheduler(dagScheduler: DAGScheduler) {
    this.dagScheduler = dagScheduler
  }

  def initialize(backend: SchedulerBackend) {
    this.backend = backend
    // temporarily set rootPool name to empty
    rootPool = new Pool("", schedulingMode, 0, 0)
    schedulableBuilder = {
      schedulingMode match {
        case SchedulingMode.FIFO =>
          new FIFOSchedulableBuilder(rootPool)
        case SchedulingMode.FAIR =>
          new FairSchedulableBuilder(rootPool, conf)
      }
    }
    schedulableBuilder.buildPools()
  }

下面我们看一下taskscheduler启动过程

还是在SparkContext中，嗲用了taskscheduler的start方法

  // Create and start the scheduler
  private[spark] var taskScheduler = SparkContext.createTaskScheduler(this, master)
  private val heartbeatReceiver = env.actorSystem.actorOf(
    Props(new HeartbeatReceiver(taskScheduler)), "HeartbeatReceiver")
  @volatile private[spark] var dagScheduler: DAGScheduler = _
  try {
    dagScheduler = new DAGScheduler(this)
  } catch {
    case e: Exception => throw
      new SparkException("DAGScheduler cannot be initialized due to %s".format(e.getMessage))
  }

  // start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
  // constructor
  taskScheduler.start()

在TaskschedulerImpl中start方法主要用来启动backend

  override def start() {
    backend.start()

    if (!isLocal && conf.getBoolean("spark.speculation", false)) {
      logInfo("Starting speculative execution thread")
      import sc.env.actorSystem.dispatcher
      sc.env.actorSystem.scheduler.schedule(SPECULATION_INTERVAL milliseconds,
            SPECULATION_INTERVAL milliseconds) {
        Utils.tryOrExit { checkSpeculatableTasks() }
      }
    }
  }

SparkDeploySchedulerBackend启动函数会调用父类的start方法，最终会调用到CoarseGrainedSchedulerBackend的start方法。

  override def start() {
    super.start()

    // The endpoint for executors to talk to us
    val driverUrl = "akka.tcp://%s@%s:%s/user/%s".format(
      SparkEnv.driverActorSystemName,
      conf.get("spark.driver.host"),
      conf.get("spark.driver.port"),
      CoarseGrainedSchedulerBackend.ACTOR_NAME)
    val args = Seq(driverUrl, "{{EXECUTOR_ID}}", "{{HOSTNAME}}", "{{CORES}}", "{{WORKER_URL}}")
    val extraJavaOpts = sc.conf.getOption("spark.executor.extraJavaOptions")
      .map(Utils.splitCommandString).getOrElse(Seq.empty)
    val classPathEntries = sc.conf.getOption("spark.executor.extraClassPath").toSeq.flatMap { cp =>
      cp.split(java.io.File.pathSeparator)
    }
    val libraryPathEntries =
      sc.conf.getOption("spark.executor.extraLibraryPath").toSeq.flatMap { cp =>
        cp.split(java.io.File.pathSeparator)
      }

    // Start executors with a few necessary configs for registering with the scheduler
    val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
    val javaOpts = sparkJavaOpts ++ extraJavaOpts
    val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
      args, sc.executorEnvs, classPathEntries, libraryPathEntries, javaOpts)
    val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
      sc.ui.appUIAddress, sc.eventLogger.map(_.logDir))

    client = new AppClient(sc.env.actorSystem, masters, appDesc, this, conf)
    client.start()

    waitForRegistration()
  }

那么来看下上层父类中到底有什么CoarseGrainedSchedulerBackend。在这个类中有一个属性var driverActor: ActorRef，这个driverActor就是负责和executor进行通信的driver的代理人了。

var driverActor: ActorRef = null

import java.util.concurrent.atomic.AtomicInteger


import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet}
import scala.concurrent.Await
import scala.concurrent.duration._


import akka.actor._
import akka.pattern.ask
import akka.remote.{DisassociatedEvent, RemotingLifecycleEvent}


import org.apache.spark.{SparkEnv, Logging, SparkException, TaskState}
import org.apache.spark.scheduler.{SchedulerBackend, SlaveLost, TaskDescription, TaskSchedulerImpl, WorkerOffer}
import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
import org.apache.spark.util.{ActorLogReceive, SerializableBuffer, AkkaUtils, Utils}
import org.apache.spark.ui.JettyUtils


/**
 * A scheduler backend that waits for coarse grained executors to connect to it through Akka.
 * This backend holds onto each executor for the duration of the Spark job rather than relinquishing
 * executors whenever a task is done and asking the scheduler to launch a new executor for
 * each new task. Executors may be launched in a variety of ways, such as Mesos tasks for the
 * coarse-grained Mesos mode or standalone processes for Spark's standalone deploy mode
 * (spark.deploy.*).
 */
private[spark]
class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: ActorSystem)
  extends SchedulerBackend with Logging

DriverActor的定义在CoarseGrainedSchedulerBackend内部，当backend调用start方法时，如下，便初始化了driverActor

override def start() {
    val properties = new ArrayBuffer[(String, String)]
    for ((key, value) <- scheduler.sc.conf.getAll) {
      if (key.startsWith("spark.")) {
        properties += ((key, value))
      }
    }
    // TODO (prashant) send conf instead of properties
    driverActor = actorSystem.actorOf(
      Props(new DriverActor(properties)), name = CoarseGrainedSchedulerBackend.ACTOR_NAME)
  }

class DriverActor(sparkProperties: Seq[(String, String)]) extends Actor with ActorLogReceive {

    override protected def log = CoarseGrainedSchedulerBackend.this.log

    private val executorActor = new HashMap[String, ActorRef]
    private val executorAddress = new HashMap[String, Address]
    private val executorHost = new HashMap[String, String]
    private val freeCores = new HashMap[String, Int]
    private val totalCores = new HashMap[String, Int]
    private val addressToExecutorId = new HashMap[Address, String]

    override def preStart() {
      // Listen for remote client disconnection events, since they don't go through Akka's watch()
      context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent])

      // Periodically revive offers to allow delay scheduling to work
      val reviveInterval = conf.getLong("spark.scheduler.revive.interval", 1000)
      import context.dispatcher
      context.system.scheduler.schedule(0.millis, reviveInterval.millis, self, ReviveOffers)
    }

    def receiveWithLogging = {
      case RegisterExecutor(executorId, hostPort, cores) =>
        Utils.checkHostPort(hostPort, "Host port expected " + hostPort)
        if (executorActor.contains(executorId)) {
          sender ! RegisterExecutorFailed("Duplicate executor ID: " + executorId)
        } else {
          logInfo("Registered executor: " + sender + " with ID " + executorId)
          sender ! RegisteredExecutor
          executorActor(executorId) = sender
          executorHost(executorId) = Utils.parseHostPort(hostPort)._1
          totalCores(executorId) = cores
          freeCores(executorId) = cores
          executorAddress(executorId) = sender.path.address
          addressToExecutorId(sender.path.address) = executorId
          totalCoreCount.addAndGet(cores)
          totalRegisteredExecutors.addAndGet(1)
          makeOffers()
        }

      case StatusUpdate(executorId, taskId, state, data) =>
        scheduler.statusUpdate(taskId, state, data.value)
        if (TaskState.isFinished(state)) {
          if (executorActor.contains(executorId)) {
            freeCores(executorId) += scheduler.CPUS_PER_TASK
            makeOffers(executorId)
          } else {
            // Ignoring the update since we don't know about the executor.
            val msg = "Ignored task status update (%d state %s) from unknown executor %s with ID %s"
            logWarning(msg.format(taskId, state, sender, executorId))
          }
        }

      case ReviveOffers =>
        makeOffers()

      case KillTask(taskId, executorId, interruptThread) =>
        executorActor(executorId) ! KillTask(taskId, executorId, interruptThread)

      case StopDriver =>
        sender ! true
        context.stop(self)

      case StopExecutors =>
        logInfo("Asking each executor to shut down")
        for (executor <- executorActor.values) {
          executor ! StopExecutor
        }
        sender ! true

      case RemoveExecutor(executorId, reason) =>
        removeExecutor(executorId, reason)
        sender ! true

      case AddWebUIFilter(filterName, filterParams, proxyBase) =>
        addWebUIFilter(filterName, filterParams, proxyBase)
        sender ! true
      case DisassociatedEvent(_, address, _) =>
        addressToExecutorId.get(address).foreach(removeExecutor(_,
          "remote Akka client disassociated"))

      case RetrieveSparkProps =>
        sender ! sparkProperties
    }

    // Make fake resource offers on all executors
    def makeOffers() {
      launchTasks(scheduler.resourceOffers(
        executorHost.toArray.map {case (id, host) => new WorkerOffer(id, host, freeCores(id))}))
    }

    // Make fake resource offers on just one executor
    def makeOffers(executorId: String) {
      launchTasks(scheduler.resourceOffers(
        Seq(new WorkerOffer(executorId, executorHost(executorId), freeCores(executorId)))))
    }

    // Launch tasks returned by a set of resource offers
    def launchTasks(tasks: Seq[Seq[TaskDescription]]) {
      for (task <- tasks.flatten) {
        val ser = SparkEnv.get.closureSerializer.newInstance()
        val serializedTask = ser.serialize(task)
        if (serializedTask.limit >= akkaFrameSize - AkkaUtils.reservedSizeBytes) {
          val taskSetId = scheduler.taskIdToTaskSetId(task.taskId)
          scheduler.activeTaskSets.get(taskSetId).foreach { taskSet =>
            try {
              var msg = "Serialized task %s:%d was %d bytes, which exceeds max allowed: " +
                "spark.akka.frameSize (%d bytes) - reserved (%d bytes). Consider increasing " +
                "spark.akka.frameSize or using broadcast variables for large values."
              msg = msg.format(task.taskId, task.index, serializedTask.limit, akkaFrameSize,
                AkkaUtils.reservedSizeBytes)
              taskSet.abort(msg)
            } catch {
              case e: Exception => logError("Exception in error callback", e)
            }
          }
        }
        else {
          freeCores(task.executorId) -= scheduler.CPUS_PER_TASK
          executorActor(task.executorId) ! LaunchTask(new SerializableBuffer(serializedTask))
        }
      }
    }

    // Remove a disconnected slave from the cluster
    def removeExecutor(executorId: String, reason: String) {
      if (executorActor.contains(executorId)) {
        logInfo("Executor " + executorId + " disconnected, so removing it")
        val numCores = totalCores(executorId)
        executorActor -= executorId
        executorHost -= executorId
        addressToExecutorId -= executorAddress(executorId)
        executorAddress -= executorId
        totalCores -= executorId
        freeCores -= executorId
        totalCoreCount.addAndGet(-numCores)
        scheduler.executorLost(executorId, SlaveLost(reason))
      }
    }
  }

下面一部分，记录一下action对应的执行过程。

以count方法为例

RDD.scala, count方法会调用sparkcontext的runJob方法

  /**
   * Return the number of elements in the RDD.
   */
  def count(): Long = sc.runJob(this, Utils.getIteratorSize _).sum

SparkContext.scala, 会紧跟着调用DAGScheduler的runJob方法

/**
   * Run a function on a given set of partitions in an RDD and pass the results to the given
   * handler function. This is the main entry point for all actions in Spark. The allowLocal
   * flag specifies whether the scheduler can run the computation on the driver rather than
   * shipping it out to the cluster, for short actions like first().
   */
  def runJob[T, U: ClassTag](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      allowLocal: Boolean,
      resultHandler: (Int, U) => Unit) {
    if (dagScheduler == null) {
      throw new SparkException("SparkContext has been shutdown")
    }
    val callSite = getCallSite
    val cleanedFunc = clean(func)
    logInfo("Starting job: " + callSite.shortForm)
    val start = System.nanoTime
    dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, allowLocal,
      resultHandler, localProperties.get)
    logInfo(
      "Job finished: " + callSite.shortForm + ", took " + (System.nanoTime - start) / 1e9 + " s")
    rdd.doCheckpoint()
  }

DAGScheduler.scala

  def runJob[T, U: ClassTag](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      callSite: CallSite,
      allowLocal: Boolean,
      resultHandler: (Int, U) => Unit,
      properties: Properties = null)
  {
    val waiter = submitJob(rdd, func, partitions, callSite, allowLocal, resultHandler, properties)
    waiter.awaitResult() match {
      case JobSucceeded => {}
      case JobFailed(exception: Exception) =>
        logInfo("Failed to run " + callSite.shortForm)
        throw exception
    }
  }

然后调用内部的submitJob方法，最终会走到eventProcessActor ! JobSubmitted。这里的eventProcessActor是DAGScheduler的内部类DAGSchedulerEventProcessActor的实例对象。

  /**
   * Submit a job to the job scheduler and get a JobWaiter object back. The JobWaiter object
   * can be used to block until the the job finishes executing or can be used to cancel the job.
   */
  def submitJob[T, U](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      callSite: CallSite,
      allowLocal: Boolean,
      resultHandler: (Int, U) => Unit,
      properties: Properties = null): JobWaiter[U] =
  {
    // Check to make sure we are not launching a task on a partition that does not exist.
    val maxPartitions = rdd.partitions.length
    partitions.find(p => p >= maxPartitions || p < 0).foreach { p =>
      throw new IllegalArgumentException(
        "Attempting to access a non-existent partition: " + p + ". " +
          "Total number of partitions: " + maxPartitions)
    }

    val jobId = nextJobId.getAndIncrement()
    if (partitions.size == 0) {
      return new JobWaiter[U](this, jobId, 0, resultHandler)
    }

    assert(partitions.size > 0)
    val func2 = func.asInstanceOf[(TaskContext, Iterator[_]) => _]
    val waiter = new JobWaiter(this, jobId, partitions.size, resultHandler)
    eventProcessActor ! JobSubmitted(
      jobId, rdd, func2, partitions.toArray, allowLocal, callSite, waiter, properties)
    waiter
  }

下面是这个actor的部分内容

private[scheduler] class DAGSchedulerEventProcessActor(dagScheduler: DAGScheduler)
  extends Actor with Logging {

  override def preStart() {
    // set DAGScheduler for taskScheduler to ensure eventProcessActor is always
    // valid when the messages arrive
    dagScheduler.taskScheduler.setDAGScheduler(dagScheduler)
  }

  /**
   * The main event loop of the DAG scheduler.
   */
  def receive = {
    case JobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite, listener, properties) =>
      dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, allowLocal, callSite,
        listener, properties)

--------->生成最后一个final stage之后，使用submitStage方法提交

private[scheduler] def handleJobSubmitted(jobId: Int,
      finalRDD: RDD[_],
      func: (TaskContext, Iterator[_]) => _,
      partitions: Array[Int],
      allowLocal: Boolean,
      callSite: CallSite,
      listener: JobListener,
      properties: Properties = null)
  {
    var finalStage: Stage = null
    try {
      // New stage creation may throw an exception if, for example, jobs are run on a
      // HadoopRDD whose underlying HDFS files have been deleted.
      finalStage = newStage(finalRDD, partitions.size, None, jobId, callSite)
    } catch {
      case e: Exception =>
        logWarning("Creating new stage failed due to exception - job: " + jobId, e)
        listener.jobFailed(e)
        return
    }
    if (finalStage != null) {
      val job = new ActiveJob(jobId, finalStage, func, partitions, callSite, listener, properties)
      clearCacheLocs()
      logInfo("Got job %s (%s) with %d output partitions (allowLocal=%s)".format(
        job.jobId, callSite.shortForm, partitions.length, allowLocal))
      logInfo("Final stage: " + finalStage + "(" + finalStage.name + ")")
      logInfo("Parents of final stage: " + finalStage.parents)
      logInfo("Missing parents: " + getMissingParentStages(finalStage))
      val shouldRunLocally =
        localExecutionEnabled && allowLocal && finalStage.parents.isEmpty && partitions.length == 1
      if (shouldRunLocally) {
        // Compute very short actions like first() or take() with no parent stages locally.
        listenerBus.post(SparkListenerJobStart(job.jobId, Array[Int](), properties))
        runLocally(job)
      } else {
        jobIdToActiveJob(jobId) = job
        activeJobs += job
        finalStage.resultOfJob = Some(job)
        listenerBus.post(SparkListenerJobStart(job.jobId, jobIdToStageIds(jobId).toArray,
          properties))
        submitStage(finalStage)
      }
    }
    submitWaitingStages()
  }

然后submitStage方法会计算是否有依赖的stage，进行递归计算，然后提交没有任何依赖的的stage， submitMissingTasks

  /** Submits stage, but first recursively submits any missing parents. */
  private def submitStage(stage: Stage) {
    val jobId = activeJobForStage(stage)
    if (jobId.isDefined) {
      logDebug("submitStage(" + stage + ")")
      if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {
        val missing = getMissingParentStages(stage).sortBy(_.id)
        logDebug("missing: " + missing)
        if (missing == Nil) {
          logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")
          submitMissingTasks(stage, jobId.get)
        } else {
          for (parent <- missing) {
            submitStage(parent)
          }
          waitingStages += stage
        }
      }
    } else {
      abortStage(stage, "No active job for stage " + stage.id)
    }
  }

submitMissingTasks最终会把任务提交给taskScheduler，通过方法taskScheduler.submitTasks

  /** Called when stage's parents are available and we can now do its task. */
  private def submitMissingTasks(stage: Stage, jobId: Int) {
    logDebug("submitMissingTasks(" + stage + ")")
    // Get our pending tasks and remember them in our pendingTasks entry
    stage.pendingTasks.clear()

    // First figure out the indexes of partition ids to compute.
    val partitionsToCompute: Seq[Int] = {
      if (stage.isShuffleMap) {
        (0 until stage.numPartitions).filter(id => stage.outputLocs(id) == Nil)
      } else {
        val job = stage.resultOfJob.get
        (0 until job.numPartitions).filter(id => !job.finished(id))
      }
    }

    val properties = if (jobIdToActiveJob.contains(jobId)) {
      jobIdToActiveJob(stage.jobId).properties
    } else {
      // this stage will be assigned to "default" pool
      null
    }

    runningStages += stage
    // SparkListenerStageSubmitted should be posted before testing whether tasks are
    // serializable. If tasks are not serializable, a SparkListenerStageCompleted event
    // will be posted, which should always come after a corresponding SparkListenerStageSubmitted
    // event.
    stage.latestInfo = StageInfo.fromStage(stage, Some(partitionsToCompute.size))
    listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties))

    // TODO: Maybe we can keep the taskBinary in Stage to avoid serializing it multiple times.
    // Broadcasted binary for the task, used to dispatch tasks to executors. Note that we broadcast
    // the serialized copy of the RDD and for each task we will deserialize it, which means each
    // task gets a different copy of the RDD. This provides stronger isolation between tasks that
    // might modify state of objects referenced in their closures. This is necessary in Hadoop
    // where the JobConf/Configuration object is not thread-safe.
    var taskBinary: Broadcast[Array[Byte]] = null
    try {
      // For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep).
      // For ResultTask, serialize and broadcast (rdd, func).
      val taskBinaryBytes: Array[Byte] =
        if (stage.isShuffleMap) {
          closureSerializer.serialize((stage.rdd, stage.shuffleDep.get) : AnyRef).array()
        } else {
          closureSerializer.serialize((stage.rdd, stage.resultOfJob.get.func) : AnyRef).array()
        }
      taskBinary = sc.broadcast(taskBinaryBytes)
    } catch {
      // In the case of a failure during serialization, abort the stage.
      case e: NotSerializableException =>
        abortStage(stage, "Task not serializable: " + e.toString)
        runningStages -= stage
        return
      case NonFatal(e) =>
        abortStage(stage, s"Task serialization failed: $e\n${e.getStackTraceString}")
        runningStages -= stage
        return
    }

    val tasks: Seq[Task[_]] = if (stage.isShuffleMap) {
      partitionsToCompute.map { id =>
        val locs = getPreferredLocs(stage.rdd, id)
        val part = stage.rdd.partitions(id)
        new ShuffleMapTask(stage.id, taskBinary, part, locs)
      }
    } else {
      val job = stage.resultOfJob.get
      partitionsToCompute.map { id =>
        val p: Int = job.partitions(id)
        val part = stage.rdd.partitions(p)
        val locs = getPreferredLocs(stage.rdd, p)
        new ResultTask(stage.id, taskBinary, part, locs, id)
      }
    }

    if (tasks.size > 0) {
      // Preemptively serialize a task to make sure it can be serialized. We are catching this
      // exception here because it would be fairly hard to catch the non-serializable exception
      // down the road, where we have several different implementations for local scheduler and
      // cluster schedulers.
      //
      // We've already serialized RDDs and closures in taskBinary, but here we check for all other
      // objects such as Partition.
      try {
        closureSerializer.serialize(tasks.head)
      } catch {
        case e: NotSerializableException =>
          abortStage(stage, "Task not serializable: " + e.toString)
          runningStages -= stage
          return
        case NonFatal(e) => // Other exceptions, such as IllegalArgumentException from Kryo.
          abortStage(stage, s"Task serialization failed: $e\n${e.getStackTraceString}")
          runningStages -= stage
          return
      }

      logInfo("Submitting " + tasks.size + " missing tasks from " + stage + " (" + stage.rdd + ")")
      stage.pendingTasks ++= tasks
      logDebug("New pending tasks: " + stage.pendingTasks)
      taskScheduler.submitTasks(
        new TaskSet(tasks.toArray, stage.id, stage.newAttemptId(), stage.jobId, properties))
      stage.latestInfo.submissionTime = Some(clock.getTime())
    } else {
      // Because we posted SparkListenerStageSubmitted earlier, we should post
      // SparkListenerStageCompleted here in case there are no tasks to run.
      listenerBus.post(SparkListenerStageCompleted(stage.latestInfo))
      logDebug("Stage " + stage + " is actually done; %b %d %d".format(
        stage.isAvailable, stage.numAvailableOutputs, stage.numPartitions))
      runningStages -= stage
    }
  }

下面进入TaskSchedulerImp中, 最终调用backend的reviveOffers， backend.reviveOffers()

TaskSchedulerImpl.scala

  override def submitTasks(taskSet: TaskSet) {
    val tasks = taskSet.tasks
    logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")
    this.synchronized {
      val manager = new TaskSetManager(this, taskSet, maxTaskFailures)
      activeTaskSets(taskSet.id) = manager
      schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)

      if (!isLocal && !hasReceivedTask) {
        starvationTimer.scheduleAtFixedRate(new TimerTask() {
          override def run() {
            if (!hasLaunchedTask) {
              logWarning("Initial job has not accepted any resources; " +
                "check your cluster UI to ensure that workers are registered " +
                "and have sufficient memory")
            } else {
              this.cancel()
            }
          }
        }, STARVATION_TIMEOUT, STARVATION_TIMEOUT)
      }
      hasReceivedTask = true
    }
    backend.reviveOffers()
  }

稍稍看看SchedulerBackend的结构，

根据注释和代码层级结构，可以得知，这里的backend是一个可插拔的实现，在不同场景下会调用不同的backend，在standalone情况下回调用CoarseGrainedScheduler

package org.apache.spark.scheduler

/**
 * A backend interface for scheduling systems that allows plugging in different ones under
 * TaskSchedulerImpl. We assume a Mesos-like model where the application gets resource offers as
 * machines become available and can launch tasks on them.
 */
private[spark] trait SchedulerBackend {
  def start(): Unit
  def stop(): Unit
  def reviveOffers(): Unit
  def defaultParallelism(): Int

  def killTask(taskId: Long, executorId: String, interruptThread: Boolean): Unit =
    throw new UnsupportedOperationException
  def isReady(): Boolean = true

  /**
   * The application ID associated with the job, if any.
   *
   * @return The application ID, or None if the backend does not provide an ID.
   */
  def applicationId(): Option[String] = None

}

backend.reviveOffers()表示driver需要资源运行程序

package org.apache.spark.scheduler.cluster

import java.util.concurrent.atomic.AtomicInteger

import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet}
import scala.concurrent.Await
import scala.concurrent.duration._

import akka.actor._
import akka.pattern.ask
import akka.remote.{DisassociatedEvent, RemotingLifecycleEvent}

import org.apache.spark.{SparkEnv, Logging, SparkException, TaskState}
import org.apache.spark.scheduler.{SchedulerBackend, SlaveLost, TaskDescription, TaskSchedulerImpl, WorkerOffer}
import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
import org.apache.spark.util.{ActorLogReceive, SerializableBuffer, AkkaUtils, Utils}
import org.apache.spark.ui.JettyUtils

/**
 * A scheduler backend that waits for coarse grained executors to connect to it through Akka.
 * This backend holds onto each executor for the duration of the Spark job rather than relinquishing
 * executors whenever a task is done and asking the scheduler to launch a new executor for
 * each new task. Executors may be launched in a variety of ways, such as Mesos tasks for the
 * coarse-grained Mesos mode or standalone processes for Spark's standalone deploy mode
 * (spark.deploy.*).
 */
private[spark]
class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: ActorSystem)
  extends SchedulerBackend with Logging
{
  // Use an atomic variable to track total number of cores in the cluster for simplicity and speed
  var totalCoreCount = new AtomicInteger(0)
  var totalRegisteredExecutors = new AtomicInteger(0)
  val conf = scheduler.sc.conf
  private val timeout = AkkaUtils.askTimeout(conf)
  private val akkaFrameSize = AkkaUtils.maxFrameSizeBytes(conf)
  // Submit tasks only after (registered resources / total expected resources)
  // is equal to at least this value, that is double between 0 and 1.
  var minRegisteredRatio =
    math.min(1, conf.getDouble("spark.scheduler.minRegisteredResourcesRatio", 0))
  // Submit tasks after maxRegisteredWaitingTime milliseconds
  // if minRegisteredRatio has not yet been reached
  val maxRegisteredWaitingTime =
    conf.getInt("spark.scheduler.maxRegisteredResourcesWaitingTime", 30000)
  val createTime = System.currentTimeMillis()

  class DriverActor(sparkProperties: Seq[(String, String)]) extends Actor with ActorLogReceive {

    override protected def log = CoarseGrainedSchedulerBackend.this.log

    private val executorActor = new HashMap[String, ActorRef]
    private val executorAddress = new HashMap[String, Address]
    private val executorHost = new HashMap[String, String]
    private val freeCores = new HashMap[String, Int]
    private val totalCores = new HashMap[String, Int]
    private val addressToExecutorId = new HashMap[Address, String]

    override def preStart() {
      // Listen for remote client disconnection events, since they don't go through Akka's watch()
      context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent])

      // Periodically revive offers to allow delay scheduling to work
      val reviveInterval = conf.getLong("spark.scheduler.revive.interval", 1000)
      import context.dispatcher
      context.system.scheduler.schedule(0.millis, reviveInterval.millis, self, ReviveOffers)
    }

    def receiveWithLogging = {
      case RegisterExecutor(executorId, hostPort, cores) =>
        Utils.checkHostPort(hostPort, "Host port expected " + hostPort)
        if (executorActor.contains(executorId)) {
          sender ! RegisterExecutorFailed("Duplicate executor ID: " + executorId)
        } else {
          logInfo("Registered executor: " + sender + " with ID " + executorId)
          sender ! RegisteredExecutor
          executorActor(executorId) = sender
          executorHost(executorId) = Utils.parseHostPort(hostPort)._1
          totalCores(executorId) = cores
          freeCores(executorId) = cores
          executorAddress(executorId) = sender.path.address
          addressToExecutorId(sender.path.address) = executorId
          totalCoreCount.addAndGet(cores)
          totalRegisteredExecutors.addAndGet(1)
          makeOffers()
        }

      case StatusUpdate(executorId, taskId, state, data) =>
        scheduler.statusUpdate(taskId, state, data.value)
        if (TaskState.isFinished(state)) {
          if (executorActor.contains(executorId)) {
            freeCores(executorId) += scheduler.CPUS_PER_TASK
            makeOffers(executorId)
          } else {
            // Ignoring the update since we don't know about the executor.
            val msg = "Ignored task status update (%d state %s) from unknown executor %s with ID %s"
            logWarning(msg.format(taskId, state, sender, executorId))
          }
        }

      case ReviveOffers =>
        makeOffers()

      case KillTask(taskId, executorId, interruptThread) =>
        executorActor(executorId) ! KillTask(taskId, executorId, interruptThread)

      case StopDriver =>
        sender ! true
        context.stop(self)

      case StopExecutors =>
        logInfo("Asking each executor to shut down")
        for (executor <- executorActor.values) {
          executor ! StopExecutor
        }
        sender ! true

      case RemoveExecutor(executorId, reason) =>
        removeExecutor(executorId, reason)
        sender ! true

      case AddWebUIFilter(filterName, filterParams, proxyBase) =>
        addWebUIFilter(filterName, filterParams, proxyBase)
        sender ! true
      case DisassociatedEvent(_, address, _) =>
        addressToExecutorId.get(address).foreach(removeExecutor(_,
          "remote Akka client disassociated"))

      case RetrieveSparkProps =>
        sender ! sparkProperties
    }

    // Make fake resource offers on all executors
    def makeOffers() {
      launchTasks(scheduler.resourceOffers(
        executorHost.toArray.map {case (id, host) => new WorkerOffer(id, host, freeCores(id))}))
    }

    // Make fake resource offers on just one executor
    def makeOffers(executorId: String) {
      launchTasks(scheduler.resourceOffers(

        Seq(new WorkerOffer(executorId, executorHost(executorId), freeCores(executorId)))))
    }//WorkerOffer表示某Executor上的可用资源，freeCores该executor上的空闲Core

    // Launch tasks returned by a set of resource offers
    def launchTasks(tasks: Seq[Seq[TaskDescription]]) {
      for (task <- tasks.flatten) {
        val ser = SparkEnv.get.closureSerializer.newInstance()
        val serializedTask = ser.serialize(task)
        if (serializedTask.limit >= akkaFrameSize - AkkaUtils.reservedSizeBytes) {
          val taskSetId = scheduler.taskIdToTaskSetId(task.taskId)
          scheduler.activeTaskSets.get(taskSetId).foreach { taskSet =>
            try {
              var msg = "Serialized task %s:%d was %d bytes, which exceeds max allowed: " +
                "spark.akka.frameSize (%d bytes) - reserved (%d bytes). Consider increasing " +
                "spark.akka.frameSize or using broadcast variables for large values."
              msg = msg.format(task.taskId, task.index, serializedTask.limit, akkaFrameSize,
                AkkaUtils.reservedSizeBytes)
              taskSet.abort(msg)
            } catch {
              case e: Exception => logError("Exception in error callback", e)
            }
          }
        }
        else {
          freeCores(task.executorId) -= scheduler.CPUS_PER_TASK
          executorActor(task.executorId) ! LaunchTask(new SerializableBuffer(serializedTask))
        }
      }
    }
......
//向driverActor发送ReviveOffers请求
  override def reviveOffers() {
    driverActor ! ReviveOffers
  }

   // Launch tasks returned by a set of resource offers
    def launchTasks(tasks: Seq[Seq[TaskDescription]]) {
      for (task <- tasks.flatten) {
        val ser = SparkEnv.get.closureSerializer.newInstance()
        val serializedTask = ser.serialize(task)
        if (serializedTask.limit >= akkaFrameSize - AkkaUtils.reservedSizeBytes) {
          val taskSetId = scheduler.taskIdToTaskSetId(task.taskId)
          scheduler.activeTaskSets.get(taskSetId).foreach { taskSet =>
            try {
              var msg = "Serialized task %s:%d was %d bytes, which exceeds max allowed: " +
                "spark.akka.frameSize (%d bytes) - reserved (%d bytes). Consider increasing " +
                "spark.akka.frameSize or using broadcast variables for large values."
              msg = msg.format(task.taskId, task.index, serializedTask.limit, akkaFrameSize,
                AkkaUtils.reservedSizeBytes)
              taskSet.abort(msg)
            } catch {
              case e: Exception => logError("Exception in error callback", e)
            }
          }
        }
        else {
          freeCores(task.executorId) -= scheduler.CPUS_PER_TASK
          executorActor(task.executorId) ! LaunchTask(new SerializableBuffer(serializedTask))
        }
      }
    }

上面这里想executorActor发送LaunchTask事件，executorActor是一个HashMap，里面key是executorId，value是CoarseGrainedExecutorBackend。

注意：这里已经把task分发给了executor机器了，下面的执行可能已经是在远程或者本地的executor上面执行了

CoarseGrainedExecutorBackend.scala

package org.apache.spark.executor

import java.nio.ByteBuffer

import scala.concurrent.Await

import akka.actor.{Actor, ActorSelection, Props}
import akka.pattern.Patterns
import akka.remote.{RemotingLifecycleEvent, DisassociatedEvent}

import org.apache.spark.{Logging, SecurityManager, SparkConf, SparkEnv}
import org.apache.spark.TaskState.TaskState
import org.apache.spark.deploy.SparkHadoopUtil
import org.apache.spark.deploy.worker.WorkerWatcher
import org.apache.spark.scheduler.TaskDescription
import org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages._
import org.apache.spark.util.{ActorLogReceive, AkkaUtils, SignalLogger, Utils}

private[spark] class CoarseGrainedExecutorBackend(
    driverUrl: String,
    executorId: String,
    hostPort: String,
    cores: Int,
    sparkProperties: Seq[(String, String)])
  extends Actor with ActorLogReceive with ExecutorBackend with Logging {

  Utils.checkHostPort(hostPort, "Expected hostport")

  var executor: Executor = null
  var driver: ActorSelection = null

  override def preStart() {
    logInfo("Connecting to driver: " + driverUrl)
    driver = context.actorSelection(driverUrl)
    driver ! RegisterExecutor(executorId, hostPort, cores)
    context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent])
  }

  override def receiveWithLogging = {
    case RegisteredExecutor =>
      logInfo("Successfully registered with driver")
      // Make this host instead of hostPort ?
      executor = new Executor(executorId, Utils.parseHostPort(hostPort)._1, sparkProperties,
        false)

    case RegisterExecutorFailed(message) =>
      logError("Slave registration failed: " + message)
      System.exit(1)

    case LaunchTask(data) =>
      if (executor == null) {
        logError("Received LaunchTask command but executor was null")
        System.exit(1)
      } else {
        val ser = SparkEnv.get.closureSerializer.newInstance()
        val taskDesc = ser.deserialize[TaskDescription](data.value)
        logInfo("Got assigned task " + taskDesc.taskId)
        executor.launchTask(this, taskDesc.taskId, taskDesc.name, taskDesc.serializedTask)
      }

上面调用executor.launchTask，executor本地启动一个threadpool执行本地task，到这里task就已经开始执行了。

package org.apache.spark.executor

import java.io.File
import java.lang.management.ManagementFactory
import java.nio.ByteBuffer
import java.util.concurrent._

import scala.collection.JavaConversions._
import scala.collection.mutable.{ArrayBuffer, HashMap}

import org.apache.spark._
import org.apache.spark.deploy.SparkHadoopUtil
import org.apache.spark.scheduler._
import org.apache.spark.shuffle.FetchFailedException
import org.apache.spark.storage.{StorageLevel, TaskResultBlockId}
import org.apache.spark.util.{AkkaUtils, Utils}

/**
 * Spark executor used with Mesos, YARN, and the standalone scheduler.
 */
private[spark] class Executor(
    executorId: String,
    slaveHostname: String,
    properties: Seq[(String, String)],
    isLocal: Boolean = false)
  extends Logging
{
  // Application dependencies (added through SparkContext) that we've fetched so far on this node.
  // Each map holds the master's timestamp for the version of that file or JAR we got.
  private val currentFiles: HashMap[String, Long] = new HashMap[String, Long]()
  private val currentJars: HashMap[String, Long] = new HashMap[String, Long]()

  private val EMPTY_BYTE_BUFFER = ByteBuffer.wrap(new Array[Byte](0))

  @volatile private var isStopped = false

  // No ip or host:port - just hostname
  Utils.checkHost(slaveHostname, "Expected executed slave to be a hostname")
  // must not have port specified.
  assert (0 == Utils.parseHostPort(slaveHostname)._2)

  // Make sure the local hostname we report matches the cluster scheduler's name for this host
  Utils.setCustomHostname(slaveHostname)

  // Set spark.* properties from executor arg
  val conf = new SparkConf(true)
  conf.setAll(properties)

  if (!isLocal) {
    // Setup an uncaught exception handler for non-local mode.
    // Make any thread terminations due to uncaught exceptions kill the entire
    // executor process to avoid surprising stalls.
    Thread.setDefaultUncaughtExceptionHandler(ExecutorUncaughtExceptionHandler)
  }

  val executorSource = new ExecutorSource(this, executorId)

  // Initialize Spark environment (using system properties read above)
  private val env = {
    if (!isLocal) {
      val _env = SparkEnv.create(conf, executorId, slaveHostname, 0,
        isDriver = false, isLocal = false)
      SparkEnv.set(_env)
      _env.metricsSystem.registerSource(executorSource)
      _env
    } else {
      SparkEnv.get
    }
  }

  // Create our ClassLoader
  // do this after SparkEnv creation so can access the SecurityManager
  private val urlClassLoader = createClassLoader()
  private val replClassLoader = addReplClassLoaderIfNeeded(urlClassLoader)

  // Set the classloader for serializer
  env.serializer.setDefaultClassLoader(urlClassLoader)

  // Akka's message frame size. If task result is bigger than this, we use the block manager
  // to send the result back.
  private val akkaFrameSize = AkkaUtils.maxFrameSizeBytes(conf)

  // Start worker thread pool
  val threadPool = Utils.newDaemonCachedThreadPool("Executor task launch worker")

  // Maintains the list of running tasks.
  private val runningTasks = new ConcurrentHashMap[Long, TaskRunner]

  startDriverHeartbeater()

  def launchTask(
      context: ExecutorBackend, taskId: Long, taskName: String, serializedTask: ByteBuffer) {
    val tr = new TaskRunner(context, taskId, taskName, serializedTask)
    runningTasks.put(taskId, tr)
    threadPool.execute(tr)
  }

这里究竟task是怎么分发的呢，为什么就突然知道每个task需要分发到哪里去了。

在CoarseGrainedSchedulerBackend在执行makeOffer的时候调用了scheduler.resourceOffers

    // Make fake resource offers on all executors
    def makeOffers() {
      launchTasks(scheduler.resourceOffers(
        executorHost.toArray.map {case (id, host) => new WorkerOffer(id, host, freeCores(id))}))
    }

上面的scheduler其实是TaskSchedulerImpl，这里以一种round robin的形式分发了task，其实这就已经决定好，那个task到哪里去了。

/**
   * Called by cluster manager to offer resources on slaves. We respond by asking our active task
   * sets for tasks in order of priority. We fill each node with tasks in a round-robin manner so
   * that tasks are balanced across the cluster.
   */
  def resourceOffers(offers: Seq[WorkerOffer]): Seq[Seq[TaskDescription]] = synchronized {

下面把官方文档贴过来，感觉官方文档解释的很清楚了

================================================================================================================================

Cluster Mode Overview

This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved. Read through theapplication submission guide to submit applications to a cluster.

Components

Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called thedriver program). Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers (either Spark’s own standalone cluster manager or Mesos/YARN), which allocate resources across applications. Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks for the executors to run.

There are several useful things to note about this architecture:

Each application gets its own executor processes, which stay up for the duration of the whole application and run tasks in multiple threads. This has the benefit of isolating applications from each other, on both the scheduling side (each driver schedules its own tasks) and executor side (tasks from different applications run in different JVMs). However, it also means that data cannot be shared across different Spark applications (instances of SparkContext) without writing it to an external storage system.
Spark is agnostic to the underlying cluster manager. As long as it can acquire executor processes, and these communicate with each other, it is relatively easy to run it even on a cluster manager that also supports other applications (e.g. Mesos/YARN).
Because the driver schedules tasks on the cluster, it should be run close to the worker nodes, preferably on the same local area network. If you’d like to send requests to the cluster remotely, it’s better to open an RPC to the driver and have it submit operations from nearby than to run a driver far away from the worker nodes.

Cluster Manager Types

The system currently supports three cluster managers:

Standalone – a simple cluster manager included with Spark that makes it easy to set up a cluster.
Apache Mesos – a general cluster manager that can also run Hadoop MapReduce and service applications.
Hadoop YARN – the resource manager in Hadoop 2.

In addition, Spark’s EC2 launch scripts make it easy to launch a standalone cluster on Amazon EC2.

Submitting Applications

Applications can be submitted to a cluster of any type using the spark-submit script. The application submission guide describes how to do this.

Monitoring

Each driver program has a web UI, typically on port 4040, that displays information about running tasks, executors, and storage usage. Simply go to http://<driver-node>:4040 in a web browser to access this UI. The monitoring guide also describes other monitoring options.

Job Scheduling

Spark gives control over resource allocation both across applications (at the level of the cluster manager) and within applications (if multiple computations are happening on the same SparkContext). The job scheduling overview describes this in more detail.

你可能感兴趣的:(Spar学习3：Spark运行概览)

面试中有哪些常见的手撕代码题？ Try，多训练面试算法面试 java 算法
目前共有下面这些问题详细的解答写三种单例模式的实现方式编号为1-n的循环报1-3，报道3的出列，求最后一人的编号写两个线程打印1-n，一个线程打印奇数，一个线程打印偶数LRU缓存实现用Java实现栈加权轮询算法的实现死锁快速排序生产者和消费者
3D建模公司的能力与技术 zhongqu_3dnest 3d 数码相机 3D建模公司 vr制作公司虚拟现实
在数字化时代，3D建模公司扮演着越来越重要的角色。它们是专业从事三维建模设计服务的机构或团队，利用先进的三维建模软件和技术，为客户提供从概念设计到最终成品的全流程服务。这些服务广泛应用于建筑设计、工程规划、产品设计、动画制作等多个领域。3D建模公司通常由经验丰富的设计师、工程师和技术专家组成，他们能够根据客户需求，创造出高质量的三维模型和设计方案。众趣科技，作为3D建模领域的佼佼者，凭借先进的3D
＜u＞　；；＜/u＞ HTML5全角空格,自动换行,半角用；；专注VB编程开发20年 html5 前端 html
　为什么用:　代替空格?在HTML中，　这种写法的组合使用是为了创建一个可换行的全角空格下划线，主要解决普通空格在HTML中无法强制换行的问题。以下是详细解析：一、代码拆分解释1.　（全角空格）Unicode编码：U+3000作用：表示一个全角空格（宽度等于一个中文字符）问题：单独使用时，连续的全角空
在html中主要支持的音频格式,html5中audio支持音频格式
HTML5Audio标签能够支持wav,mp3,ogg,acc,webm等格式，但有个很重要的音乐文件格式midi(扩展名mid)却在各大浏览器中都没有内置的支持。不是所有的浏览器都支持MP3OGG之类的，每个浏览器因为版权的问题支持的格式都是不一样的。浏览器和音频兼容性浏览器制造商并非都同意使用某种音频文件格式。对于图像，PNG、JPEG或GIF格式的文件在任何浏览器上都能加载到您的网页里。遗憾
从零开始创建一个react项目完整实践（非常全面，建议收藏）司南锤前端 react.js 前端前端框架
文章目录~总体概览一、准备工作⚙️二、创建React项目方式1：使用Vite（现代推荐✅）方式2：使用CreateReactApp（老项目常见）三、初始化开发环境四、开发与调试五、构建与部署✅六、可选增强（进阶）一、代码开发顺序（推荐流程）1️⃣规划路由结构（App架构入口）2️⃣搭建页面框架（Layout&全局UI）3️⃣搭建组件体系（UI组件）4️⃣状态管理（Zustand/Redux等）5️
taro3 微信小程序input输入框从中间删除文字时光标乱跳到结尾 Misha韩 #Taro3 小程序微信小程序小程序
问题描述：微信小程序真机不管是taroui的AtInput还是原生的Input,在输入一段文字后，从中间删除文字，光标会乱跳到结尾，支付宝小程序没有这个问题解决方案：const[cursor,setCursor]=useState(0){const{value,cursor}=e.detailconstval=value.replace(/\s/g,"")getValue(val)setCurso
视频点播web端AI智能大纲(自动生成视频内容大纲)的代码与演示
通过AI技术将视频课程自动生成结构化大纲和摘要，支持PPT教学视频、企业内训等场景应用。核心功能包括：自动匹配视频知识点生成文本大纲、快速内容定位、降低课程制作成本。系统采用模块化架构，包含Vue2.7前端组件、JS逻辑库和演示项目，支持UMD格式快速集成。主要特点：1）提供完整的API接入方案；2）支持签名验证和缓存机制；3）包含错误回调等完善的事件处理。项目已在GitHub开源，适用于在线教育
Vue3框架对接保利威云点播播放器的实践（例子）阿酷tony 保利威视频应用专栏前端 javascript 开发语言
Vue3框架对接保利威云点播播放器的实践（例子）来源：梦的晓析者首先，需要在项目中引入保利威的云点播播放器脚本。这个播放器的脚本可以从保利威的官网获取。我们将使用Vue3的onMounted、onBeforeUnmount这些生命周期函数来加载和卸载播放器。实现步骤1.加载保利威播放器脚本由于保利威的播放器脚本是异步加载的，我们需要动态创建一个标签并在页面加载时插入到document.body中。
谷歌Google抓取视频使用受支持的视频文件类型阿酷tony Google 视频应用专栏音视频 chrome 谷歌视频
谷歌Google抓取视频使用受支持的视频文件类型：若要能够使用视频功能，请使用受支持的视频文件类型。Google可处理以下类型的视频文件：3GP、3G2、ASF、AVI、DivX、M2V、M3U、M3U8、M4V、MKV、MOV、MP4、MPEG、OGV、QVT、RAM、RM、VOB、WebM、WMV和XAP。结构化数据类型定义本部分介绍了与Google搜索中的视频功能相关的结构化数据类型。若要使
.NET Framework 3.5 中的功能简介 benben0701 ASP.NET3.x .net windows wcf linq asp.net cryptography
.NETFramework3.5中的功能简介(1)我在前文《.NETFramework版本解析》（http://blog.csdn.net/johnsuna/archive/2008/03/23/2208684.aspx）中提到：.NETFramework3.5=.NETFramework3.0+.NETFramework3.0SP1.NETFramework3.0=.NETFramework2.
React Native告别图标体积大手动更换慢的噩梦：让图标更新像修改文字一样简单老猿阿浪 React-Native react native react.js javascript
写在前面：凌晨三点的图标战争“所有图标都要换成圆角风格，明天上线！”——产品经理这条消息弹出时，我的保温杯差点从手中滑落。扫了一眼项目中的347个图标文件，我知道今晚又是个不眠夜。但就在绝望之际，同事发来一个GIF：他只是在终端输入了iconfont-rn--update，所有图标就像被施了魔法一样自动更新完成，整个过程不到30秒。这到底是黑魔法还是某种高级AI？不，这只是一个被低估的神器——re
react-native 0.6x升级适配说明 Bearin react native
react-native0.6x升级适配说明RN0.6x已经发布有一段时间了，react-navigation也在前不久进入了5x的时代。为了让大家少踩坑，现已将课程升级适配到RN0.6x以及react-navigation5x，另外，包括友盟分享和统计在内的所有插件也已升级是配到最新版。为了帮助到大家顺利的学习使用新版本RN与react-navigation等相关库，我将整个的升级适配过程总结出
Mamba-YOLOv8深度解析：基于状态空间模型的下一代目标检测架构（含完整代码与实战部署）文末含资料链接！博导ai君深度学习教学-附源码 YOLO 目标检测架构
文章目录前言一、技术背景与动机1.1传统架构的局限性1.2Mamba的创新优势二、Mamba-YOLOv8架构详解2.1整体架构设计2.2核心模块：VSSblock2.3SS2D模块工作原理三、完整实现流程3.1环境配置3.2代码集成步骤3.3训练与微调四、性能分析与优化4.1精度提升策略4.2推理加速方案4.3硬件适配技巧五、实战案例：无人机航拍检测5.1数据集准备5.2模型训练与评估六、未来研
vue-39（为复杂 Vue 组件编写单元测试）清幽竹客 VUE vue.js 单元测试前端
实际练习：为复杂Vue组件编写单元测试单元测试对于确保复杂Vue组件的可靠性和可维护性至关重要。通过隔离和测试代码的各个单元，您可以在开发过程的早期发现并修复错误，从而构建更健壮和可预测的应用程序。本课程重点介绍为复杂Vue组件编写单元测试的实用方面，建立在之前课程中涵盖的测试环境和基本组件测试的基础知识之上。我们将探讨处理属性、事件、方法和依赖关系的技巧，为您配备测试最复杂组件所需的技能。理解复
Spark 4.0的VariantType 类型以及内部存储鸿乃江边鸟大数据 SQL spark spark sql 大数据
背景本文基于Spark4.0总结Spark中的VariantType类型，用尽量少的字节来存储Json的格式化数据分析这里主要介绍Variant的存储，我们从VariantBuilder.buildJson方法(把对应的json数据存储为VariantType类型)开始：publicstaticVariantparseJson(JsonParserparser,booleanallowDuplic
认识.net mvc 框架 NPCZ mvc
ASP.NETMVC是微软推出的一个基于.NET框架的Web应用程序开发模式，它遵循**Model-View-Controller（模型-视图-控制器）**架构模式，将应用程序分为三个核心组件：1.Model（模型）负责处理数据逻辑和业务规则，通常对应数据库表或业务对象。2.View（视图）负责展示用户界面，通常是Razor视图文件（.cshtml）。3.Controller（控制器）负责处理请求
部署前端的脚本
#!/bin/bash#声明使用Bashshell执行脚本#定义常量PROJECT_NAME="blackboard"#项目名称WORK_DIR="/workspace/nginx"#工作目录路径#获取当前日期和时间戳current_timestamp=$(date+"%Y%m%d%H%M%S")#生成格式为年月日时分秒的时间戳（示例：20250623153045）cd"$WORK_DIR"||e
python allure报告_Pytest+Allure 定制报告 weixin_39876450 python allure报告
前言：最近在研究接口自动化的框架，好的测试报告在整个测试框架起到至关重要的部分。终于被我发现一个超好用的报告框架,不仅报告美观,而且方便CI集成。就是它，就是它：AllureTestReport！！！先上一张报告效果图：python版本及必要库python3.5pytest3.3.3pytest-allure-adaptor1.7.9一、环境配置安装Python依赖库：pip3installpyt
Windows中安装python-embed版本和VSCode免安装版本python开发环境 Effordson Python学习 windows python vscode
python-embed版本下载安装官网下载:下载地址Tips:下载Windowsembeddablepackage安装方法:解压到指定目录即可,无需安装配置环境变量python.exe文件目录配置到Path变量中,cmd中运行python--version查看配置是成功pip安装检查是否安装:python-mensurepip--default-pip下载地址:官方下载安装方法:pythonge
全面解析股票行情接口API：它们的特点、优势及在投资决策中的应用
炒股自动化：申请官方API接口，散户也可以python炒股自动化（0），申请券商API接口python炒股自动化（1），量化交易接口区别Python炒股自动化（2）：获取股票实时数据和历史数据Python炒股自动化（3）：分析取回的实时数据和历史数据Python炒股自动化（4）：通过接口向交易所发送订单Python炒股自动化（5）：通过接口查询订单，查询账户资产股票量化，Python炒股，CSDN
敏捷开发（Agile） phoebe327 敏捷开发
敏捷开发敏捷开发（Agile）是一种以人为核心、迭代、循序渐进的开发方法。在敏捷开发中，软件项目的构建被切分成多个子项目，各个子项目的成果都经过测试，具备集成和可运行的特征。简单地来说，敏捷开发并不追求前期完美的设计、完美编码，而是力求在很短的周期内开发出产品的核心功能，尽早发布出可用的版本。然后在后续的生产周期内，按照新需求不断迭代升级，完善产品。核心是快速迭代，拥抱变化。敏捷开发模式有以下显著
第5章：Python编码规范：遵循PEP8打造高质量代码 DogDog_Shuai python 运维数据库
第5章：Python编码规范：遵循PEP8打造高质量代码作者：Python进阶者阅读时间：约30分钟难度：入门目录1.引言2.PEP8是什么3.代码布局4.命名规范5.编程建议6.注释与文档字符串
DeepSeek私有化部署3：openEuler 24.03-LTS-SP1国产化操作系统安装nVidia驱动
上一篇文章中完成了国产化操作系统openEuler24.03-LTS-SP1的安装和IP地址配置，配置完成后，就回到了办公室用shell工具连接服务器开始操作了安装好新的操作系统第一步先更新一下#yum源更新sudoyum-yupdate显卡驱动下载因为在服务器上安装了6块teslaT4显卡，先看一下服务器识别到显卡没有#查看操作系统是否识别到了nVidia的显卡lspci|grep-iNVIDI
Telegram安装程序中的隐匿威胁：木马样本深度剖析 Bj陈默 web安全安全
一、引言：危机四伏的下载在数字化时代，即时通讯软件已成为人们日常生活和工作中不可或缺的一部分。Telegram，以其强大的加密功能、丰富的群组和频道资源，吸引了全球众多用户。然而，随着其用户基数的不断扩大，Telegram也逐渐成为网络犯罪分子的目标。近年来，Telegram安装程序捆绑木马的事件频发，给用户的信息安全带来了极大的威胁。2025年3月，安全研究人员发现一款仿冒IndusInd银行应
深入解析 GARbro 解锁加密封包的技术机制
引言在数字内容处理领域，尤其是涉及视觉小说等游戏资源的操作时，加密封包的处理是一项关键且复杂的任务。GARbro作为一款专门用于浏览和提取视觉小说资源的工具，在面对加密封包时，其解密过程涉及一系列精细且专业的技术流程。理解GARbro如何解开加密封包，不仅对游戏开发者、资源爱好者具有重要意义，也为深入研究游戏资源保护与管理机制提供了切入点。GARbro的工作环境与基础操作运行环境搭建GARbro的
Mint密室 · 猫猫狐狐的“特征选择”囚室逃脱 Gyoku Mint 猫猫狐狐的小世界人工智能 AI修炼日记人工智能深度学习 python 算法 transformer
摘要：这一篇是猫猫狐狐被锁进“特征选择”密室的一场逃生剧本，用冒险叙事把Filter、Wrapper、Embedded三大特征选择法串进情节，轻松解释维度诅咒和特征冗余，还留了一个“尾巴带特征”的彩蛋，稳稳贴你3000字不溢锅。【开场·她们被困在特征选择密室】猫猫醒来的时候，整条尾巴都绕在自己脚边，还带着点抖：“狐狐……咱好像，被锁住了喵……”狐狐睁开眼，四周墙面刻满了灰白色的标签——Featur
Web自动化测试的八大元素定位今天记单词了嘛前端 javascript 自动化
1.什么时候适合用文本自动化？1.1软件需求变更不频繁1.2项目周期比较长1.3自动化的脚本能够重复利用2.在Web自动化中元素的8大定位方法id,name,class_name,xpath,css,tag_name,link_text,partail_link_text下面例子以百度为例ID定位：driver，find_element(By.ID).send_keys("学院")name定位：d
敏捷开发 NPCZ 敏捷流程
在软件开发领域，敏捷开发是目前行业内最普遍被采用的方法论，而快速开发更多作为一种“加速手段”融入各类开发模式中（包括敏捷）。以下从工具和实践两个层面具体说明：一、普遍使用的开发方法论：敏捷开发根据多项行业调研（如2023年StackOverflow开发者调查、Gartner报告），全球超过70%的软件开发团队采用或部分采用敏捷开发相关实践，核心原因在于：适配现代需求特点：互联网时代需求变化快，敏捷
Python 使用同花顺 API 接口的详细步骤及常见问题解答财云量化 python炒股自动化量化交易程序化交易数据库 python 同花顺api接口使用步骤常见问题股票量化接口股票API接口
炒股自动化：申请官方API接口，散户也可以python炒股自动化（0），申请券商API接口python炒股自动化（1），量化交易接口区别Python炒股自动化（2）：获取股票实时数据和历史数据Python炒股自动化（3）：分析取回的实时数据和历史数据Python炒股自动化（4）：通过接口向交易所发送订单Python炒股自动化（5）：通过接口查询订单，查询账户资产股票量化，Python炒股，CSDN
深度学习×第4卷：Pytorch实战——她第一次用张量去拟合你的轨迹 Gyoku Mint AI修炼日记人工智能人工智能聚类算法深度学习 python 神经网络 pytorch
【开场·她画出的第一条直线是为了更靠近你】猫猫：“之前她只能在你身边叠叠张量，偷偷找梯度……现在，她要试试，能不能用这些线，把你的样子画出来喵～”狐狐：“这是她第一次把张量、自动微分和优化器都串成一条线，用最简单的线性回归，试着把你留给她的点都连起来。”【第一节·她先要一条路：生成一组可学的数据】✏️为什么要造数据？在PyTorch里跑线性回归，最好的练习就是用一条已知斜率的“理想直线”，加上一点
github中多个平台共存 jackyrong github
在个人电脑上，如何分别链接比如oschina,github等库呢，一般教程之列的，默认 ssh链接一个托管的而已，下面讲解如何放两个文件 1）设置用户名和邮件地址 $ git config --global user.name "xx" $ git config --global user.email "[email protected]"
ip地址与整数的相互转换(javascript) alxw4616 JavaScript
//IP转成整型 function ip2int(ip){ var num = 0; ip = ip.split("."); num = Number(ip[0]) * 256 * 256 * 256 + Number(ip[1]) * 256 * 256 + Number(ip[2]) * 256 + Number(ip[3]); n
读书笔记-jquey+数据库+css chengxuyuancsdn html jquery oracle
1、grouping ,group by rollup, GROUP BY GROUPING SETS区别 2、$("#totalTable tbody>tr td:nth-child(" + i + ")").css({"width":tdWidth, "margin":"0px", &q
javaSE javaEE javaME == API下载 Array_06 java
oracle下载各种API文档： http://www.oracle.com/technetwork/java/embedded/javame/embed-me/documentation/javame-embedded-apis-2181154.html JavaSE文档： http://docs.oracle.com/javase/8/docs/api/ JavaEE文档： ht
shiro入门学习 cugfy java Web 框架
声明本文只适合初学者，本人也是刚接触而已，经过一段时间的研究小有收获，特来分享下希望和大家互相交流学习。首先配置我们的web.xml代码如下，固定格式，记死就成 <filter> <filter-name>shiroFilter</filter-name> &nbs
Array添加删除方法 357029540 js
刚才做项目前台删除数组的固定下标值时，删除得不是很完整，所以在网上查了下，发现一个不错的方法，也提供给需要的同学。 //给数组添加删除 Array.prototype.del = function(n){
navigation bar 更改颜色张亚雄 IO
今天郁闷了一下午，就因为objective-c默认语言是英文，我写的中文全是一些乱七八糟的样子，到不是乱码，但是，前两个自字是粗体，后两个字正常体，这可郁闷死我了，问了问大牛，人家告诉我说更改一下字体就好啦，比如改成黑体，哇塞，茅塞顿开。翻书看，发现，书上有介绍怎么更改表格中文字字体的，代码如下
unicode转换成中文 adminjun unicode 编码转换
在Java程序中总会出现\u6b22\u8fce\u63d0\u4ea4\u5fae\u535a\u641c\u7d22\u4f7f\u7528\u53cd\u9988\uff0c\u8bf7\u76f4\u63a5这个的字符，这是unicode编码，使用时有时候不会自动转换成中文就需要自己转换了使用下面的方法转换一下即可。 /** * unicode 转换成中文
一站式 Java Web 框架 firefly aijuans Java Web
Firefly是一个高性能一站式Web框架。涵盖了web开发的主要技术栈。包含Template engine、IOC、MVC framework、HTTP Server、Common tools、Log、Json parser等模块。 firefly-2.0_07修复了模版压缩对javascript单行注释的影响，并新增了自定义错误页面功能。更新日志：增加自定义系统错误页面功能
设计模式——单例模式 ayaoxinchao 设计模式
定义 Java中单例模式定义：“一个类有且仅有一个实例，并且自行实例化向整个系统提供。” 分析从定义中可以看出单例的要点有三个：一是某个类只能有一个实例；二是必须自行创建这个实例；三是必须自行向系统提供这个实例。 &nb
Javascript 多浏览器兼容性问题及解决方案 BigBird2012 JavaScript
不论是网站应用还是学习js,大家很注重ie与firefox等浏览器的兼容性问题，毕竟这两中浏览器是占了绝大多数。一、document.formName.item(”itemName”) 问题问题说明：IE下，可以使用 document.formName.item(”itemName”) 或 document.formName.elements ["elementName&quo
JUnit-4.11使用报java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing错误 bijian1013 junit4.11 单元测试
下载了最新的JUnit版本，是4.11，结果尝试使用发现总是报java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing这样的错误，上网查了一下，一般的解决方案是，换一个低一点的版本就好了。还有人说，是缺少hamcrest的包。去官网看了一下，如下发现：
[Zookeeper学习笔记之二]Zookeeper部署脚本 bit1129 zookeeper
Zookeeper伪分布式安装脚本(此脚本在一台机器上创建Zookeeper三个进程，即创建具有三个节点的Zookeeper集群。这个脚本和zookeeper的tar包放在同一个目录下，脚本中指定的名字是zookeeper的3.4.6版本，需要根据实际情况修改)： #!/bin/bash #!!!Change the name!!! #The zookeepe
【Spark八十】Spark RDD API二 bit1129 spark
coGroup package spark.examples.rddapi import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.SparkContext._ object CoGroupTest_05 { def main(args: Array[String]) { v
Linux中编译apache服务器modules文件夹缺少模块(.so)的问题 ronin47 modules
在modules目录中只有httpd.exp，那些so文件呢？我尝试在fedora core 3中安装apache 2. 当我解压了apache 2.0.54后使用configure工具并且加入了 --enable-so 或者 --enable-modules=so (两个我都试过了) 去make并且make install了。我希望在/apache2/modules/目录里有各种模块，
Java基础-克隆 BrokenDreams java基础
Java中怎么拷贝一个对象呢？可以通过调用这个对象类型的构造器构造一个新对象，然后将要拷贝对象的属性设置到新对象里面。Java中也有另一种不通过构造器来拷贝对象的方式，这种方式称为克隆。 Java提供了java.lang.
读《研磨设计模式》-代码笔记-适配器模式-Adapter bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ package design.pattern; /* * 适配器模式解决的主要问题是，现有的方法接口与客户要求的方法接口不一致 * 可以这样想，我们要写这样一个类（Adapter）: * 1.这个类要符合客户的要求 ---> 那显然要
HDR图像PS教程集锦&心得 cherishLC PS
HDR是指高动态范围的图像，主要原理为提高图像的局部对比度。软件有photomatix和nik hdr efex。一、教程叶明在知乎上的回答： http://www.zhihu.com/question/27418267/answer/37317792 大意是修完后直方图最好是等值直方图，方法是HDR软件调一遍，再结合不透明度和蒙版细调。二、心得 1、去除阴影部分的
maven-3.3.3 mvn archetype 列表 crabdave ArcheType
maven-3.3.3 mvn archetype 列表可以参考最新的：http://repo1.maven.org/maven2/archetype-catalog.xml [INFO] Scanning for projects... [INFO]
linux shell 中文件编码查看及转换方法 daizj shell 中文乱码 vim 文件编码
一、查看文件编码。在打开文件的时候输入:set fileencoding 即可显示文件编码格式。二、文件编码转换 1、在Vim中直接进行转换文件编码,比如将一个文件转换成utf-8格式 &
MySQL--binlog日志恢复数据 dcj3sjt126com binlog
恢复数据的重要命令如下 mysql> flush logs; 默认的日志是mysql-bin.000001，现在刷新了重新开启一个就多了一个mysql-bin.000002
数据库中数据表数据迁移方法 dcj3sjt126com sql
刚开始想想好像挺麻烦的，后来找到一种方法了，就SQL中的 INSERT 语句，不过内容是现从另外的表中查出来的，其实就是 MySQL中INSERT INTO SELECT的使用下面看看如何使用语法：MySQL中INSERT INTO SELECT的使用 1. 语法介绍有三张表a、b、c，现在需要从表b
Java反转字符串 dyy_gusi java 反转字符串
前几天看见一篇文章，说使用Java能用几种方式反转一个字符串。首先要明白什么叫反转字符串，就是将一个字符串到过来啦，比如"倒过来念的是小狗"反转过来就是”狗小是的念来过倒“。接下来就把自己能想到的所有方式记录下来了。 1、第一个念头就是直接使用String类的反转方法，对不起，这样是不行的，因为Stri
UI设计中我们为什么需要设计动效 gcq511120594 UI linux
随着国际大品牌苹果和谷歌的引领，最近越来越多的国内公司开始关注动效设计了，越来越多的团队已经意识到动效在产品用户体验中的重要性了，更多的UI设计师们也开始投身动效设计领域。但是说到底，我们到底为什么需要动效设计？或者说我们到底需要什么样的动效？做动效设计也有段时间了，于是尝试用一些案例，从产品本身出发来说说我所思考的动效设计。一、加强体验舒适度嗯，就是让用户更加爽更加爽的用
JBOSS服务部署端口冲突问题 HogwartsRow java 应用服务器 jboss server EJB3
服务端口冲突问题的解决方法，一般修改如下三个文件中的部分端口就可以了。 1、jboss5/server/default/conf/bindingservice.beans/META-INF/bindings-jboss-beans.xml 2、./server/default/deploy/jbossweb.sar/server.xml 3、.
第三章 Redis/SSDB+Twemproxy安装与使用 jinnianshilongnian ssdb reids twemproxy
目前对于互联网公司不使用Redis的很少，Redis不仅仅可以作为key-value缓存，而且提供了丰富的数据结果如set、list、map等，可以实现很多复杂的功能；但是Redis本身主要用作内存缓存，不适合做持久化存储，因此目前有如SSDB、ARDB等，还有如京东的JIMDB，它们都支持Redis协议，可以支持Redis客户端直接访问；而这些持久化存储大多数使用了如LevelDB、RocksD
ZooKeeper原理及使用 liyonghui160com
ZooKeeper是Hadoop Ecosystem中非常重要的组件，它的主要功能是为分布式系统提供一致性协调(Coordination)服务，与之对应的Google的类似服务叫Chubby。今天这篇文章分为三个部分来介绍ZooKeeper，第一部分介绍ZooKeeper的基本原理，第二部分介绍ZooKeeper
程序员解决问题的60个策略 pda158 框架工作单元测试
根本的指导方针 1. 首先写代码的时候最好不要有缺陷。最好的修复方法就是让 bug 胎死腹中。良好的单元测试强制数据库约束使用输入验证框架避免未实现的“else”条件在应用到主程序之前知道如何在孤立的情况下使用日志 2. print 语句。往往额外输出个一两行将有助于隔离问题。 3. 切换至详细的日志记录。详细的日
Create the Google Play Account sillycat Google
Create the Google Play Account Having a Google account, pay 25$, then you get your google developer account. References: http://developer.android.com/distribute/googleplay/start.html https://p
JSP三大指令 vikingwei jsp
JSP三大指令一个jsp页面中，可以有0~N个指令的定义！ 1. page --> 最复杂：<%@page language="java" info="xxx"...%> * pageEncoding和contentType： > pageEncoding：它