spark-local模式详解

spark-local 详解

Spark的Local模式是在本地启动多个Threads(线程)来模拟分布式运行模式,每个Thread代表一个worker。l根据Spark官方文档,spark-Local模式下有以下集中设置mast url的方式,不同Local部署模式的不同之处在于任务失败后的重试次数。

Master-URL Meaning Max-Retry time
local 1 worker Thread(no parallelism) 失败的任务不会重新尝试
local[N] N worker Threads,ideally set to cores number 失败的任务不会重新尝试
local[*] as many worker Threads as your machine 失败的任务不会重新尝试
local[N,F] N worker Threads 失败的任务最多进行F-1次尝试
local[*,F] as many worker Threads as your machine 失败的任务最多进行F-1此尝试
local-cluster[numSlaves,coresPerSlave,memeoryPerySlave] 伪分布式模式,本地运行master和worker,master中指定了worker数目,CPU core数目和每个worker能使用的内存 其他与standalone运行模式相同

spark.task.maxFailures是Task失败时重试的次数。

Property Name Default Meaning
spark.task.maxFailures 4 Number of failures of any particular task before giving up on the job. The total number of failures spread across different tasks will not cause the job to fail; a particular task has to fail this number of attempts. Should be greater than or equal to 1. Number of allowed retries = this value - 1.

Spark在不同的部署模式下的区别是实现TaskScheduler and SchedulerBackend。Spark会在SparkContext的创建过程中通过传入的Master URL来确定不同的运行模式,并且创建不同的TaskScheduler和SchedulerBackend,具体实现是在org.apache.spark.SparkContext#createTaskScheduler中。
local模式的整体架构如下如所示,其中LocalBackend持有一个LocalActor,他与Executor之间的通信就是通过这个Executor来完成的。

image.png

local-cluster[N,F,M]整体架构如下,需要注意的是org.apache.spark.executor.CoarseGrainedExecutorBackendorg.apache.spark.deploy.worker.Worker两个不同的进程,两者之间没有包含与被包含的关系。当Worker接收到Master的消息后会创建一个ExecutorRunner实例,然后由ExecutorRunner来启动新的进程CoarseGrainedExecutorBackend。

image.png

在SparkContext.scala中有匹配Local和Spark-Standalone模式下Master URL的正则表达式。

    /**
     *Master URL正则表达式
     */
    private object SparkMasterRegex {
      // Regular expression used for local[N] and local[*] master formats
      val LOCAL_N_REGEX = """local\[([0-9]+|\*)\]""".r
      // Regular expression for local[N, maxRetries], used in tests with failing tasks
      val LOCAL_N_FAILURES_REGEX = """local\[([0-9]+|\*)\s*,\s*([0-9]+)\]""".r
      // Regular expression for simulating a Spark cluster of [N, cores, memory] locally
      val LOCAL_CLUSTER_REGEX = """local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r
      // Regular expression for connecting to Spark deploy clusters
      // Standalone 正则表达式
      val SPARK_REGEX = """spark://(.*)""".r
    }

对于其他部署模式则直接通过getClusterManager方法创建ExternalClusterManager:

private def getClusterManager(url: String): Option[ExternalClusterManager] = {
        val loader = Utils.getContextOrSparkClassLoader
        val serviceLoaders =
          ServiceLoader.load(classOf[ExternalClusterManager], loader).asScala.filter(_.canCreate(url))
        if (serviceLoaders.size > 1) {
          throw new SparkException(
            s"Multiple external cluster managers registered for the url $url: $serviceLoaders")
        }
        serviceLoaders.headOption
      }
    }

Spark中创建TaskScheduler和SchedulerBackend实例的具体过程如下:

      /**
       * 基于Master URL创建不同的schedulerbackend and task scheduler实例
       */
      private def createTaskScheduler(
          sc: SparkContext,
          master: String,
          deployMode: String): (SchedulerBackend, TaskScheduler) = {
        import SparkMasterRegex._
        // 在Local模式下,失败的Task不进行重试
        val MAX_LOCAL_TASK_FAILURES = 1
    
        master match {
          case "local" =>
            val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
            val backend = new LocalSchedulerBackend(sc.getConf, scheduler, 1)
            scheduler.initialize(backend)
            (backend, scheduler)

          case LOCAL_N_REGEX(threads) =>
            // 获取当前能使用的cores
            def localCpuCount: Int = Runtime.getRuntime.availableProcessors()
            // local[*] estimates the number of cores on the machine; local[N] uses exactly N threads.
            val threadCount = if (threads == "*") localCpuCount else threads.toInt
            if (threadCount <= 0) {
              throw new SparkException(s"Asked to run locally with $threadCount threads")
            }
            val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
            val backend = new LocalSchedulerBackend(sc.getConf, scheduler, threadCount)
            scheduler.initialize(backend)
            (backend, scheduler)
    
          case LOCAL_N_FAILURES_REGEX(threads, maxFailures) =>
            def localCpuCount: Int = Runtime.getRuntime.availableProcessors()
            // 这里指定了每个Task的尝试次数
            val threadCount = if (threads == "*") localCpuCount else threads.toInt
            val scheduler = new TaskSchedulerImpl(sc, maxFailures.toInt, isLocal = true)
            val backend = new LocalSchedulerBackend(sc.getConf, scheduler, threadCount)
            scheduler.initialize(backend)
            (backend, scheduler)
          // Spark Standalone部署模式
          case SPARK_REGEX(sparkUrl) =>
            val scheduler = new TaskSchedulerImpl(sc)
            val masterUrls = sparkUrl.split(",").map("spark://" + _)
            val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
            scheduler.initialize(backend)
            (backend, scheduler)
          // local-cluster 模式
          case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) =>
            // Check to make sure memory requested <= memoryPerSlave. Otherwise Spark will just hang.
            val memoryPerSlaveInt = memoryPerSlave.toInt
            if (sc.executorMemory > memoryPerSlaveInt) {
              throw new SparkException(
                "Asked to launch cluster with %d MB RAM / worker but requested %d MB/worker".format(
                  memoryPerSlaveInt, sc.executorMemory))
            }
    
            val scheduler = new TaskSchedulerImpl(sc)
            val localCluster = new LocalSparkCluster(
              numSlaves.toInt, coresPerSlave.toInt, memoryPerSlaveInt, sc.conf)
            val masterUrls = localCluster.start()
            val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
            scheduler.initialize(backend)
            backend.shutdownCallback = (backend: StandaloneSchedulerBackend) => {
              localCluster.stop()
            }
            (backend, scheduler)
          // 其他部署模式(Mesos,yarn,k8s,ec2)时使用getClusterManager
          case masterUrl =>
            val cm = getClusterManager(masterUrl) match {
              case Some(clusterMgr) => clusterMgr
              case None => throw new SparkException("Could not parse Master URL: '" + master + "'")
            }
            try {
              val scheduler = cm.createTaskScheduler(sc, masterUrl)
              val backend = cm.createSchedulerBackend(sc, masterUrl, scheduler)
              cm.initialize(scheduler, backend)
              (backend, scheduler)
            } catch {
              case se: SparkException => throw se
              case NonFatal(e) =>
                throw new SparkException("External scheduler cannot be instantiated", e)
            }
        }
      }

Reference:

[1] Spark技术内幕:深入解析Spark内核架构设计与原理实现(张安站)
[2] https://github.com/apache/spark/blob/branch-2.3/core/src/main/scala/org/apache/spark/SparkContext.scala

你可能感兴趣的:(spark-local模式详解)