spark-local 详解
Spark的Local模式是在本地启动多个Threads(线程)来模拟分布式运行模式,每个Thread代表一个worker。l根据Spark官方文档,spark-Local模式下有以下集中设置mast url的方式,不同Local部署模式的不同之处在于任务失败后的重试次数。
Master-URL | Meaning | Max-Retry time |
---|---|---|
local | 1 worker Thread(no parallelism) | 失败的任务不会重新尝试 |
local[N] | N worker Threads,ideally set to cores number | 失败的任务不会重新尝试 |
local[*] | as many worker Threads as your machine | 失败的任务不会重新尝试 |
local[N,F] | N worker Threads | 失败的任务最多进行F-1次尝试 |
local[*,F] | as many worker Threads as your machine | 失败的任务最多进行F-1此尝试 |
local-cluster[numSlaves,coresPerSlave,memeoryPerySlave] | 伪分布式模式,本地运行master和worker,master中指定了worker数目,CPU core数目和每个worker能使用的内存 | 其他与standalone运行模式相同 |
spark.task.maxFailures
是Task失败时重试的次数。
Property Name | Default | Meaning |
---|---|---|
spark.task.maxFailures | 4 | Number of failures of any particular task before giving up on the job. The total number of failures spread across different tasks will not cause the job to fail; a particular task has to fail this number of attempts. Should be greater than or equal to 1. Number of allowed retries = this value - 1. |
Spark在不同的部署模式下的区别是实现TaskScheduler and SchedulerBackend
。Spark会在SparkContext的创建过程中通过传入的Master URL来确定不同的运行模式,并且创建不同的TaskScheduler和SchedulerBackend,具体实现是在org.apache.spark.SparkContext#createTaskScheduler
中。
local模式的整体架构如下如所示,其中LocalBackend持有一个LocalActor,他与Executor之间的通信就是通过这个Executor来完成的。
local-cluster[N,F,M]整体架构如下,需要注意的是org.apache.spark.executor.CoarseGrainedExecutorBackend
和org.apache.spark.deploy.worker.Worker
是两个不同的进程,两者之间没有包含与被包含的关系。当Worker接收到Master的消息后会创建一个ExecutorRunner实例,然后由ExecutorRunner来启动新的进程CoarseGrainedExecutorBackend。
在SparkContext.scala中有匹配Local和Spark-Standalone模式下Master URL的正则表达式。
/**
*Master URL正则表达式
*/
private object SparkMasterRegex {
// Regular expression used for local[N] and local[*] master formats
val LOCAL_N_REGEX = """local\[([0-9]+|\*)\]""".r
// Regular expression for local[N, maxRetries], used in tests with failing tasks
val LOCAL_N_FAILURES_REGEX = """local\[([0-9]+|\*)\s*,\s*([0-9]+)\]""".r
// Regular expression for simulating a Spark cluster of [N, cores, memory] locally
val LOCAL_CLUSTER_REGEX = """local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r
// Regular expression for connecting to Spark deploy clusters
// Standalone 正则表达式
val SPARK_REGEX = """spark://(.*)""".r
}
对于其他部署模式则直接通过getClusterManager方法创建ExternalClusterManager:
private def getClusterManager(url: String): Option[ExternalClusterManager] = {
val loader = Utils.getContextOrSparkClassLoader
val serviceLoaders =
ServiceLoader.load(classOf[ExternalClusterManager], loader).asScala.filter(_.canCreate(url))
if (serviceLoaders.size > 1) {
throw new SparkException(
s"Multiple external cluster managers registered for the url $url: $serviceLoaders")
}
serviceLoaders.headOption
}
}
Spark中创建TaskScheduler和SchedulerBackend实例的具体过程如下:
/**
* 基于Master URL创建不同的schedulerbackend and task scheduler实例
*/
private def createTaskScheduler(
sc: SparkContext,
master: String,
deployMode: String): (SchedulerBackend, TaskScheduler) = {
import SparkMasterRegex._
// 在Local模式下,失败的Task不进行重试
val MAX_LOCAL_TASK_FAILURES = 1
master match {
case "local" =>
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
val backend = new LocalSchedulerBackend(sc.getConf, scheduler, 1)
scheduler.initialize(backend)
(backend, scheduler)
case LOCAL_N_REGEX(threads) =>
// 获取当前能使用的cores
def localCpuCount: Int = Runtime.getRuntime.availableProcessors()
// local[*] estimates the number of cores on the machine; local[N] uses exactly N threads.
val threadCount = if (threads == "*") localCpuCount else threads.toInt
if (threadCount <= 0) {
throw new SparkException(s"Asked to run locally with $threadCount threads")
}
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
val backend = new LocalSchedulerBackend(sc.getConf, scheduler, threadCount)
scheduler.initialize(backend)
(backend, scheduler)
case LOCAL_N_FAILURES_REGEX(threads, maxFailures) =>
def localCpuCount: Int = Runtime.getRuntime.availableProcessors()
// 这里指定了每个Task的尝试次数
val threadCount = if (threads == "*") localCpuCount else threads.toInt
val scheduler = new TaskSchedulerImpl(sc, maxFailures.toInt, isLocal = true)
val backend = new LocalSchedulerBackend(sc.getConf, scheduler, threadCount)
scheduler.initialize(backend)
(backend, scheduler)
// Spark Standalone部署模式
case SPARK_REGEX(sparkUrl) =>
val scheduler = new TaskSchedulerImpl(sc)
val masterUrls = sparkUrl.split(",").map("spark://" + _)
val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
(backend, scheduler)
// local-cluster 模式
case LOCAL_CLUSTER_REGEX(numSlaves, coresPerSlave, memoryPerSlave) =>
// Check to make sure memory requested <= memoryPerSlave. Otherwise Spark will just hang.
val memoryPerSlaveInt = memoryPerSlave.toInt
if (sc.executorMemory > memoryPerSlaveInt) {
throw new SparkException(
"Asked to launch cluster with %d MB RAM / worker but requested %d MB/worker".format(
memoryPerSlaveInt, sc.executorMemory))
}
val scheduler = new TaskSchedulerImpl(sc)
val localCluster = new LocalSparkCluster(
numSlaves.toInt, coresPerSlave.toInt, memoryPerSlaveInt, sc.conf)
val masterUrls = localCluster.start()
val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
backend.shutdownCallback = (backend: StandaloneSchedulerBackend) => {
localCluster.stop()
}
(backend, scheduler)
// 其他部署模式(Mesos,yarn,k8s,ec2)时使用getClusterManager
case masterUrl =>
val cm = getClusterManager(masterUrl) match {
case Some(clusterMgr) => clusterMgr
case None => throw new SparkException("Could not parse Master URL: '" + master + "'")
}
try {
val scheduler = cm.createTaskScheduler(sc, masterUrl)
val backend = cm.createSchedulerBackend(sc, masterUrl, scheduler)
cm.initialize(scheduler, backend)
(backend, scheduler)
} catch {
case se: SparkException => throw se
case NonFatal(e) =>
throw new SparkException("External scheduler cannot be instantiated", e)
}
}
}
Reference:
[1] Spark技术内幕:深入解析Spark内核架构设计与原理实现(张安站)
[2] https://github.com/apache/spark/blob/branch-2.3/core/src/main/scala/org/apache/spark/SparkContext.scala