源码目录
SparkContext 初始化时,创建并启动了 TaskScheduler;
TaskScheduler 启动时注册 Application 到 Master;
Master 上启动 Application 时会调用 startExecutorsOnWorkers 方法启动 Executors;
这里分析 Executors 的启动过程。
1 在Workers上规划Executors
- 进入
org.apache.spark.deploy.master.Master.scala
/**
* Schedule and launch executors on workers
*/
private def startExecutorsOnWorkers(): Unit = {
// Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app
// in the queue, then the second app, etc.
for (app <- waitingApps) {
val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1)
// If the cores left is less than the coresPerExecutor,the cores left will not be allocated
if (app.coresLeft >= coresPerExecutor) {
// Filter out workers that don't have enough resources to launch an executor
val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
.filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
worker.coresFree >= coresPerExecutor)
.sortBy(_.coresFree).reverse
val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)
// Now that we've decided how many cores to allocate on each worker, let's allocate them
for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
allocateWorkerResourceToExecutors(
app, assignedCores(pos), app.desc.coresPerExecutor, usableWorkers(pos))
}
}
}
}
- 遍历 Master.waitingApps,取出正在等待的 app;
- 判断 app.coresLeft 是否大于等于 coresPerExecutor,如果小就意味着app剩余需要的核数比一个Executor上的核数小了,就没必要再启一个Executor了,否则造成浪费;
- 找出所有workers中状态为ALIVE,且 worker.memoryFree >= app.memoryPerExecutorMB,且 worker.coresFree >= app.coresPerExecutor 的 worker,再按照 worker.coresFree 倒排序得到 usableWorkers;
- 调用 scheduleExecutorsOnWorkers 方法在 usableWorkers 上分别规划 Executors;
- 遍历usableWorkers,找出规划了 Executors 的那些 Workers,调用 allocateWorkerResourceToExecutors 方法将 Workers 上的资源分配给 Executors。
1.1 scheduleExecutorsOnWorkers
- 进入
org.apache.spark.deploy.master.Master.scala
private def scheduleExecutorsOnWorkers(
app: ApplicationInfo,
usableWorkers: Array[WorkerInfo],
spreadOutApps: Boolean): Array[Int] = {
val coresPerExecutor = app.desc.coresPerExecutor
val minCoresPerExecutor = coresPerExecutor.getOrElse(1)
val oneExecutorPerWorker = coresPerExecutor.isEmpty
val memoryPerExecutor = app.desc.memoryPerExecutorMB
val numUsable = usableWorkers.length
val assignedCores = new Array[Int](numUsable) // Number of cores to give to each worker
val assignedExecutors = new Array[Int](numUsable) // Number of new executors on each worker
var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)
/** Return whether the specified worker can launch an executor for this app. */
def canLaunchExecutor(pos: Int): Boolean = {
val keepScheduling = coresToAssign >= minCoresPerExecutor
val enoughCores = usableWorkers(pos).coresFree - assignedCores(pos) >= minCoresPerExecutor
// If we allow multiple executors per worker, then we can always launch new executors.
// Otherwise, if there is already an executor on this worker, just give it more cores.
val launchingNewExecutor = !oneExecutorPerWorker || assignedExecutors(pos) == 0
if (launchingNewExecutor) {
val assignedMemory = assignedExecutors(pos) * memoryPerExecutor
val enoughMemory = usableWorkers(pos).memoryFree - assignedMemory >= memoryPerExecutor
val underLimit = assignedExecutors.sum + app.executors.size < app.executorLimit
keepScheduling && enoughCores && enoughMemory && underLimit
} else {
// We're adding cores to an existing executor, so no need
// to check memory and executor limits
keepScheduling && enoughCores
}
}
// Keep launching executors until no more workers can accommodate any
// more executors, or if we have reached this application's limits
var freeWorkers = (0 until numUsable).filter(canLaunchExecutor)
while (freeWorkers.nonEmpty) {
freeWorkers.foreach { pos =>
var keepScheduling = true
while (keepScheduling && canLaunchExecutor(pos)) {
coresToAssign -= minCoresPerExecutor
assignedCores(pos) += minCoresPerExecutor
// If we are launching one executor per worker, then every iteration assigns 1 core
// to the executor. Otherwise, every iteration assigns cores to a new executor.
if (oneExecutorPerWorker) {
assignedExecutors(pos) = 1
} else {
assignedExecutors(pos) += 1
}
// Spreading out an application means spreading out its executors across as
// many workers as possible. If we are not spreading out, then we should keep
// scheduling executors on this worker until we use all of its resources.
// Otherwise, just move on to the next worker.
if (spreadOutApps) {
keepScheduling = false
}
}
}
freeWorkers = freeWorkers.filter(canLaunchExecutor)
}
assignedCores
}
- 遍历所有可用 workers,位置记为 pos;
- 如果 keepScheduling=true 且 canLaunchExecutor(pos)=true,就更新待分配核数和已分配核数;
- 如果配置了一个Worker上启一个Executor,则 assignedExecutors(pos) = 1,否则 assignedExecutors(pos) += 1;
- 判断配置项 spreadOutApps,如果为 ture,意味着 app 想将 Executors 规划到尽可能多的Workers上执行,则设置 keepScheduling = false,即跳到下一个 Worker 上开始规划 Executor;
- 循环迭代最终得到一个 Array[assignedCore],该数组与 Array[usableWorkerInfo] 对应。
1.2 allocateWorkerResourceToExecutors
该方法是为一个 Worker 上规划的所有 Executors 分配资源,每个 Worker 都会调用一次该方法。
- 进入
org.apache.spark.deploy.master.Master.scala
/**
* Allocate a worker's resources to one or more executors.
* @param app the info of the application which the executors belong to
* @param assignedCores number of cores on this worker for this application
* @param coresPerExecutor number of cores per executor
* @param worker the worker info
*/
private def allocateWorkerResourceToExecutors(
app: ApplicationInfo,
assignedCores: Int,
coresPerExecutor: Option[Int],
worker: WorkerInfo): Unit = {
// If the number of cores per executor is specified, we divide the cores assigned
// to this worker evenly among the executors with no remainder.
// Otherwise, we launch a single executor that grabs all the assignedCores on this worker.
val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
for (i <- 1 to numExecutors) {
val exec = app.addExecutor(worker, coresToAssign)
launchExecutor(worker, exec)
app.state = ApplicationState.RUNNING
}
}
-
assignedCores / coresPerExecutor
计算出该 Worker 上要分配的 Executors 数; - 循环调用(Executors数)次 app.addExecutor 方法依次将该 Worker 上的所有 ExecutorDesc 加入到 app 上;
- 循环调用(Executors数)次 launchExecutor 方法依次启动该 Worker 上的所有 Executor。
- 进入
org.apache.spark.deploy.master.Master.scala
private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
worker.addExecutor(exec)
worker.endpoint.send(LaunchExecutor(masterUrl,
exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
exec.application.driver.send(
ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
}
- 利用 WorkerRpcEndpointRef 发送消息 LaunchExecutor 给 Worker,用于在 Worker 上启动 Executor;
- 利用 DriverRpcEndpointRef 发送消息 ExecutorAdded 给 Driver,告诉 Driver Executor已经规划完毕。
2 启动Executor
- 进入
org.apache.spark.deploy.worker.Worker.scala
override def receive: PartialFunction[Any, Unit] = synchronized {
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
if (masterUrl != activeMasterUrl) {
logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")
} else {
try {
logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))
// Create the executor's working directory
val executorDir = new File(workDir, appId + "/" + execId)
if (!executorDir.mkdirs()) {
throw new IOException("Failed to create directory " + executorDir)
}
// Create local dirs for the executor. These are passed to the executor via the
// SPARK_EXECUTOR_DIRS environment variable, and deleted by the Worker when the
// application finishes.
val appLocalDirs = appDirectories.getOrElse(appId, {
val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
val dirs = localRootDirs.flatMap { dir =>
try {
val appDir = Utils.createDirectory(dir, namePrefix = "executor")
Utils.chmod700(appDir)
Some(appDir.getAbsolutePath())
} catch {
case e: IOException =>
logWarning(s"${e.getMessage}. Ignoring this directory.")
None
}
}.toSeq
if (dirs.isEmpty) {
throw new IOException("No subfolder can be created in " +
s"${localRootDirs.mkString(",")}.")
}
dirs
})
appDirectories(appId) = appLocalDirs
val manager = new ExecutorRunner(
appId,
execId,
appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
cores_,
memory_,
self,
workerId,
host,
webUi.boundPort,
publicAddress,
sparkHome,
executorDir,
workerUri,
conf,
appLocalDirs, ExecutorState.RUNNING)
executors(appId + "/" + execId) = manager
manager.start()
coresUsed += cores_
memoryUsed += memory_
sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
} catch {
case e: Exception =>
logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)
if (executors.contains(appId + "/" + execId)) {
executors(appId + "/" + execId).kill()
executors -= appId + "/" + execId
}
sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
Some(e.toString), None))
}
}
}
- 创建 Executor 工作目录;
- 创建并启动 ExecutorRunner;
- 更新 Worker 的 coresUsed 和 memoryUsed;
- 发消息告诉 Master ExecutorStateChanged。
- 进入
org.apache.spark.deploy.worker.ExecutorRunner.scala
private[worker] def start() {
workerThread = new Thread("ExecutorRunner for " + fullId) {
override def run() { fetchAndRunExecutor() }
}
workerThread.start()
// Shutdown hook that kills actors on shutdown.
shutdownHook = ShutdownHookManager.addShutdownHook { () =>
// It's possible that we arrive here before calling `fetchAndRunExecutor`, then `state` will
// be `ExecutorState.RUNNING`. In this case, we should set `state` to `FAILED`.
if (state == ExecutorState.RUNNING) {
state = ExecutorState.FAILED
}
killProcess(Some("Worker shutting down")) }
}
启新线程,调用 fetchAndRunExecutor 方法。
private def fetchAndRunExecutor() {
try {
// Launch the process
val subsOpts = appDesc.command.javaOpts.map {
Utils.substituteAppNExecIds(_, appId, execId.toString)
}
val subsCommand = appDesc.command.copy(javaOpts = subsOpts)
val builder = CommandUtils.buildProcessBuilder(subsCommand, new SecurityManager(conf),
memory, sparkHome.getAbsolutePath, substituteVariables)
val command = builder.command()
val formattedCommand = command.asScala.mkString("\"", "\" \"", "\"")
logInfo(s"Launch command: $formattedCommand")
builder.directory(executorDir)
builder.environment.put("SPARK_EXECUTOR_DIRS", appLocalDirs.mkString(File.pathSeparator))
// In case we are running this from within the Spark Shell, avoid creating a "scala"
// parent process for the executor command
builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0")
// Add webUI log urls
val baseUrl =
if (conf.getBoolean("spark.ui.reverseProxy", false)) {
s"/proxy/$workerId/logPage/?appId=$appId&executorId=$execId&logType="
} else {
s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="
}
builder.environment.put("SPARK_LOG_URL_STDERR", s"${baseUrl}stderr")
builder.environment.put("SPARK_LOG_URL_STDOUT", s"${baseUrl}stdout")
process = builder.start()
val header = "Spark Executor Command: %s\n%s\n\n".format(
formattedCommand, "=" * 40)
// Redirect its stdout and stderr to files
val stdout = new File(executorDir, "stdout")
stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
val stderr = new File(executorDir, "stderr")
Files.write(header, stderr, StandardCharsets.UTF_8)
stderrAppender = FileAppender(process.getErrorStream, stderr, conf)
// Wait for it to exit; executor may exit with code 0 (when driver instructs it to shutdown)
// or with nonzero exit code
val exitCode = process.waitFor()
state = ExecutorState.EXITED
val message = "Command exited with code " + exitCode
worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
} catch {
case interrupted: InterruptedException =>
logInfo("Runner thread for executor " + fullId + " interrupted")
state = ExecutorState.KILLED
killProcess(None)
case e: Exception =>
logError("Error running executor", e)
state = ExecutorState.FAILED
killProcess(Some(e.toString))
}
}
在上面启动 SchedulerBackend 时,会创建Command("org.apache.spark.executor.CoarseGrainedExecutorBackend", ......)
,并将 Command 放到 ApplicationDescription 中。
这里从 ApplicationDescription 取出 Command 执行。
3 启动CoarseGrainedExecutorBackend
- 进入
org.apache.spark.executor.CoarseGrainedExecutorBackend.scala
def main(args: Array[String]) {
var driverUrl: String = null
var executorId: String = null
var hostname: String = null
var cores: Int = 0
var appId: String = null
var workerUrl: Option[String] = None
val userClassPath = new mutable.ListBuffer[URL]()
var argv = args.toList
while (!argv.isEmpty) {
argv match {
case ("--driver-url") :: value :: tail =>
driverUrl = value
argv = tail
case ("--executor-id") :: value :: tail =>
executorId = value
argv = tail
case ("--hostname") :: value :: tail =>
hostname = value
argv = tail
case ("--cores") :: value :: tail =>
cores = value.toInt
argv = tail
case ("--app-id") :: value :: tail =>
appId = value
argv = tail
case ("--worker-url") :: value :: tail =>
// Worker url is used in spark standalone mode to enforce fate-sharing with worker
workerUrl = Some(value)
argv = tail
case ("--user-class-path") :: value :: tail =>
userClassPath += new URL(value)
argv = tail
case Nil =>
case tail =>
// scalastyle:off println
System.err.println(s"Unrecognized options: ${tail.mkString(" ")}")
// scalastyle:on println
printUsageAndExit()
}
}
if (driverUrl == null || executorId == null || hostname == null || cores <= 0 ||
appId == null) {
printUsageAndExit()
}
run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
System.exit(0)
}
CoarseGrainedExecutorBackend 调用 run 方法。
- 进入
org.apache.spark.executor.CoarseGrainedExecutorBackend.scala
private def run(
driverUrl: String,
executorId: String,
hostname: String,
cores: Int,
appId: String,
workerUrl: Option[String],
userClassPath: Seq[URL]) {
Utils.initDaemon(log)
SparkHadoopUtil.get.runAsSparkUser { () =>
// Debug code
Utils.checkHost(hostname)
// Bootstrap to fetch the driver's Spark properties.
val executorConf = new SparkConf
val fetcher = RpcEnv.create(
"driverPropsFetcher",
hostname,
-1,
executorConf,
new SecurityManager(executorConf),
clientMode = true)
val driver = fetcher.setupEndpointRefByURI(driverUrl)
val cfg = driver.askSync[SparkAppConfig](RetrieveSparkAppConfig)
val props = cfg.sparkProperties ++ Seq[(String, String)](("spark.app.id", appId))
fetcher.shutdown()
// Create SparkEnv using properties we fetched from the driver.
val driverConf = new SparkConf()
for ((key, value) <- props) {
// this is required for SSL in standalone mode
if (SparkConf.isExecutorStartupConf(key)) {
driverConf.setIfMissing(key, value)
} else {
driverConf.set(key, value)
}
}
cfg.hadoopDelegationCreds.foreach { tokens =>
SparkHadoopUtil.get.addDelegationTokens(tokens, driverConf)
}
val env = SparkEnv.createExecutorEnv(
driverConf, executorId, hostname, cores, cfg.ioEncryptionKey, isLocal = false)
env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
workerUrl.foreach { url =>
env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
}
env.rpcEnv.awaitTermination()
}
}
- 构建 RpcEnv 获取 DriverRpcEndpointRef,利用 DriverRpcEndpointRef 发送消息 RetrieveSparkAppConfig,即发送消息给 Driver,用于从 Driver 获取 SparkAppConfig;
- 调用 SparkEnv.createExecutorEnv 创建 Executor 端的 SparkEnv,类似于 Spark源码:初始化SparkContext 里创建 Driver 端的 SparkEnv,不再赘述;
- 创建 CoarseGrainedExecutorBackend(RpcEndpoint) 以 "Executor" 为名注册到 SparkContext.SparkEnv.ExecutorRpcEnv上,此时会调用 CoarseGrainedExecutorBackend.onStart 方法;
- 遍历 workerUrl,分别创建 WorkerWatcher(RpcEndpoint) 以 "WorkerWatcher" 为名注册到 SparkContext.SparkEnv.ExecutorRpcEnv 上,用于监测 Worker。
3.1 CoarseGrainedExecutorBackend.onStart
- 进入
org.apache.spark.executor.CoarseGrainedExecutorBackend.scala
override def onStart() {
logInfo("Connecting to driver: " + driverUrl)
rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
// This is a very fast action so we can use "ThreadUtils.sameThread"
driver = Some(ref)
ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls))
}(ThreadUtils.sameThread).onComplete {
// This is a very fast action so we can use "ThreadUtils.sameThread"
case Success(msg) =>
// Always receive `true`. Just ignore it
case Failure(e) =>
exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)
}(ThreadUtils.sameThread)
}
- 根据 driverUrl 获取 DriverRpcEndpointRef;
- 发送消息 RegisterExecutor 给 Driver,即注册 Executor 到 Driver,其目的是为了告诉 Driver 规划启动了哪些 Executors 给 Application。
4 注册Executor到Driver
- 进入 Driver 端
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.DriverEndpoint.scala
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>
if (executorDataMap.contains(executorId)) {
executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
context.reply(true)
} else if (scheduler.nodeBlacklist.contains(hostname)) {
// If the cluster manager gives us an executor on a blacklisted node (because it
// already started allocating those resources before we informed it of our blacklist,
// or if it ignored our blacklist), then we reject that executor immediately.
logInfo(s"Rejecting $executorId as it has been blacklisted.")
executorRef.send(RegisterExecutorFailed(s"Executor is blacklisted: $executorId"))
context.reply(true)
} else {
// If the executor's rpc env is not listening for incoming connections, `hostPort`
// will be null, and the client connection should be used to contact the executor.
val executorAddress = if (executorRef.address != null) {
executorRef.address
} else {
context.senderAddress
}
logInfo(s"Registered executor $executorRef ($executorAddress) with ID $executorId")
addressToExecutorId(executorAddress) = executorId
totalCoreCount.addAndGet(cores)
totalRegisteredExecutors.addAndGet(1)
val data = new ExecutorData(executorRef, executorAddress, hostname,
cores, cores, logUrls)
// This must be synchronized because variables mutated
// in this block are read when requesting executors
CoarseGrainedSchedulerBackend.this.synchronized {
executorDataMap.put(executorId, data)
if (currentExecutorIdCounter < executorId.toInt) {
currentExecutorIdCounter = executorId.toInt
}
if (numPendingExecutors > 0) {
numPendingExecutors -= 1
logDebug(s"Decremented number of pending executors ($numPendingExecutors left)")
}
}
executorRef.send(RegisteredExecutor)
// Note: some tests expect the reply to come after we put the executor in the map
context.reply(true)
listenerBus.post(
SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))
makeOffers()
}
}
- 创建 ExecutorData;
- 将新创建的 ExecutorData 加入到 CoarseGrainedSchedulerBackend.executorDataMap 中,即将 ExecutorData 注册到 Driver 上,CoarseGrainedSchedulerBackend 运行在 Driver 上,是 TaskScheduler 的后台线程;
- 利用 executorEndpointRef 发送 RegisteredExecutor 消息,回响应给 CoarseGrainedExecutorBackend,告诉 Executor 它已经注册到 Driver 上了;
- 调用 makeOffers 方法为 TaskSet 分配 Executors(后面文章分析)。
说明:
在启动 TaskScheduler 时,会启动 StandaloneSchedulerBackend,进而又启动 CoarseGrainedSchedulerBackend,其是 TaskScheduler 的后台管理线程,用于接收处理一些消息,运行在 Driver 端;
在启动 Executor 时,会启动 CoarseGrainedExecutorBackend,其是 Executor 的后台管理线程,用于接收处理一些消息,运行在 Executor 端;
5 总结
- SparkContext 初始化时,创建并启动了 TaskScheduler,TaskScheduler 启动时注册 Application 到 Master,Master 上启动 Application 时会调用 startExecutorsOnWorkers 方法启动 Executors;
- 首先从 Master.waitingApps 中取出正在等待的 app(TaskScheduler 注册 Application 到 Master 时加入到 Master.waitingApps 中),找出那些可以为该 app 分配 Executors 的 Workers;
- 调用 scheduleExecutorsOnWorkers 方法为那些可以分配 Executors 的 Workers 规划 Executors;
- 为那些规划好 Executors 的 Workers,分别调用 allocateWorkerResourceToExecutors 方法将 Worker 上的资源分配给 Executors;
- 资源分配好后,开始启动 Executors;
- 启动 CoarseGrainedExecutorBackend;
- 发送消息 RegisterExecutor 给 Driver,注册 Executor 到 Driver,封装 ExecutorData 加入到 CoarseGrainedSchedulerBackend.executorDataMap 中。