环境:
spark 2.3.3
scala 2.11.8
Java 1.8.0_141
_taskScheduler和executor运行的代码调用流程,如下图所示:
将上述过程,整理简图如下:
下面进行具体说明:
一:Spark Executor工作原理:
1.创建并启动TaskScheduler
在SparkContext中调用内部方法createTaskScheduler创建TaskScheduler,并启动TaskScheduler;
1.1在Standalone模式下createTaskScheduler匹配到如下代码:
SparkContext:
case SPARK_REGEX(sparkUrl) =>
val scheduler = new TaskSchedulerImpl(sc)
val masterUrls = sparkUrl.split(",").map("spark://" + _)
val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
scheduler.initialize(backend)
(backend, scheduler)
所以在Standalone 模式下,_taskScheduler是TaskSchedulerImpl, _schedulerBackend是StandaloneSchedulerBackend;
1.2 SparkContext创建并启动
SparkContext:
// Create and start the scheduler
val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)
_schedulerBackend = sched
_taskScheduler = ts
_dagScheduler = new DAGScheduler(this)
_heartbeatReceiver.ask[Boolean](TaskSchedulerIsSet)
// start TaskScheduler after taskScheduler sets DAGScheduler reference in DAGScheduler's
// constructor
_taskScheduler.start()
2. 创建并注册DriverEndpint('CoarseGrainedScheduler')
TaskSchedulerImpl在启动过程中,会去调用StandaloneSchedulerBackend的start方法;StandaloneSchedulerBackend的start方法会调用父类CoarseGrainedSchedulerBackend的start 方法;
CoarseGrainedSchedulerBackend:
override def start() {
val properties = new ArrayBuffer[(String, String)]
for ((key, value) <- scheduler.sc.conf.getAll) {
if (key.startsWith("spark.")) {
properties += ((key, value))
}
}
// TODO (prashant) send conf instead of properties
driverEndpoint = createDriverEndpointRef(properties)
}
protected def createDriverEndpointRef(
properties: ArrayBuffer[(String, String)]): RpcEndpointRef = {
rpcEnv.setupEndpoint(ENDPOINT_NAME, createDriverEndpoint(properties))
}
protected def createDriverEndpoint(properties: Seq[(String, String)]): DriverEndpoint = {
new DriverEndpoint(rpcEnv, properties)
}
CoarseGrainedSchedulerBackend的start 方法创建并注册DriverEndpint('CoarseGrainedScheduler’),此时会触发DriverEndpint的onStart方法执行:
override def onStart() {
// Periodically revive offers to allow delay scheduling to work
val reviveIntervalMs = conf.getTimeAsMs("spark.scheduler.revive.interval", "1s")
reviveThread.scheduleAtFixedRate(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
Option(self).foreach(_.send(ReviveOffers))
}
}, 0, reviveIntervalMs, TimeUnit.MILLISECONDS)
}
3.注册Application
3.1在StandaloneSchedulerBackend的start方法,调用完父类StandaloneSchedulerBackend的start方法后,会创建StandaloneAppClient并启动
// The endpoint for executors to talk to us
val driverUrl = RpcEndpointAddress(
sc.conf.get("spark.driver.host"),
sc.conf.get("spark.driver.port").toInt,
CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
val args = Seq(
"--driver-url", driverUrl,
...
// Start executors with a few necessary configs for registering with the scheduler
val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
val javaOpts = sparkJavaOpts ++ extraJavaOpts
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
val webUrl = sc.ui.map(_.webUrl).getOrElse("")
val coresPerExecutor = conf.getOption("spark.executor.cores").map(_.toInt)
...
val appDesc = ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
webUrl, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)
client = new StandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
client.start()
3.2创建并注册ClientEndpoint('AppClient’)
在StandaloneAppClient的start方法中,注册ClientEndpoint('AppClient’)
def start() {
// Just launch an rpcEndpoint; it will call back into the listener.
endpoint.set(rpcEnv.setupEndpoint("AppClient", new ClientEndpoint(rpcEnv)))
}
此时,会触发ClientEndpoint的onStart方法执行,在onStart方法中会调用registerWithMaster方法向Master注册Application
override def onStart(): Unit = {
try {
registerWithMaster(1)
} ...
}
registerWithMaster内部会调用tryRegisterAllMasters向所有Master注册,只有有一个Master注册成功,其他注册就会去取消掉;
//Once we connect to a master successfully, all scheduling work and Futures will be cancelled.
在tryRegisterAllMasters内部是向Master发送消息:RegisterApplication
logInfo("Connecting to master " + masterAddress.toSparkURL + "...")
val masterRef = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)
masterRef.send(RegisterApplication(appDescription, self))
3.3 Master注册Application
Master接收到RegisterApplication消息后,会调用registerApplication(app)方法进行注册,
case RegisterApplication(description, driver) =>
// TODO Prevent repeated registrations from some driver
if (state == RecoveryState.STANDBY) {
// ignore, don't send response
} else {
logInfo("Registering app " + description.name)
val app = createApplication(description, driver)
registerApplication(app)
logInfo("Registered app " + description.name + " with ID " + app.id)
persistenceEngine.addApplication(app)
driver.send(RegisteredApplication(app.id, self))
schedule()
}
其实只是记录app的各种信息:
private def registerApplication(app: ApplicationInfo): Unit = {
val appAddress = app.driver.address
if (addressToApp.contains(appAddress)) {
logInfo("Attempted to re-register application at same address: " + appAddress)
return
}
applicationMetricsSystem.registerSource(app.appSource)
apps += app
idToApp(app.id) = app
endpointToApp(app.driver) = app
addressToApp(appAddress) = app
waitingApps += app
}
记录完Application的信息后,会向ClientEndpoint返回注册消息
driver.send(RegisteredApplication(app.id, self))
3.4 ClientEndpoint处理Master返回的RegisteredApplication消息
ClientEndpoint收到Master返回的RegisteredApplication消息后,会进行记录,并修改registered状态为true;
override def receive: PartialFunction[Any, Unit] = {
case RegisteredApplication(appId_, masterRef) =>
// FIXME How to handle the following cases?
// 1. A master receives multiple registrations and sends back multiple
// RegisteredApplications due to an unstable network.
// 2. Receive multiple RegisteredApplication from different masters because the master is
// changing.
appId.set(appId_)
registered.set(true)
master = Some(masterRef)
listener.connected(appId.get)
3.5 Master给Application分配资源
Master接收到RegisterApplication消息后,进行记录后,会向ClientEndpoint返回消息,在此之后,Master会接着调用schedule给Application分配资源:
具体是schedule() ->startExecutorsOnWorkers()->allocateWorkerResourceToExecutors()->launchExecutor()
在launchExecutor()中会发送两条消息:
private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
worker.addExecutor(exec)
worker.endpoint.send(LaunchExecutor(masterUrl,
exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
exec.application.driver.send(
ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
}
第二条:exec.application.driver.send(ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory)),是向ClientEndPoint发送ExecutorAdded消息,ClientEndPoint收到消息后,只是打印下日志
ClientEndPoint:
case ExecutorAdded(id: Int, workerId: String, hostPort: String, cores: Int, memory: Int) =>
val fullId = appId + "/" + id
logInfo("Executor added: %s on %s (%s) with %d core(s)".format(fullId, workerId, hostPort,
cores))
listener.executorAdded(fullId, workerId, hostPort, cores, memory)
第一条:worker.endpoint.send(LaunchExecutor(masterUrl,exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory)),是想Worker发送LaunchExecutor消息,是我们的重点。
4.Worker启动Executor
4.1 Worker创建并启动ExecutorRunner
Worker收到Master发送的LaunchExecutor消息后,会创建ExecutorRunner,并启动
Worker:
val manager = new ExecutorRunner(
appId,
execId,
appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
cores_,
memory_,
self,
workerId,
host,
webUi.boundPort,
publicAddress,
sparkHome,
executorDir,
workerUri,
conf,
appLocalDirs, ExecutorState.RUNNING)
executors(appId + "/" + execId) = manager
manager.start()
4.2启动CoarseGrainedExecutorBackend
ExecutorRunner的start方法,会在线程中调用fetchAndRunExecutor方法:
ExecutorRunner:
private[worker] def start() {
workerThread = new Thread("ExecutorRunner for " + fullId) {
override def run() { fetchAndRunExecutor() }
}...
而fetchAndRunExecutor方法会从ApplicationDescription获取启动命令,即StandaloneAppClient的构造参数,其实是org.apache.spark.executor.CoarseGrainedExecutorBackend类和启动参数;进行命令启动:
process = builder.start()
会启动CoarseGrainedExecutorBackend的main方法,在main方法中会调用run方法,在run方法内部会注册CoarseGrainedExecutorBackend:
CoarseGrainedExecutorBackend:
env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
此时会触发CoarseGrainedExecutorBackend的onStart方法:
CoarseGrainedExecutorBackend:
override def onStart() {
logInfo("Connecting to driver: " + driverUrl)
rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
// This is a very fast action so we can use "ThreadUtils.sameThread"
driver = Some(ref)
ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls))
}(ThreadUtils.sameThread).onComplete {
// This is a very fast action so we can use "ThreadUtils.sameThread"
case Success(msg) =>
// Always receive `true`. Just ignore it
case Failure(e) =>
exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)
}(ThreadUtils.sameThread)
}
这里的driverUrl就是DriverEndpoint,会向DriverEndpoint发送RegisterExecutor消息;
DriverEndpoint收到RegisterExecutor消息后,会将Executor信息封装在ExecutorData中,并保存在CoarseGrainedExecutorBackend的executorDataMap(private val executorDataMap = new HashMap[String, ExecutorData])中
DriverEndpoint:
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>
...
// If the executor's rpc env is not listening for incoming connections, `hostPort`
// will be null, and the client connection should be used to contact the executor.
val executorAddress = if (executorRef.address != null) {
executorRef.address
} else {
context.senderAddress
}
logInfo(s"Registered executor $executorRef ($executorAddress) with ID $executorId")
addressToExecutorId(executorAddress) = executorId
totalCoreCount.addAndGet(cores)
totalRegisteredExecutors.addAndGet(1)
val data = new ExecutorData(executorRef, executorAddress, hostname,
cores, cores, logUrls)
// This must be synchronized because variables mutated
// in this block are read when requesting executors
CoarseGrainedSchedulerBackend.this.synchronized {
executorDataMap.put(executorId, data)
if (currentExecutorIdCounter < executorId.toInt) {
currentExecutorIdCounter = executorId.toInt
}
if (numPendingExecutors > 0) {
numPendingExecutors -= 1
logDebug(s"Decremented number of pending executors ($numPendingExecutors left)")
}
}
executorRef.send(RegisteredExecutor)
// Note: some tests expect the reply to come after we put the executor in the map
context.reply(true)
listenerBus.post(
SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))
makeOffers()
}
DriverEndpoint记录完Executor信息后,会像CoarseGrainedExecutorBackend发送RegisteredExecutor消息
4.3 启动Executor
CoarseGrainedExecutorBackend在收到DriverEndpoint发送的RegisteredExecutor消息后,会床Executor;
override def receive: PartialFunction[Any, Unit] = {
case RegisteredExecutor =>
logInfo("Successfully registered with driver")
try {
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
} catch {
case NonFatal(e) =>
exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
}
至此Executor就创建成功了!!!
5.Executor处理Task任务
这里我们简单看下Executo如何处理Task任务,具体细节后面再分析:
Executor内部会创建一个线程池,用来出Task任务:
Executors.newCachedThreadPool(threadFactory).asInstanceOf[ThreadPoolExecutor]
DriverEndpoint发送LaunchTask给CoarseGrainedExecutorBackend, CoarseGrainedExecutorBackend收到消息后,调用Executor.launchTask方法,在Executor.launchTask方法中会调用Executor的线程池进行处理。