接上文 Spark源码分析之AM端运行流程(Driver) 分析完了在AM端Driver的运行流程,在最后我们看到AM向Yarn提交申请Executor容器请求,请求上下文参数如下图:
Yarn分配运行Executor容器流程和Yarn分配运行Driver容器流程一样(流程分析见 Spark源码分析之任务提交流程(Client)),我们继续看启动Executor的launch_container.sh
:
如上面两图可以看出,容器启动后的入口类是 org.apache.spark.executor.CoarseGrainedExecutorBackend;由--driver-url spark://CoarseGrainedScheduler@node0:43195
可知Driver注册的服务名为CoarseGrainedScheduler
,对应的服务类是DriverEndpoint
,是org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend的内部类。
下面我们进入CoarseGrainedExecutorBackend
的伴生类的main函数开始分析
(源码同样基于Spark2.4):
def main(args: Array[String]) {
var driverUrl: String = null
var executorId: String = null
var hostname: String = null
var cores: Int = 0
var appId: String = null
var workerUrl: Option[String] = None
val userClassPath = new mutable.ListBuffer[URL]()
var argv = args.toList
while (!argv.isEmpty) {
argv match {
case ("--driver-url") :: value :: tail =>
driverUrl = value
argv = tail
case ("--executor-id") :: value :: tail =>
executorId = value
argv = tail
case ("--hostname") :: value :: tail =>
hostname = value
argv = tail
case ("--cores") :: value :: tail =>
cores = value.toInt
argv = tail
case ("--app-id") :: value :: tail =>
appId = value
argv = tail
case ("--worker-url") :: value :: tail =>
// Worker url is used in spark standalone mode to enforce fate-sharing with worker
workerUrl = Some(value)
argv = tail
case ("--user-class-path") :: value :: tail =>
userClassPath += new URL(value)
argv = tail
case Nil =>
case tail =>
// scalastyle:off println
System.err.println(s"Unrecognized options: ${tail.mkString(" ")}")
// scalastyle:on println
printUsageAndExit()
}
}
if (hostname == null) {
hostname = Utils.localHostName()
log.info(s"Executor hostname is not provided, will use '$hostname' to advertise itself")
}
if (driverUrl == null || executorId == null || cores <= 0 || appId == null) {
printUsageAndExit()
}
run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
System.exit(0)
}
main
函数操作包含:获取命令行参数,解析校验参数,最后调用伴生类的run
方法(解析见代码注释),如下:
private def run(
driverUrl: String,
executorId: String,
hostname: String,
cores: Int,
appId: String,
workerUrl: Option[String],
userClassPath: Seq[URL]) {
Utils.initDaemon(log)
SparkHadoopUtil.get.runAsSparkUser { () =>
// Debug code
Utils.checkHost(hostname)
// Bootstrap to fetch the driver's Spark properties.
// 首先创建名为fetcher的RpcEnv,主要用于从driver拉取Spark配置信息,用完后面就关闭了
val executorConf = new SparkConf
val fetcher = RpcEnv.create(
"driverPropsFetcher",
hostname,
-1,
executorConf,
new SecurityManager(executorConf),
clientMode = true)
// 通过--driver-url参数注册获得Driver引用(driverUrl例如:spark://CoarseGrainedScheduler@node0:43195)
val driver = fetcher.setupEndpointRefByURI(driverUrl)
// 向Driver发送RetrieveSparkAppConfig消息,拉取Spark配置信息
val cfg = driver.askSync[SparkAppConfig](RetrieveSparkAppConfig)
val props = cfg.sparkProperties ++ Seq[(String, String)](("spark.app.id", appId))
fetcher.shutdown()
// Create SparkEnv using properties we fetched from the driver.
// 通过拉取的driver配置信息创建SparkConf
val driverConf = new SparkConf()
for ((key, value) <- props) {
// this is required for SSL in standalone mode
if (SparkConf.isExecutorStartupConf(key)) {
driverConf.setIfMissing(key, value)
} else {
driverConf.set(key, value)
}
}
cfg.hadoopDelegationCreds.foreach { tokens =>
SparkHadoopUtil.get.addDelegationTokens(tokens, driverConf)
}
// 创建Executor的SparkEnv
val env = SparkEnv.createExecutorEnv(
driverConf, executorId, hostname, cores, cfg.ioEncryptionKey, isLocal = false)
// 创建CoarseGrainedExecutorBackend实例,并注册到自身的Executor Env的rpcEnv中
env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
// 创建WorkerWatcher,用于当worker发生异常情况时,关闭CoarseGrainedExecutorBackend(仅standalone模式有效)
workerUrl.foreach { url =>
env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
}
// 等待,直到rpcEnv退出
env.rpcEnv.awaitTermination()
}
}
其中重点是val env = SparkEnv.createExecutorEnv(...)
和env.rpcEnv.setupEndpoint(...)
,涉及Spark的Rpc框架设计(详细分析见另一篇博文 Spark源码分析之Rpc框架),在前者SparkEnv.createExecutorEnv()
函数初始化注册了多个服务,例如:MapOutputTracker
、BlockManagerMaster
和OutputCommitCoordinator
等,后者env.rpcEnv.setupEndpoint()
函数又单独注册了Executor
服务(为啥这么分开注册,目前没明白原因,但不影响代码分析),注册完后就可以和driver进行正常消息交互。
那我们看交互是如何开始的?
从 Spark源码分析之Rpc框架 分析可知,在setupEndpoint
函数注册过程中,会向Dispatcher
调度器注册EndpointData
,而在实例化EndpointData
时候,会维护一个Inbox
用于接受消息,在实例化Inbox
时候会放入第一个消息OnStart
消息,这样在后面的循环消费消息时候就能消费第一个消息调用服务对应的onStart()
方法了,即是每个服务在注册过程都会首先触发调用自身的onStart方法。下面以注册CoarseGrainedExecutorBackend
服务调用其onStart()
方法演示其过程:
在其StandaloneSchedulerBackend.onstart()
方法中会向Driver(即CoarseGrainedExecutorBackend
的内部类DriverEndpoint
)发送RegisterExecutor消息,如下图:
DriverEndpoint
接受到RegisterExecutor
消息后,判断executor是否重复注册,如果重复注册直接回复消息;否则,初始化生成ExecutorData
并添加到内存中,并向CoarseGrainedExecutorBackend
发送RegisteredExecutor
消息。CoarseGrainedExecutorBackend
接收到消息后,会初始化 Executor 实例,初始化Executor工作,例如定时发送心跳等。从上面分析我们可以看出:CoarseGrainedExecutorBackend
是一个JVM进程,该进程为Executor
的守护进程,用于Executor的创建和维护,CoarseGrainedExecutorBackend
和Executor
是一一对应,一个Worker可以启动多个CoarseGrainedExecutorBackend
。
至此完成了Executor
在Driver
的注册,之后Executor
就可以接受Driver
下发的各种消息了。