当我们写好的Spark程序想要提交到集群运行时,我们通常是调用SparkSubmit脚本进行提交,那调用Spark Submit脚本是如何在集群中执行我们的Spark程序的,Master是如何进行调度的,Executor如何向Driver注册的等等,过程复杂,这篇文章将为你一一揭晓。
执行命令bin/spark-submit,内部通过exec执行${SPARK_HOME}/bin/spark-class org.apache.spark.deploy.SparkSubmit。
通过spark-class脚本,最终执行的命令中,执行了程序的入口为org.apache.spark.deploy.SparkSubmit,类中的doRunMain中通过反射调用自定勒的main方法。
自定义main方法中构造了SparkContext执行其主构造器
//TODO 该方法创建了一个ActorSystem
private[spark] def createSparkEnv(
conf: SparkConf,
isLocal: Boolean,
listenerBus: LiveListenerBus): SparkEnv = {
SparkEnv.createDriverEnv(conf, isLocal, listenerBus)
}
private[spark] val env = createSparkEnv(conf, isLocal, listenerBus)
SparkEnv.set(env)
------------------------
// Create the ActorSystem for Akka and get the port it binds to.
val (actorSystem, boundPort) = {
val actorSystemName = if (isDriver) driverActorSystemName else executorActorSystemName
//TODO 利用AkkaUtils这个工具类创建ActorSystem
AkkaUtils.createActorSystem(actorSystemName, hostname, port, conf, securityManager)
}
val SPARK_REGEX = """spark://(.*)""".r
------------
//TODO 创建一个TaskScheduler
private[spark] var (schedulerBackend, taskScheduler) =
SparkContext.createTaskScheduler(this, master)
------------
//TODO spark的StandAlone模式,根据URL匹配的Spark://
case SPARK_REGEX(sparkUrl) =>
//TODO 创建了一个TaskSchedulerImpl
val scheduler = new TaskSchedulerImpl(sc)
val masterUrls = sparkUrl.split(",").map("spark://" + _)
//TODO 创建了一个SparkDeploySchedulerBackend
val backend = new SparkDeploySchedulerBackend(scheduler, sc, masterUrls)
//TODO 调用initialize创建调度器
scheduler.initialize(backend)
(backend, scheduler)
//TODO 通过ActorSystem创建了一个Actor,这个心跳是Executors和DriverActor的心跳
private val heartbeatReceiver = env.actorSystem.actorOf(
Props(new HeartbeatReceiver(taskScheduler)), "HeartbeatReceiver")
this指的是SparkContext
//TODO 创建了一个DAGScheduler,以后用来把DAG切分成Stage
dagScheduler = new DAGScheduler(this)
DAGScheduler在初始化的时候,就会启动eventProcessLoop(FIFO),DAGSchedulerEventProcessLoop继承了EventLoop,启动之后,会启动一个后台线程,从队列BlockingQueue里取各种event,然后根据取出的event,进行不同的处理。
//TODO 启动taskScheduler
taskScheduler.start()
//TODO 首先调用SparkDeploySchedulerBackend的start方法
backend.start()
//TODO 首先调用父类的start方法来创建DriverActor
super.start()
创建DriverActor,用于与Executor通信
// TODO 通过ActorSystem创建DriverActor
driverActor = actorSystem.actorOf(
Props(new DriverActor(properties)), name = CoarseGrainedSchedulerBackend.ACTOR_NAME)
}
回到SparkDeploySchedulerBackend调用父类start()方法之后
创建AppClient,并调用start方法,start方法中创建了ClientActor用于与Master通信。把应用信息封装在一个ApplicationDescription中发给Master
//TODO 准备一些参数,以后把这些参数封装到一个对象中,然后将该对象发送给Master
val driverUrl = AkkaUtils.address(
AkkaUtils.protocol(actorSystem),
SparkEnv.driverActorSystemName,
conf.get("spark.driver.host"),
conf.get("spark.driver.port"),
CoarseGrainedSchedulerBackend.ACTOR_NAME)
//TODO 这个参数是以后Executor的实现类
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
//TODO 把参数封装到ApplicationDescription
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
appUIAddress, sc.eventLogDir, sc.eventLogCodec)
//TODO 创建一个AppClient把ApplicationDescription通过主构造器传进去
client = new AppClient(sc.env.actorSystem, masters, appDesc, this, conf)
//TODO 然后调用AppClient的start方法,在start方法中创建了一个ClientActor用于与Master通信
client.start()
waitForRegistration()
创建ClientActor,用于与Executor通信
def start() {
// Just launch an actor; it will call back into the listener.
//TODO 创建ClientActor调用主构造器 -》preStart -》 receive
actor = actorSystem.actorOf(Props(new ClientActor))
}
生命周期方法(preStart())中向Master注册
//TODO ClientActor的生命周期方法
override def preStart() {
context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent])
try {
//TODO ClientActor向Master注册
registerWithMaster()
} catch {
case e: Exception =>
logWarning("Failed to connect to master", e)
markDisconnected()
context.stop(self)
}
}
向Master注册成功后Master会返回case class RegisteredApplication
//TODO Master发送给ClientActor注册成功的消息
case RegisteredApplication(appId_, masterUrl) =>
appId = appId_
registered = true
changeMaster(masterUrl)
listener.connected(appId)
接收到来自DriverActor的注册消息,保存信息,发送注册成功的消息,调用schedule()方法,开始调度资源,根据任务决定在哪些Worker上启动Executor
//TODO ClientActor发送过来的注册应用的消息
case RegisterApplication(description) => {
if (state == RecoveryState.STANDBY) {
// ignore, don't send response
} else {
logInfo("Registering app " + description.name)
//TODO 首先把应用的信息放到内存中存储
val app = createApplication(description, sender)
registerApplication(app)
logInfo("Registered app " + description.name + " with ID " + app.id)
//TODO 利用持久化引擎保存
persistenceEngine.addApplication(app)
//TODO Master向ClientActor发送注册成功的消息
sender ! RegisteredApplication(app.id, masterUrl)
//TODO 重要:Master开始调度资源,其实就是把任务启动到哪些Worker上
schedule()
}
}
Master调用该方法开始进行调度。在1.3中一个worker只能存在一个application的Executor,在1.6去掉了这种限制
有两种方式,尽量大散和尽量集中
发送消息启动Executor,并告诉ClientActor Executor启动成功
//TODO Master发送消息让Worker启动Executor
launchExecutor(usableWorkers(pos), exec)
def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc) {
logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
//TODO 记录该Worker使用的资源
worker.addExecutor(exec)
//TODO Master发送消息给Worker,把参数通过case class传递给Worker,让他启动Executor,
worker.actor ! LaunchExecutor(masterUrl,
exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory)
//TODO Master向ClientActor发送消息,告诉它Executor已经启动了
exec.application.driver ! ExecutorAdded(
exec.id, worker.id, worker.hostPort, exec.cores, exec.memory)
}
接收到来自Master的消息,里面封装了以后要启动的Executor的信息
//TODO Master发送给Worker的消息,让Worker启动Executor。
//TODO LaunchExecutor是一个Case Class,里面封装了以后要启动的Executor的信息
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) => ...........
创建一个ExecutorRunner,将参数都放到其中,然后在通过他启动Executor
//TODO 创建一个ExecutorRunner,将参数都放到其中,然后在通过他启动Executor
val manager = new ExecutorRunner(
appId,
execId,
appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
cores_,
memory_,
self,
workerId,
host,
webUi.boundPort,
publicAddress,
sparkHome,
executorDir,
akkaUrl,
conf,
appLocalDirs, ExecutorState.LOADING)
保存Executor信息,并调用ExecutorRunner的start方法来启动Executor java子进程
//TODO 把ExecutorID-> Executor放到一个Map中了
executors(appId + "/" + execId) = manager
//TODO 调用ExecutorRunner的start方法来启动Executor java子进程
manager.start()
coresUsed += cores_
memoryUsed += memory_
master ! ExecutorStateChanged(appId, execId, manager.state, None, None)
}
创建一个线程,通过一个线程来启动一个Java子进程
def start() {
//TODO 先创建一个线程对象,然后通过一个线程来启动一个Java子进程
workerThread = new Thread("ExecutorRunner for " + fullId) {
override def run() { fetchAndRunExecutor() }
}
//TODO 调用线程对象的start方法 -> 线程对象的run方法
workerThread.start()
// Shutdown hook that kills actors on shutdown.
shutdownHook = new Thread() {
override def run() {
killProcess(Some("Worker shutting down"))
}
}
Runtime.getRuntime.addShutdownHook(shutdownHook)
}
线程调用fetchAndRunExecutor()方法来启动子进程
//TODO 线程对象调用该方法启动Java子进程
def fetchAndRunExecutor() {
try {
// Launch the process
//TODO 启动子进程
val builder = CommandUtils.buildProcessBuilder(appDesc.command, memory,
sparkHome.getAbsolutePath, substituteVariables)
val command = builder.command()
logInfo("Launch command: " + command.mkString("\"", "\" \"", "\""))
builder.directory(executorDir)
builder.environment.put("SPARK_LOCAL_DIRS", appLocalDirs.mkString(","))
// In case we are running this from within the Spark Shell, avoid creating a "scala"
// parent process for the executor command
builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0")
// Add webUI log urls
val baseUrl =
s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="
builder.environment.put("SPARK_LOG_URL_STDERR", s"${baseUrl}stderr")
builder.environment.put("SPARK_LOG_URL_STDOUT", s"${baseUrl}stdout")
//TODO 真正启动一个Java子进程 -> CoarseGrainedExecutorBackend的main方法
process = builder.start()
val header = "Spark Executor Command: %s\n%s\n\n".format(
command.mkString("\"", "\" \"", "\""), "=" * 40)
启动好一个Executor后,会调用Executor的生命周期方法,在生命周期方法中向Driver注册
//TODO CoarseGrainedExecutorBackend的生命周期方法
override def preStart() {
logInfo("Connecting to driver: " + driverUrl)
//TODO 跟Driver建立连接
driver = context.actorSelection(driverUrl)
//TODO Executor向DriverActor发送消息,来注册Exectuor
driver ! RegisterExecutor(executorId, hostPort, cores, extractLogUrls)
context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent])
}
接收Executor向DriverActor发送的注册消息(RegisterExecutor),并返回注册成功消息(RegisteredExecutor)
def receiveWithLogging = {
//TODO Executor向DriverActor发送的消息
case RegisterExecutor(executorId, hostPort, cores, logUrls) =>
Utils.checkHostPort(hostPort, "Host port expected " + hostPort)
if (executorDataMap.contains(executorId)) {
sender ! RegisterExecutorFailed("Duplicate executor ID: " + executorId)
} else {
logInfo("Registered executor: " + sender + " with ID " + executorId)
//TODO DriverActor向Executor发送消息,告诉Executor注册成功
sender ! RegisteredExecutor
接收Driver发送来的注册成功消息,创建一个Executor实例
//TODO DirverActor发送给Executor的消息,告诉它已经注册成功
case RegisteredExecutor =>
logInfo("Successfully registered with driver")
val (hostname, _) = Utils.parseHostPort(hostPort)
//TODO 创建了一个Executor实例,用来执行业务逻辑
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
查看是否有任务需要提交
//TODO 查看是否有任务需要提交(DriverActor -> Executor)
makeOffers()
收到DriverActor的注册成功消息后,创建一个线程池用来执行task
//TODO DirverActor发送给Executor的消息,告诉她已经注册成功
case RegisteredExecutor =>
logInfo("Successfully registered with driver")
val (hostname, _) = Utils.parseHostPort(hostPort)
//TODO 创建了一个Executor实例,用来执行业务逻辑
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
//TODO 初始化线程池
val threadPool = Utils.newDaemonCachedThreadPool("Executor task launch worker")
向DriverActor发送心跳
//TODO Executor向DriverActor发送心跳
startDriverHeartbeater()
到这就完成Executor的注册工作啦,接下来就是Driver提交的job到Executor运行了
本篇先讲到这里,如果喜欢的话记得关注喔, 公众号同名