点个关注吧,球球啦!
spark源码剖析相关:
spark Standalone master启动流程 https://blog.csdn.net/Mr_kidBK/article/details/105131444
Standalone worker启动流程 https://blog.csdn.net/Mr_kidBK/article/details/105356632
本文章是对standalone方式启动spark的源码角度说明从执行shell脚本到worker启动主要步骤,spark版本2.1.x,信息来源为spark官网和源码,如果与网上其他文章有较大的偏差,建议以我为准!
运行start-all.sh时会依次调用start-slaves.sh->slave.sh
在要启动worker的机器上执行spark-slave.shspark-daemon.sh ->spark-class
最终在目标机器上执行:
/opt/module/jdk1.8.0/bin/java -cp /opt/module/spark-standalone/conf/:/opt/module/spark-standalone/jars/* -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=dai102:2181,dai103:2181,dai104:2181 -Dspark.deploy.zookeeper.dir=/spark_dai -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://dai102:7077
建议配合master启动流程一起食用
spark Standalone master启动流程
https://blog.csdn.net/Mr_kidBK/article/details/105131444
- 构建解析参数的实例 (与master过程相同,在此不再进行过多赘述)
- 启动 Rpc 环境和 Rpc 终端
def main(argStrings: Array[String]) {
Utils.initDaemon(log)
val conf = new SparkConf
// 构建解析参数的实例
val args = new WorkerArguments(argStrings, conf)
// 启动 Rpc 环境和 Rpc 终端
val rpcEnv = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, args.cores,
args.memory, args.masters, args.workDir, conf = conf)
rpcEnv.awaitTermination()
}
我们都知道nettyrpc在一个生命周期中会先被调用构造方法和onstart方法
构造方法没什么可看的,都是一些配置。
主要是onstart方法
- 检查 Worker 是否未注册
- 创建工作目录
- 启动 shuffle 服务
- 创建 Worker UI
- 向 Master 注册 Worker (核心逻辑)
override def onStart() {
// 第一次启动要求 Worker 未注册
assert(!registered)
logInfo("Starting Spark worker %s:%d with %d cores, %s RAM".format(
host, port, cores, Utils.megabytesToString(memory)))
logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}")
logInfo("Spark home: " + sparkHome)
// 创建工作目录
createWorkDir()
// 如果可以用, 则启动 shuffle 服务
shuffleService.startIfEnabled()
// Worker的 WebUI(master 8080 worker 8081 job 4040)
webUi = new WorkerWebUI(this, workDir, webUiPort)
webUi.bind()
workerWebUiUrl = s"http://$publicAddress:${webUi.boundPort}"
// 向 Master 注册 Worker (核心逻辑) ->
registerWithMaster()
metricsSystem.registerSource(workerSource)
metricsSystem.start()
// Attach the worker metrics servlet handler to the web ui after the metrics system is started.
metricsSystem.getServletHandlers.foreach(webUi.attachHandler)
}
向master注册调用了registerWithMaster方法
registrationRetryTimer的默认值为none
private var registrationRetryTimer: Option[JScheduledFuture[_]] = None
- 尝试向所有的 master 注册
- 为避免注册失败以固定频率再尝试向所有的 master 注册
/**
* 向 Master 注册 Worker
*/
private def registerWithMaster() {
// onDisconnected may be triggered multiple times, so don't attempt registration
// if there are outstanding registration attempts scheduled.
registrationRetryTimer match {
case None =>
registered = false
// 向所有的 Master 注册
registerMasterFutures = tryRegisterAllMasters()
connectionAttemptCount = 0
// 前面有可能注册失败, 后面再以固定的品向 master 注册
registrationRetryTimer = Some(forwordMessageScheduler.scheduleAtFixedRate(
new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
Option(self).foreach(_.send(ReregisterWithMaster))
}
},
// [0.5, 1.5) * 10
INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,
INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,
TimeUnit.SECONDS))
case Some(_) =>
logInfo("Not spawning another attempt to register with the master, since there is an" +
" attempt scheduled already.")
}
}
- 从线程池中启动线程来执行 Worker 向 Master 注册
- 向master发送RPC ask请求
// 尝试向所有的master注册
private def tryRegisterAllMasters(): Array[JFuture[_]] = {
masterRpcAddresses.map { masterAddress =>
// 从线程池中启动线程来执行 Worker 向 Master 注册
registerMasterThreadPool.submit(new Runnable {
override def run(): Unit = {
try {
logInfo("Connecting to master " + masterAddress + "...")
// 根据 Master 的地址得到一个 Master 的 RpcEndpointRef, 然后就可以和 Master 进行通讯了.
val masterEndpoint: RpcEndpointRef = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)
// 向 Master 注册
registerWithMaster(masterEndpoint)
} catch {
case ie: InterruptedException => // Cancelled
case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e)
}
}
})
}
}
private def registerWithMaster(masterEndpoint: RpcEndpointRef): Unit = {
// 向 Master 对应的 receiveAndReply 方法发送信息
// 信息的类型是 RegisterWorker, 包括 Worker 的一些信息: id, 主机地址, 端口号, 内存, webUi
// ask: 发送信息的时候, 要求对方有回应
masterEndpoint.ask[RegisterWorkerResponse](RegisterWorker(
workerId, host, port, self, cores, memory, workerWebUiUrl))
.onComplete {
// This is a very fast action so we can use "ThreadUtils.sameThread"
case Success(msg) =>
Utils.tryLogNonFatalError {
handleRegisterResponse(msg)
}
case Failure(e) =>
logError(s"Cannot register with master: ${masterEndpoint.address}", e)
System.exit(1)
}(ThreadUtils.sameThread)
}
Master收到请求后,会先判断该worker是否已经注册,如果已经注册但是worker是dead状态,删除worker信息,重新注册。
Master会有三个地方存储worker的相关信息,一个HashSet,两个HashMap(一个id映射,一个地址映射)
然后master会将自己的ref,UIurl作为返回值返回给worker
worker收到返回信息后,首先会庆祝自己注册成功,先打个日志,随便吧自己的注册状态改成true
然后通知自己每隔15秒给这个B发送心跳信息
Master收到正常的信息后会更新心跳(超过4个心跳没收到会被标记为超时状态)
Master如果收到不正常的心跳,先会抱怨一句“你发你马呢”,打印一段日志,并且反向问候对方,确认对方健在。
private def handleRegisterResponse(msg: RegisterWorkerResponse): Unit = synchronized {
msg match {
case RegisteredWorker(masterRef, masterWebUiUrl) =>
logInfo("Successfully registered with master " + masterRef.address.toSparkURL)
// 已经注册过了
registered = true
// 更新 Master
changeMaster(masterRef, masterWebUiUrl)
// 通知自己给 Master 发送心跳信息 默认 1 分钟 4 次
forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
self.send(SendHeartbeat)
}
}, 0, HEARTBEAT_MILLIS, TimeUnit.MILLISECONDS)
if (CLEANUP_ENABLED) {
logInfo(
s"Worker cleanup enabled; old application directories will be deleted in: $workDir")
forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
self.send(WorkDirCleanup)
}
}, CLEANUP_INTERVAL_MILLIS, CLEANUP_INTERVAL_MILLIS, TimeUnit.MILLISECONDS)
}
val execs = executors.values.map { e =>
new ExecutorDescription(e.appId, e.execId, e.cores, e.state)
}
masterRef.send(WorkerLatestState(workerId, execs.toList, drivers.keys.toSeq))
case RegisterWorkerFailed(message) =>
if (!registered) {
logError("Worker registration failed: " + message)
System.exit(1)
}
case MasterInStandby =>
// Ignore. Master not yet ready.
}
}
原创不易,白嫖不好,各位的支持和认可,就是我创作的最大动力,我们下篇文章见!
本博客仅发布于CSDN—一个帅到不能再帅的人 Mr_kidBK。转载请标明出处。
https://blog.csdn.net/Mr_kidBK
点赞!收藏!转发!!!么么哒!
点赞!收藏!转发!!!么么哒!
点赞!收藏!转发!!!么么哒!
点赞!收藏!转发!!!么么哒!
点赞!收藏!转发!!!么么哒!
————————————————