Spark源码分析之-deploy模块

Deploy模块整体架构

deploy模块主要包含3个子模块:masterworkerclient。他们继承于Actor,通过actor实现互相之间的通信。

  • Master:master的主要功能是接收worker的注册并管理所有的worker,接收client提交的application,(FIFO)调度等待的application并向worker提交。
  • Worker:worker的主要功能是向master注册自己,根据master发送的application配置进程环境,并启动StandaloneExecutorBackend
  • Client:client的主要功能是向master注册并监控application。当用户创建SparkContext时会实例化SparkDeploySchedulerBackend,而实例化SparkDeploySchedulerBackend的同时就会启动client,通过向client传递启动参数和application有关信息,client向master发送请求注册application并且在slave node上启动StandaloneExecutorBackend

下面来看一下deploy模块的类图:

Spark源码分析之-deploy模块_第1张图片

Deploy模块通信消息

Deploy模块并不复杂,代码也不多,主要集中在各个子模块之间的消息传递和处理上,因此在这里列出了各个模块之间传递的主要消息:

  • client to master

    1. RegisterApplication (向master注册application)
  • master to client

    1. RegisteredApplication (作为注册application的reply,回复给client)
    2. ExecutorAdded (通知client worker已经启动了Executor环境,当向worker发送LaunchExecutor后通知client)
    3. ExecutorUpdated (通知client Executor状态已经发生变化了,包括结束、异常退出等,当worker向master发送ExecutorStateChanged后通知client)
  • master to worker

    1. LaunchExecutor (发送消息启动Executor环境)
    2. RegisteredWorker (作为worker向master注册的reply)
    3. RegisterWorkerFailed (作为worker向master注册失败的reply)
    4. KillExecutor (发送给worker请求停止executor环境)
  • worker to master

    1. RegisterWorker (向master注册自己)
    2. Heartbeat (定期向master发送心跳信息)
    3. ExecutorStateChanged (向master发送Executor状态改变信息)

Deploy模块代码详解

Deploy模块相比于scheduler模块简单,因此对于deploy模块的代码并不做十分细节的分析,只针对application的提交和结束过程做一定的分析。

Client提交application

Client是由SparkDeploySchedulerBackend创建被启动的,因此client是被嵌入在每一个application中,只为这个applicator所服务,在client启动时首先会先master注册application:

  1. def start(){
  2. // Just launch an actor; it will call back into the listener.
  3. actor = actorSystem.actorOf(Props(newClientActor))
  4. }
  5. overridedef preStart(){
  6. logInfo("Connecting to master "+ masterUrl)
  7. try{
  8. master = context.actorFor(Master.toAkkaUrl(masterUrl))
  9. masterAddress = master.path.address
  10. master !RegisterApplication(appDescription)//向master注册application
  11. context.system.eventStream.subscribe(self, classOf[RemoteClientLifeCycleEvent])
  12. context.watch(master)// Doesn't work with remote actors, but useful for testing
  13. }catch{
  14. case e:Exception=>
  15. logError("Failed to connect to master", e)
  16. markDisconnected()
  17. context.stop(self)
  18. }
  19. }

Master在收到RegisterApplication请求后会把application加到等待队列中,等待调度:

  1. caseRegisterApplication(description)=>{
  2. logInfo("Registering app "+ description.name)
  3. val app = addApplication(description, sender)
  4. logInfo("Registered app "+ description.name +" with ID "+ app.id)
  5. waitingApps += app
  6. context.watch(sender)// This doesn't work with remote actors but helps for testing
  7. sender !RegisteredApplication(app.id)
  8. schedule()
  9. }

Master会在每次操作后调用schedule()函数,以确保等待的application能够被及时调度。

在前面提到deploy模块是资源管理模块,那么Spark的deploy管理的是什么资源,资源以什么单位进行调度的呢?在当前版本的Spark中,集群的cpu数量是Spark资源管理的一个标准,每个提交的application都会标明自己所需要的资源数(也就是cpu的core数),Master以FIFO的方式管理所有的application请求,当资源数量满足当前任务执行需求的时候该任务就会被调度,否则就继续等待,当然如果master能给予当前任务部分资源则也会启动该application。schedule()函数实现的就是此功能。

  1. def schedule(){
  2. if(spreadOutApps){
  3. for(app <- waitingApps if app.coresLeft >0){
  4. val usableWorkers = workers.toArray.filter(_.state ==WorkerState.ALIVE)
  5. .filter(canUse(app, _)).sortBy(_.coresFree).reverse
  6. val numUsable = usableWorkers.length
  7. val assigned =newArray[Int](numUsable)// Number of cores to give on each node
  8. var toAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)
  9. var pos =0
  10. while(toAssign >0){
  11. if(usableWorkers(pos).coresFree - assigned(pos)>0){
  12. toAssign -=1
  13. assigned(pos)+=1
  14. }
  15. pos =(pos +1)% numUsable
  16. }
  17. // Now that we've decided how many cores to give on each node, let's actually give them
  18. for(pos <-0until numUsable){
  19. if(assigned(pos)>0){
  20. val exec= app.addExecutor(usableWorkers(pos), assigned(pos))
  21. launchExecutor(usableWorkers(pos),exec, app.desc.sparkHome)
  22. app.state =ApplicationState.RUNNING
  23. }
  24. }
  25. }
  26. }else{
  27. // Pack each app into as few nodes as possible until we've assigned all its cores
  28. for(worker <- workers if worker.coresFree >0&& worker.state ==WorkerState.ALIVE){
  29. for(app <- waitingApps if app.coresLeft >0){
  30. if(canUse(app, worker)){
  31. val coresToUse = math.min(worker.coresFree, app.coresLeft)
  32. if(coresToUse >0){
  33. val exec= app.addExecutor(worker, coresToUse)
  34. launchExecutor(worker,exec, app.desc.sparkHome)
  35. app.state =ApplicationState.RUNNING
  36. }
  37. }
  38. }
  39. }
  40. }
  41. }

当application得到调度后就会调用launchExecutor()向worker发送请求,同时向client汇报状态:

  1. def launchExecutor(worker:WorkerInfo,exec:ExecutorInfo, sparkHome:String){
  2. worker.addExecutor(exec)
  3. worker.actor !LaunchExecutor(exec.application.id,exec.id,exec.application.desc,exec.cores,exec.memory, sparkHome)
  4. exec.application.driver !ExecutorAdded(exec.id, worker.id, worker.host,exec.cores,exec.memory)
  5. }

至此client与master的交互已经转向了master与worker的交互,worker需要配置application启动环境

  1. caseLaunchExecutor(appId, execId, appDesc, cores_, memory_, execSparkHome_)=>
  2. val manager =newExecutorRunner(
  3. appId, execId, appDesc, cores_, memory_,self, workerId, ip,newFile(execSparkHome_), workDir)
  4. executors(appId +"/"+ execId)= manager
  5. manager.start()
  6. coresUsed += cores_
  7. memoryUsed += memory_
  8. master !ExecutorStateChanged(appId, execId,ExecutorState.RUNNING,None,None)

Worker在接收到LaunchExecutor消息后创建ExecutorRunner实例,同时汇报master executor环境启动。

ExecutorRunner在启动的过程中会创建线程,配置环境,启动新进程:

  1. def start(){
  2. workerThread =newThread("ExecutorRunner for "+ fullId){
  3. overridedef run(){ fetchAndRunExecutor()}
  4. }
  5. workerThread.start()
  6. // Shutdown hook that kills actors on shutdown.
  7. ...
  8. }
  9. def fetchAndRunExecutor(){
  10. try{
  11. // Create the executor's working directory
  12. val executorDir =newFile(workDir, appId +"/"+ execId)
  13. if(!executorDir.mkdirs()){
  14. thrownewIOException("Failed to create directory "+ executorDir)
  15. }
  16. // Launch the process
  17. val command = buildCommandSeq()
  18. val builder =newProcessBuilder(command: _*).directory(executorDir)
  19. val env = builder.environment()
  20. for((key, value)<- appDesc.command.environment){
  21. env.put(key, value)
  22. }
  23. env.put("SPARK_MEM", memory.toString +"m")
  24. // In case we are running this from within the Spark Shell, avoid creating a "scala"
  25. // parent process for the executor command
  26. env.put("SPARK_LAUNCH_WITH_SCALA","0")
  27. process = builder.start()
  28. // Redirect its stdout and stderr to files
  29. redirectStream(process.getInputStream,newFile(executorDir,"stdout"))
  30. redirectStream(process.getErrorStream,newFile(executorDir,"stderr"))
  31. // Wait for it to exit; this is actually a bad thing if it happens, because we expect to run
  32. // long-lived processes only. However, in the future, we might restart the executor a few
  33. // times on the same machine.
  34. val exitCode = process.waitFor()
  35. val message ="Command exited with code "+ exitCode
  36. worker !ExecutorStateChanged(appId, execId,ExecutorState.FAILED,Some(message),
  37. Some(exitCode))
  38. }catch{
  39. case interrupted:InterruptedException=>
  40. logInfo("Runner thread for executor "+ fullId +" interrupted")
  41. case e:Exception=>{
  42. logError("Error running executor", e)
  43. if(process !=null){
  44. process.destroy()
  45. }
  46. val message = e.getClass +": "+ e.getMessage
  47. worker !ExecutorStateChanged(appId, execId,ExecutorState.FAILED,Some(message),None)
  48. }
  49. }
  50. }

ExecutorRunner启动后worker向master汇报ExecutorStateChanged,而master则将消息重新pack成为ExecutorUpdated发送给client。

至此整个application提交过程基本结束,提交的过程并不复杂,主要涉及到的消息的传递。

Application的结束

由于各种原因(包括正常结束,异常返回等)会造成application的结束,我们现在就来看看applicatoin结束的整个流程。

application的结束往往会造成client的结束,而client的结束会被master通过Actor检测到,master检测到后会调用removeApplication()函数进行操作:

  1. def removeApplication(app:ApplicationInfo){
  2. if(apps.contains(app)){
  3. logInfo("Removing app "+ app.id)
  4. apps -= app
  5. idToApp -= app.id
  6. actorToApp -= app.driver
  7. addressToWorker -= app.driver.path.address
  8. completedApps += app // Remember it in our history
  9. waitingApps -= app
  10. for(exec<- app.executors.values){
  11. exec.worker.removeExecutor(exec)
  12. exec.worker.actor !KillExecutor(exec.application.id,exec.id)
  13. }
  14. app.markFinished(ApplicationState.FINISHED)// TODO: Mark it as FAILED if it failed
  15. schedule()
  16. }
  17. }

removeApplicatoin()首先会将application从master自身所管理的数据结构中删除,其次它会通知每一个work,请求其KillExecutor。worker在收到KillExecutor后调用ExecutorRunnerkill()函数:

  1. caseKillExecutor(appId, execId)=>
  2. val fullId = appId +"/"+ execId
  3. executors.get(fullId) match {
  4. caseSome(executor)=>
  5. logInfo("Asked to kill executor "+ fullId)
  6. executor.kill()
  7. caseNone=>
  8. logInfo("Asked to kill unknown executor "+ fullId)
  9. }

ExecutorRunner内部,它会结束监控线程,同时结束监控线程所启动的进程,并且向worker汇报ExecutorStateChanged

  1. def kill(){
  2. if(workerThread !=null){
  3. workerThread.interrupt()
  4. workerThread =null
  5. if(process !=null){
  6. logInfo("Killing process!")
  7. process.destroy()
  8. process.waitFor()
  9. }
  10. worker !ExecutorStateChanged(appId, execId,ExecutorState.KILLED,None,None)
  11. Runtime.getRuntime.removeShutdownHook(shutdownHook)
  12. }
  13. }

Application结束的同时清理了master和worker上的关于该application的所有信息,这样关于application结束的整个流程就介绍完了,当然在这里我们对于许多异常处理分支没有细究,但这并不影响我们对主线的把握。

http://jerryshao.me/architecture/2013/04/30/Spark%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90%E4%B9%8B-deploy%E6%A8%A1%E5%9D%97/

你可能感兴趣的:(spark)