spark源码解读

  • SparkContext初始化操作

  1. private val creationSite: CallSite = Utils.getCallSite()
  2. private[spark] val listenerBus = new LiveListenerBus(this)
  3. addedFiles/addedJars:用于存储每个静态文件/ jar的URL以及文件的本地时间戳
  4. val sparkUser = Utils.getCurrentUserName()
  5. try{
    1. _conf
    2. _jars/_files/_eventLogDir/_eventLogCodec
    3. _jobProgressListener = new JobProgressListener(_conf)
    4.  _env = createSparkEnv(_conf, isLocal, listenerBus)->SparkEnv.set(_env)
    5. _statusTracker = new SparkStatusTracker(this)
    6. _progressBar/_ui/_hadoopConfiguration/_executorMemory/executorEnvs
    7. _heartbeatReceiver = env.rpcEnv.setupEndpoint(HeartbeatReceiver.ENDPOINT_NAME, new HeartbeatReceiver(this))
    8. val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)
    9. _schedulerBackend /_taskScheduler /_dagScheduler
    10.  _heartbeatReceiver.ask[Boolean](TaskSchedulerIsSet)
    11. _taskScheduler.start()
    12.  _env.blockManager.initialize(_applicationId)
    13. _env.metricsSystem.start()
    14. _executorAllocationManager.foreach(_.start())
    15. setupAndStartListenerBus()
    16. postEnvironmentUpdate()
    17. postApplicationStart()
    18. _taskScheduler.postStartHook()
    19. _env.metricsSystem.registerSource
    20. _shutdownHookRef = ShutdownHookManager.addShutdownHook()

 

    }catch{

       ……

    }

 

 

  • RDDOperationScope解读

0. withScope 简述:https://www.jianshu.com/p/8a3958337aea

1. 源码解读:https://blog.csdn.net/qq_21383435/article/details/79666170

2. 其他说明:http://www.mamicode.com/info-detail-1066067.html

 

  • coalesce()方法和repartition()方法

https://blog.csdn.net/lzq20115395/article/details/80602071

 

你可能感兴趣的:(spark)