什么是DAG
DAG是有向无环图,它的功能是在Spark运行应用程序(Application)时,首先建立一个有向无环图(DAG),图上的每个节点是一个操作,而Spark的操作分为两类,一类是Transform,一类是Action。在应用程序执行过程中,只有遇到Action类的操作时,才会出发作业(Job)的提交。一个应用程序可以包含多个作业。在提交作业后,首先根据DAG计算这个作业包含哪些Stage,然后每个Stage分解成一些Task
SparkContext、SparkConf和SparkEnv
在实例化SparkContext的过程中,会实例化SparkEnv,为了实例化SparkEnv,Spark启动了多个环节,这从SparkEnv的构造函数中即可看到端倪
new SparkEnv( executorId, actorSystem, serializer, closureSerializer, cacheManager, mapOutputTracker, shuffleManager, broadcastManager, blockTransferService, blockManager, securityManager, httpFileServer, sparkFilesDir, metricsSystem, shuffleMemoryManager, conf
上面的每个变量都对应着Spark的某个方面,每个变量所属的类型如下:
class SparkEnv ( val executorId: String, val actorSystem: ActorSystem, val serializer: Serializer, val closureSerializer: Serializer, val cacheManager: CacheManager, val mapOutputTracker: MapOutputTracker, val shuffleManager: ShuffleManager, val broadcastManager: BroadcastManager, val blockTransferService: BlockTransferService, val blockManager: BlockManager, val securityManager: SecurityManager, val httpFileServer: HttpFileServer, val sparkFilesDir: String, val metricsSystem: MetricsSystem, val shuffleMemoryManager: ShuffleMemoryManager, val conf: SparkConf) extends Logging { ////方法体 }