https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sparkenv.html#BlockManagerMaster
SparkEnv — Spark Runtime Environment
Spark Runtime Environment(SparkEnv) is the runtime environment with Spark’s public services that interact with each other to establish a distributed computing platform for a Spark application.
Spark Runtime Environment is represented by aSparkEnvobject that holds all the required runtime services for a running Spark application with separate environments for thedriverandexecutors.
The idiomatic way in Spark to access the currentSparkEnvwhen on the driver or executors is to usegetmethod.
importorg.apache.spark._scala>SparkEnv.getres0: org.apache.spark.SparkEnv= org.apache.spark.SparkEnv@49322d04
Table 1.SparkEnvServices
PropertyServiceDescription
rpcEnv
RpcEnv
serializer
Serializer
closureSerializer
Serializer
serializerManager
SerializerManager
mapOutputTracker
MapOutputTracker
shuffleManager
ShuffleManager
broadcastManager
BroadcastManager
blockManager
BlockManager
securityManager
SecurityManager
metricsSystem
MetricsSystem
memoryManager
MemoryManager
outputCommitCoordinator
OutputCommitCoordinator
Table 2. SparkEnv’s Internal Properties
NameInitial ValueDescription
isStopped
Disabled, i.e.false
Used to markSparkEnvstopped.FIXME
driverTmpDir
Tip
EnableINFOorDEBUGlogging level fororg.apache.spark.SparkEnvlogger to see what happens inside.
Add the following line toconf/log4j.properties:
log4j.logger.org.apache.spark.SparkEnv=DEBUG
Refer toLogging.
SparkEnvFactory Object
Creating "Base"SparkEnv—createMethod
create( conf:SparkConf, executorId:String, hostname:String, port:Int, isDriver:Boolean, isLocal:Boolean, numUsableCores:Int, listenerBus:LiveListenerBus=null, mockOutputCommitCoordinator:Option[OutputCommitCoordinator] =None):SparkEnv
createis a internal helper method to create a "base"SparkEnvregardless of the target environment, i.e. a driver or an executor.
Table 3.create's Input Arguments and Their Usage
Input ArgumentUsage
bindAddress
Used to createRpcEnvandNettyBlockTransferService.
advertiseAddress
Used to createRpcEnvandNettyBlockTransferService.
numUsableCores
Used to createMemoryManager,NettyBlockTransferServiceandBlockManager.
When executed,createcreates aSerializer(based onspark.serializersetting). You should see the followingDEBUGmessage in the logs:
DEBUG SparkEnv: Using serializer: [serializer]
It creates anotherSerializer(based onspark.closure.serializer).
It creates aShuffleManagerbased onspark.shuffle.managerSpark property.
It creates aMemoryManagerbased onspark.memory.useLegacyModesetting (withUnifiedMemoryManagerbeing the default andnumCoresthe inputnumUsableCores).
createcreates aNettyBlockTransferService. It usesspark.driver.blockManager.portfor the port on the driverandspark.blockManager.portfor the port on executors.
Caution
FIXMEA picture withSparkEnv,NettyBlockTransferServiceand the ports "armed".
createcreates aBlockManagerMasterobject with theBlockManagerMasterRPC endpoint reference (byregistering or looking it up by nameandBlockManagerMasterEndpoint), the inputSparkConf, and the inputisDriverflag.
Figure 1. Creating BlockManager for the Driver
Note
createregisters theBlockManagerMasterRPC endpoint for the driver and looks it up for executors.
Figure 2. Creating BlockManager for Executor
It creates aBlockManager(using the aboveBlockManagerMaster,NettyBlockTransferServiceand other services).
createcreates aBroadcastManager.
createcreates aMapOutputTrackerMasterorMapOutputTrackerWorkerfor the driver and executors, respectively.
Note
The choice of the real implementation ofMapOutputTrackeris based on whether the inputexecutorIdisdriveror not.
createregisters or looks upRpcEndpointasMapOutputTracker. It registersMapOutputTrackerMasterEndpointon the driver and creates a RPC endpoint reference on executors. The RPC endpoint reference gets assigned as theMapOutputTracker RPC endpoint.
Caution
FIXME
It creates a CacheManager.
It creates a MetricsSystem for a driver and a worker separately.
It initializesuserFilestemporary directory used for downloading dependencies for a driver while this is the executor’s current working directory for an executor.
An OutputCommitCoordinator is created.
Note
createis called bycreateDriverEnvandcreateExecutorEnv.
Registering or Looking up RPC Endpoint by Name —registerOrLookupEndpointMethod
registerOrLookupEndpoint(name:String, endpointCreator: =>RpcEndpoint)
registerOrLookupEndpointregisters or looks up a RPC endpoint byname.
If called from the driver, you should see the following INFO message in the logs:
INFO SparkEnv: Registering [name]
And the RPC endpoint is registered in the RPC environment.
Otherwise, it obtains a RPC endpoint reference byname.
Creating SparkEnv for Driver —createDriverEnvMethod
createDriverEnv( conf:SparkConf, isLocal:Boolean, listenerBus:LiveListenerBus, numCores:Int, mockOutputCommitCoordinator:Option[OutputCommitCoordinator] =None):SparkEnv
createDriverEnvcreates aSparkEnvexecution environment for the driver.
Figure 3. Spark Environment for driver
createDriverEnvaccepts an instance ofSparkConf,whether it runs in local mode or not,LiveListenerBus, the number of cores to use for execution in local mode or0otherwise, and aOutputCommitCoordinator(default: none).
createDriverEnvensures thatspark.driver.hostandspark.driver.portsettings are defined.
It then passes the call straight on to thecreate helper method(withdriverexecutor id,isDriverenabled, and the input parameters).
Note
createDriverEnvis exclusively used bySparkContext to create aSparkEnv(while aSparkContext is being created for the driver).
Creating SparkEnv for Executor —createExecutorEnvMethod
createExecutorEnv( conf:SparkConf, executorId:String, hostname:String, port:Int, numCores:Int, ioEncryptionKey:Option[Array[Byte]], isLocal:Boolean):SparkEnv
createExecutorEnvcreates anexecutor’s (execution) environmentthat is the Spark execution environment for an executor.
Figure 4. Spark Environment for executor
Note
createExecutorEnvis aprivate[spark]method.
createExecutorEnvsimplycreates the baseSparkEnv(passing in all the input parameters) andsets it as the currentSparkEnv.
Note
The number of coresnumCoresis configured using--corescommand-line option ofCoarseGrainedExecutorBackendand is specific to a cluster manager.
Note
createExecutorEnvis used whenCoarseGrainedExecutorBackendrunsandMesosExecutorBackendregisters a Spark executor.
Getting Current SparkEnv —getMethod
get:SparkEnv
getreturns the currentSparkEnv.
importorg.apache.spark._scala>SparkEnv.getres0: org.apache.spark.SparkEnv= org.apache.spark.SparkEnv@49322d04
Stopping SparkEnv —stopMethod
stop():Unit
stopchecksisStoppedinternal flag and does nothing when enabled.
Note
stopis aprivate[spark]method.
Otherwise,stopturnsisStoppedflag on, stops allpythonWorkersand requests the following services to stop:
MapOutputTracker
ShuffleManager
BroadcastManager
BlockManager
BlockManagerMaster
MetricsSystem
OutputCommitCoordinator
stoprequestsRpcEnvto shut downandwaits till it terminates.
Only on the driver,stopdeletes thetemporary directory. You can see the following WARN message in the logs if the deletion fails.
WARN Exception while deleting Spark temp dir: [path]
Note
stopis used whenSparkContextstops(on the driver) andExecutorstops.
Settings
Table 4. Spark Properties
Spark PropertyDefault ValueDescription
spark.serializer
org.apache.spark.serializer.JavaSerializer
Serializer
TIP: Enable DEBUG logging level fororg.apache.spark.SparkEnvlogger to see the current value.
` DEBUG SparkEnv: Using serializer: [serializer]`
spark.closure.serializer
org.apache.spark.serializer.JavaSerializer
Serializer
spark.memory.useLegacyMode
false
Controls what type of theMemoryManagerto use. When enabled (i.e.true) it is the legacyStaticMemoryManagerwhileUnifiedMemoryManagerotherwise (i.e.false).