Spark BlockManagerMaster

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sparkenv.html#BlockManagerMaster

SparkEnv — Spark Runtime Environment

Spark Runtime Environment(SparkEnv) is the runtime environment with Spark’s public services that interact with each other to establish a distributed computing platform for a Spark application.

Spark Runtime Environment is represented by aSparkEnvobject that holds all the required runtime services for a running Spark application with separate environments for thedriverandexecutors.

The idiomatic way in Spark to access the currentSparkEnvwhen on the driver or executors is to usegetmethod.

importorg.apache.spark._scala>SparkEnv.getres0: org.apache.spark.SparkEnv= org.apache.spark.SparkEnv@49322d04

Table 1.SparkEnvServices

PropertyServiceDescription

rpcEnv

RpcEnv

serializer

Serializer

closureSerializer

Serializer

serializerManager

SerializerManager

mapOutputTracker

MapOutputTracker

shuffleManager

ShuffleManager

broadcastManager

BroadcastManager

blockManager

BlockManager

securityManager

SecurityManager

metricsSystem

MetricsSystem

memoryManager

MemoryManager

outputCommitCoordinator

OutputCommitCoordinator

Table 2. SparkEnv’s Internal Properties

NameInitial ValueDescription

isStopped

Disabled, i.e.false

Used to markSparkEnvstopped.FIXME

driverTmpDir

Tip

EnableINFOorDEBUGlogging level fororg.apache.spark.SparkEnvlogger to see what happens inside.

Add the following line toconf/log4j.properties:

log4j.logger.org.apache.spark.SparkEnv=DEBUG

Refer toLogging.

SparkEnvFactory Object

Creating "Base"SparkEnv—createMethod

create(  conf:SparkConf,  executorId:String,  hostname:String,  port:Int,  isDriver:Boolean,  isLocal:Boolean,  numUsableCores:Int,  listenerBus:LiveListenerBus=null,  mockOutputCommitCoordinator:Option[OutputCommitCoordinator] =None):SparkEnv

createis a internal helper method to create a "base"SparkEnvregardless of the target environment, i.e. a driver or an executor.

Table 3.create's Input Arguments and Their Usage

Input ArgumentUsage

bindAddress

Used to createRpcEnvandNettyBlockTransferService.

advertiseAddress

Used to createRpcEnvandNettyBlockTransferService.

numUsableCores

Used to createMemoryManager,NettyBlockTransferServiceandBlockManager.

When executed,createcreates aSerializer(based onspark.serializersetting). You should see the followingDEBUGmessage in the logs:

DEBUG SparkEnv: Using serializer: [serializer]

It creates anotherSerializer(based onspark.closure.serializer).

It creates aShuffleManagerbased onspark.shuffle.managerSpark property.

It creates aMemoryManagerbased onspark.memory.useLegacyModesetting (withUnifiedMemoryManagerbeing the default andnumCoresthe inputnumUsableCores).

createcreates aNettyBlockTransferService. It usesspark.driver.blockManager.portfor the port on the driverandspark.blockManager.portfor the port on executors.

Caution

FIXMEA picture withSparkEnv,NettyBlockTransferServiceand the ports "armed".

createcreates aBlockManagerMasterobject with theBlockManagerMasterRPC endpoint reference (byregistering or looking it up by nameandBlockManagerMasterEndpoint), the inputSparkConf, and the inputisDriverflag.

Spark BlockManagerMaster_第1张图片

Figure 1. Creating BlockManager for the Driver

Note

createregisters theBlockManagerMasterRPC endpoint for the driver and looks it up for executors.

Spark BlockManagerMaster_第2张图片

Figure 2. Creating BlockManager for Executor

It creates aBlockManager(using the aboveBlockManagerMaster,NettyBlockTransferServiceand other services).

createcreates aBroadcastManager.

createcreates aMapOutputTrackerMasterorMapOutputTrackerWorkerfor the driver and executors, respectively.

Note

The choice of the real implementation ofMapOutputTrackeris based on whether the inputexecutorIdisdriveror not.

createregisters or looks upRpcEndpointasMapOutputTracker. It registersMapOutputTrackerMasterEndpointon the driver and creates a RPC endpoint reference on executors. The RPC endpoint reference gets assigned as theMapOutputTracker RPC endpoint.

Caution

FIXME

It creates a CacheManager.

It creates a MetricsSystem for a driver and a worker separately.

It initializesuserFilestemporary directory used for downloading dependencies for a driver while this is the executor’s current working directory for an executor.

An OutputCommitCoordinator is created.

Note

createis called bycreateDriverEnvandcreateExecutorEnv.

Registering or Looking up RPC Endpoint by Name —registerOrLookupEndpointMethod

registerOrLookupEndpoint(name:String, endpointCreator: =>RpcEndpoint)

registerOrLookupEndpointregisters or looks up a RPC endpoint byname.

If called from the driver, you should see the following INFO message in the logs:

INFO SparkEnv: Registering [name]

And the RPC endpoint is registered in the RPC environment.

Otherwise, it obtains a RPC endpoint reference byname.

Creating SparkEnv for Driver —createDriverEnvMethod

createDriverEnv(  conf:SparkConf,  isLocal:Boolean,  listenerBus:LiveListenerBus,  numCores:Int,  mockOutputCommitCoordinator:Option[OutputCommitCoordinator] =None):SparkEnv

createDriverEnvcreates aSparkEnvexecution environment for the driver.

Spark BlockManagerMaster_第3张图片

Figure 3. Spark Environment for driver

createDriverEnvaccepts an instance ofSparkConf,whether it runs in local mode or not,LiveListenerBus, the number of cores to use for execution in local mode or0otherwise, and aOutputCommitCoordinator(default: none).

createDriverEnvensures thatspark.driver.hostandspark.driver.portsettings are defined.

It then passes the call straight on to thecreate helper method(withdriverexecutor id,isDriverenabled, and the input parameters).

Note

createDriverEnvis exclusively used bySparkContext to create aSparkEnv(while aSparkContext is being created for the driver).

Creating SparkEnv for Executor —createExecutorEnvMethod

createExecutorEnv(  conf:SparkConf,  executorId:String,  hostname:String,  port:Int,  numCores:Int,  ioEncryptionKey:Option[Array[Byte]],  isLocal:Boolean):SparkEnv

createExecutorEnvcreates anexecutor’s (execution) environmentthat is the Spark execution environment for an executor.

Spark BlockManagerMaster_第4张图片

Figure 4. Spark Environment for executor

Note

createExecutorEnvis aprivate[spark]method.

createExecutorEnvsimplycreates the baseSparkEnv(passing in all the input parameters) andsets it as the currentSparkEnv.

Note

The number of coresnumCoresis configured using--corescommand-line option ofCoarseGrainedExecutorBackendand is specific to a cluster manager.

Note

createExecutorEnvis used whenCoarseGrainedExecutorBackendrunsandMesosExecutorBackendregisters a Spark executor.

Getting Current SparkEnv —getMethod

get:SparkEnv

getreturns the currentSparkEnv.

importorg.apache.spark._scala>SparkEnv.getres0: org.apache.spark.SparkEnv= org.apache.spark.SparkEnv@49322d04

Stopping SparkEnv —stopMethod

stop():Unit

stopchecksisStoppedinternal flag and does nothing when enabled.

Note

stopis aprivate[spark]method.

Otherwise,stopturnsisStoppedflag on, stops allpythonWorkersand requests the following services to stop:

MapOutputTracker

ShuffleManager

BroadcastManager

BlockManager

BlockManagerMaster

MetricsSystem

OutputCommitCoordinator

stoprequestsRpcEnvto shut downandwaits till it terminates.

Only on the driver,stopdeletes thetemporary directory. You can see the following WARN message in the logs if the deletion fails.

WARN Exception while deleting Spark temp dir: [path]

Note

stopis used whenSparkContextstops(on the driver) andExecutorstops.

Settings

Table 4. Spark Properties

Spark PropertyDefault ValueDescription

spark.serializer

org.apache.spark.serializer.JavaSerializer

Serializer

TIP: Enable DEBUG logging level fororg.apache.spark.SparkEnvlogger to see the current value.

` DEBUG SparkEnv: Using serializer: [serializer]`

spark.closure.serializer

org.apache.spark.serializer.JavaSerializer

Serializer

spark.memory.useLegacyMode

false

Controls what type of theMemoryManagerto use. When enabled (i.e.true) it is the legacyStaticMemoryManagerwhileUnifiedMemoryManagerotherwise (i.e.false).

你可能感兴趣的:(Spark BlockManagerMaster)