Storage模块主要分为两层:
1.通信层:storage模块采用的是master-slave结构来实现通信层,master和slave之间传输控制信息、状态信息,这些都是通过通信层来实现的。
2.存储层:storage模块需要把数据存储到disk或是memory上面,有可能还需replicate到远端,这都是由存储层来实现和提供相应接口。而其他模块若要和storage模块进行交互,storage模块提供了统一的操作类BlockManager,外部类与storage模块打交道都需要通过调用BlockManager相应接口来实现。
Storage模块通信层
在应用程序启动时,在SparkEnv阶段,会创建BlockManger实例,在Executor上创建ExecutorEnv并创建Executor上的BlockManger,在driver上driverEnv并同时创建driver上的BlockManger,其中Driver上的BlockManger(master), 其余executor上的为slave。BlockManagerMaster管理slave上的所有BlockManger,master和slave之间的通信通过BlockManagerMasterEndpoint来完成。每个executor上都会有BlockManger,BlockManger管理该节点上的block,总之BlockMnage跟Executor是一一对应关系,由Driver上的BlockManagerMaster来管理所有的BlockManager
BlockManagerMasterEndpoint
executor to client driver
/** Register the BlockManager's id with the driver. */ def registerBlockManager( blockManagerId: BlockManagerId, maxMemSize: Long, slaveEndpoint: RpcEndpointRef): Unit = { logInfo("Trying to register BlockManager") tell(RegisterBlockManager(blockManagerId, maxMemSize, slaveEndpoint)) logInfo("Registered BlockManager") } |
RegisterBlockManager (executor创建BlockManager以后向client driver发送请求注册自身) HeartBeat UpdateBlockInfo (更新block信息) GetPeers (请求获得其他BlockManager的id) GetLocations (获取block所在的BlockManager的id) GetLocationsMultipleBlockIds (获取一组block所在的BlockManager id)
client driver to client driver
/** Get locations of the blockId from the driver */ def getLocations(blockId: BlockId): Seq[BlockManagerId] = { driverEndpoint.askWithRetry[Seq[BlockManagerId]](GetLocations(blockId)) } /** Get locations of multiple blockIds from the driver */ def getLocations(blockIds: Array[BlockId]): IndexedSeq[Seq[BlockManagerId]] = { driverEndpoint.askWithRetry[IndexedSeq[Seq[BlockManagerId]]]( GetLocationsMultipleBlockIds(blockIds)) } |
GetLocations (获取block所在的BlockManager的id) GetLocationsMultipleBlockIds (获取一组block所在的BlockManager id) RemoveExecutor (删除所保存的已经死亡的executor上的BlockManager) StopBlockManagerMaster (停止client driver上的BlockManagerMasterActor)
BlockManagerSlaveEndpoint
client driver to executor
def removeBlock(blockId: BlockId) { driverEndpoint.askWithRetry[Boolean](RemoveBlock(blockId)) } /** Remove all blocks belonging to the given RDD. */ def removeRdd(rddId: Int, blocking: Boolean) { val future = driverEndpoint.askWithRetry[Future[Seq[Int]]](RemoveRdd(rddId)) future.onFailure { case e: Exception => logWarning(s"Failed to remove RDD $rddId - ${e.getMessage}", e) }(ThreadUtils.sameThread) if (blocking) { timeout.awaitResult(future) } } |
RemoveBlock (删除block) RemoveRdd (删除RDD)
通信层中涉及许多控制消息和状态消息的传递以及处理,这些细节可以直接查看源码,这里就不在一一罗列。下面就只简单介绍一下exeuctor端的BlockManager是如何启动以及向client driver发送注册请求完成注册。
Storage模块存储层
存储层整体架构图如下所示:
CacheManager:RDD在进行计算的时候,通过CacheManager来获取数据,并通过CacheManager来存取计算结果。
BlockManager:CacheManager在进行数据读取和存取的时候主要依赖BlockManager接口来操作,BlockManager决定数据是从内存(MemoryStore)还是从磁盘(diskstore)中获取。
MemoryStore:负责将数据保存在内存或从内存读取
DiskStore:负责将数据写入磁盘或从磁盘读入。
BlockManagerMaster:该模块只运行在Drvier Application所在的Executor上,功能是负责管理所有slave上的BlockManager。
ConnectionManager:负责与其他计算节点建立连接,并负责数据的发送和接收。
BlockManager对象被创建的时候会创建出MemoryStore和DiskStore对象用以存取block
DiskStore如何存取block
在DiskStore里面,每一个block都被存储为一个file,通过计算block id的hash值将block映射到文件中,block id与文件路径的映射关系如下所示:
private def createLocalDirs(conf: SparkConf): Array[File] = { Utils.getConfiguredLocalDirs(conf).flatMap { rootDir => try { val localDir = Utils.createDirectory(rootDir, "blockmgr") logInfo(s"Created local directory at $localDir") Some(localDir) } catch { case e: IOException => logError(s"Failed to create local dir in $rootDir. Ignoring this directory.", e) None } } } def createDirectory(root: String, namePrefix: String = "spark"): File = { var attempts = 0 val maxAttempts = MAX_DIR_CREATION_ATTEMPTS var dir: File = null while (dir == null) { attempts += 1 if (attempts > maxAttempts) { throw new IOException("Failed to create a temp directory (under " + root + ") after " + maxAttempts + " attempts!") } try { dir = new File(root, namePrefix + "-" + UUID.randomUUID.toString) if (dir.exists() || !dir.mkdirs()) { dir = null } } catch { case e: SecurityException => dir = null; } } dir.getCanonicalFile } def getFile(filename: String): File = { // Figure out which local directory it hashes to, and which subdirectory in that val hash = Utils.nonNegativeHash(filename) val dirId = hash % localDirs.length val subDirId = (hash / localDirs.length) % subDirsPerLocalDir // Create the subdirectory if it doesn't already exist val subDir = subDirs(dirId).synchronized { val old = subDirs(dirId)(subDirId) if (old != null) { old } else { val newDir = new File(localDirs(dirId), "%02x".format(subDirId)) if (!newDir.exists() && !newDir.mkdir()) { throw new IOException(s"Failed to create local dir in $newDir.") } subDirs(dirId)(subDirId) = newDir newDir } } new File(subDir, filename) } |
MemoryStore存取block
相对于DiskStore需要根据block id hash计算出文件路径并将block存放到对应的文件里面,MemoryStore管理block就显得非常简单:MemoryStore内部维护了一个hash map来管理所有的block,以block id为key将block存放到hash map中。
override def putBytes(blockId: BlockId, _bytes: ByteBuffer, level: StorageLevel): PutResult = { // Work on a duplicate - since the original input might be used elsewhere. val bytes = _bytes.duplicate() bytes.rewind() if (level.deserialized) { val values = blockManager.dataDeserialize(blockId, bytes) putIterator(blockId, values, level, returnValues = true) } else { val putAttempt = tryToPut(blockId, bytes, bytes.limit, deserialized = false) PutResult(bytes.limit(), Right(bytes.duplicate()), putAttempt.droppedBlocks) } } /** * Use `size` to test if there is enough space in MemoryStore. If so, create the ByteBuffer and * put it into MemoryStore. Otherwise, the ByteBuffer won't be created. * * The caller should guarantee that `size` is correct. */ def putBytes(blockId: BlockId, size: Long, _bytes: () => ByteBuffer): PutResult = { // Work on a duplicate - since the original input might be used elsewhere. lazy val bytes = _bytes().duplicate().rewind().asInstanceOf[ByteBuffer] val putAttempt = tryToPut(blockId, () => bytes, size, deserialized = false) val data = if (putAttempt.success) { assert(bytes.limit == size) Right(bytes.duplicate()) } else { null } PutResult(size, data, putAttempt.droppedBlocks) } |