本文主要是介绍storage包下面的类
sealed abstract class BlockId {
表示的是数据块的标识。具体子类有rddblockid / shuffle / broadcast / task / stream / temlocal/
temshuffle 等等
private[storage] class BlockInfo( val level: StorageLevel, val classTag: ClassTag[_], val tellMaster: Boolean) {
维护、跟踪一个数据块的元信息。
private[storage] class BlockInfoManager extends Logging {
本管理器同时维护了线程(任务ID: taskAttemptId)读、写快(blockid)的同步信息,blockInfo里面封装了的是读写锁访问信息。
private[spark] trait BlockData {
对数据块的抽象,抽象数据是如何存储的,以及提供了访问潜在数据的方法。
private[spark] class ByteBufferBlockData( val buffer: ChunkedByteBuffer, val shouldDispose: Boolean) extends BlockData {
BlockData的实现类,字节缓冲块数据。
private[spark] class BlockManager( executorId: String, rpcEnv: RpcEnv, val master: BlockManagerMaster, val serializerManager: SerializerManager, val conf: SparkConf, memoryManager: MemoryManager, mapOutputTracker: MapOutputTracker, shuffleManager: ShuffleManager, val blockTransferService: BlockTransferService, securityManager: SecurityManager, numUsableCores: Int) extends BlockDataManager with BlockEvictionHandler with Logging {
提供了接口用来存放和获取本地以及远端的数据块,存储的方式可以分为这样的三种:内存、磁盘、和off-heap。通过memoryStore/ diskStore 存储数据
class BlockManagerId private ( private var executorId_ : String, private var host_ : String, private var port_ : Int, private var topologyInfo_ : Option[String]) extends Externalizable {
代表的是BlockManager的唯一标识。
private[storage] class BlockManagerManagedBuffer( blockInfoManager: BlockInfoManager, blockId: BlockId, data: BlockData, dispose: Boolean) extends ManagedBuffer {
底层调用的依然是BlockData数据结构的方法,其实就是对BlockData进行了一次包裹。
class BlockManagerMaster( var driverEndpoint: RpcEndpointRef, conf: SparkConf, isDriver: Boolean) extends Logging {
本类其实里面的方法全是是通过RpcEndpointRef实现的,用来和driver端通信。涉及到了blockmanager,块,rdd,
广播变量,等等在driver端的状态。
private[spark] class BlockManagerMasterEndpoint( override val rpcEnv: RpcEnv, val isLocal: Boolean, conf: SparkConf, listenerBus: LiveListenerBus) extends ThreadSafeRpcEndpoint with Logging {
这个类只存在于driver端,就是对slave上的blockmanager进行管理
sealed trait ToBlockManagerSlave
这个是从master到从节点发送的消息
sealed trait ToBlockManagerMaster
这个是从slaves到master节点发送的消息
private[storage] class BlockManagerSlaveEndpoint( override val rpcEnv: RpcEnv, blockManager: BlockManager, mapOutputTracker: MapOutputTracker) extends ThreadSafeRpcEndpoint with Logging {
主要是接受master发来的消息,一般是删除块,备份块,获取块信息等等
private[spark] class BlockManagerSource(val blockManager: BlockManager) extends Source {
数据源,来源于blockmanager的数据源头信息
trait BlockReplicationPolicy {
备份策略,也就一个方法:prioritize ,返回的是按照先后顺序存放到的blockmanager
case class BlockUpdatedInfo(
blockManagerId: BlockManagerId,
blockId: BlockId,
storageLevel: StorageLevel,
memSize: Long,
diskSize: Long)
块更新信息
private[spark] class DiskBlockManager(conf: SparkConf, deleteFilesOnStop: Boolean)
extends Logging {
维护了逻辑块和在磁盘上物理数据的映射
private[spark] class DiskBlockObjectWriter(
将jvm对象写到磁盘上去,并且可以追加写
private[spark] class DiskStore(
将blockmanager里面的块存放到磁盘上去。
class RDDInfo( val id: Int, var name: String, val numPartitions: Int, var storageLevel: StorageLevel, val parentIds: Seq[Int], val callSite: String = "", val scope: Option[RDDOperationScope] = None) extends Ordered[RDDInfo] {
里面也就3个方法:isCached ,toString, compare
private[spark] final class ShuffleBlockFetcherIterator( context: TaskContext, shuffleClient: ShuffleClient, blockManager: BlockManager, blocksByAddress: Iterator[(BlockManagerId, Seq[(BlockId, Long)])], streamWrapper: (BlockId, InputStream) => InputStream, maxBytesInFlight: Long, maxReqsInFlight: Int, maxBlocksInFlightPerAddress: Int, maxReqSizeShuffleToMem: Long, detectCorrupt: Boolean) extends Iterator[(BlockId, InputStream)] with TempFileManager with Logging {抓取远端的数据块,存放在当前数据结构中的阻塞队列中。并从这个阻塞队列里面迭代出来。
case class FetchRequest(address: BlockManagerId, blocks: Seq[(BlockId, Long)]) { val size = blocks.map(_._2).sum }
抓取请求,都是远端的blockmanagerid和相应的块
private[storage] sealed trait FetchResult { val blockId: BlockId val address: BlockManagerId }
抓取的返回结果,有两个类
private[storage] case class SuccessFetchResult( blockId: BlockId, address: BlockManagerId, size: Long, buf: ManagedBuffer, isNetworkReqDone: Boolean) extends FetchResult { require(buf != null) require(size >= 0) }
抓取成功,返回数据
private[storage] case class FailureFetchResult( blockId: BlockId, address: BlockManagerId, e: Throwable) extends FetchResult
抓取失败,返回异常
private[spark] class StorageStatus( val blockManagerId: BlockManagerId, val maxMemory: Long, val maxOnHeapMem: Option[Long], val maxOffHeapMem: Option[Long]) {
里面存放了当前blockmanager对应的快状态信息,内存和磁盘使用情况