Spark源码学习(5)——Storage

本文要解决的问题:
主要研究Spark的存储模块,通过阅读源码,对分布式存储有更深的理解。

BlockManager

Storagef模块主要分为两层:

1):负责向BlockManagerMaster上报block信息,master与slave之间的信息传递通过m-s的模式传递

2):数据层 负责存储和读取信息,主要在disk、memory、tachyon上

通常RDD的数据存放在分区中,而cache的数据一般都是block中。所以BlockManager管理着所有的Block。

BlockManeger的构造成员:

private[spark] class BlockManager(
    executorId: String,
    rpcEnv: RpcEnv,
    val master: BlockManagerMaster,
    serializerManager: SerializerManager,
    val conf: SparkConf,
    memoryManager: MemoryManager,
    mapOutputTracker: MapOutputTracker,
    shuffleManager: ShuffleManager,
    val blockTransferService: BlockTransferService,
    securityManager: SecurityManager,
    numUsableCores: Int)
  extends BlockDataManager with BlockEvictionHandler with Logging

初始化方法,有两个:1、注册BlockManagerMaster,2、启动BlockManagerWorker

 def initialize(appId: String): Unit = {
    blockTransferService.init(this)
    shuffleClient.init(appId)

    blockManagerId = BlockManagerId(
      executorId, blockTransferService.hostName, blockTransferService.port)

    shuffleServerId = if (externalShuffleServiceEnabled) {
      logInfo(s"external shuffle service port = $externalShuffleServicePort")
      BlockManagerId(executorId, blockTransferService.hostName, externalShuffleServicePort)
    } else {
      blockManagerId
    }

Register BlockManger

注册BlockManager就是发送一个单向消息给Master Actor。而这个消息参数则是一个RegisterBlockManager对象,实例化这个对象需要设置几个属性:id, maxMemSize, slaveActor

/** Register the BlockManager's id with the driver. */
  def registerBlockManager(
      blockManagerId: BlockManagerId, maxMemSize: Long, slaveEndpoint: RpcEndpointRef): Unit = {
    logInfo("Trying to register BlockManager")
    tell(RegisterBlockManager(blockManagerId, maxMemSize, slaveEndpoint))
    logInfo("Registered BlockManager")
  }

这里需要查看tell方法。源码如下:

 /** Send a one-way message to the master endpoint, to which we expect it to reply with true. */
 //发送一个单向消息给Master actor
  private def tell(message: Any) {
    if (!driverEndpoint.askWithRetry[Boolean](message)) {
      throw new SparkException("BlockManagerMasterEndpoint returned false, expected true.")
    }
  }

askWithRetry方法,源码如下:

def askWithRetry[T: ClassTag](message: Any, timeout: RpcTimeout): T = {
    // TODO: Consider removing multiple attempts
    var attempts = 0
    var lastException: Exception = null
    while (attempts < maxRetries) {
      attempts += 1
      try {
        val future = ask[T](message, timeout)
        val result = timeout.awaitResult(future)
        if (result == null) {
          throw new SparkException("RpcEndpoint returned null")
        }
        return result
      } catch {
        case ie: InterruptedException => throw ie
        case e: Exception =>
          lastException = e
          logWarning(s"Error sending message [message = $message] in $attempts attempts", e)
      }

      if (attempts < maxRetries) {
        Thread.sleep(retryWaitMs)
      }
    }

    throw new SparkException(
      s"Error sending message [message = $message]", lastException)
  }

}

Stroage

Spark在存储上主要提供了两种方案:
一种是memoryStore,另一种是diskStore。

 private[spark] val memoryStore =
    new MemoryStore(conf, blockInfoManager, serializerManager, memoryManager, this)
  private[spark] val diskStore = new DiskStore(conf, diskBlockManager)
  memoryManager.setMemoryStore(memoryStore)

MemoryStore

基于内存的存储有两种一种是基于反序列化java对象数组和序列化的ByteBuffers。在MemoyStore中维护了一个LinkedHashMap对象,它是以blockID和MemoryEntry的K/V存储。

private valentries = new LinkedHashMap[BlockId, MemoryEntry](32, 0.75f, true)

存储方法主要包括两种putBytes、putArray。在具体存储的时候需要根据StorageLevel的序列化属性deserialized,对数据做不同的操作。当内存不够的时候,默认是写磁盘的。

def putBytes[T: ClassTag](
      blockId: BlockId,
      size: Long,
      memoryMode: MemoryMode,
      _bytes: () => ChunkedByteBuffer): Boolean = {
    require(!contains(blockId), s"Block $blockId is already present in the MemoryStore")
    if (memoryManager.acquireStorageMemory(blockId, size, memoryMode)) {
      // We acquired enough memory for the block, so go ahead and put it
      val bytes = _bytes()
      assert(bytes.size == size)
      val entry = new SerializedMemoryEntry[T](bytes, memoryMode, implicitly[ClassTag[T]])
      entries.synchronized {
        entries.put(blockId, entry)
      }
      logInfo("Block %s stored as bytes in memory (estimated size %s, free %s)".format(
        blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed)))
      true
    } else {
      false
    }
  }

再看一下doPutIterator方法
将Block存储在内存中,如果内存不够,则下滑至磁盘。

private def doPutIterator[T](
      blockId: BlockId,
      iterator: () => Iterator[T],
      level: StorageLevel,
      classTag: ClassTag[T],
      tellMaster: Boolean = true,
      keepReadLock: Boolean = false): Option[PartiallyUnrolledIterator[T]] = {
    doPut(blockId, level, classTag, tellMaster = tellMaster, keepReadLock = keepReadLock) { info =>
      val startTimeMs = System.currentTimeMillis
      var iteratorFromFailedMemoryStorePut: Option[PartiallyUnrolledIterator[T]] = None
      // Size of the block in bytes
      var size = 0L
      if (level.useMemory) {
        // Put it in memory first, even if it also has useDisk set to true;
        // We will drop it to disk later if the memory store can't hold it.
        if (level.deserialized) {
          memoryStore.putIteratorAsValues(blockId, iterator(), classTag) match {
            case Right(s) =>
              size = s
            case Left(iter) =>
              // Not enough space to unroll this block; drop to disk if applicable
              if (level.useDisk) {
                logWarning(s"Persisting block $blockId to disk instead.")
                diskStore.put(blockId) { fileOutputStream =>
                  serializerManager.dataSerializeStream(blockId, fileOutputStream, iter)(classTag)
                }
                size = diskStore.getSize(blockId)
              } else {
                iteratorFromFailedMemoryStorePut = Some(iter)
              }
          }
        } else { // !level.deserialized
          memoryStore.putIteratorAsBytes(blockId, iterator(), classTag, level.memoryMode) match {
            case Right(s) =>
              size = s
            case Left(partiallySerializedValues) =>
              // Not enough space to unroll this block; drop to disk if applicable
              if (level.useDisk) {
                logWarning(s"Persisting block $blockId to disk instead.")
                diskStore.put(blockId) { fileOutputStream =>
                  partiallySerializedValues.finishWritingToStream(fileOutputStream)
                }
                size = diskStore.getSize(blockId)
              } else {
                iteratorFromFailedMemoryStorePut = Some(partiallySerializedValues.valuesIterator)
              }
          }
        }

      } else if (level.useDisk) {
        diskStore.put(blockId) { fileOutputStream =>
          serializerManager.dataSerializeStream(blockId, fileOutputStream, iterator())(classTag)
        }
        size = diskStore.getSize(blockId)
      }

      val putBlockStatus = getCurrentBlockStatus(blockId, info)
      val blockWasSuccessfullyStored = putBlockStatus.storageLevel.isValid
      if (blockWasSuccessfullyStored) {
        // Now that the block is in either the memory, externalBlockStore, or disk store,
        // tell the master about it.
        info.size = size
        if (tellMaster) {
          reportBlockStatus(blockId, info, putBlockStatus)
        }
        Option(TaskContext.get()).foreach { c =>
          c.taskMetrics().incUpdatedBlockStatuses(blockId -> putBlockStatus)
        }
        logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs)))
        if (level.replication > 1) {
          val remoteStartTime = System.currentTimeMillis
          val bytesToReplicate = doGetLocalBytes(blockId, info)
          try {
            replicate(blockId, bytesToReplicate, level, classTag)
          } finally {
            bytesToReplicate.dispose()
          }
          logDebug("Put block %s remotely took %s"
            .format(blockId, Utils.getUsedTimeMs(remoteStartTime)))
        }
      }
      assert(blockWasSuccessfullyStored == iteratorFromFailedMemoryStorePut.isEmpty)
      iteratorFromFailedMemoryStorePut
    }
  }

这里不再一一介绍方法。

RDD API

在Shuffle过程中,为保持数据容错或者结构数据再次利用。RDD提供了cache、persist来存储数据,在源码中可以看出其实cache就是调用了persist。

StorageLevel

StotageLevel标志着Spark Storage的存储级别。它的存储介质主要包括Disk、Memory、OffHeap。另外还有deserialized标志数据序列化操作和replication副本数。在源码中默认为1个。下面从源码中阅读StorageLevel。

CheckPoint

一般在程序运行比较长或者计算量大的情况下,需要进行CheckPoint。这样可以避免在运行中出现异常导致代价过大的问题。CheckPoint会把数据写在本地磁盘上。在进行checkpoint前的RDD数据需要进行cache。因为checkpoint的时候会移除它的所有父节点信息,那麽在第二次加载的时候就不需要重新从磁盘加载数据。

存储方面还有很多欠缺,还需要后续进一步仔细学习。

你可能感兴趣的:(源码,spark,数据,存储,分布式存储)