MemoryStore负责将Block存储到内存。Spark通过将广播数据、RDD、Shuffle数据存储到内存,减少了对磁盘I/O的依赖,提高了程序的读写效率。
Block在内存中以什么形式存在呢?是将文件直接缓存到内存?Spark将内存中的Block抽象为特质MemoryEntry,其定义如下:
//org.apache.spark.storage.memory.MemoryStore
private sealed trait MemoryEntry[T] {
def size: Long
def memoryMode: MemoryMode
def classTag: ClassTag[T]
}
根据上面的代码,MemoryEntry提供了三个接口方法:
MemoryEntry有两个实现类,它们的实现如下:
//org.apache.spark.storage.memory.MemoryStore
private case class DeserializedMemoryEntry[T](
value: Array[T],
size: Long,
classTag: ClassTag[T]) extends MemoryEntry[T] {
val memoryMode: MemoryMode = MemoryMode.ON_HEAP
}
private case class SerializedMemoryEntry[T](
buffer: ChunkedByteBuffer,
memoryMode: MemoryMode,
classTag: ClassTag[T]) extends MemoryEntry[T] {
def size: Long = buffer.size
}
DeserializedMemoryEntry表示反序列化的MemoryEntry,而SerializedMemoryEntry表示序列化后的MemoryEntry。
下面来看看MemoryStore的属性:
//org.apache.spark.storage.memory.MemoryStore
private[spark] class MemoryStore(
conf: SparkConf,
blockInfoManager: BlockInfoManager,
serializerManager: SerializerManager,
memoryManager: MemoryManager,
blockEvictionHandler: BlockEvictionHandler)
extends Logging {
private val entries = new LinkedHashMap[BlockId, MemoryEntry[_]](32, 0.75f, true)
private val onHeapUnrollMemoryMap = mutable.HashMap[Long, Long]()
private val offHeapUnrollMemoryMap = mutable.HashMap[Long, Long]()
private val unrollMemoryThreshold: Long = conf.getLong("spark.storage.unrollMemoryThreshold", 1024 * 1024)
.
.
//org.apache.spark.storage.memory.MemoryStore
private[storage] trait BlockEvictionHandler {
private[storage] def dropFromMemory[T: ClassTag](
blockId: BlockId,
data: () => Either[Array[T], ChunkedByteBuffer]): StorageLevel
}
BlockManager实现了特质BlockEvictionHandler,并重写了dropFromMemory方法,BlockManager在构造MemoryStore时,将自身的引用作为blockEvictionHandler参数传递给MemoryStore的构造器,因而BlockEvictionHandler就是BlockManager。
//org.apache.spark.storage.BlockManager
private[spark] val memoryStore =
new MemoryStore(conf, blockInfoManager, serializerManager, memoryManager, this)
MemoryStore除了以上属性外,还有一些方法对MemoryStore的模型提供了概念上的描述:
基于这些成员的了解,下面来研究一下MemoryStore的内存模型。MemoryStore相比于MemoryManager,提供了一种宏观的内存模型,MemoryManager模型的堆内存和堆外内存在MemoryStore的内存模型中是透明的,UnifiedMemoryManager中存储内存与计算内存的“软”边界在MemoryStore的内存模型中也是透明的
从图中看出,整个MemoryStore的存储分为三块:一块是MemoryStore的entries属性持有的很多MemoryEntry所占据的内存BlocksMemoryUsed;一块是onHeapUnrollMemoryMap或offHeapUnrollMemoryMap中使用展开方式占用的内存currentUnrollMemory。展开Block的行为类似于人们生活中的“占座”,一间教室里有些座位有人,有些则穿着,在座位上放一本书表示有人正在使用,那么别人就不会坐。这样可以防止在向内存真正写入数据时,内存不足发生溢出。blocksMemoryUsed和currentUnrollMemory的空间之和是已经使用的空间,用memoryUsed表示。还有一块没有任何标记,表示未使用。
MemoryStore提供了很多方法,便于对Block数据的存储和读取。MemoryStore提供的方法如下:
2.1 getSize
用于获取BlockId对应MemoryEntry(即Block的内存形式)所占用的大小
//org.apache.spark.storage.memory.MemoryStore
def getSize(blockId: BlockId): Long = {
entries.synchronized {
entries.get(blockId).size
}
}
2.2 putBytes
将BlockId对应的Block(已经封装为ChunkedByteBuffer)写入内存
def putBytes[T: ClassTag](
blockId: BlockId,
size: Long,
memoryMode: MemoryMode,
_bytes: () => ChunkedByteBuffer): Boolean = {
require(!contains(blockId), s"Block $blockId is already present in the MemoryStore")
if (memoryManager.acquireStorageMemory(blockId, size, memoryMode)) {
// We acquired enough memory for the block, so go ahead and put it
val bytes = _bytes()
assert(bytes.size == size)
val entry = new SerializedMemoryEntry[T](bytes, memoryMode, implicitly[ClassTag[T]])
entries.synchronized {
entries.put(blockId, entry)
}
logInfo("Block %s stored as bytes in memory (estimated size %s, free %s)".format(
blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed)))
true
} else {
false
}
}
执行步骤如下:
2.3 reserveUnrollMemoryForThisTask
用于为展开尝试执行任务给定的Block保留指定内存模式上指定大小的内存
def reserveUnrollMemoryForThisTask(
blockId: BlockId,
memory: Long,
memoryMode: MemoryMode): Boolean = {
memoryManager.synchronized {
val success = memoryManager.acquireUnrollMemory(blockId, memory, memoryMode)
if (success) {
val taskAttemptId = currentTaskAttemptId()
val unrollMemoryMap = memoryMode match {
case MemoryMode.ON_HEAP => onHeapUnrollMemoryMap
case MemoryMode.OFF_HEAP => offHeapUnrollMemoryMap
}
unrollMemoryMap(taskAttemptId) = unrollMemoryMap.getOrElse(taskAttemptId, 0L) + memory
}
success
}
}
其步骤如下:
2.4 releaseUnrollMemoryForThisTask
用于释放任务尝试线程占用的内存
def releaseUnrollMemoryForThisTask(memoryMode: MemoryMode, memory: Long = Long.MaxValue): Unit = {
val taskAttemptId = currentTaskAttemptId()
memoryManager.synchronized {
val unrollMemoryMap = memoryMode match {
case MemoryMode.ON_HEAP => onHeapUnrollMemoryMap
case MemoryMode.OFF_HEAP => offHeapUnrollMemoryMap
}
if (unrollMemoryMap.contains(taskAttemptId)) {
val memoryToRelease = math.min(memory, unrollMemoryMap(taskAttemptId))//计算要释放的内存
if (memoryToRelease > 0) {//释放展开内存
unrollMemoryMap(taskAttemptId) -= memoryToRelease
memoryManager.releaseUnrollMemory(memoryToRelease, memoryMode)
}
if (unrollMemoryMap(taskAttemptId) == 0) {
unrollMemoryMap.remove(taskAttemptId)//清除taskAttemptId与展开内存大小之间的映射关系
}
}
}
2.4 putIteratorAsValues
此方法将BlockId对应的Block(已经转换为Iterator)写入内存。有时候放入内存的Block很大,所以一次性将此对象写入内存可能将引发OOM异常。为了避免这种情况的发生,首先需要 将Block转换为Iterator,然后渐进式地展开此Iterator,并且周期性地检查是否有足够的展开内存。此方法涉及很多变量,为了便于理解,这里先解释这些变量的含义,然后再分析方法实现。
private[storage] def putIteratorAsValues[T](
blockId: BlockId,
values: Iterator[T],
classTag: ClassTag[T]): Either[PartiallyUnrolledIterator[T], Long] = {
require(!contains(blockId), s"Block $blockId is already present in the MemoryStore")
var elementsUnrolled = 0
var keepUnrolling = true
val initialMemoryThreshold = unrollMemoryThreshold
val memoryCheckPeriod = 16
var memoryThreshold = initialMemoryThreshold
val memoryGrowthFactor = 1.5
var unrollMemoryUsedByThisBlock = 0L
var vector = new SizeTrackingVector[T]()(classTag)
keepUnrolling =
reserveUnrollMemoryForThisTask(blockId, initialMemoryThreshold, MemoryMode.ON_HEAP)
if (!keepUnrolling) {
logWarning(s"Failed to reserve initial memory threshold of " +
s"${Utils.bytesToString(initialMemoryThreshold)} for computing block $blockId in memory.")
} else {
unrollMemoryUsedByThisBlock += initialMemoryThreshold
}
//不断迭代读取Iterator中的数据,将数据放入追踪器vector中
while (values.hasNext && keepUnrolling) {
vector += values.next()
if (elementsUnrolled % memoryCheckPeriod == 0) {//周期性地检查
val currentSize = vector.estimateSize()
if (currentSize >= memoryThreshold) {
val amountToRequest = (currentSize * memoryGrowthFactor - memoryThreshold).toLong
keepUnrolling =
reserveUnrollMemoryForThisTask(blockId, amountToRequest, MemoryMode.ON_HEAP)
if (keepUnrolling) {
unrollMemoryUsedByThisBlock += amountToRequest
}
memoryThreshold += amountToRequest
}
}
elementsUnrolled += 1
}
if (keepUnrolling) {//申请到足够多的展开内存,将数据写入内存
val arrayValues = vector.toArray
vector = null
val entry =
new DeserializedMemoryEntry[T](arrayValues, SizeEstimator.estimate(arrayValues), classTag)
val size = entry.size
def transferUnrollToStorage(amount: Long): Unit = {//将展开Block的内存转换为存储Block的内存
memoryManager.synchronized {
releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, amount)
val success = memoryManager.acquireStorageMemory(blockId, amount, MemoryMode.ON_HEAP)
assert(success, "transferring unroll memory to storage memory failed")
}
}
val enoughStorageMemory = {
if (unrollMemoryUsedByThisBlock <= size) {
val acquiredExtra =
memoryManager.acquireStorageMemory(
blockId, size - unrollMemoryUsedByThisBlock, MemoryMode.ON_HEAP)
if (acquiredExtra) {
transferUnrollToStorage(unrollMemoryUsedByThisBlock)
}
acquiredExtra
} else {//当unrollMemoryUsedByThisBlock > size,归还多余的展开内存空间
val excessUnrollMemory = unrollMemoryUsedByThisBlock - size
releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP, excessUnrollMemory)
transferUnrollToStorage(size)
true
}
}
if (enoughStorageMemory) {
entries.synchronized {
entries.put(blockId, entry)
}
logInfo("Block %s stored as values in memory (estimated size %s, free %s)".format(
blockId, Utils.bytesToString(size), Utils.bytesToString(maxMemory - blocksMemoryUsed)))
Right(size)
} else {
assert(currentUnrollMemoryForThisTask >= unrollMemoryUsedByThisBlock,
"released too much unroll memory")
Left(new PartiallyUnrolledIterator(
this,
MemoryMode.ON_HEAP,
unrollMemoryUsedByThisBlock,
unrolled = arrayValues.toIterator,
rest = Iterator.empty))
}
} else {
logUnrollFailureMessage(blockId, vector.estimateSize())
Left(new PartiallyUnrolledIterator(
this,
MemoryMode.ON_HEAP,
unrollMemoryUsedByThisBlock,
unrolled = vector.iterator,
rest = values))
}
}
①将vector中的数据封装为DeserializedMemoryEntry,并重新估算vector的大小size
②如果unrollMemoryUsedByThisBlock小于或等于size,说明用于展开的内存过多,需要向MemoryManager归还多余的空间。归还的内存大小为unrollMemoryUsedByThisBlock - size。之后调用 transferUnrollToStorage方法将展开Block占用的内存转换为用于存储Block的内存,此转换过程是原子的。
③如果有足够的内存存储Block,则将BlockId与DeserializedMemoryEntry的映射关系放入entries并返回Right(size)
④如果没有足够的内存存储Block,则创建PartiallyUnrollidIterator并返回Letf
3)如果展开Iterator中所有的数据后,keepUnrolling为false,说明没有为Block申请到足够多的保留内存,此时将创建PartiallyUnrolledIterator并返回Left。
2.5 getBytes
从内存中读取BlockId对应的Block(已经封装为ChunkedByteBuffer)
def getBytes(blockId: BlockId): Option[ChunkedByteBuffer] = {
val entry = entries.synchronized { entries.get(blockId) }
entry match {
case null => None
case e: DeserializedMemoryEntry[_] =>
throw new IllegalArgumentException("should only call getBytes on serialized blocks")
case SerializedMemoryEntry(bytes, _, _) => Some(bytes)
}
}
getBytes只能获取序列化的Block
2.6 getValues
从内存中读取BlockId对应的Block(已经封装为Iterator)
def getValues(blockId: BlockId): Option[Iterator[_]] = {
val entry = entries.synchronized { entries.get(blockId) }
entry match {
case null => None
case e: SerializedMemoryEntry[_] =>
throw new IllegalArgumentException("should only call getValues on deserialized blocks")
case DeserializedMemoryEntry(values, _, _) =>
val x = Some(values)
x.map(_.iterator)
}
}
getValues只能获取没有序列化的Block
2.7 remove
从内存中移除BlockId对应的Block
def remove(blockId: BlockId): Boolean = memoryManager.synchronized {
val entry = entries.synchronized {
entries.remove(blockId)
}
if (entry != null) {
entry match {
case SerializedMemoryEntry(buffer, _, _) => buffer.dispose()
case _ =>
}
memoryManager.releaseStorageMemory(entry.size, entry.memoryMode)
logDebug(s"Block $blockId of size ${entry.size} dropped " +
s"from memory (free ${maxMemory - blocksMemoryUsed})")
true
} else {
false
}
}
2.8 evictBlocksToFreeSpace
用于驱逐Block,以便释放一些空间来存储新的Block。
private[spark] def evictBlocksToFreeSpace(
blockId: Option[BlockId],
space: Long,
memoryMode: MemoryMode): Long = {
assert(space > 0)
memoryManager.synchronized {
var freedMemory = 0L
val rddToAdd = blockId.flatMap(getRddId)
val selectedBlocks = new ArrayBuffer[BlockId]
def blockIsEvictable(blockId: BlockId, entry: MemoryEntry[_]): Boolean = {
entry.memoryMode == memoryMode && (rddToAdd.isEmpty || rddToAdd != getRddId(blockId))
}
entries.synchronized {
val iterator = entries.entrySet().iterator()
while (freedMemory < space && iterator.hasNext) {//选择符合驱逐条件的Block
val pair = iterator.next()
val blockId = pair.getKey
val entry = pair.getValue
if (blockIsEvictable(blockId, entry)) {
if (blockInfoManager.lockForWriting(blockId, blocking = false).isDefined) {
selectedBlocks += blockId
freedMemory += pair.getValue.size
}
}
}
}
def dropBlock[T](blockId: BlockId, entry: MemoryEntry[T]): Unit = {
val data = entry match {
case DeserializedMemoryEntry(values, _, _) => Left(values)
case SerializedMemoryEntry(buffer, _, _) => Right(buffer)
}
val newEffectiveStorageLevel =
blockEvictionHandler.dropFromMemory(blockId, () => data)(entry.classTag)
if (newEffectiveStorageLevel.isValid) {
blockInfoManager.unlock(blockId)
} else {
blockInfoManager.removeBlock(blockId)
}
}
if (freedMemory >= space) { //通过驱逐可以为存储Block提供足够的空间,则进行驱逐
logInfo(s"${selectedBlocks.size} blocks selected for dropping " +
s"(${Utils.bytesToString(freedMemory)} bytes)")
for (blockId <- selectedBlocks) {
val entry = entries.synchronized { entries.get(blockId) }
if (entry != null) {
dropBlock(blockId, entry)
}
}
logInfo(s"After dropping ${selectedBlocks.size} blocks, " +
s"free memory is ${Utils.bytesToString(maxMemory - blocksMemoryUsed)}")
freedMemory
} else {//通过驱逐不能为存储Block提供足够的空间,则释放原本准备要驱逐的各个Block的写锁
blockId.foreach { id =>
logInfo(s"Will not store $id")
}
selectedBlocks.foreach { id =>
blockInfoManager.unlock(id)//释放写锁
}
0L
}
}
}
evictBlocksToFreeSparce中定义了一些局部变量:
private def getRddId(blockId: BlockId): Option[Int] = {
blockId.asRDDId.map(_.rddId)
}
上述代码说明首先调用了BlockId的asRDDId方法,将BlockId转换为RDDBlockId,然后获取RDDBlockId的rddId属性
有了对变量的理解,现在来看看evictBlocksToFreeSpace的执行步骤:
①MemoryEntry的内存模式与所需的内存模式一致
②BlockId对应的Block不是RDD,或者BlockId与blockId不是同一个RDD
2.9 contains
用于判断本地MemoryStore中是否包含给定的BlockId所应对的Block文件
def contains(blockId: BlockId): Boolean = {
entries.synchronized { entries.containsKey(blockId) }
}