Spark是现在很流行的一个基于内存的分布式计算框架,既然是基于内存,那么自然而然的,内存的管理就是Spark存储管理的重中之重了。那么,Spark究竟采用什么样的内存管理模型呢?本文就为大家揭开Spark内存管理模型的神秘面纱。
我们在《Spark源码分析之七:Task运行(一)》一文中曾经提到过,在Task被传递到Executor上去执行时,在为其分配的TaskRunner线程的run()方法内,在Task真正运行之前,我们就要构造一个任务内存管理器TaskMemoryManager,然后在反序列化Task对象的二进制数据得到Task对象后,需要将这个内存管理器TaskMemoryManager设置为Task的成员变量。那么,究竟TaskMemoryManager是如何被创建的呢?我们先看下TaskRunner线程的run()方法中涉及到的代码:
// 获取任务内存管理器
val taskMemoryManager = new TaskMemoryManager(env.memoryManager, taskId)
taskId好说,它就是Task的唯一标识ID,那么env.memoryManager呢,我们来看下SparkEnv的相关代码,如下所示:
// 根据参数spark.memory.useLegacyMode确定使用哪种内存管理模型
val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)
val memoryManager: MemoryManager =
if (useLegacyMemoryManager) {// 如果还是采用之前的方式,则使用StaticMemoryManager内存管理模型,即静态内存管理模型
new StaticMemoryManager(conf, numUsableCores)
} else {// 否则,使用最新的UnifiedMemoryManager内存管理模型,即统一内存管理模型
UnifiedMemoryManager(conf, numUsableCores)
}
SparkEnv在构造过程中,会根据参数spark.memory.useLegacyMode来确定是否使用之前的内存管理模型,默认不采用之前的。如果是采用之前的,则memoryManager被实例化为一个StaticMemoryManager对象,否则采用新的内存管理模型,memoryManager被实例化为一个UnifiedMemoryManager内存对象。
我们知道,英文单词static代表静态、不变的意思,unified代表统一的意思。从字面意思来看,StaticMemoryManager表示是静态的内存管理器,何谓静态,就是按照某种算法确定内存的分配后,其整体分布不会随便改变,而UnifiedMemoryManager代表的是统一的内存管理器,统一么,是不是有共享和变动的意思。那么我们不妨大胆猜测下,StaticMemoryManager这种内存管理模型是在内存分配之初,即确定各区域内存的大小,并在Task运行过程中保持不变,而UnifiedMemoryManager则会根据Task运行过程中各区域数据对内存需要的程序进行动态调整。到底是不是这样呢?只有看过源码才能知晓。首先,我们看下StaticMemoryManager,通过SparkEnv中对其初始化的语句我们知道,它的初始化调用的是StaticMemoryManager的带有参数SparkConf类型的conf、Int类型的numCores的构造函数,代码如下:
def this(conf: SparkConf, numCores: Int) {
// 调用最底层的构造方法
this(
conf,
// Execution区域(即运行区域,为shuffle使用)分配的可用内存总大小
StaticMemoryManager.getMaxExecutionMemory(conf),
// storage区域(即存储区域)分配的可用内存总大小
StaticMemoryManager.getMaxStorageMemory(conf),
numCores)
}
其中,在调用最底层的构造方法之前,调用了伴生对象StaticMemoryManager的两个方法,分别是获取Execution区域(即运行区域,为shuffle使用)分配的可用内存总大小的getMaxExecutionMemory()和获取storage区域(即存储区域)分配的可用内存总大小的getMaxStorageMemory()方法。
何为Storage区域?何为Execution区域?这里,我们简单解释下,Storage就是存储的意思,它存储的是Task的运行结果等数据,当然前提是其运行结果比较小,足以在内存中盛放。那么Execution呢?执行的意思,通过参数中的shuffle,我猜测它实际上是shuffle过程中需要使用的内存数据(因为还未分析shuffle,这里只是猜测,猜错勿怪,还望读者指正)。
我们接着看下这两个方法,代码如下:
/**
* Return the total amount of memory available for the storage region, in bytes.
* 返回为storage区域(即存储区域)分配的可用内存总大小,单位为bytes
*/
private def getMaxStorageMemory(conf: SparkConf): Long = {
// 系统可用最大内存,取参数spark.testing.memory,未配置的话取运行时环境中的最大内存
val systemMaxMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
// 取storage区域(即存储区域)在总内存中所占比重,由参数spark.storage.memoryFraction确定,默认为0.6
val memoryFraction = conf.getDouble("spark.storage.memoryFraction", 0.6)
// 取storage区域(即存储区域)在系统为其可分配最大内存的安全系数,主要为了防止OOM,取参数spark.storage.safetyFraction,默认为0.9
val safetyFraction = conf.getDouble("spark.storage.safetyFraction", 0.9)
// 返回storage区域(即存储区域)分配的可用内存总大小,计算公式:系统可用最大内存 * 在系统可用最大内存中所占比重 * 安全系数
(systemMaxMemory * memoryFraction * safetyFraction).toLong
}
/**
* Return the total amount of memory available for the execution region, in bytes.
* 返回为Execution区域(即运行区域,为shuffle使用)分配的可用内存总大小,单位为bytes
*/
private def getMaxExecutionMemory(conf: SparkConf): Long = {
// 系统可用最大内存,取参数spark.testing.memory,未配置的话取运行时环境中的最大内存
val systemMaxMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
// 取Execution区域(即运行区域,为shuffle使用)在总内存中所占比重,由参数spark.shuffle.memoryFraction确定,默认为0.2
val memoryFraction = conf.getDouble("spark.shuffle.memoryFraction", 0.2)
// 取Execution区域(即运行区域,为shuffle使用)在系统为其可分配最大内存的安全系数,主要为了防止OOM,取参数spark.shuffle.safetyFraction,默认为0.8
val safetyFraction = conf.getDouble("spark.shuffle.safetyFraction", 0.8)
// 返回为Execution区域(即运行区域,为shuffle使用)分配的可用内存总大小,计算公式:系统可用最大内存 * 在系统可用最大内存中所占比重 * 安全系数
(systemMaxMemory * memoryFraction * safetyFraction).toLong
}
通过上述代码,我们可以看到,获取两个内存区域大小的方法是及其相似的,我们就以getMaxStorageMemory()方法为例,来详细说明。
首先,需要获得系统可用最大内存systemMaxMemory,取参数spark.testing.memory,未配置的话取运行时环境中的最大内存;
然后,需要获取取storage区域(即存储区域)在总内存中所占比重memoryFraction,由参数spark.storage.memoryFraction确定,默认为0.6;
接着,需要获取storage区域(即存储区域)在系统为其可分配最大内存的安全系数safetyFraction,主要为了防止OOM,取参数spark.storage.safetyFraction,默认为0.9;
最后,利用公式systemMaxMemory * memoryFraction * safetyFraction来计算出storage区域(即存储区域)分配的可用内存总大小。
前面几步都好说,默认情况下,storage区域(即存储区域)分配的可用内存总大小占系统可用内存大小的60%,那么最后为什么需要一个安全系数safetyFraction呢?设身处地的想一下,系统一开始分配它最大可用内存的60%给你,你上来就一下用完了,那么再有内存需求呢?是不是此时很容易就发生OOM呢?安全系统正式基于这个原因才设定的,也就是说,默认情况下,刚开始storage区域(即存储区域)分配的可用内存总大小占系统可用内存大小的54%,而不是60%。
getMaxExecutionMemory()方法与getMaxStorageMemory()方法处理逻辑一样,只不过取得参数不同罢了,默认情况下占系统可用最大内存的20%,而安全系数则是80%,故默认情况下,刚开始Execution区域(即运行区域,为shuffle使用)分配的可用内存总大小占系统可用内存大小的16%。
等等,好像少点什么?60%+20%=80%,不是100%!这是为什么呢?很简单,程序或者系统本身的运行,也是需要消耗内存的嘛!
而StaticMemoryManager最底层的构造方法,也就是scala语言语法中类定义的部分,则是:
private[spark] class StaticMemoryManager(
conf: SparkConf,
maxOnHeapExecutionMemory: Long,
override val maxStorageMemory: Long,
numCores: Int)
extends MemoryManager(
conf,
numCores,
maxStorageMemory,
maxOnHeapExecutionMemory) {
我们看到,StaticMemoryManager静态内存管理器则持有了参数SparkConf类型的conf、Execution内存大小maxOnHeapExecutionMemory、Storage内存大小maxStorageMemory、CPU核数numCores等成员变量。
至此,StaticMemoryManager对象就初始化完毕。现在我们总结一下静态内存管理模型的特点,这种模型最大的一个缺点就是每种区域不能超过参数为其配置的最大值,即便是一种区域的内存很繁忙,而另外一种很空闲,也不能超过上限占用更多的内存,即使是总数未超过规定的阈值。那么,随之而来的一种解决方案便是UnifiedMemoryManager,统一的内存管理模型。
接下来,我们再看下UnifiedMemoryManager,即统一内存管理器。在SparkEnv中,它是通过如下方式完成初始化的:
UnifiedMemoryManager(conf, numUsableCores)
读者这里可能有疑问了,为什么没有new关键字呢?这正是scala语言的特点。它其实是通过UnifiedMemoryManager类的apply()方法完成初始化的。代码如下:
def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {
// 获得execution和storage区域共享的最大内存
val maxMemory = getMaxMemory(conf)
// 构造UnifiedMemoryManager对象,
new UnifiedMemoryManager(
conf,
maxMemory = maxMemory,
// storage区域内存大小初始为execution和storage区域共享的最大内存的spark.memory.storageFraction,默认为0.5,即一半
storageRegionSize =
(maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong,
numCores = numCores)
}
首先,需要获得execution和storage区域共享的最大内存maxMemory;
然后,构造UnifiedMemoryManager对象,而storage区域内存大小storageRegionSize则初始化为execution和storage区域共享的最大内存maxMemory的spark.memory.storageFraction,默认为0.5,即一半。
下面,我们主要看下获得execution和storage区域共享的最大内存的getMaxMemory()方法。代码如下:
/**
* Return the total amount of memory shared between execution and storage, in bytes.
* 返回execution和storage区域共享的最大内存
*/
private def getMaxMemory(conf: SparkConf): Long = {
// 获取系统可用最大内存systemMemory,取参数spark.testing.memory,未配置的话取运行时环境中的最大内存
val systemMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
// 获取预留内存reservedMemory,取参数spark.testing.reservedMemory,
// 未配置的话,根据参数spark.testing来确定默认值,参数spark.testing存在的话,默认为0,否则默认为300M
val reservedMemory = conf.getLong("spark.testing.reservedMemory",
if (conf.contains("spark.testing")) 0 else RESERVED_SYSTEM_MEMORY_BYTES)
// 取最小的系统内存minSystemMemory,为预留内存reservedMemory的1.5倍
val minSystemMemory = reservedMemory * 1.5
// 如果系统可用最大内存systemMemory小于最小的系统内存minSystemMemory,即预留内存reservedMemory的1.5倍的话,抛出异常
// 提醒用户调大JVM堆大小
if (systemMemory < minSystemMemory) {
throw new IllegalArgumentException(s"System memory $systemMemory must " +
s"be at least $minSystemMemory. Please use a larger heap size.")
}
// 计算可用内存usableMemory,即系统最大可用内存systemMemory减去预留内存reservedMemory
val usableMemory = systemMemory - reservedMemory
// 取可用内存所占比重,即参数spark.memory.fraction,默认为0.75
val memoryFraction = conf.getDouble("spark.memory.fraction", 0.75)
// 返回的execution和storage区域共享的最大内存为usableMemory * memoryFraction
(usableMemory * memoryFraction).toLong
}
处理流程大体如下:
1、获取系统可用最大内存systemMemory,取参数spark.testing.memory,未配置的话取运行时环境中的最大内存;
2、获取预留内存reservedMemory,取参数spark.testing.reservedMemory,未配置的话,根据参数spark.testing来确定默认值,参数spark.testing存在的话,默认为0,否则默认为300M;
3、取最小的系统内存minSystemMemory,为预留内存reservedMemory的1.5倍;
4、如果系统可用最大内存systemMemory小于最小的系统内存minSystemMemory,即预留内存reservedMemory的1.5倍的话,抛出异常,提醒用户调大JVM堆大小;
5、计算可用内存usableMemory,即系统最大可用内存systemMemory减去预留内存reservedMemory;
6、取可用内存所占比重,即参数spark.memory.fraction,默认为0.75;
7、返回的execution和storage区域共享的最大内存为usableMemory * memoryFraction。
也就是说,UnifiedMemoryManager统一内存存储管理策略中,默认情况下,storage区域和execution区域默认都占其共享内存区域的一半,而execution和storage区域共享的最大内存为系统最大可用内存systemMemory减去预留内存reservedMemory后的75%。至于在哪里体现的动态调整,则要到真正申请内存时再体现了。
好了,UnifiedMemoryManager统一内存存储管理器的初始化也讲完了。那么,接下来的问题则是,何时以及如何进行内存的申请及分配?针对storage和execution,我们一个个的看。
首先看看storage,顾名思义,storage是存储的意思,也就是说是在Task运行完成出结果后,对结果的存储区域。我们回顾下博文《Spark源码分析之七:Task运行(一)》中所讲的Task运行完成后对Task运行结果的处理,如果 Task运行结果大小超过Akka除去需要保留的字节外最大大小,则将结果写入BlockManager,那么是如何写入的呢?代码如下:
env.blockManager.putBytes(
blockId, serializedDirectResult, StorageLevel.MEMORY_AND_DISK_SER)
调用的是BlockManager的putBytes()方法,很显然,写入的是二进制Bytes数据,且使用的存储策略是MEMORY_AND_DISK_SER。我们先看下这个方法:
/**
* Put a new block of serialized bytes to the block manager.
* Return a list of blocks updated as a result of this put.
*/
def putBytes(
blockId: BlockId,
bytes: ByteBuffer,
level: StorageLevel,
tellMaster: Boolean = true,
effectiveStorageLevel: Option[StorageLevel] = None): Seq[(BlockId, BlockStatus)] = {
require(bytes != null, "Bytes is null")
doPut(blockId, ByteBufferValues(bytes), level, tellMaster, effectiveStorageLevel)
}
调用的是doPut()方法,传入的是ByteBufferValues类型的数据,而doPut()方法中,则有如下关键代码:
// Actually put the values
val result = data match {
case IteratorValues(iterator) =>
blockStore.putIterator(blockId, iterator, putLevel, returnValues)
case ArrayValues(array) =>
blockStore.putArray(blockId, array, putLevel, returnValues)
case ByteBufferValues(bytes) =>
bytes.rewind()
blockStore.putBytes(blockId, bytes, putLevel)
}
上面提到过,传入的是ByteBufferValues类型的数据,那么这里调用的就应该是BlockStore的putBytes()方法。而BlockStore是一个抽象类,有硬盘DiskStore、外部块ExternalBlockStore、内存MemoryStore三种实现形式,这里既然讲的是内存管理模型,我们当然要看其内存实现形式MemoryStore了。而putBytes()方法中,不管level.deserialized是true还是false,最终还是调用的tryToPut()方法,该方法中,对内存的处理为:
val enoughMemory = memoryManager.acquireStorageMemory(blockId, size, droppedBlocks)
if (enoughMemory) {
// We acquired enough memory for the block, so go ahead and put it
val entry = new MemoryEntry(value(), size, deserialized)
entries.synchronized {
entries.put(blockId, entry)
}
val valuesOrBytes = if (deserialized) "values" else "bytes"
logInfo("Block %s stored as %s in memory (estimated size %s, free %s)".format(
blockId, valuesOrBytes, Utils.bytesToString(size), Utils.bytesToString(blocksMemoryUsed)))
} else {
// Tell the block manager that we couldn't put it in memory so that it can drop it to
// disk if the block allows disk storage.
lazy val data = if (deserialized) {
Left(value().asInstanceOf[Array[Any]])
} else {
Right(value().asInstanceOf[ByteBuffer].duplicate())
}
val droppedBlockStatus = blockManager.dropFromMemory(blockId, () => data)
droppedBlockStatus.foreach { status => droppedBlocks += ((blockId, status)) }
}
由上面我们可以得知,是通过memoryManager的acquireStorageMemory()方法来查看是否存在足够内存的。我们就先看下StaticMemoryManager的acquireStorageMemory()方法,定义如下:
override def acquireStorageMemory(
blockId: BlockId,
numBytes: Long,
evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean = synchronized {
if (numBytes > maxStorageMemory) {// 如果需要的大小numBytes超过Storage区域内存的上限,直接返回false,说明内存不够
// Fail fast if the block simply won't fit
logInfo(s"Will not store $blockId as the required space ($numBytes bytes) exceeds our " +
s"memory limit ($maxStorageMemory bytes)")
false
} else {// 否则,调用storageMemoryPool的acquireMemory()方法,申请内存
storageMemoryPool.acquireMemory(blockId, numBytes, evictedBlocks)
}
}
对了,就是这么简单。如果需要申请的内存超过Storage区域内存最大值的上限,则表明没有足够的内存进行存储,否则,调用storageMemoryPool的acquireMemory()方法分配内存,正是这里体现了static一词。至于具体分配内存的storageMemoryPool,我们放到最后和Execution区域时的onHeapExecutionMemoryPool、offHeapExecutionMemoryPool一起讲,这里先了解下它的概念即可,它实际上是对应某种区域的内存池,是对内存总大小、可用内存、已用内存等内存使用情况的一种记账的专用对象。
我们再看下UnifiedMemoryManager,其acquireStorageMemory()方法如下:
override def acquireStorageMemory(
blockId: BlockId,
numBytes: Long,
evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean = synchronized {
assert(onHeapExecutionMemoryPool.poolSize + storageMemoryPool.poolSize == maxMemory)
assert(numBytes >= 0)
// 如果需要申请的内存大小超过maxStorageMemory,即execution和storage区域共享的最大内存减去Execution已用内存,快速返回,
// 这里是将execution和storage区域一起考虑的
if (numBytes > maxStorageMemory) {
// Fail fast if the block simply won't fit
logInfo(s"Will not store $blockId as the required space ($numBytes bytes) exceeds our " +
s"memory limit ($maxStorageMemory bytes)")
return false
}
// 如果需要申请的内存大小超过预分配storage区域中可用大小memoryFree
if (numBytes > storageMemoryPool.memoryFree) {
// There is not enough free memory in the storage pool, so try to borrow free memory from
// the execution pool.
// 从Execution区域借调的内存大小,为需要申请内存大小和预分配的Execution区域可用大小memoryFree的较小者
val memoryBorrowedFromExecution = Math.min(onHeapExecutionMemoryPool.memoryFree, numBytes)
// Execution区域减小相应的值
onHeapExecutionMemoryPool.decrementPoolSize(memoryBorrowedFromExecution)
// Storage区域增大相应的值
storageMemoryPool.incrementPoolSize(memoryBorrowedFromExecution)
}
// 通过storageMemoryPool完成内存分配
storageMemoryPool.acquireMemory(blockId, numBytes, evictedBlocks)
}
首先,我们需要先了解下maxStorageMemory,这个和StaticMemoryManager中不一样,后者为按照比例和安全系数预分配的固定不变的大小,而这里则是通过如下方式定义的:
// maxStorageMemory为execution和storage区域共享的最大内存减去Execution已用内存
override def maxStorageMemory: Long = synchronized {
maxMemory - onHeapExecutionMemoryPool.memoryUsed
}
这个maxStorageMemory为execution和storage区域共享的最大内存减去Execution已用内存。好了,继续分析吧!
首先,如果需要申请的内存大小超过maxStorageMemory,即execution和storage区域共享的最大内存减去Execution已用内存,快速返回false,表示内存不充足不可用,这里是将execution和storage区域一起考虑的;
然后,如果需要申请的内存大小超过预分配storage区域中可用大小memoryFree,计算可以从从Execution区域借调的内存大小,该大小为需要申请内存大小和预分配的Execution区域可用大小memoryFree的较小者,然后Execution区域减小相应的值,Storage区域增大相应的值,完成动态调整;
最后,通过storageMemoryPool完成内存分配。
至此,StaticMemoryManager和UnifiedMemoryManager中,storage区域内存何时申请及如何分配我们已经讲完了。
接下来看看Execution区域。它申请内存的触发时机是在何时呢?之前,我们已经提到过它是被shuffle使用的,对于shuffle的详细细节,在这里,读者大可不必深究,我们会在专门的shuffle模块中进行讲解。通过代码追溯,我们可以大体了解到,它在ShuffleExternalSorter的insertRecord()方法,而ShuffleExternalSorter是一个专业的外部分类器,负责将传入的记录追加到数据页中。当所有的记录被插入,或者当前线程的shuffle内存已达上限时,内存中的记录就会通过它们的分区ID进行排序。
我们来看下它的insertRecord()方法,该方法负责将记录Record插入到数据页中,代码如下:
/**
* Write a record to the shuffle sorter.
*/
public void insertRecord(Object recordBase, long recordOffset, int length, int partitionId)
throws IOException {
// for tests
assert(inMemSorter != null);
if (inMemSorter.numRecords() > numElementsForSpillThreshold) {
spill();
}
growPointerArrayIfNecessary();
// Need 4 bytes to store the record length.
final int required = length + 4;
acquireNewPageIfNecessary(required);
assert(currentPage != null);
final Object base = currentPage.getBaseObject();
final long recordAddress = taskMemoryManager.encodePageNumberAndOffset(currentPage, pageCursor);
Platform.putInt(base, pageCursor, length);
pageCursor += 4;
Platform.copyMemory(recordBase, recordOffset, base, pageCursor, length);
pageCursor += length;
inMemSorter.insertRecord(recordAddress, partitionId);
}
可以看到,在方法处理逻辑中,会调用acquireNewPageIfNecessary()方法,该方法的作用就是为了插入一条额外的记录,在必要的情况下申请更多的内存。它的实现如下:
private void acquireNewPageIfNecessary(int required) {
if (currentPage == null ||
pageCursor + required > currentPage.getBaseOffset() + currentPage.size() ) {
// TODO: try to find space in previous pages
currentPage = allocatePage(required);
pageCursor = currentPage.getBaseOffset();
allocatedPages.add(currentPage);
}
}
它会调用allocatePage()方法,继续追踪,在其父类MemoryConsumer中,代码如下:
/**
* Allocate a memory block with at least `required` bytes.
*
* Throws IOException if there is not enough memory.
*
* @throws OutOfMemoryError
*/
protected MemoryBlock allocatePage(long required) {
MemoryBlock page = taskMemoryManager.allocatePage(Math.max(pageSize, required), this);
if (page == null || page.size() < required) {
long got = 0;
if (page != null) {
got = page.size();
taskMemoryManager.freePage(page, this);
}
taskMemoryManager.showMemoryUsage();
throw new OutOfMemoryError("Unable to acquire " + required + " bytes of memory, got " + got);
}
used += page.size();
return page;
}
继而会调用TaskMemoryManager的allocatePage()方法,我们继续看TaskMemoryManager的allocatePage()方法,发现它会调用acquireExecutionMemory()方法,而acquireExecutionMemory()方法则会调用MemoryManager的同名方法,于是,Execution区域内存分配最终就落在了MemoryManager的acquireExecutionMemory()方法上了。
仿照上面的storage区域的分析,我们还是分Static和Unified两种方式来讲解。先看Static,其acquireExecutionMemory()方法实现如下:
private[memory]
override def acquireExecutionMemory(
numBytes: Long,
taskAttemptId: Long,
memoryMode: MemoryMode): Long = synchronized {
// 根据MemoryMode的种类决定如何分配内存
memoryMode match {
// 如果是堆内,即ON_HEAP,则通过onHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配
case MemoryMode.ON_HEAP => onHeapExecutionMemoryPool.acquireMemory(numBytes, taskAttemptId)
// 如果是堆外,即OFF_HEAP,则通过offHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配
case MemoryMode.OFF_HEAP => offHeapExecutionMemoryPool.acquireMemory(numBytes, taskAttemptId)
}
}
}
这个方法的逻辑很简单,根据MemoryMode的种类来决定如何分配Execution区域内存。如果是堆内,即ON_HEAP,则通过onHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配;如果是堆外,即OFF_HEAP,则通过offHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配。
现在再看下UnifiedMemoryManager的acquireExecutionMemory()方法,代码如下:
/**
* Try to acquire up to `numBytes` of execution memory for the current task and return the
* number of bytes obtained, or 0 if none can be allocated.
*
* This call may block until there is enough free memory in some situations, to make sure each
* task has a chance to ramp up to at least 1 / 2N of the total memory pool (where N is the # of
* active tasks) before it is forced to spill. This can happen if the number of tasks increase
* but an older task had a lot of memory already.
*/
override private[memory] def acquireExecutionMemory(
numBytes: Long,
taskAttemptId: Long,
memoryMode: MemoryMode): Long = synchronized {
// 确保onHeapExecutionMemoryPool和storageMemoryPool大小之和等于二者共享内存区域maxMemory大小
assert(onHeapExecutionMemoryPool.poolSize + storageMemoryPool.poolSize == maxMemory)
assert(numBytes >= 0)
memoryMode match {
// 如果是堆内,即ON_HEAP,则通过onHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配
case MemoryMode.ON_HEAP =>
/**
* 通过收回缓存的块扩充the execution pool,从而减少the storage pool。
* Grow the execution pool by evicting cached blocks, thereby shrinking the storage pool.
*
* When acquiring memory for a task, the execution pool may need to make multiple
* attempts. Each attempt must be able to evict storage in case another task jumps in
* and caches a large block between the attempts. This is called once per attempt.
*/
def maybeGrowExecutionPool(extraMemoryNeeded: Long): Unit = {
// 如果需要额外的内存,即Execution预分配的内存已不够使用
if (extraMemoryNeeded > 0) {
// There is not enough free memory in the execution pool, so try to reclaim memory from
// storage. We can reclaim any free memory from the storage pool. If the storage pool
// has grown to become larger than `storageRegionSize`, we can evict blocks and reclaim
// the memory that storage has borrowed from execution.
// 此时,在the execution pool中已不存在足够的可用内存,所以我们尝试从storage区域回收部分内存。
// 我们可以回收the storage pool中的全部可用内存。
// 如果the storage pool逐渐增大至大于storageRegionSize,即初始化时storage区域的最大内存,
// 我们可以回收部分blocks,并回收storage区域从execution借用的那些内存。
// 首先取storageMemoryPool可用内存、storageMemoryPool总内存减去初始化时内存的较大者memoryReclaimableFromStorage
// 意思也就是,一定会把storageMemoryPool的可用内存全部借给execution区域,并且如果当前storageMemoryPool大小比初始化时大了,且大的程度比当前可用内存还大,则回收部分内存
val memoryReclaimableFromStorage =
math.max(storageMemoryPool.memoryFree, storageMemoryPool.poolSize - storageRegionSize)
//
if (memoryReclaimableFromStorage > 0) {
// Only reclaim as much space as is necessary and available:
// 仅仅回收可用及必需的内存
// storageMemoryPool调用shrinkPoolToFreeSpace方法回收并减持部分内存spaceReclaimed
val spaceReclaimed = storageMemoryPool.shrinkPoolToFreeSpace(
math.min(extraMemoryNeeded, memoryReclaimableFromStorage))
// onHeapExecutionMemoryPool增持相应的onHeapExecutionMemoryPool内存
onHeapExecutionMemoryPool.incrementPoolSize(spaceReclaimed)
}
}
}
/**
* The size the execution pool would have after evicting storage memory.
*
* The execution memory pool divides this quantity among the active tasks evenly to cap
* the execution memory allocation for each task. It is important to keep this greater
* than the execution pool size, which doesn't take into account potential memory that
* could be freed by evicting storage. Otherwise we may hit SPARK-12155.
*
* Additionally, this quantity should be kept below `maxMemory` to arbitrate fairness
* in execution memory allocation across tasks, Otherwise, a task may occupy more than
* its fair share of execution memory, mistakenly thinking that other tasks can acquire
* the portion of storage memory that cannot be evicted.
*/
// 计算ExecutionPool的最大大小
def computeMaxExecutionPoolSize(): Long = {
// storage区域和Execution区域二者共享内存减去storage区域已使用内存和storage区域初始化大小
maxMemory - math.min(storageMemoryUsed, storageRegionSize)
}
onHeapExecutionMemoryPool.acquireMemory(
numBytes, taskAttemptId, maybeGrowExecutionPool, computeMaxExecutionPoolSize)
// 如果是堆外,即OFF_HEAP,则通过offHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配
case MemoryMode.OFF_HEAP =>
// For now, we only support on-heap caching of data, so we do not need to interact with
// the storage pool when allocating off-heap memory. This will change in the future, though.
offHeapExecutionMemoryPool.acquireMemory(numBytes, taskAttemptId)
}
}
UnifiedMemoryManager同样也是区别ON_HEAP和OFF_HEAP两种方式来进行Execution区域内存的分配,OFF_HEAP方式和StaticMemoryManager一样,也是通过offHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配,ON_HEAP则要稍微复杂些,它虽然也是和StaticMemoryManager一样通过onHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配,但是它的分配有种特殊情况,即如果the execution pool可用内存不够,即如果需要额外的内存,会尝试从storage区域回收部分内存。此时,可以回收the storage pool中的全部可用内存,如果the storage pool逐渐增大至大于storageRegionSize,即初始化时storage区域的最大内存,我们可以回收部分blocks,并回收storage区域从execution借用的那些内存。
我们来看maybeGrowExecutionPool()方法的代码逻辑:如果需要额外的内存,即Execution预分配的内存已不够使用,首先取storageMemoryPool可用内存、storageMemoryPool总内存减去初始化时内存的较大者memoryReclaimableFromStorage,意思也就是,一定会把storageMemoryPool的可用内存全部借给execution区域,并且如果当前storageMemoryPool大小比初始化时大了,且大的程度比当前可用内存还大,则回收部分内存,然后storageMemoryPool调用shrinkPoolToFreeSpace方法回收并减持部分内存spaceReclaimed,onHeapExecutionMemoryPool增持相应的spaceReclaimed内存,起到了一个动态调整和此消彼长的效果。
最后,我们来看下实际分配内存的storageMemoryPool、onHeapExecutionMemoryPool以及offHeapExecutionMemoryPool。它们的定义在MemoryManager中,代码如下:
@GuardedBy("this")
protected val storageMemoryPool = new StorageMemoryPool(this)
@GuardedBy("this")
protected val onHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "on-heap execution")
@GuardedBy("this")
protected val offHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "off-heap execution")
它们的类型分别是StorageMemoryPool、ExecutionMemoryPool,只不过后两者是用两个名称不同的对象来分别提供内存服务的,二者的名称分别是on-heap execution和off-heap execution。而它们共同继承自抽象类MemoryPool,我们先看下MemoryPool的代码:
/**
* Manages bookkeeping for an adjustable-sized region of memory. This class is internal to
* the [[MemoryManager]]. See subclasses for more details.
* 为一个可调整大小的内存区域管理记账工作。这个类在MemoryManager类中使用。更多详情请查看实现子类。
*
* @param lock a [[MemoryManager]] instance, used for synchronization. We purposely erase the type
* to `Object` to avoid programming errors, since this object should only be used for
* synchronization purposes.
*/
private[memory] abstract class MemoryPool(lock: Object) {
@GuardedBy("lock")
// 内存池的大小
private[this] var _poolSize: Long = 0
/**
* Returns the current size of the pool, in bytes.
* 获取内存池大小,需要使用对象lock的同步关键字synchronized,解决并发的问题,
* 而这个lock就是StaticMemoryManager或UnifiedMemoryManager类类型的对象
*/
final def poolSize: Long = lock.synchronized {
_poolSize
}
/**
* Returns the amount of free memory in the pool, in bytes.
* 返回内存池可用内存,即内存池总大小减去已用内存,同样需要使用lock的同步关键字synchronized,解决并发的问题
*/
final def memoryFree: Long = lock.synchronized {
_poolSize - memoryUsed
}
/**
* Expands the pool by `delta` bytes.
* 对内存池进行delta bytes的扩充,即完成_poolSize += delta,delta必须大于等于0
*/
final def incrementPoolSize(delta: Long): Unit = lock.synchronized {
require(delta >= 0)
_poolSize += delta
}
/**
* Shrinks the pool by `delta` bytes.
* 对内存池进行delta字节的收缩,即_poolSize -= delta,delta必须大于等于0,且小于内存池现在的大小,并且必须小于等于内存池现在可用大小
*/
final def decrementPoolSize(delta: Long): Unit = lock.synchronized {
require(delta >= 0)
require(delta <= _poolSize)
require(_poolSize - delta >= memoryUsed)
_poolSize -= delta
}
/**
* Returns the amount of used memory in this pool (in bytes).
* 返回内存池现在已使用的大小,由子类实现
*/
def memoryUsed: Long
}
很简单,一切尽在代码注释中,读者可自行补脑。
下面,我们看下StorageMemoryPool的实现,代码如下:
/**
* Performs bookkeeping for managing an adjustable-size pool of memory that is used for storage
* (caching).
* 完成为管理一个可调整大小的用于存储(caching)的内存池的记账工作。
*
* @param lock a [[MemoryManager]] instance to synchronize on
*/
private[memory] class StorageMemoryPool(lock: Object) extends MemoryPool(lock) with Logging {
@GuardedBy("lock")
// 已使用内存大小
private[this] var _memoryUsed: Long = 0L
// 获取已使用内存大小memoryUsed,需要使用对象lock的同步关键字synchronized,解决并发的问题,
// 而这个lock就是StaticMemoryManager或UnifiedMemoryManager类类型的对象
override def memoryUsed: Long = lock.synchronized {
_memoryUsed
}
// MemoryStore内存存储
private var _memoryStore: MemoryStore = _
def memoryStore: MemoryStore = {
if (_memoryStore == null) {
throw new IllegalStateException("memory store not initialized yet")
}
_memoryStore
}
/**
* Set the [[MemoryStore]] used by this manager to evict cached blocks.
* This must be set after construction due to initialization ordering constraints.
*/
final def setMemoryStore(store: MemoryStore): Unit = {
_memoryStore = store
}
/**
* Acquire N bytes of memory to cache the given block, evicting existing ones if necessary.
* Blocks evicted in the process, if any, are added to `evictedBlocks`.
* @return whether all N bytes were successfully granted.
* 申请内存
*/
def acquireMemory(
blockId: BlockId,
numBytes: Long,
evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean =
// 需要lock的synchronized,解决并发的问题
lock.synchronized {
// 需要释放的资源,为需要申请的大小减去内存池目前可用内存大小
val numBytesToFree = math.max(0, numBytes - memoryFree)
// 调用同名方法acquireMemory()
acquireMemory(blockId, numBytes, numBytesToFree, evictedBlocks)
}
/**
* Acquire N bytes of storage memory for the given block, evicting existing ones if necessary.
*
* @param blockId the ID of the block we are acquiring storage memory for
* @param numBytesToAcquire the size of this block
* @param numBytesToFree the amount of space to be freed through evicting blocks
* @return whether all N bytes were successfully granted.
*/
def acquireMemory(
blockId: BlockId,
numBytesToAcquire: Long,
numBytesToFree: Long,
evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean = lock.synchronized {
// 申请分配的内存必须大于等于0
assert(numBytesToAcquire >= 0)
// 需要释放的内存必须大于等于0
assert(numBytesToFree >= 0)
// 已使用内存必须小于等于内存池大小
assert(memoryUsed <= poolSize)
if (numBytesToFree > 0) {
// 调用MemoryStore的evictBlocksToFreeSpace()方法释放numBytesToFree大小内存
memoryStore.evictBlocksToFreeSpace(Some(blockId), numBytesToFree, evictedBlocks)
// Register evicted blocks, if any, with the active task metrics
Option(TaskContext.get()).foreach { tc =>
val metrics = tc.taskMetrics()
val lastUpdatedBlocks = metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
metrics.updatedBlocks = Some(lastUpdatedBlocks ++ evictedBlocks.toSeq)
}
}
// NOTE: If the memory store evicts blocks, then those evictions will synchronously call
// back into this StorageMemoryPool in order to free memory. Therefore, these variables
// should have been updated.
// 判断是否有足够的内存,即申请分配的内存必须小于等于可用内存
val enoughMemory = numBytesToAcquire <= memoryFree
if (enoughMemory) {
// 如果有足够的内存,已使用内存_memoryUsed增加numBytesToAcquire
_memoryUsed += numBytesToAcquire
}
// 返回enoughMemory,标志内存是否分配成功,即存在可用内存的话就分配成功,否则分配不成功
enoughMemory
}
// 释放size大小的内存,同样需要lock对象上的synchronized关键字,解决并发问题
def releaseMemory(size: Long): Unit = lock.synchronized {
// 如果size大于目前已使用内存_memoryUsed,记录Warning日志信息,且已使用内存_memoryUsed设置为0
if (size > _memoryUsed) {
logWarning(s"Attempted to release $size bytes of storage " +
s"memory when we only have ${_memoryUsed} bytes")
_memoryUsed = 0
} else {
// 否则,已使用内存_memoryUsed减去size大小
_memoryUsed -= size
}
}
// 释放所有的内存,同样需要lock对象上的synchronized关键字,解决并发问题,将目前已使用内存_memoryUsed设置为0
def releaseAllMemory(): Unit = lock.synchronized {
_memoryUsed = 0
}
/**
* Try to shrink the size of this storage memory pool by `spaceToFree` bytes. Return the number
* of bytes removed from the pool's capacity.
*/
def shrinkPoolToFreeSpace(spaceToFree: Long): Long = lock.synchronized {
// First, shrink the pool by reclaiming free memory:
val spaceFreedByReleasingUnusedMemory = math.min(spaceToFree, memoryFree)
decrementPoolSize(spaceFreedByReleasingUnusedMemory)
val remainingSpaceToFree = spaceToFree - spaceFreedByReleasingUnusedMemory
if (remainingSpaceToFree > 0) {
// If reclaiming free memory did not adequately shrink the pool, begin evicting blocks:
val evictedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
memoryStore.evictBlocksToFreeSpace(None, remainingSpaceToFree, evictedBlocks)
val spaceFreedByEviction = evictedBlocks.map(_._2.memSize).sum
// When a block is released, BlockManager.dropFromMemory() calls releaseMemory(), so we do
// not need to decrement _memoryUsed here. However, we do need to decrement the pool size.
decrementPoolSize(spaceFreedByEviction)
spaceFreedByReleasingUnusedMemory + spaceFreedByEviction
} else {
spaceFreedByReleasingUnusedMemory
}
}
}
我们重点看下acquireMemory()的两个方法,第一个是带有blockId、numBytes、evictedBlocks三个参数的,它的逻辑很简单,使用lock的synchronized,解决并发的问题,然后计算需要释放的内存大小numBytesToFree,需要申请的大小减去内存池目前可用内存大小,也就是看内存池中可用内存大小是否能满足申请分配的内存大小,然后调用多一个numBytesToFree参数的同名方法。
带有四个参数的acquireMemory()也很简单,首先需要做一些内存大小的校验,确保内存的申请分配时合理的。校验的内容包含以下三个部分:
1、申请分配的内存numBytesToAcquire必须大于等于0;
2、需要释放的内存numBytesToFree必须大于等于0;
3、已使用内存memoryUsed必须小于等于内存池大小poolSize;
然后,如果需要释放部分内存,即numBytesToFree大于0,则调用MemoryStore的evictBlocksToFreeSpace()方法释放numBytesToFree大小内存,关于MemoryStore的内容我们在后续的存储管理模块再详细介绍,这里先有个概念即可。
最后,判断是否有足够的内存,即申请分配的内存必须小于等于可用内存,如果有足够的内存,已使用内存_memoryUsed增加numBytesToAcquire,并返回ture,否则返回false。
接下来,我们再看下shrinkPoolToFreeSpace()方法,它的主要作用就是试图对storage内存池收缩 spaceToFree字节大小,返回实际收缩的大小值。处理逻辑如下:
1、取试图收缩大小spaceToFree和可用内存memoryFree的较小者,即如果试图收缩的spaceToFree大于可用内存大小,那么最大也就是收缩可用内存大小memoryFree;
2、内存池做相应的减少,减少的大小为上面的spaceFreedByReleasingUnusedMemory;
3、计算预设定收缩大小中未完成的部分remainingSpaceToFree,即spaceToFree - spaceFreedByReleasingUnusedMemory;
4、如果未完成部分大于0:
4.1、利用MemoryStore调用evictBlocksToFreeSpace,放弃部分块来增加内存可用空间;
4.2、取得放弃块后可用内存增加的大小spaceFreedByEviction;
4.3、内存池做相应的减少spaceFreedByEviction;
4.4、返回收缩的实际大小,即spaceFreedByReleasingUnusedMemory + spaceFreedByEviction;
5、返回spaceFreedByReleasingUnusedMemory。
相对应的,我们现在来看下ExecutionMemoryPool的实现,代码如下:
/**
* Implements policies and bookkeeping for sharing a adjustable-sized pool of memory between tasks.
*
* Tries to ensure that each task gets a reasonable share of memory, instead of some task ramping up
* to a large amount first and then causing others to spill to disk repeatedly.
*
* If there are N tasks, it ensures that each task can acquire at least 1 / 2N of the memory
* before it has to spill, and at most 1 / N. Because N varies dynamically, we keep track of the
* set of active tasks and redo the calculations of 1 / 2N and 1 / N in waiting tasks whenever this
* set changes. This is all done by synchronizing access to mutable state and using wait() and
* notifyAll() to signal changes to callers. Prior to Spark 1.6, this arbitration of memory across
* tasks was performed by the ShuffleMemoryManager.
*
* @param lock a [[MemoryManager]] instance to synchronize on
* @param poolName a human-readable name for this pool, for use in log messages
*/
private[memory] class ExecutionMemoryPool(
lock: Object,
poolName: String
) extends MemoryPool(lock) with Logging {
/**
* Map from taskAttemptId -> memory consumption in bytes
* taskAttemptId到内存耗费的映射
*/
@GuardedBy("lock")
private val memoryForTask = new mutable.HashMap[Long, Long]()
// 获取已使用内存,需要在对象lock上使用synchronized关键字,解决并发的问题
override def memoryUsed: Long = lock.synchronized {
memoryForTask.values.sum
}
/**
* Returns the memory consumption, in bytes, for the given task.
* 返回给定Task的内存耗费
*/
def getMemoryUsageForTask(taskAttemptId: Long): Long = lock.synchronized {
memoryForTask.getOrElse(taskAttemptId, 0L)
}
/**
* Try to acquire up to `numBytes` of memory for the given task and return the number of bytes
* obtained, or 0 if none can be allocated.
*
* This call may block until there is enough free memory in some situations, to make sure each
* task has a chance to ramp up to at least 1 / 2N of the total memory pool (where N is the # of
* active tasks) before it is forced to spill. This can happen if the number of tasks increase
* but an older task had a lot of memory already.
*
* @param numBytes number of bytes to acquire
* @param taskAttemptId the task attempt acquiring memory
* @param maybeGrowPool a callback that potentially grows the size of this pool. It takes in
* one parameter (Long) that represents the desired amount of memory by
* which this pool should be expanded.
* @param computeMaxPoolSize a callback that returns the maximum allowable size of this pool
* at this given moment. This is not a field because the max pool
* size is variable in certain cases. For instance, in unified
* memory management, the execution pool can be expanded by evicting
* cached blocks, thereby shrinking the storage pool.
*
* @return the number of bytes granted to the task.
*/
private[memory] def acquireMemory(
numBytes: Long,
taskAttemptId: Long,
maybeGrowPool: Long => Unit = (additionalSpaceNeeded: Long) => Unit,
computeMaxPoolSize: () => Long = () => poolSize): Long = lock.synchronized {
// 申请内存numBytes的大小必须大于0
assert(numBytes > 0, s"invalid number of bytes requested: $numBytes")
// TODO: clean up this clunky method signature
// Add this task to the taskMemory map just so we can keep an accurate count of the number
// of active tasks, to let other tasks ramp down their memory in calls to `acquireMemory`
// 如果memoryForTask中不包含该Task,加入该Task,初始化为0,并唤醒其它等待的对象
if (!memoryForTask.contains(taskAttemptId)) {
memoryForTask(taskAttemptId) = 0L
// This will later cause waiting tasks to wake up and check numTasks again
lock.notifyAll()
}
// Keep looping until we're either sure that we don't want to grant this request (because this
// task would have more than 1 / numActiveTasks of the memory) or we have enough free
// memory to give it (we always let each task get at least 1 / (2 * numActiveTasks)).
// TODO: simplify this to limit each task to its own slot
while (true) {
// 获取当前活跃Task的数目
val numActiveTasks = memoryForTask.keys.size
// 获取该Task对应的当前已耗费内存
val curMem = memoryForTask(taskAttemptId)
// In every iteration of this loop, we should first try to reclaim any borrowed execution
// space from storage. This is necessary because of the potential race condition where new
// storage blocks may steal the free execution memory that this task was waiting for.
// 传进来的UnifiedMemoryManager的maybeGrowExecutionPool()方法
// 通过收回缓存的块扩充the execution pool,从而减少the storage pool
maybeGrowPool(numBytes - memoryFree)
// Maximum size the pool would have after potentially growing the pool.
// This is used to compute the upper bound of how much memory each task can occupy. This
// must take into account potential free memory as well as the amount this pool currently
// occupies. Otherwise, we may run into SPARK-12155 where, in unified memory management,
// we did not take into account space that could have been freed by evicting cached blocks.
// 计算内存池的最大大小maxPoolSize
val maxPoolSize = computeMaxPoolSize()
// 平均每个Task分配的最大内存大小maxMemoryPerTask
val maxMemoryPerTask = maxPoolSize / numActiveTasks
// 平均每个Task分配的最小内存大小minMemoryPerTask,为maxMemoryPerTask的一半
val minMemoryPerTask = poolSize / (2 * numActiveTasks)
// How much we can grant this task; keep its share within 0 <= X <= 1 / numActiveTasks
// 我们可以赋予该Task的最大大小,取numBytes和(maxMemoryPerTask - curMem与0较大者)中的较小者
// 如果当前已耗费内存大于maxMemoryPerTask,则为0,不再分配啦,否则取还可以分配的内存和申请分配的内存中的较小者
val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - curMem))
// Only give it as much memory as is free, which might be none if it reached 1 / numTasks
// 实际可以分配的最大大小,取maxToGrant和memoryFree中的较小者
val toGrant = math.min(maxToGrant, memoryFree)
// We want to let each task get at least 1 / (2 * numActiveTasks) before blocking;
// if we can't give it this much now, wait for other tasks to free up memory
// (this happens if older tasks allocated lots of memory before N grew)
if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {
// 如果实际分配的内存大小toGrant小于申请分配的内存大小numBytes,且当前已耗费内存加上马上就要分配的内存,小于Task需要的最小内存
// 记录日志信息
logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName pool to be free")
// lock等待,即MemoryManager等待
lock.wait()
} else {
// 对应Task的已耗费内存增加toGrant
memoryForTask(taskAttemptId) += toGrant
// 返回申请的内存大小toGrant
return toGrant
}
}
0L // Never reached
}
/**
* Release `numBytes` of memory acquired by the given task.
* 释放给定Task申请的numBytes大小的内存
*/
def releaseMemory(numBytes: Long, taskAttemptId: Long): Unit = lock.synchronized {
// 根据Task获取当前已耗费内存
val curMem = memoryForTask.getOrElse(taskAttemptId, 0L)
var memoryToFree = if (curMem < numBytes) {// 如果当前已耗费内存小于需要释放的内存
// 记录警告日志信息
logWarning(
s"Internal error: release called on $numBytes bytes but task only has $curMem bytes " +
s"of memory from the $poolName pool")
// 返回curMem
curMem
} else {
// 否则直接返回numBytes
numBytes
}
if (memoryForTask.contains(taskAttemptId)) {
// memoryForTask中对应Task的已耗费内存减少memoryToFree
memoryForTask(taskAttemptId) -= memoryToFree
// 已耗费内存小于等于0的话,直接删除
if (memoryForTask(taskAttemptId) <= 0) {
memoryForTask.remove(taskAttemptId)
}
}
// 唤醒所有等待的对象,比如acquireMemory()方法
lock.notifyAll() // Notify waiters in acquireMemory() that memory has been freed
}
/**
* Release all memory for the given task and mark it as inactive (e.g. when a task ends).
* @return the number of bytes freed.
* 释放给定Task的所有内存,并且标记其为不活跃
*/
def releaseAllMemoryForTask(taskAttemptId: Long): Long = lock.synchronized {
// 获取指定Task的内存使用情况
val numBytesToFree = getMemoryUsageForTask(taskAttemptId)
// 释放指定Task的numBytesToFree大小的内存
releaseMemory(numBytesToFree, taskAttemptId)
// 返回释放的大小numBytesToFree
numBytesToFree
}
}
我们还是重点看下acquireMemory()方法,主要逻辑如下:
1、校验,确保申请内存numBytes的大小必须大于0;
2、如果memoryForTask中不包含该Task,加入该Task,初始化为0,并唤醒其它等待的对象;
3、在一个循环体中:
3.1、获取当前活跃Task的数目numActiveTasks;
3.2、获取该Task对应的当前已耗费内存curMem;
3.3、maybeGrowPool为传进来的UnifiedMemoryManager的maybeGrowExecutionPool()方法,其通过收回缓存的块扩充the execution pool,从而减少the storage pool;
3.4、计算内存池的最大大小maxPoolSize;
3.5、平均每个Task分配的最大内存大小maxMemoryPerTask;
3.6、平均每个Task分配的最小内存大小minMemoryPerTask,为maxMemoryPerTask的一半;
3.7、计算我们可以赋予该Task的最大大小maxToGrant,取numBytes和(maxMemoryPerTask - curMem与0较大者)中的较小者,也就是,如果当前已耗费内存大于maxMemoryPerTask,则为0,不再分配啦,否则取还可以分配的内存和申请分配的内存中的较小者;
3.8、计算实际可以分配的最大大小toGrant,取maxToGrant和memoryFree中的较小者;
3.9、如果实际分配的内存大小toGrant小于申请分配的内存大小numBytes,且当前已耗费内存加上马上就要分配的内存,小于Task需要的最小内存,记录日志信息,lock等待,即MemoryManager等待;否则memoryForTask中对应Task的已耗费内存增加toGrant,返回申请的内存大小toGrant,跳出循环。
好了,到此为止,静态内存管理模型StaticMemoryManager和统一内存管理模型UnifiedMemoryManager的原理、初始化,还有storage区域和execution区域内存申请的时机、具体策略及方法我们都已分析完了。因水平有限,如有叙述不清、错误的地方,或者好的建议和更深的理解,还望诸位读者不吝赐教!其中涉及的storage、shuffle等内容或者相关类,等后续的存储模块、shuffle模块再做详细分析吧!
只看不评,不是好汉!哈哈!