Spark是现在很流行的一个基于内存的分布式计算框架,既然是基于内存,那么自然而然的,内存的管理就是Spark存储管理的重中之重了。那么,Spark究竟采用什么样的内存管理模型呢?本文就为大家揭开Spark内存管理模型的神秘面纱。
我们在《Spark源码分析之七:Task运行(一)》一文中曾经提到过,在Task被传递到Executor上去执行时,在为其分配的TaskRunner线程的run()方法内,在Task真正运行之前,我们就要构造一个任务内存管理器TaskMemoryManager,然后在反序列化Task对象的二进制数据得到Task对象后,需要将这个内存管理器TaskMemoryManager设置为Task的成员变量。那么,究竟TaskMemoryManager是如何被创建的呢?我们先看下TaskRunner线程的run()方法中涉及到的代码:
-
- val taskMemoryManager = new TaskMemoryManager(env.memoryManager, taskId)
taskId好说,它就是Task的唯一标识ID,那么env.memoryManager呢,我们来看下SparkEnv的相关代码,如下所示:
-
- val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)
-
- val memoryManager: MemoryManager =
- if (useLegacyMemoryManager) {
- new StaticMemoryManager(conf, numUsableCores)
- } else {
- UnifiedMemoryManager(conf, numUsableCores)
- }
SparkEnv在构造过程中,会根据参数spark.memory.useLegacyMode来确定是否使用之前的内存管理模型,默认不采用之前的。如果是采用之前的,则memoryManager被实例化为一个StaticMemoryManager对象,否则采用新的内存管理模型,memoryManager被实例化为一个UnifiedMemoryManager内存对象。
我们知道,英文单词static代表静态、不变的意思,unified代表统一的意思。从字面意思来看,StaticMemoryManager表示是静态的内存管理器,何谓静态,就是按照某种算法确定内存的分配后,其整体分布不会随便改变,而UnifiedMemoryManager代表的是统一的内存管理器,统一么,是不是有共享和变动的意思。那么我们不妨大胆猜测下,StaticMemoryManager这种内存管理模型是在内存分配之初,即确定各区域内存的大小,并在Task运行过程中保持不变,而UnifiedMemoryManager则会根据Task运行过程中各区域数据对内存需要的程序进行动态调整。到底是不是这样呢?只有看过源码才能知晓。首先,我们看下StaticMemoryManager,通过SparkEnv中对其初始化的语句我们知道,它的初始化调用的是StaticMemoryManager的带有参数SparkConf类型的conf、Int类型的numCores的构造函数,代码如下:
- def this(conf: SparkConf, numCores: Int) {
-
- this(
- conf,
-
- StaticMemoryManager.getMaxExecutionMemory(conf),
-
-
- StaticMemoryManager.getMaxStorageMemory(conf),
- numCores)
- }
其中,在调用最底层的构造方法之前,调用了伴生对象StaticMemoryManager的两个方法,分别是获取Execution区域(即运行区域,为shuffle使用)分配的可用内存总大小的getMaxExecutionMemory()和获取storage区域(即存储区域)分配的可用内存总大小的getMaxStorageMemory()方法。
何为Storage区域?何为Execution区域?这里,我们简单解释下,Storage就是存储的意思,它存储的是Task的运行结果等数据,当然前提是其运行结果比较小,足以在内存中盛放。那么Execution呢?执行的意思,通过参数中的shuffle,我猜测它实际上是shuffle过程中需要使用的内存数据(因为还未分析shuffle,这里只是猜测,猜错勿怪,还望读者指正)。
我们接着看下这两个方法,代码如下:
-
-
-
-
- private def getMaxStorageMemory(conf: SparkConf): Long = {
-
-
- val systemMaxMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
-
-
- val memoryFraction = conf.getDouble("spark.storage.memoryFraction", 0.6)
-
-
- val safetyFraction = conf.getDouble("spark.storage.safetyFraction", 0.9)
-
-
- (systemMaxMemory * memoryFraction * safetyFraction).toLong
- }
-
-
-
-
-
- private def getMaxExecutionMemory(conf: SparkConf): Long = {
-
-
- val systemMaxMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
-
-
- val memoryFraction = conf.getDouble("spark.shuffle.memoryFraction", 0.2)
-
-
- val safetyFraction = conf.getDouble("spark.shuffle.safetyFraction", 0.8)
-
-
- (systemMaxMemory * memoryFraction * safetyFraction).toLong
- }
通过上述代码,我们可以看到,获取两个内存区域大小的方法是及其相似的,我们就以getMaxStorageMemory()方法为例,来详细说明。
首先,需要获得系统可用最大内存systemMaxMemory,取参数spark.testing.memory,未配置的话取运行时环境中的最大内存;
然后,需要获取取storage区域(即存储区域)在总内存中所占比重memoryFraction,由参数spark.storage.memoryFraction确定,默认为0.6;
接着,需要获取storage区域(即存储区域)在系统为其可分配最大内存的安全系数safetyFraction,主要为了防止OOM,取参数spark.storage.safetyFraction,默认为0.9;
最后,利用公式systemMaxMemory * memoryFraction * safetyFraction来计算出storage区域(即存储区域)分配的可用内存总大小。
前面几步都好说,默认情况下,storage区域(即存储区域)分配的可用内存总大小占系统可用内存大小的60%,那么最后为什么需要一个安全系数safetyFraction呢?设身处地的想一下,系统一开始分配它最大可用内存的60%给你,你上来就一下用完了,那么再有内存需求呢?是不是此时很容易就发生OOM呢?安全系统正式基于这个原因才设定的,也就是说,默认情况下,刚开始storage区域(即存储区域)分配的可用内存总大小占系统可用内存大小的54%,而不是60%。
getMaxExecutionMemory()方法与getMaxStorageMemory()方法处理逻辑一样,只不过取得参数不同罢了,默认情况下占系统可用最大内存的20%,而安全系数则是80%,故默认情况下,刚开始Execution区域(即运行区域,为shuffle使用)分配的可用内存总大小占系统可用内存大小的16%。
等等,好像少点什么?60%+20%=80%,不是100%!这是为什么呢?很简单,程序或者系统本身的运行,也是需要消耗内存的嘛!
而StaticMemoryManager最底层的构造方法,也就是scala语言语法中类定义的部分,则是:
- private[spark] class StaticMemoryManager(
- conf: SparkConf,
- maxOnHeapExecutionMemory: Long,
- override val maxStorageMemory: Long,
- numCores: Int)
- extends MemoryManager(
- conf,
- numCores,
- maxStorageMemory,
- maxOnHeapExecutionMemory) {
我们看到,StaticMemoryManager静态内存管理器则持有了参数SparkConf类型的conf、Execution内存大小maxOnHeapExecutionMemory、Storage内存大小maxStorageMemory、CPU核数numCores等成员变量。
至此,StaticMemoryManager对象就初始化完毕。现在我们总结一下静态内存管理模型的特点,这种模型最大的一个缺点就是每种区域不能超过参数为其配置的最大值,即便是一种区域的内存很繁忙,而另外一种很空闲,也不能超过上限占用更多的内存,即使是总数未超过规定的阈值。那么,随之而来的一种解决方案便是UnifiedMemoryManager,统一的内存管理模型。
接下来,我们再看下UnifiedMemoryManager,即统一内存管理器。在SparkEnv中,它是通过如下方式完成初始化的:
- UnifiedMemoryManager(conf, numUsableCores)
读者这里可能有疑问了,为什么没有new关键字呢?这正是scala语言的特点。它其实是通过UnifiedMemoryManager类的apply()方法完成初始化的。代码如下:
- def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {
-
-
- val maxMemory = getMaxMemory(conf)
-
-
- new UnifiedMemoryManager(
- conf,
- maxMemory = maxMemory,
-
- storageRegionSize =
- (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong,
- numCores = numCores)
- }
首先,需要获得execution和storage区域共享的最大内存maxMemory;
然后,构造UnifiedMemoryManager对象,而storage区域内存大小storageRegionSize则初始化为execution和storage区域共享的最大内存maxMemory的spark.memory.storageFraction,默认为0.5,即一半。
下面,我们主要看下获得execution和storage区域共享的最大内存的getMaxMemory()方法。代码如下:
-
-
-
-
- private def getMaxMemory(conf: SparkConf): Long = {
-
-
- val systemMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)
-
-
-
- val reservedMemory = conf.getLong("spark.testing.reservedMemory",
- if (conf.contains("spark.testing")) 0 else RESERVED_SYSTEM_MEMORY_BYTES)
-
-
- val minSystemMemory = reservedMemory * 1.5
-
-
-
- if (systemMemory < minSystemMemory) {
- throw new IllegalArgumentException(s"System memory $systemMemory must " +
- s"be at least $minSystemMemory. Please use a larger heap size.")
- }
-
-
- val usableMemory = systemMemory - reservedMemory
-
-
- val memoryFraction = conf.getDouble("spark.memory.fraction", 0.75)
-
-
- (usableMemory * memoryFraction).toLong
- }
处理流程大体如下:
1、获取系统可用最大内存systemMemory,取参数spark.testing.memory,未配置的话取运行时环境中的最大内存;
2、获取预留内存reservedMemory,取参数spark.testing.reservedMemory,未配置的话,根据参数spark.testing来确定默认值,参数spark.testing存在的话,默认为0,否则默认为300M;
3、取最小的系统内存minSystemMemory,为预留内存reservedMemory的1.5倍;
4、如果系统可用最大内存systemMemory小于最小的系统内存minSystemMemory,即预留内存reservedMemory的1.5倍的话,抛出异常,提醒用户调大JVM堆大小;
5、计算可用内存usableMemory,即系统最大可用内存systemMemory减去预留内存reservedMemory;
6、取可用内存所占比重,即参数spark.memory.fraction,默认为0.75;
7、返回的execution和storage区域共享的最大内存为usableMemory * memoryFraction。
也就是说,UnifiedMemoryManager统一内存存储管理策略中,默认情况下,storage区域和execution区域默认都占其共享内存区域的一半,而execution和storage区域共享的最大内存为系统最大可用内存systemMemory减去预留内存reservedMemory后的75%。至于在哪里体现的动态调整,则要到真正申请内存时再体现了。
好了,UnifiedMemoryManager统一内存存储管理器的初始化也讲完了。那么,接下来的问题则是,何时以及如何进行内存的申请及分配?针对storage和execution,我们一个个的看。
首先看看storage,顾名思义,storage是存储的意思,也就是说是在Task运行完成出结果后,对结果的存储区域。我们回顾下博文《Spark源码分析之七:Task运行(一)》中所讲的Task运行完成后对Task运行结果的处理,如果 Task运行结果大小超过Akka除去需要保留的字节外最大大小,则将结果写入BlockManager,那么是如何写入的呢?代码如下:
- env.blockManager.putBytes(
- blockId, serializedDirectResult, StorageLevel.MEMORY_AND_DISK_SER)
调用的是BlockManager的putBytes()方法,很显然,写入的是二进制Bytes数据,且使用的存储策略是MEMORY_AND_DISK_SER。我们先看下这个方法:
-
-
-
-
- def putBytes(
- blockId: BlockId,
- bytes: ByteBuffer,
- level: StorageLevel,
- tellMaster: Boolean = true,
- effectiveStorageLevel: Option[StorageLevel] = None): Seq[(BlockId, BlockStatus)] = {
- require(bytes != null, "Bytes is null")
- doPut(blockId, ByteBufferValues(bytes), level, tellMaster, effectiveStorageLevel)
- }
调用的是doPut()方法,传入的是ByteBufferValues类型的数据,而doPut()方法中,则有如下关键代码:
-
- val result = data match {
- case IteratorValues(iterator) =>
- blockStore.putIterator(blockId, iterator, putLevel, returnValues)
- case ArrayValues(array) =>
- blockStore.putArray(blockId, array, putLevel, returnValues)
- case ByteBufferValues(bytes) =>
- bytes.rewind()
- blockStore.putBytes(blockId, bytes, putLevel)
- }
上面提到过,传入的是ByteBufferValues类型的数据,那么这里调用的就应该是BlockStore的putBytes()方法。而BlockStore是一个抽象类,有硬盘DiskStore、外部块ExternalBlockStore、内存MemoryStore三种实现形式,这里既然讲的是内存管理模型,我们当然要看其内存实现形式MemoryStore了。而putBytes()方法中,不管level.deserialized是true还是false,最终还是调用的tryToPut()方法,该方法中,对内存的处理为:
- val enoughMemory = memoryManager.acquireStorageMemory(blockId, size, droppedBlocks)
- if (enoughMemory) {
-
- val entry = new MemoryEntry(value(), size, deserialized)
- entries.synchronized {
- entries.put(blockId, entry)
- }
- val valuesOrBytes = if (deserialized) "values" else "bytes"
- logInfo("Block %s stored as %s in memory (estimated size %s, free %s)".format(
- blockId, valuesOrBytes, Utils.bytesToString(size), Utils.bytesToString(blocksMemoryUsed)))
- } else {
-
-
- lazy val data = if (deserialized) {
- Left(value().asInstanceOf[Array[Any]])
- } else {
- Right(value().asInstanceOf[ByteBuffer].duplicate())
- }
- val droppedBlockStatus = blockManager.dropFromMemory(blockId, () => data)
- droppedBlockStatus.foreach { status => droppedBlocks += ((blockId, status)) }
- }
由上面我们可以得知,是通过memoryManager的acquireStorageMemory()方法来查看是否存在足够内存的。我们就先看下StaticMemoryManager的acquireStorageMemory()方法,定义如下:
- override def acquireStorageMemory(
- blockId: BlockId,
- numBytes: Long,
- evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean = synchronized {
-
- if (numBytes > maxStorageMemory) {
-
- logInfo(s"Will not store $blockId as the required space ($numBytes bytes) exceeds our " +
- s"memory limit ($maxStorageMemory bytes)")
- false
- } else {
- storageMemoryPool.acquireMemory(blockId, numBytes, evictedBlocks)
- }
- }
对了,就是这么简单。如果需要申请的内存超过Storage区域内存最大值的上限,则表明没有足够的内存进行存储,否则,调用storageMemoryPool的acquireMemory()方法分配内存,正是这里体现了static一词。至于具体分配内存的storageMemoryPool,我们放到最后和Execution区域时的onHeapExecutionMemoryPool、offHeapExecutionMemoryPool一起讲,这里先了解下它的概念即可,它实际上是对应某种区域的内存池,是对内存总大小、可用内存、已用内存等内存使用情况的一种记账的专用对象。
我们再看下UnifiedMemoryManager,其acquireStorageMemory()方法如下:
- override def acquireStorageMemory(
- blockId: BlockId,
- numBytes: Long,
- evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean = synchronized {
-
- assert(onHeapExecutionMemoryPool.poolSize + storageMemoryPool.poolSize == maxMemory)
- assert(numBytes >= 0)
-
-
-
- if (numBytes > maxStorageMemory) {
-
- logInfo(s"Will not store $blockId as the required space ($numBytes bytes) exceeds our " +
- s"memory limit ($maxStorageMemory bytes)")
- return false
- }
-
-
- if (numBytes > storageMemoryPool.memoryFree) {
-
-
-
-
- val memoryBorrowedFromExecution = Math.min(onHeapExecutionMemoryPool.memoryFree, numBytes)
-
-
- onHeapExecutionMemoryPool.decrementPoolSize(memoryBorrowedFromExecution)
-
-
- storageMemoryPool.incrementPoolSize(memoryBorrowedFromExecution)
- }
-
-
- storageMemoryPool.acquireMemory(blockId, numBytes, evictedBlocks)
- }
首先,我们需要先了解下maxStorageMemory,这个和StaticMemoryManager中不一样,后者为按照比例和安全系数预分配的固定不变的大小,而这里则是通过如下方式定义的:
-
- override def maxStorageMemory: Long = synchronized {
- maxMemory - onHeapExecutionMemoryPool.memoryUsed
- }
这个maxStorageMemory为execution和storage区域共享的最大内存减去Execution已用内存。好了,继续分析吧!
首先,如果需要申请的内存大小超过maxStorageMemory,即execution和storage区域共享的最大内存减去Execution已用内存,快速返回false,表示内存不充足不可用,这里是将execution和storage区域一起考虑的;
然后,如果需要申请的内存大小超过预分配storage区域中可用大小memoryFree,计算可以从从Execution区域借调的内存大小,该大小为需要申请内存大小和预分配的Execution区域可用大小memoryFree的较小者,然后Execution区域减小相应的值,Storage区域增大相应的值,完成动态调整;
最后,通过storageMemoryPool完成内存分配。
至此,StaticMemoryManager和UnifiedMemoryManager中,storage区域内存何时申请及如何分配我们已经讲完了。
接下来看看Execution区域。它申请内存的触发时机是在何时呢?之前,我们已经提到过它是被shuffle使用的,对于shuffle的详细细节,在这里,读者大可不必深究,我们会在专门的shuffle模块中进行讲解。通过代码追溯,我们可以大体了解到,它在ShuffleExternalSorter的insertRecord()方法,而ShuffleExternalSorter是一个专业的外部分类器,负责将传入的记录追加到数据页中。当所有的记录被插入,或者当前线程的shuffle内存已达上限时,内存中的记录就会通过它们的分区ID进行排序。
我们来看下它的insertRecord()方法,该方法负责将记录Record插入到数据页中,代码如下:
-
-
-
- public void insertRecord(Object recordBase, long recordOffset, int length, int partitionId)
- throws IOException {
-
-
- assert(inMemSorter != null);
- if (inMemSorter.numRecords() > numElementsForSpillThreshold) {
- spill();
- }
-
- growPointerArrayIfNecessary();
-
- final int required = length + 4;
- acquireNewPageIfNecessary(required);
-
- assert(currentPage != null);
- final Object base = currentPage.getBaseObject();
- final long recordAddress = taskMemoryManager.encodePageNumberAndOffset(currentPage, pageCursor);
- Platform.putInt(base, pageCursor, length);
- pageCursor += 4;
- Platform.copyMemory(recordBase, recordOffset, base, pageCursor, length);
- pageCursor += length;
- inMemSorter.insertRecord(recordAddress, partitionId);
- }
可以看到,在方法处理逻辑中,会调用acquireNewPageIfNecessary()方法,该方法的作用就是为了插入一条额外的记录,在必要的情况下申请更多的内存。它的实现如下:
- private void acquireNewPageIfNecessary(int required) {
- if (currentPage == null ||
- pageCursor + required > currentPage.getBaseOffset() + currentPage.size() ) {
-
- currentPage = allocatePage(required);
- pageCursor = currentPage.getBaseOffset();
- allocatedPages.add(currentPage);
- }
- }
它会调用allocatePage()方法,继续追踪,在其父类MemoryConsumer中,代码如下:
-
-
-
-
-
-
-
- protected MemoryBlock allocatePage(long required) {
- MemoryBlock page = taskMemoryManager.allocatePage(Math.max(pageSize, required), this);
- if (page == null || page.size() < required) {
- long got = 0;
- if (page != null) {
- got = page.size();
- taskMemoryManager.freePage(page, this);
- }
- taskMemoryManager.showMemoryUsage();
- throw new OutOfMemoryError("Unable to acquire " + required + " bytes of memory, got " + got);
- }
- used += page.size();
- return page;
- }
继而会调用TaskMemoryManager的allocatePage()方法,我们继续看TaskMemoryManager的allocatePage()方法,发现它会调用acquireExecutionMemory()方法,而acquireExecutionMemory()方法则会调用MemoryManager的同名方法,于是,Execution区域内存分配最终就落在了MemoryManager的acquireExecutionMemory()方法上了。
仿照上面的storage区域的分析,我们还是分Static和Unified两种方式来讲解。先看Static,其acquireExecutionMemory()方法实现如下:
- private[memory]
- override def acquireExecutionMemory(
- numBytes: Long,
- taskAttemptId: Long,
- memoryMode: MemoryMode): Long = synchronized {
-
- memoryMode match {
-
-
- case MemoryMode.ON_HEAP => onHeapExecutionMemoryPool.acquireMemory(numBytes, taskAttemptId)
-
-
- case MemoryMode.OFF_HEAP => offHeapExecutionMemoryPool.acquireMemory(numBytes, taskAttemptId)
- }
- }
- }
这个方法的逻辑很简单,根据MemoryMode的种类来决定如何分配Execution区域内存。如果是堆内,即ON_HEAP,则通过onHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配;如果是堆外,即OFF_HEAP,则通过offHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配。
现在再看下UnifiedMemoryManager的acquireExecutionMemory()方法,代码如下:
-
-
-
-
-
-
-
-
-
- override private[memory] def acquireExecutionMemory(
- numBytes: Long,
- taskAttemptId: Long,
- memoryMode: MemoryMode): Long = synchronized {
-
-
- assert(onHeapExecutionMemoryPool.poolSize + storageMemoryPool.poolSize == maxMemory)
- assert(numBytes >= 0)
- memoryMode match {
-
-
- case MemoryMode.ON_HEAP =>
-
-
-
-
-
-
-
-
-
- def maybeGrowExecutionPool(extraMemoryNeeded: Long): Unit = {
-
-
- if (extraMemoryNeeded > 0) {
-
-
-
-
-
-
-
-
-
-
-
-
- val memoryReclaimableFromStorage =
- math.max(storageMemoryPool.memoryFree, storageMemoryPool.poolSize - storageRegionSize)
-
-
- if (memoryReclaimableFromStorage > 0) {
-
-
-
-
- val spaceReclaimed = storageMemoryPool.shrinkPoolToFreeSpace(
- math.min(extraMemoryNeeded, memoryReclaimableFromStorage))
-
-
- onHeapExecutionMemoryPool.incrementPoolSize(spaceReclaimed)
- }
- }
- }
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- def computeMaxExecutionPoolSize(): Long = {
-
- maxMemory - math.min(storageMemoryUsed, storageRegionSize)
- }
-
- onHeapExecutionMemoryPool.acquireMemory(
- numBytes, taskAttemptId, maybeGrowExecutionPool, computeMaxExecutionPoolSize)
-
-
- case MemoryMode.OFF_HEAP =>
-
-
- offHeapExecutionMemoryPool.acquireMemory(numBytes, taskAttemptId)
- }
- }
UnifiedMemoryManager同样也是区别ON_HEAP和OFF_HEAP两种方式来进行Execution区域内存的分配,OFF_HEAP方式和StaticMemoryManager一样,也是通过offHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配,ON_HEAP则要稍微复杂些,它虽然也是和StaticMemoryManager一样通过onHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配,但是它的分配有种特殊情况,即如果the execution pool可用内存不够,即如果需要额外的内存,会尝试从storage区域回收部分内存。此时,可以回收the storage pool中的全部可用内存,如果the storage pool逐渐增大至大于storageRegionSize,即初始化时storage区域的最大内存,我们可以回收部分blocks,并回收storage区域从execution借用的那些内存。
我们来看maybeGrowExecutionPool()方法的代码逻辑:如果需要额外的内存,即Execution预分配的内存已不够使用,首先取storageMemoryPool可用内存、storageMemoryPool总内存减去初始化时内存的较大者memoryReclaimableFromStorage,意思也就是,一定会把storageMemoryPool的可用内存全部借给execution区域,并且如果当前storageMemoryPool大小比初始化时大了,且大的程度比当前可用内存还大,则回收部分内存,然后storageMemoryPool调用shrinkPoolToFreeSpace方法回收并减持部分内存spaceReclaimed,onHeapExecutionMemoryPool增持相应的spaceReclaimed内存,起到了一个动态调整和此消彼长的效果。
最后,我们来看下实际分配内存的storageMemoryPool、onHeapExecutionMemoryPool以及offHeapExecutionMemoryPool。它们的定义在MemoryManager中,代码如下:
- @GuardedBy("this")
- protected val storageMemoryPool = new StorageMemoryPool(this)
- @GuardedBy("this")
- protected val onHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "on-heap execution")
- @GuardedBy("this")
- protected val offHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "off-heap execution")
它们的类型分别是StorageMemoryPool、ExecutionMemoryPool,只不过后两者是用两个名称不同的对象来分别提供内存服务的,二者的名称分别是on-heap execution和off-heap execution。而它们共同继承自抽象类MemoryPool,我们先看下MemoryPool的代码:
-
-
-
-
-
-
-
-
-
- private[memory] abstract class MemoryPool(lock: Object) {
-
- @GuardedBy("lock")
-
- private[this] var _poolSize: Long = 0
-
-
-
-
-
-
- final def poolSize: Long = lock.synchronized {
- _poolSize
- }
-
-
-
-
-
- final def memoryFree: Long = lock.synchronized {
- _poolSize - memoryUsed
- }
-
-
-
-
-
- final def incrementPoolSize(delta: Long): Unit = lock.synchronized {
- require(delta >= 0)
- _poolSize += delta
- }
-
-
-
-
-
- final def decrementPoolSize(delta: Long): Unit = lock.synchronized {
- require(delta >= 0)
- require(delta <= _poolSize)
- require(_poolSize - delta >= memoryUsed)
- _poolSize -= delta
- }
-
-
-
-
-
- def memoryUsed: Long
- }
很简单,一切尽在代码注释中,读者可自行补脑。
下面,我们看下StorageMemoryPool的实现,代码如下:
-
-
-
-
-
-
-
- private[memory] class StorageMemoryPool(lock: Object) extends MemoryPool(lock) with Logging {
-
- @GuardedBy("lock")
-
- private[this] var _memoryUsed: Long = 0L
-
-
-
- override def memoryUsed: Long = lock.synchronized {
- _memoryUsed
- }
-
-
- private var _memoryStore: MemoryStore = _
- def memoryStore: MemoryStore = {
- if (_memoryStore == null) {
- throw new IllegalStateException("memory store not initialized yet")
- }
- _memoryStore
- }
-
-
-
-
-
- final def setMemoryStore(store: MemoryStore): Unit = {
- _memoryStore = store
- }
-
-
-
-
-
-
-
- def acquireMemory(
- blockId: BlockId,
- numBytes: Long,
- evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean =
-
- lock.synchronized {
-
- val numBytesToFree = math.max(0, numBytes - memoryFree)
-
- acquireMemory(blockId, numBytes, numBytesToFree, evictedBlocks)
- }
-
-
-
-
-
-
-
-
-
- def acquireMemory(
- blockId: BlockId,
- numBytesToAcquire: Long,
- numBytesToFree: Long,
- evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean = lock.synchronized {
-
-
- assert(numBytesToAcquire >= 0)
-
-
- assert(numBytesToFree >= 0)
-
-
- assert(memoryUsed <= poolSize)
- if (numBytesToFree > 0) {
-
-
- memoryStore.evictBlocksToFreeSpace(Some(blockId), numBytesToFree, evictedBlocks)
-
- Option(TaskContext.get()).foreach { tc =>
- val metrics = tc.taskMetrics()
- val lastUpdatedBlocks = metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
- metrics.updatedBlocks = Some(lastUpdatedBlocks ++ evictedBlocks.toSeq)
- }
- }
-
-
-
-
-
- val enoughMemory = numBytesToAcquire <= memoryFree
- if (enoughMemory) {
-
- _memoryUsed += numBytesToAcquire
- }
-
-
- enoughMemory
- }
-
-
- def releaseMemory(size: Long): Unit = lock.synchronized {
-
-
- if (size > _memoryUsed) {
- logWarning(s"Attempted to release $size bytes of storage " +
- s"memory when we only have ${_memoryUsed} bytes")
- _memoryUsed = 0
- } else {
-
- _memoryUsed -= size
- }
- }
-
-
- def releaseAllMemory(): Unit = lock.synchronized {
- _memoryUsed = 0
- }
-
-
-
-
-
- def shrinkPoolToFreeSpace(spaceToFree: Long): Long = lock.synchronized {
-
- val spaceFreedByReleasingUnusedMemory = math.min(spaceToFree, memoryFree)
- decrementPoolSize(spaceFreedByReleasingUnusedMemory)
- val remainingSpaceToFree = spaceToFree - spaceFreedByReleasingUnusedMemory
- if (remainingSpaceToFree > 0) {
-
- val evictedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
- memoryStore.evictBlocksToFreeSpace(None, remainingSpaceToFree, evictedBlocks)
- val spaceFreedByEviction = evictedBlocks.map(_._2.memSize).sum
-
-
- decrementPoolSize(spaceFreedByEviction)
- spaceFreedByReleasingUnusedMemory + spaceFreedByEviction
- } else {
- spaceFreedByReleasingUnusedMemory
- }
- }
- }
我们重点看下acquireMemory()的两个方法,第一个是带有blockId、numBytes、evictedBlocks三个参数的,它的逻辑很简单,使用lock的synchronized,解决并发的问题,然后计算需要释放的内存大小numBytesToFree,需要申请的大小减去内存池目前可用内存大小,也就是看内存池中可用内存大小是否能满足申请分配的内存大小,然后调用多一个numBytesToFree参数的同名方法。
带有四个参数的acquireMemory()也很简单,首先需要做一些内存大小的校验,确保内存的申请分配时合理的。校验的内容包含以下三个部分:
1、申请分配的内存numBytesToAcquire必须大于等于0;
2、需要释放的内存numBytesToFree必须大于等于0;
3、已使用内存memoryUsed必须小于等于内存池大小poolSize;
然后,如果需要释放部分内存,即numBytesToFree大于0,则调用MemoryStore的evictBlocksToFreeSpace()方法释放numBytesToFree大小内存,关于MemoryStore的内容我们在后续的存储管理模块再详细介绍,这里先有个概念即可。
最后,判断是否有足够的内存,即申请分配的内存必须小于等于可用内存,如果有足够的内存,已使用内存_memoryUsed增加numBytesToAcquire,并返回ture,否则返回false。
接下来,我们再看下shrinkPoolToFreeSpace()方法,它的主要作用就是试图对storage内存池收缩 spaceToFree字节大小,返回实际收缩的大小值。处理逻辑如下:
1、取试图收缩大小spaceToFree和可用内存memoryFree的较小者,即如果试图收缩的spaceToFree大于可用内存大小,那么最大也就是收缩可用内存大小memoryFree;
2、内存池做相应的减少,减少的大小为上面的spaceFreedByReleasingUnusedMemory;
3、计算预设定收缩大小中未完成的部分remainingSpaceToFree,即spaceToFree - spaceFreedByReleasingUnusedMemory;
4、如果未完成部分大于0:
4.1、利用MemoryStore调用evictBlocksToFreeSpace,放弃部分块来增加内存可用空间;
4.2、取得放弃块后可用内存增加的大小spaceFreedByEviction;
4.3、内存池做相应的减少spaceFreedByEviction;
4.4、返回收缩的实际大小,即spaceFreedByReleasingUnusedMemory + spaceFreedByEviction;
5、返回spaceFreedByReleasingUnusedMemory。
相对应的,我们现在来看下ExecutionMemoryPool的实现,代码如下:
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- private[memory] class ExecutionMemoryPool(
- lock: Object,
- poolName: String
- ) extends MemoryPool(lock) with Logging {
-
-
-
-
-
- @GuardedBy("lock")
- private val memoryForTask = new mutable.HashMap[Long, Long]()
-
-
- override def memoryUsed: Long = lock.synchronized {
- memoryForTask.values.sum
- }
-
-
-
-
-
- def getMemoryUsageForTask(taskAttemptId: Long): Long = lock.synchronized {
- memoryForTask.getOrElse(taskAttemptId, 0L)
- }
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- private[memory] def acquireMemory(
- numBytes: Long,
- taskAttemptId: Long,
- maybeGrowPool: Long => Unit = (additionalSpaceNeeded: Long) => Unit,
- computeMaxPoolSize: () => Long = () => poolSize): Long = lock.synchronized {
-
-
- assert(numBytes > 0, s"invalid number of bytes requested: $numBytes")
-
-
-
-
-
-
-
- if (!memoryForTask.contains(taskAttemptId)) {
- memoryForTask(taskAttemptId) = 0L
-
- lock.notifyAll()
- }
-
-
-
-
-
- while (true) {
-
-
- val numActiveTasks = memoryForTask.keys.size
-
-
- val curMem = memoryForTask(taskAttemptId)
-
-
-
-
-
-
- maybeGrowPool(numBytes - memoryFree)
-
-
-
-
-
-
-
-
- val maxPoolSize = computeMaxPoolSize()
-
-
- val maxMemoryPerTask = maxPoolSize / numActiveTasks
-
-
- val minMemoryPerTask = poolSize / (2 * numActiveTasks)
-
-
-
-
- val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - curMem))
-
-
- val toGrant = math.min(maxToGrant, memoryFree)
-
-
-
-
- if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {
-
-
- logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName pool to be free")
-
- lock.wait()
- } else {
-
- memoryForTask(taskAttemptId) += toGrant
-
-
- return toGrant
- }
- }
- 0L
- }
-
-
-
-
-
- def releaseMemory(numBytes: Long, taskAttemptId: Long): Unit = lock.synchronized {
-
-
- val curMem = memoryForTask.getOrElse(taskAttemptId, 0L)
-
- var memoryToFree = if (curMem < numBytes) {
-
- logWarning(
- s"Internal error: release called on $numBytes bytes but task only has $curMem bytes " +
- s"of memory from the $poolName pool")
-
- curMem
- } else {
-
- numBytes
- }
-
- if (memoryForTask.contains(taskAttemptId)) {
-
- memoryForTask(taskAttemptId) -= memoryToFree
-
-
- if (memoryForTask(taskAttemptId) <= 0) {
- memoryForTask.remove(taskAttemptId)
- }
- }
-
-
- lock.notifyAll()
- }
-
-
-
-
-
-
- def releaseAllMemoryForTask(taskAttemptId: Long): Long = lock.synchronized {
-
-
- val numBytesToFree = getMemoryUsageForTask(taskAttemptId)
-
-
- releaseMemory(numBytesToFree, taskAttemptId)
-
-
- numBytesToFree
- }
-
- }
其中,有一个非常重要的数据结构memoryForTask,保存的是taskAttemptId到内存耗费的映射。
我们还是重点看下acquireMemory()方法,主要逻辑如下:
1、校验,确保申请内存numBytes的大小必须大于0;
2、如果memoryForTask中不包含该Task,加入该Task,初始化为0,并唤醒其它等待的对象;
3、在一个循环体中:
3.1、获取当前活跃Task的数目numActiveTasks;
3.2、获取该Task对应的当前已耗费内存curMem;
3.3、maybeGrowPool为传进来的UnifiedMemoryManager的maybeGrowExecutionPool()方法,其通过收回缓存的块扩充the execution pool,从而减少the storage pool;
3.4、计算内存池的最大大小maxPoolSize;
3.5、平均每个Task分配的最大内存大小maxMemoryPerTask;
3.6、平均每个Task分配的最小内存大小minMemoryPerTask,为maxMemoryPerTask的一半;
3.7、计算我们可以赋予该Task的最大大小maxToGrant,取numBytes和(maxMemoryPerTask - curMem与0较大者)中的较小者,也就是,如果当前已耗费内存大于maxMemoryPerTask,则为0,不再分配啦,否则取还可以分配的内存和申请分配的内存中的较小者;
3.8、计算实际可以分配的最大大小toGrant,取maxToGrant和memoryFree中的较小者;
3.9、如果实际分配的内存大小toGrant小于申请分配的内存大小numBytes,且当前已耗费内存加上马上就要分配的内存,小于Task需要的最小内存,记录日志信息,lock等待,即MemoryManager等待;否则memoryForTask中对应Task的已耗费内存增加toGrant,返回申请的内存大小toGrant,跳出循环。