Anonymous_cx

Spark内存管理模型

Spark是现在很流行的一个基于内存的分布式计算框架，既然是基于内存，那么自然而然的，内存的管理就是Spark存储管理的重中之重了。那么，Spark究竟采用什么样的内存管理模型呢？本文就为大家揭开Spark内存管理模型的神秘面纱。

我们在《Spark源码分析之七：Task运行（一）》一文中曾经提到过，在Task被传递到Executor上去执行时，在为其分配的TaskRunner线程的run()方法内，在Task真正运行之前，我们就要构造一个任务内存管理器TaskMemoryManager，然后在反序列化Task对象的二进制数据得到Task对象后，需要将这个内存管理器TaskMemoryManager设置为Task的成员变量。那么，究竟TaskMemoryManager是如何被创建的呢？我们先看下TaskRunner线程的run()方法中涉及到的代码：

[java]  view plain 
      copy 
     
 
 // 获取任务内存管理器  
       val taskMemoryManager = new TaskMemoryManager(env.memoryManager, taskId)  

taskId好说，它就是Task的唯一标识ID，那么env.memoryManager呢，我们来看下SparkEnv的相关代码，如下所示：

[java]  view plain 
      copy 
     
 
 // 根据参数spark.memory.useLegacyMode确定使用哪种内存管理模型  
     val useLegacyMemoryManager = conf.getBoolean("spark.memory.useLegacyMode", false)  
       
     val memoryManager: MemoryManager =  
       if (useLegacyMemoryManager) {// 如果还是采用之前的方式，则使用StaticMemoryManager内存管理模型，即静态内存管理模型  
         new StaticMemoryManager(conf, numUsableCores)  
       } else {// 否则，使用最新的UnifiedMemoryManager内存管理模型，即统一内存管理模型  
         UnifiedMemoryManager(conf, numUsableCores)  
       }  

SparkEnv在构造过程中，会根据参数spark.memory.useLegacyMode来确定是否使用之前的内存管理模型，默认不采用之前的。如果是采用之前的，则memoryManager被实例化为一个StaticMemoryManager对象，否则采用新的内存管理模型，memoryManager被实例化为一个UnifiedMemoryManager内存对象。

我们知道，英文单词static代表静态、不变的意思，unified代表统一的意思。从字面意思来看，StaticMemoryManager表示是静态的内存管理器，何谓静态，就是按照某种算法确定内存的分配后，其整体分布不会随便改变，而UnifiedMemoryManager代表的是统一的内存管理器，统一么，是不是有共享和变动的意思。那么我们不妨大胆猜测下，StaticMemoryManager这种内存管理模型是在内存分配之初，即确定各区域内存的大小，并在Task运行过程中保持不变，而UnifiedMemoryManager则会根据Task运行过程中各区域数据对内存需要的程序进行动态调整。到底是不是这样呢？只有看过源码才能知晓。首先，我们看下StaticMemoryManager，通过SparkEnv中对其初始化的语句我们知道，它的初始化调用的是StaticMemoryManager的带有参数SparkConf类型的conf、Int类型的numCores的构造函数，代码如下：

[java]  view plain 
      copy 
     
 
 def this(conf: SparkConf, numCores: Int) {  
     // 调用最底层的构造方法  
     this(  
       conf,  
       // Execution区域（即运行区域，为shuffle使用）分配的可用内存总大小  
       StaticMemoryManager.getMaxExecutionMemory(conf),  
         
       // storage区域（即存储区域）分配的可用内存总大小  
       StaticMemoryManager.getMaxStorageMemory(conf),  
       numCores)  
   }  

其中，在调用最底层的构造方法之前，调用了伴生对象StaticMemoryManager的两个方法，分别是获取Execution区域（即运行区域，为shuffle使用）分配的可用内存总大小的getMaxExecutionMemory()和获取storage区域（即存储区域）分配的可用内存总大小的getMaxStorageMemory()方法。

何为Storage区域？何为Execution区域？这里，我们简单解释下，Storage就是存储的意思，它存储的是Task的运行结果等数据，当然前提是其运行结果比较小，足以在内存中盛放。那么Execution呢？执行的意思，通过参数中的shuffle，我猜测它实际上是shuffle过程中需要使用的内存数据（因为还未分析shuffle，这里只是猜测，猜错勿怪，还望读者指正）。

我们接着看下这两个方法，代码如下：

[java]  view plain 
      copy 
     
 
 /** 
    * Return the total amount of memory available for the storage region, in bytes. 
    * 返回为storage区域（即存储区域）分配的可用内存总大小，单位为bytes 
    */  
   private def getMaxStorageMemory(conf: SparkConf): Long = {  
       
     // 系统可用最大内存，取参数spark.testing.memory，未配置的话取运行时环境中的最大内存  
     val systemMaxMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)  
       
     // 取storage区域（即存储区域）在总内存中所占比重，由参数spark.storage.memoryFraction确定，默认为0.6  
     val memoryFraction = conf.getDouble("spark.storage.memoryFraction", 0.6)  
       
     // 取storage区域（即存储区域）在系统为其可分配最大内存的安全系数，主要为了防止OOM，取参数spark.storage.safetyFraction，默认为0.9  
     val safetyFraction = conf.getDouble("spark.storage.safetyFraction", 0.9)  
       
     // 返回storage区域（即存储区域）分配的可用内存总大小，计算公式：系统可用最大内存 * 在系统可用最大内存中所占比重 * 安全系数  
     (systemMaxMemory * memoryFraction * safetyFraction).toLong  
   }  
   
   /** 
    * Return the total amount of memory available for the execution region, in bytes. 
    * 返回为Execution区域（即运行区域，为shuffle使用）分配的可用内存总大小，单位为bytes 
    */  
   private def getMaxExecutionMemory(conf: SparkConf): Long = {  
     
     // 系统可用最大内存，取参数spark.testing.memory，未配置的话取运行时环境中的最大内存  
     val systemMaxMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)  
       
     // 取Execution区域（即运行区域，为shuffle使用）在总内存中所占比重，由参数spark.shuffle.memoryFraction确定，默认为0.2  
     val memoryFraction = conf.getDouble("spark.shuffle.memoryFraction", 0.2)  
       
     // 取Execution区域（即运行区域，为shuffle使用）在系统为其可分配最大内存的安全系数，主要为了防止OOM，取参数spark.shuffle.safetyFraction，默认为0.8  
     val safetyFraction = conf.getDouble("spark.shuffle.safetyFraction", 0.8)  
       
     // 返回为Execution区域（即运行区域，为shuffle使用）分配的可用内存总大小，计算公式：系统可用最大内存 * 在系统可用最大内存中所占比重 * 安全系数  
     (systemMaxMemory * memoryFraction * safetyFraction).toLong  
   }  

通过上述代码，我们可以看到，获取两个内存区域大小的方法是及其相似的，我们就以getMaxStorageMemory()方法为例，来详细说明。

首先，需要获得系统可用最大内存systemMaxMemory，取参数spark.testing.memory，未配置的话取运行时环境中的最大内存；

然后，需要获取取storage区域（即存储区域）在总内存中所占比重memoryFraction，由参数spark.storage.memoryFraction确定，默认为0.6；

接着，需要获取storage区域（即存储区域）在系统为其可分配最大内存的安全系数safetyFraction，主要为了防止OOM，取参数spark.storage.safetyFraction，默认为0.9；

最后，利用公式systemMaxMemory * memoryFraction * safetyFraction来计算出storage区域（即存储区域）分配的可用内存总大小。

前面几步都好说，默认情况下，storage区域（即存储区域）分配的可用内存总大小占系统可用内存大小的60%，那么最后为什么需要一个安全系数safetyFraction呢？设身处地的想一下，系统一开始分配它最大可用内存的60%给你，你上来就一下用完了，那么再有内存需求呢？是不是此时很容易就发生OOM呢？安全系统正式基于这个原因才设定的，也就是说，默认情况下，刚开始storage区域（即存储区域）分配的可用内存总大小占系统可用内存大小的54%，而不是60%。

getMaxExecutionMemory()方法与getMaxStorageMemory()方法处理逻辑一样，只不过取得参数不同罢了，默认情况下占系统可用最大内存的20%，而安全系数则是80%，故默认情况下，刚开始Execution区域（即运行区域，为shuffle使用）分配的可用内存总大小占系统可用内存大小的16%。

等等，好像少点什么？60%+20%=80%，不是100%！这是为什么呢？很简单，程序或者系统本身的运行，也是需要消耗内存的嘛！
而StaticMemoryManager最底层的构造方法，也就是scala语言语法中类定义的部分，则是：

[java]  view plain 
      copy 
     
 
 private[spark] class StaticMemoryManager(  
     conf: SparkConf,  
     maxOnHeapExecutionMemory: Long,  
     override val maxStorageMemory: Long,  
     numCores: Int)  
   extends MemoryManager(  
     conf,  
     numCores,  
     maxStorageMemory,  
     maxOnHeapExecutionMemory) {  

我们看到，StaticMemoryManager静态内存管理器则持有了参数SparkConf类型的conf、Execution内存大小maxOnHeapExecutionMemory、Storage内存大小maxStorageMemory、CPU核数numCores等成员变量。

至此，StaticMemoryManager对象就初始化完毕。现在我们总结一下静态内存管理模型的特点，这种模型最大的一个缺点就是每种区域不能超过参数为其配置的最大值，即便是一种区域的内存很繁忙，而另外一种很空闲，也不能超过上限占用更多的内存，即使是总数未超过规定的阈值。那么，随之而来的一种解决方案便是UnifiedMemoryManager，统一的内存管理模型。

接下来，我们再看下UnifiedMemoryManager，即统一内存管理器。在SparkEnv中，它是通过如下方式完成初始化的：

[java]  view plain 
      copy 
     
 
 UnifiedMemoryManager(conf, numUsableCores)  

读者这里可能有疑问了，为什么没有new关键字呢？这正是scala语言的特点。它其实是通过UnifiedMemoryManager类的apply()方法完成初始化的。代码如下：

[java]  view plain 
      copy 
     
 
 def apply(conf: SparkConf, numCores: Int): UnifiedMemoryManager = {  
       
     // 获得execution和storage区域共享的最大内存  
     val maxMemory = getMaxMemory(conf)  
       
     // 构造UnifiedMemoryManager对象，  
     new UnifiedMemoryManager(  
       conf,  
       maxMemory = maxMemory,  
       // storage区域内存大小初始为execution和storage区域共享的最大内存的spark.memory.storageFraction，默认为0.5，即一半  
       storageRegionSize =  
         (maxMemory * conf.getDouble("spark.memory.storageFraction", 0.5)).toLong,  
       numCores = numCores)  
   }  

首先，需要获得execution和storage区域共享的最大内存maxMemory；

然后，构造UnifiedMemoryManager对象，而storage区域内存大小storageRegionSize则初始化为execution和storage区域共享的最大内存maxMemory的spark.memory.storageFraction，默认为0.5，即一半。

下面，我们主要看下获得execution和storage区域共享的最大内存的getMaxMemory()方法。代码如下：

[java]  view plain 
      copy 
     
 
 /** 
    * Return the total amount of memory shared between execution and storage, in bytes. 
    * 返回execution和storage区域共享的最大内存 
    */  
   private def getMaxMemory(conf: SparkConf): Long = {  
     
     // 获取系统可用最大内存systemMemory，取参数spark.testing.memory，未配置的话取运行时环境中的最大内存  
     val systemMemory = conf.getLong("spark.testing.memory", Runtime.getRuntime.maxMemory)  
       
     // 获取预留内存reservedMemory，取参数spark.testing.reservedMemory，  
     // 未配置的话，根据参数spark.testing来确定默认值，参数spark.testing存在的话，默认为0，否则默认为300M  
     val reservedMemory = conf.getLong("spark.testing.reservedMemory",  
       if (conf.contains("spark.testing")) 0 else RESERVED_SYSTEM_MEMORY_BYTES)  
       
     // 取最小的系统内存minSystemMemory，为预留内存reservedMemory的1.5倍  
     val minSystemMemory = reservedMemory * 1.5  
       
     // 如果系统可用最大内存systemMemory小于最小的系统内存minSystemMemory，即预留内存reservedMemory的1.5倍的话，抛出异常  
     // 提醒用户调大JVM堆大小  
     if (systemMemory < minSystemMemory) {  
       throw new IllegalArgumentException(s"System memory $systemMemory must " +  
         s"be at least $minSystemMemory. Please use a larger heap size.")  
     }  
       
     // 计算可用内存usableMemory，即系统最大可用内存systemMemory减去预留内存reservedMemory  
     val usableMemory = systemMemory - reservedMemory  
       
     // 取可用内存所占比重，即参数spark.memory.fraction，默认为0.75  
     val memoryFraction = conf.getDouble("spark.memory.fraction", 0.75)  
       
     // 返回的execution和storage区域共享的最大内存为usableMemory * memoryFraction  
     (usableMemory * memoryFraction).toLong  
   }  

处理流程大体如下：

1、获取系统可用最大内存systemMemory，取参数spark.testing.memory，未配置的话取运行时环境中的最大内存；

2、获取预留内存reservedMemory，取参数spark.testing.reservedMemory，未配置的话，根据参数spark.testing来确定默认值，参数spark.testing存在的话，默认为0，否则默认为300M；

3、取最小的系统内存minSystemMemory，为预留内存reservedMemory的1.5倍；

4、如果系统可用最大内存systemMemory小于最小的系统内存minSystemMemory，即预留内存reservedMemory的1.5倍的话，抛出异常，提醒用户调大JVM堆大小；

5、计算可用内存usableMemory，即系统最大可用内存systemMemory减去预留内存reservedMemory；

6、取可用内存所占比重，即参数spark.memory.fraction，默认为0.75；

7、返回的execution和storage区域共享的最大内存为usableMemory * memoryFraction。

也就是说，UnifiedMemoryManager统一内存存储管理策略中，默认情况下，storage区域和execution区域默认都占其共享内存区域的一半，而execution和storage区域共享的最大内存为系统最大可用内存systemMemory减去预留内存reservedMemory后的75%。至于在哪里体现的动态调整，则要到真正申请内存时再体现了。

好了，UnifiedMemoryManager统一内存存储管理器的初始化也讲完了。那么，接下来的问题则是，何时以及如何进行内存的申请及分配？针对storage和execution，我们一个个的看。

首先看看storage，顾名思义，storage是存储的意思，也就是说是在Task运行完成出结果后，对结果的存储区域。我们回顾下博文《Spark源码分析之七：Task运行（一）》中所讲的Task运行完成后对Task运行结果的处理，如果 Task运行结果大小超过Akka除去需要保留的字节外最大大小，则将结果写入BlockManager，那么是如何写入的呢？代码如下：

[java]  view plain 
      copy 
     
 
 env.blockManager.putBytes(  
               blockId, serializedDirectResult, StorageLevel.MEMORY_AND_DISK_SER)  

调用的是BlockManager的putBytes()方法，很显然，写入的是二进制Bytes数据，且使用的存储策略是MEMORY_AND_DISK_SER。我们先看下这个方法：

[java]  view plain 
      copy 
     
 
 /** 
    * Put a new block of serialized bytes to the block manager. 
    * Return a list of blocks updated as a result of this put. 
    */  
   def putBytes(  
       blockId: BlockId,  
       bytes: ByteBuffer,  
       level: StorageLevel,  
       tellMaster: Boolean = true,  
       effectiveStorageLevel: Option[StorageLevel] = None): Seq[(BlockId, BlockStatus)] = {  
     require(bytes != null, "Bytes is null")  
     doPut(blockId, ByteBufferValues(bytes), level, tellMaster, effectiveStorageLevel)  
   }  

调用的是doPut()方法，传入的是ByteBufferValues类型的数据，而doPut()方法中，则有如下关键代码：

[java]  view plain 
      copy 
     
 
 // Actually put the values  
         val result = data match {  
           case IteratorValues(iterator) =>  
             blockStore.putIterator(blockId, iterator, putLevel, returnValues)  
           case ArrayValues(array) =>  
             blockStore.putArray(blockId, array, putLevel, returnValues)  
           case ByteBufferValues(bytes) =>  
             bytes.rewind()  
             blockStore.putBytes(blockId, bytes, putLevel)  
         }  

上面提到过，传入的是ByteBufferValues类型的数据，那么这里调用的就应该是BlockStore的putBytes()方法。而BlockStore是一个抽象类，有硬盘DiskStore、外部块ExternalBlockStore、内存MemoryStore三种实现形式，这里既然讲的是内存管理模型，我们当然要看其内存实现形式MemoryStore了。而putBytes()方法中，不管level.deserialized是true还是false，最终还是调用的tryToPut()方法，该方法中，对内存的处理为：

[java]  view plain 
      copy 
     
 
 val enoughMemory = memoryManager.acquireStorageMemory(blockId, size, droppedBlocks)  
       if (enoughMemory) {  
         // We acquired enough memory for the block, so go ahead and put it  
         val entry = new MemoryEntry(value(), size, deserialized)  
         entries.synchronized {  
           entries.put(blockId, entry)  
         }  
         val valuesOrBytes = if (deserialized) "values" else "bytes"  
         logInfo("Block %s stored as %s in memory (estimated size %s, free %s)".format(  
           blockId, valuesOrBytes, Utils.bytesToString(size), Utils.bytesToString(blocksMemoryUsed)))  
       } else {  
         // Tell the block manager that we couldn't put it in memory so that it can drop it to  
         // disk if the block allows disk storage.  
         lazy val data = if (deserialized) {  
           Left(value().asInstanceOf[Array[Any]])  
         } else {  
           Right(value().asInstanceOf[ByteBuffer].duplicate())  
         }  
         val droppedBlockStatus = blockManager.dropFromMemory(blockId, () => data)  
         droppedBlockStatus.foreach { status => droppedBlocks += ((blockId, status)) }  
       }  

由上面我们可以得知，是通过memoryManager的acquireStorageMemory()方法来查看是否存在足够内存的。我们就先看下StaticMemoryManager的acquireStorageMemory()方法，定义如下：

[java]  view plain 
      copy 
     
 
 override def acquireStorageMemory(  
       blockId: BlockId,  
       numBytes: Long,  
       evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean = synchronized {  
       
     if (numBytes > maxStorageMemory) {// 如果需要的大小numBytes超过Storage区域内存的上限，直接返回false，说明内存不够  
       // Fail fast if the block simply won't fit  
       logInfo(s"Will not store $blockId as the required space ($numBytes bytes) exceeds our " +  
         s"memory limit ($maxStorageMemory bytes)")  
       false  
     } else {// 否则，调用storageMemoryPool的acquireMemory()方法，申请内存  
       storageMemoryPool.acquireMemory(blockId, numBytes, evictedBlocks)  
     }  
   }  

对了，就是这么简单。如果需要申请的内存超过Storage区域内存最大值的上限，则表明没有足够的内存进行存储，否则，调用storageMemoryPool的acquireMemory()方法分配内存，正是这里体现了static一词。至于具体分配内存的storageMemoryPool，我们放到最后和Execution区域时的onHeapExecutionMemoryPool、offHeapExecutionMemoryPool一起讲，这里先了解下它的概念即可，它实际上是对应某种区域的内存池，是对内存总大小、可用内存、已用内存等内存使用情况的一种记账的专用对象。

我们再看下UnifiedMemoryManager，其acquireStorageMemory()方法如下：

[java]  view plain 
      copy 
     
 
 override def acquireStorageMemory(  
       blockId: BlockId,  
       numBytes: Long,  
       evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean = synchronized {  
      
     assert(onHeapExecutionMemoryPool.poolSize + storageMemoryPool.poolSize == maxMemory)  
     assert(numBytes >= 0)  
       
     // 如果需要申请的内存大小超过maxStorageMemory，即execution和storage区域共享的最大内存减去Execution已用内存，快速返回，  
     // 这里是将execution和storage区域一起考虑的  
     if (numBytes > maxStorageMemory) {  
       // Fail fast if the block simply won't fit  
       logInfo(s"Will not store $blockId as the required space ($numBytes bytes) exceeds our " +  
         s"memory limit ($maxStorageMemory bytes)")  
       return false  
     }  
       
     // 如果需要申请的内存大小超过预分配storage区域中可用大小memoryFree  
     if (numBytes > storageMemoryPool.memoryFree) {  
       // There is not enough free memory in the storage pool, so try to borrow free memory from  
       // the execution pool.  
         
       // 从Execution区域借调的内存大小，为需要申请内存大小和预分配的Execution区域可用大小memoryFree的较小者  
       val memoryBorrowedFromExecution = Math.min(onHeapExecutionMemoryPool.memoryFree, numBytes)  
         
       // Execution区域减小相应的值  
       onHeapExecutionMemoryPool.decrementPoolSize(memoryBorrowedFromExecution)  
         
       // Storage区域增大相应的值  
       storageMemoryPool.incrementPoolSize(memoryBorrowedFromExecution)  
     }  
       
     // 通过storageMemoryPool完成内存分配  
     storageMemoryPool.acquireMemory(blockId, numBytes, evictedBlocks)  
   }  

首先，我们需要先了解下maxStorageMemory，这个和StaticMemoryManager中不一样，后者为按照比例和安全系数预分配的固定不变的大小，而这里则是通过如下方式定义的：

[java]  view plain 
      copy 
     
 
 // maxStorageMemory为execution和storage区域共享的最大内存减去Execution已用内存  
   override def maxStorageMemory: Long = synchronized {  
     maxMemory - onHeapExecutionMemoryPool.memoryUsed  
   }  

这个maxStorageMemory为execution和storage区域共享的最大内存减去Execution已用内存。好了，继续分析吧！

首先，如果需要申请的内存大小超过maxStorageMemory，即execution和storage区域共享的最大内存减去Execution已用内存，快速返回false，表示内存不充足不可用，这里是将execution和storage区域一起考虑的；

然后，如果需要申请的内存大小超过预分配storage区域中可用大小memoryFree，计算可以从从Execution区域借调的内存大小，该大小为需要申请内存大小和预分配的Execution区域可用大小memoryFree的较小者，然后Execution区域减小相应的值，Storage区域增大相应的值，完成动态调整；

最后，通过storageMemoryPool完成内存分配。

至此，StaticMemoryManager和UnifiedMemoryManager中，storage区域内存何时申请及如何分配我们已经讲完了。

接下来看看Execution区域。它申请内存的触发时机是在何时呢？之前，我们已经提到过它是被shuffle使用的，对于shuffle的详细细节，在这里，读者大可不必深究，我们会在专门的shuffle模块中进行讲解。通过代码追溯，我们可以大体了解到，它在ShuffleExternalSorter的insertRecord()方法，而ShuffleExternalSorter是一个专业的外部分类器，负责将传入的记录追加到数据页中。当所有的记录被插入，或者当前线程的shuffle内存已达上限时，内存中的记录就会通过它们的分区ID进行排序。

我们来看下它的insertRecord()方法，该方法负责将记录Record插入到数据页中，代码如下：

[java]  view plain 
      copy 
     
 
 /** 
    * Write a record to the shuffle sorter. 
    */  
   public void insertRecord(Object recordBase, long recordOffset, int length, int partitionId)  
     throws IOException {  
   
     // for tests  
     assert(inMemSorter != null);  
     if (inMemSorter.numRecords() > numElementsForSpillThreshold) {  
       spill();  
     }  
   
     growPointerArrayIfNecessary();  
     // Need 4 bytes to store the record length.  
     final int required = length + 4;  
     acquireNewPageIfNecessary(required);  
   
     assert(currentPage != null);  
     final Object base = currentPage.getBaseObject();  
     final long recordAddress = taskMemoryManager.encodePageNumberAndOffset(currentPage, pageCursor);  
     Platform.putInt(base, pageCursor, length);  
     pageCursor += 4;  
     Platform.copyMemory(recordBase, recordOffset, base, pageCursor, length);  
     pageCursor += length;  
     inMemSorter.insertRecord(recordAddress, partitionId);  
   }  

可以看到，在方法处理逻辑中，会调用acquireNewPageIfNecessary()方法，该方法的作用就是为了插入一条额外的记录，在必要的情况下申请更多的内存。它的实现如下：

[java]  view plain 
      copy 
     
 
 private void acquireNewPageIfNecessary(int required) {  
     if (currentPage == null ||  
       pageCursor + required > currentPage.getBaseOffset() + currentPage.size() ) {  
       // TODO: try to find space in previous pages  
       currentPage = allocatePage(required);  
       pageCursor = currentPage.getBaseOffset();  
       allocatedPages.add(currentPage);  
     }  
   }  

它会调用allocatePage()方法，继续追踪，在其父类MemoryConsumer中，代码如下：

[java]  view plain 
      copy 
     
 
 /** 
    * Allocate a memory block with at least `required` bytes. 
    * 
    * Throws IOException if there is not enough memory. 
    * 
    * @throws OutOfMemoryError 
    */  
   protected MemoryBlock allocatePage(long required) {  
     MemoryBlock page = taskMemoryManager.allocatePage(Math.max(pageSize, required), this);  
     if (page == null || page.size() < required) {  
       long got = 0;  
       if (page != null) {  
         got = page.size();  
         taskMemoryManager.freePage(page, this);  
       }  
       taskMemoryManager.showMemoryUsage();  
       throw new OutOfMemoryError("Unable to acquire " + required + " bytes of memory, got " + got);  
     }  
     used += page.size();  
     return page;  
   }  

继而会调用TaskMemoryManager的allocatePage()方法，我们继续看TaskMemoryManager的allocatePage()方法，发现它会调用acquireExecutionMemory()方法，而acquireExecutionMemory()方法则会调用MemoryManager的同名方法，于是，Execution区域内存分配最终就落在了MemoryManager的acquireExecutionMemory()方法上了。

仿照上面的storage区域的分析，我们还是分Static和Unified两种方式来讲解。先看Static，其acquireExecutionMemory()方法实现如下：

[java]  view plain 
      copy 
     
 
 private[memory]  
   override def acquireExecutionMemory(  
       numBytes: Long,  
       taskAttemptId: Long,  
       memoryMode: MemoryMode): Long = synchronized {  
     // 根据MemoryMode的种类决定如何分配内存  
     memoryMode match {  
         
       // 如果是堆内，即ON_HEAP，则通过onHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配  
       case MemoryMode.ON_HEAP => onHeapExecutionMemoryPool.acquireMemory(numBytes, taskAttemptId)  
         
       // 如果是堆外，即OFF_HEAP，则通过offHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配  
       case MemoryMode.OFF_HEAP => offHeapExecutionMemoryPool.acquireMemory(numBytes, taskAttemptId)  
     }  
   }  
 }  

这个方法的逻辑很简单，根据MemoryMode的种类来决定如何分配Execution区域内存。如果是堆内，即ON_HEAP，则通过onHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配；如果是堆外，即OFF_HEAP，则通过offHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配。

现在再看下UnifiedMemoryManager的acquireExecutionMemory()方法，代码如下：

[java]  view plain 
      copy 
     
 
 /** 
    * Try to acquire up to `numBytes` of execution memory for the current task and return the 
    * number of bytes obtained, or 0 if none can be allocated. 
    * 
    * This call may block until there is enough free memory in some situations, to make sure each 
    * task has a chance to ramp up to at least 1 / 2N of the total memory pool (where N is the # of 
    * active tasks) before it is forced to spill. This can happen if the number of tasks increase 
    * but an older task had a lot of memory already. 
    */  
   override private[memory] def acquireExecutionMemory(  
       numBytes: Long,  
       taskAttemptId: Long,  
       memoryMode: MemoryMode): Long = synchronized {  
       
     // 确保onHeapExecutionMemoryPool和storageMemoryPool大小之和等于二者共享内存区域maxMemory大小  
     assert(onHeapExecutionMemoryPool.poolSize + storageMemoryPool.poolSize == maxMemory)  
     assert(numBytes >= 0)  
     memoryMode match {  
       
       // 如果是堆内，即ON_HEAP，则通过onHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配  
       case MemoryMode.ON_HEAP =>  
   
         /** 
          * 通过收回缓存的块扩充the execution pool，从而减少the storage pool。 
          * Grow the execution pool by evicting cached blocks, thereby shrinking the storage pool. 
          * 
          * When acquiring memory for a task, the execution pool may need to make multiple 
          * attempts. Each attempt must be able to evict storage in case another task jumps in 
          * and caches a large block between the attempts. This is called once per attempt. 
          */  
         def maybeGrowExecutionPool(extraMemoryNeeded: Long): Unit = {  
             
           // 如果需要额外的内存，即Execution预分配的内存已不够使用  
           if (extraMemoryNeeded > 0) {  
             // There is not enough free memory in the execution pool, so try to reclaim memory from  
             // storage. We can reclaim any free memory from the storage pool. If the storage pool  
             // has grown to become larger than `storageRegionSize`, we can evict blocks and reclaim  
             // the memory that storage has borrowed from execution.  
               
             // 此时，在the execution pool中已不存在足够的可用内存，所以我们尝试从storage区域回收部分内存。  
             // 我们可以回收the storage pool中的全部可用内存。  
             // 如果the storage pool逐渐增大至大于storageRegionSize，即初始化时storage区域的最大内存，  
             // 我们可以回收部分blocks，并回收storage区域从execution借用的那些内存。  
               
             // 首先取storageMemoryPool可用内存、storageMemoryPool总内存减去初始化时内存的较大者memoryReclaimableFromStorage  
             // 意思也就是，一定会把storageMemoryPool的可用内存全部借给execution区域，并且如果当前storageMemoryPool大小比初始化时大了，且大的程度比当前可用内存还大，则回收部分内存  
             val memoryReclaimableFromStorage =  
               math.max(storageMemoryPool.memoryFree, storageMemoryPool.poolSize - storageRegionSize)  
               
             //   
             if (memoryReclaimableFromStorage > 0) {  
               // Only reclaim as much space as is necessary and available:  
               // 仅仅回收可用及必需的内存  
                 
               // storageMemoryPool调用shrinkPoolToFreeSpace方法回收并减持部分内存spaceReclaimed  
               val spaceReclaimed = storageMemoryPool.shrinkPoolToFreeSpace(  
                 math.min(extraMemoryNeeded, memoryReclaimableFromStorage))  
                 
               // onHeapExecutionMemoryPool增持相应的onHeapExecutionMemoryPool内存  
               onHeapExecutionMemoryPool.incrementPoolSize(spaceReclaimed)  
             }  
           }  
         }  
   
         /** 
          * The size the execution pool would have after evicting storage memory. 
          * 
          * The execution memory pool divides this quantity among the active tasks evenly to cap 
          * the execution memory allocation for each task. It is important to keep this greater 
          * than the execution pool size, which doesn't take into account potential memory that 
          * could be freed by evicting storage. Otherwise we may hit SPARK-12155. 
          * 
          * Additionally, this quantity should be kept below `maxMemory` to arbitrate fairness 
          * in execution memory allocation across tasks, Otherwise, a task may occupy more than 
          * its fair share of execution memory, mistakenly thinking that other tasks can acquire 
          * the portion of storage memory that cannot be evicted. 
          */  
            
         // 计算ExecutionPool的最大大小  
         def computeMaxExecutionPoolSize(): Long = {  
           // storage区域和Execution区域二者共享内存减去storage区域已使用内存和storage区域初始化大小  
           maxMemory - math.min(storageMemoryUsed, storageRegionSize)  
         }  
   
         onHeapExecutionMemoryPool.acquireMemory(  
           numBytes, taskAttemptId, maybeGrowExecutionPool, computeMaxExecutionPoolSize)  
   
       // 如果是堆外，即OFF_HEAP，则通过offHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配  
       case MemoryMode.OFF_HEAP =>  
         // For now, we only support on-heap caching of data, so we do not need to interact with  
         // the storage pool when allocating off-heap memory. This will change in the future, though.  
         offHeapExecutionMemoryPool.acquireMemory(numBytes, taskAttemptId)  
     }  
   }  

UnifiedMemoryManager同样也是区别ON_HEAP和OFF_HEAP两种方式来进行Execution区域内存的分配，OFF_HEAP方式和StaticMemoryManager一样，也是通过offHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配，ON_HEAP则要稍微复杂些，它虽然也是和StaticMemoryManager一样通过onHeapExecutionMemoryPool的acquireMemory对Task进行Execution区域内存分配，但是它的分配有种特殊情况，即如果the execution pool可用内存不够，即如果需要额外的内存，会尝试从storage区域回收部分内存。此时，可以回收the storage pool中的全部可用内存，如果the storage pool逐渐增大至大于storageRegionSize，即初始化时storage区域的最大内存，我们可以回收部分blocks，并回收storage区域从execution借用的那些内存。

我们来看maybeGrowExecutionPool()方法的代码逻辑：如果需要额外的内存，即Execution预分配的内存已不够使用，首先取storageMemoryPool可用内存、storageMemoryPool总内存减去初始化时内存的较大者memoryReclaimableFromStorage，意思也就是，一定会把storageMemoryPool的可用内存全部借给execution区域，并且如果当前storageMemoryPool大小比初始化时大了，且大的程度比当前可用内存还大，则回收部分内存，然后storageMemoryPool调用shrinkPoolToFreeSpace方法回收并减持部分内存spaceReclaimed，onHeapExecutionMemoryPool增持相应的spaceReclaimed内存，起到了一个动态调整和此消彼长的效果。

最后，我们来看下实际分配内存的storageMemoryPool、onHeapExecutionMemoryPool以及offHeapExecutionMemoryPool。它们的定义在MemoryManager中，代码如下：

[java]  view plain 
      copy 
     
 
 @GuardedBy("this")  
   protected val storageMemoryPool = new StorageMemoryPool(this)  
   @GuardedBy("this")  
   protected val onHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "on-heap execution")  
   @GuardedBy("this")  
   protected val offHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "off-heap execution")  

它们的类型分别是StorageMemoryPool、ExecutionMemoryPool，只不过后两者是用两个名称不同的对象来分别提供内存服务的，二者的名称分别是on-heap execution和off-heap execution。而它们共同继承自抽象类MemoryPool，我们先看下MemoryPool的代码：

[java]  view plain 
      copy 
     
 
 /** 
  * Manages bookkeeping for an adjustable-sized region of memory. This class is internal to 
  * the [[MemoryManager]]. See subclasses for more details. 
  * 为一个可调整大小的内存区域管理记账工作。这个类在MemoryManager类中使用。更多详情请查看实现子类。 
  * 
  * @param lock a [[MemoryManager]] instance, used for synchronization. We purposely erase the type 
  *             to `Object` to avoid programming errors, since this object should only be used for 
  *             synchronization purposes. 
  */  
 private[memory] abstract class MemoryPool(lock: Object) {  
   
   @GuardedBy("lock")  
   // 内存池的大小  
   private[this] var _poolSize: Long = 0  
   
   /** 
    * Returns the current size of the pool, in bytes. 
    * 获取内存池大小，需要使用对象lock的同步关键字synchronized，解决并发的问题， 
    * 而这个lock就是StaticMemoryManager或UnifiedMemoryManager类类型的对象 
    */  
   final def poolSize: Long = lock.synchronized {  
     _poolSize  
   }  
   
   /** 
    * Returns the amount of free memory in the pool, in bytes. 
    * 返回内存池可用内存，即内存池总大小减去已用内存，同样需要使用lock的同步关键字synchronized，解决并发的问题 
    */  
   final def memoryFree: Long = lock.synchronized {  
     _poolSize - memoryUsed  
   }  
   
   /** 
    * Expands the pool by `delta` bytes. 
    * 对内存池进行delta bytes的扩充，即完成_poolSize += delta，delta必须大于等于0 
    */  
   final def incrementPoolSize(delta: Long): Unit = lock.synchronized {  
     require(delta >= 0)  
     _poolSize += delta  
   }  
   
   /** 
    * Shrinks the pool by `delta` bytes. 
    * 对内存池进行delta字节的收缩，即_poolSize -= delta，delta必须大于等于0，且小于内存池现在的大小，并且必须小于等于内存池现在可用大小 
    */  
   final def decrementPoolSize(delta: Long): Unit = lock.synchronized {  
     require(delta >= 0)  
     require(delta <= _poolSize)  
     require(_poolSize - delta >= memoryUsed)  
     _poolSize -= delta  
   }  
   
   /** 
    * Returns the amount of used memory in this pool (in bytes). 
    * 返回内存池现在已使用的大小，由子类实现 
    */  
   def memoryUsed: Long  
 }  

很简单，一切尽在代码注释中，读者可自行补脑。

下面，我们看下StorageMemoryPool的实现，代码如下：

[java]  view plain 
      copy 
     
 
 /** 
  * Performs bookkeeping for managing an adjustable-size pool of memory that is used for storage 
  * (caching). 
  * 完成为管理一个可调整大小的用于存储（caching）的内存池的记账工作。 
  * 
  * @param lock a [[MemoryManager]] instance to synchronize on 
  */  
 private[memory] class StorageMemoryPool(lock: Object) extends MemoryPool(lock) with Logging {  
   
   @GuardedBy("lock")  
   // 已使用内存大小  
   private[this] var _memoryUsed: Long = 0L  
   
   // 获取已使用内存大小memoryUsed，需要使用对象lock的同步关键字synchronized，解决并发的问题，  
   // 而这个lock就是StaticMemoryManager或UnifiedMemoryManager类类型的对象  
   override def memoryUsed: Long = lock.synchronized {  
     _memoryUsed  
   }  
   
   // MemoryStore内存存储  
   private var _memoryStore: MemoryStore = _  
   def memoryStore: MemoryStore = {  
     if (_memoryStore == null) {  
       throw new IllegalStateException("memory store not initialized yet")  
     }  
     _memoryStore  
   }  
   
   /** 
    * Set the [[MemoryStore]] used by this manager to evict cached blocks. 
    * This must be set after construction due to initialization ordering constraints. 
    */  
   final def setMemoryStore(store: MemoryStore): Unit = {  
     _memoryStore = store  
   }  
   
   /** 
    * Acquire N bytes of memory to cache the given block, evicting existing ones if necessary. 
    * Blocks evicted in the process, if any, are added to `evictedBlocks`. 
    * @return whether all N bytes were successfully granted. 
    * 申请内存 
    */  
   def acquireMemory(  
       blockId: BlockId,  
       numBytes: Long,  
       evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean =   
     // 需要lock的synchronized，解决并发的问题  
     lock.synchronized {  
     // 需要释放的资源，为需要申请的大小减去内存池目前可用内存大小  
     val numBytesToFree = math.max(0, numBytes - memoryFree)  
     // 调用同名方法acquireMemory()  
     acquireMemory(blockId, numBytes, numBytesToFree, evictedBlocks)  
   }  
   
   /** 
    * Acquire N bytes of storage memory for the given block, evicting existing ones if necessary. 
    * 
    * @param blockId the ID of the block we are acquiring storage memory for 
    * @param numBytesToAcquire the size of this block 
    * @param numBytesToFree the amount of space to be freed through evicting blocks 
    * @return whether all N bytes were successfully granted. 
    */  
   def acquireMemory(  
       blockId: BlockId,  
       numBytesToAcquire: Long,  
       numBytesToFree: Long,  
       evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean = lock.synchronized {  
       
     // 申请分配的内存必须大于等于0  
     assert(numBytesToAcquire >= 0)  
       
     // 需要释放的内存必须大于等于0  
     assert(numBytesToFree >= 0)  
       
     // 已使用内存必须小于等于内存池大小  
     assert(memoryUsed <= poolSize)  
     if (numBytesToFree > 0) {  
         
       // 调用MemoryStore的evictBlocksToFreeSpace()方法释放numBytesToFree大小内存  
       memoryStore.evictBlocksToFreeSpace(Some(blockId), numBytesToFree, evictedBlocks)  
       // Register evicted blocks, if any, with the active task metrics  
       Option(TaskContext.get()).foreach { tc =>  
         val metrics = tc.taskMetrics()  
         val lastUpdatedBlocks = metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())  
         metrics.updatedBlocks = Some(lastUpdatedBlocks ++ evictedBlocks.toSeq)  
       }  
     }  
     // NOTE: If the memory store evicts blocks, then those evictions will synchronously call  
     // back into this StorageMemoryPool in order to free memory. Therefore, these variables  
     // should have been updated.  
       
     // 判断是否有足够的内存，即申请分配的内存必须小于等于可用内存  
     val enoughMemory = numBytesToAcquire <= memoryFree  
     if (enoughMemory) {  
       // 如果有足够的内存，已使用内存_memoryUsed增加numBytesToAcquire  
       _memoryUsed += numBytesToAcquire  
     }  
       
     // 返回enoughMemory，标志内存是否分配成功，即存在可用内存的话就分配成功，否则分配不成功  
     enoughMemory  
   }  
   
   // 释放size大小的内存，同样需要lock对象上的synchronized关键字，解决并发问题  
   def releaseMemory(size: Long): Unit = lock.synchronized {  
       
     // 如果size大于目前已使用内存_memoryUsed，记录Warning日志信息，且已使用内存_memoryUsed设置为0  
     if (size > _memoryUsed) {  
       logWarning(s"Attempted to release $size bytes of storage " +  
         s"memory when we only have ${_memoryUsed} bytes")  
       _memoryUsed = 0  
     } else {  
       // 否则，已使用内存_memoryUsed减去size大小  
       _memoryUsed -= size  
     }  
   }  
     
   // 释放所有的内存，同样需要lock对象上的synchronized关键字，解决并发问题，将目前已使用内存_memoryUsed设置为0  
   def releaseAllMemory(): Unit = lock.synchronized {  
     _memoryUsed = 0  
   }  
   
   /** 
    * Try to shrink the size of this storage memory pool by `spaceToFree` bytes. Return the number 
    * of bytes removed from the pool's capacity. 
    */  
   def shrinkPoolToFreeSpace(spaceToFree: Long): Long = lock.synchronized {  
     // First, shrink the pool by reclaiming free memory:  
     val spaceFreedByReleasingUnusedMemory = math.min(spaceToFree, memoryFree)  
     decrementPoolSize(spaceFreedByReleasingUnusedMemory)  
     val remainingSpaceToFree = spaceToFree - spaceFreedByReleasingUnusedMemory  
     if (remainingSpaceToFree > 0) {  
       // If reclaiming free memory did not adequately shrink the pool, begin evicting blocks:  
       val evictedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]  
       memoryStore.evictBlocksToFreeSpace(None, remainingSpaceToFree, evictedBlocks)  
       val spaceFreedByEviction = evictedBlocks.map(_._2.memSize).sum  
       // When a block is released, BlockManager.dropFromMemory() calls releaseMemory(), so we do  
       // not need to decrement _memoryUsed here. However, we do need to decrement the pool size.  
       decrementPoolSize(spaceFreedByEviction)  
       spaceFreedByReleasingUnusedMemory + spaceFreedByEviction  
     } else {  
       spaceFreedByReleasingUnusedMemory  
     }  
   }  
 }  

我们重点看下acquireMemory()的两个方法，第一个是带有blockId、numBytes、evictedBlocks三个参数的，它的逻辑很简单，使用lock的synchronized，解决并发的问题，然后计算需要释放的内存大小numBytesToFree，需要申请的大小减去内存池目前可用内存大小，也就是看内存池中可用内存大小是否能满足申请分配的内存大小，然后调用多一个numBytesToFree参数的同名方法。

带有四个参数的acquireMemory()也很简单，首先需要做一些内存大小的校验，确保内存的申请分配时合理的。校验的内容包含以下三个部分：

1、申请分配的内存numBytesToAcquire必须大于等于0；

2、需要释放的内存numBytesToFree必须大于等于0；

3、已使用内存memoryUsed必须小于等于内存池大小poolSize；

然后，如果需要释放部分内存，即numBytesToFree大于0，则调用MemoryStore的evictBlocksToFreeSpace()方法释放numBytesToFree大小内存，关于MemoryStore的内容我们在后续的存储管理模块再详细介绍，这里先有个概念即可。

最后，判断是否有足够的内存，即申请分配的内存必须小于等于可用内存，如果有足够的内存，已使用内存_memoryUsed增加numBytesToAcquire，并返回ture，否则返回false。

接下来，我们再看下shrinkPoolToFreeSpace()方法，它的主要作用就是试图对storage内存池收缩 spaceToFree字节大小，返回实际收缩的大小值。处理逻辑如下：

1、取试图收缩大小spaceToFree和可用内存memoryFree的较小者，即如果试图收缩的spaceToFree大于可用内存大小，那么最大也就是收缩可用内存大小memoryFree；

2、内存池做相应的减少，减少的大小为上面的spaceFreedByReleasingUnusedMemory；

3、计算预设定收缩大小中未完成的部分remainingSpaceToFree，即spaceToFree - spaceFreedByReleasingUnusedMemory；

4、如果未完成部分大于0：

4.1、利用MemoryStore调用evictBlocksToFreeSpace，放弃部分块来增加内存可用空间；

4.2、取得放弃块后可用内存增加的大小spaceFreedByEviction；

4.3、内存池做相应的减少spaceFreedByEviction；

4.4、返回收缩的实际大小，即spaceFreedByReleasingUnusedMemory + spaceFreedByEviction；

5、返回spaceFreedByReleasingUnusedMemory。

相对应的，我们现在来看下ExecutionMemoryPool的实现，代码如下：

[java]  view plain 
      copy 
     
 
 /** 
  * Implements policies and bookkeeping for sharing a adjustable-sized pool of memory between tasks. 
  * 
  * Tries to ensure that each task gets a reasonable share of memory, instead of some task ramping up 
  * to a large amount first and then causing others to spill to disk repeatedly. 
  * 
  * If there are N tasks, it ensures that each task can acquire at least 1 / 2N of the memory 
  * before it has to spill, and at most 1 / N. Because N varies dynamically, we keep track of the 
  * set of active tasks and redo the calculations of 1 / 2N and 1 / N in waiting tasks whenever this 
  * set changes. This is all done by synchronizing access to mutable state and using wait() and 
  * notifyAll() to signal changes to callers. Prior to Spark 1.6, this arbitration of memory across 
  * tasks was performed by the ShuffleMemoryManager. 
  * 
  * @param lock a [[MemoryManager]] instance to synchronize on 
  * @param poolName a human-readable name for this pool, for use in log messages 
  */  
 private[memory] class ExecutionMemoryPool(  
     lock: Object,  
     poolName: String  
   ) extends MemoryPool(lock) with Logging {  
   
   /** 
    * Map from taskAttemptId -> memory consumption in bytes 
    * taskAttemptId到内存耗费的映射 
    */  
   @GuardedBy("lock")  
   private val memoryForTask = new mutable.HashMap[Long, Long]()  
   
   // 获取已使用内存，需要在对象lock上使用synchronized关键字，解决并发的问题  
   override def memoryUsed: Long = lock.synchronized {  
     memoryForTask.values.sum  
   }  
   
   /** 
    * Returns the memory consumption, in bytes, for the given task. 
    * 返回给定Task的内存耗费 
    */  
   def getMemoryUsageForTask(taskAttemptId: Long): Long = lock.synchronized {  
     memoryForTask.getOrElse(taskAttemptId, 0L)  
   }  
   
   /** 
    * Try to acquire up to `numBytes` of memory for the given task and return the number of bytes 
    * obtained, or 0 if none can be allocated. 
    * 
    * This call may block until there is enough free memory in some situations, to make sure each 
    * task has a chance to ramp up to at least 1 / 2N of the total memory pool (where N is the # of 
    * active tasks) before it is forced to spill. This can happen if the number of tasks increase 
    * but an older task had a lot of memory already. 
    * 
    * @param numBytes number of bytes to acquire 
    * @param taskAttemptId the task attempt acquiring memory 
    * @param maybeGrowPool a callback that potentially grows the size of this pool. It takes in 
    *                      one parameter (Long) that represents the desired amount of memory by 
    *                      which this pool should be expanded. 
    * @param computeMaxPoolSize a callback that returns the maximum allowable size of this pool 
    *                           at this given moment. This is not a field because the max pool 
    *                           size is variable in certain cases. For instance, in unified 
    *                           memory management, the execution pool can be expanded by evicting 
    *                           cached blocks, thereby shrinking the storage pool. 
    * 
    * @return the number of bytes granted to the task. 
    */  
   private[memory] def acquireMemory(  
       numBytes: Long,  
       taskAttemptId: Long,  
       maybeGrowPool: Long => Unit = (additionalSpaceNeeded: Long) => Unit,  
       computeMaxPoolSize: () => Long = () => poolSize): Long = lock.synchronized {  
       
     // 申请内存numBytes的大小必须大于0  
     assert(numBytes > 0, s"invalid number of bytes requested: $numBytes")  
   
     // TODO: clean up this clunky method signature  
   
     // Add this task to the taskMemory map just so we can keep an accurate count of the number  
     // of active tasks, to let other tasks ramp down their memory in calls to `acquireMemory`  
       
     // 如果memoryForTask中不包含该Task，加入该Task，初始化为0，并唤醒其它等待的对象  
     if (!memoryForTask.contains(taskAttemptId)) {  
       memoryForTask(taskAttemptId) = 0L  
       // This will later cause waiting tasks to wake up and check numTasks again  
       lock.notifyAll()  
     }  
   
     // Keep looping until we're either sure that we don't want to grant this request (because this  
     // task would have more than 1 / numActiveTasks of the memory) or we have enough free  
     // memory to give it (we always let each task get at least 1 / (2 * numActiveTasks)).  
     // TODO: simplify this to limit each task to its own slot  
     while (true) {  
       
       // 获取当前活跃Task的数目  
       val numActiveTasks = memoryForTask.keys.size  
         
       // 获取该Task对应的当前已耗费内存  
       val curMem = memoryForTask(taskAttemptId)  
   
       // In every iteration of this loop, we should first try to reclaim any borrowed execution  
       // space from storage. This is necessary because of the potential race condition where new  
       // storage blocks may steal the free execution memory that this task was waiting for.  
       // 传进来的UnifiedMemoryManager的maybeGrowExecutionPool()方法  
       // 通过收回缓存的块扩充the execution pool，从而减少the storage pool  
       maybeGrowPool(numBytes - memoryFree)  
   
       // Maximum size the pool would have after potentially growing the pool.  
       // This is used to compute the upper bound of how much memory each task can occupy. This  
       // must take into account potential free memory as well as the amount this pool currently  
       // occupies. Otherwise, we may run into SPARK-12155 where, in unified memory management,  
       // we did not take into account space that could have been freed by evicting cached blocks.  
         
       // 计算内存池的最大大小maxPoolSize  
       val maxPoolSize = computeMaxPoolSize()  
         
       // 平均每个Task分配的最大内存大小maxMemoryPerTask  
       val maxMemoryPerTask = maxPoolSize / numActiveTasks  
         
       // 平均每个Task分配的最小内存大小minMemoryPerTask，为maxMemoryPerTask的一半  
       val minMemoryPerTask = poolSize / (2 * numActiveTasks)  
   
       // How much we can grant this task; keep its share within 0 <= X <= 1 / numActiveTasks  
       // 我们可以赋予该Task的最大大小，取numBytes和（maxMemoryPerTask - curMem与0较大者）中的较小者  
       // 如果当前已耗费内存大于maxMemoryPerTask，则为0，不再分配啦，否则取还可以分配的内存和申请分配的内存中的较小者  
       val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - curMem))  
       // Only give it as much memory as is free, which might be none if it reached 1 / numTasks  
       // 实际可以分配的最大大小，取maxToGrant和memoryFree中的较小者  
       val toGrant = math.min(maxToGrant, memoryFree)  
   
       // We want to let each task get at least 1 / (2 * numActiveTasks) before blocking;  
       // if we can't give it this much now, wait for other tasks to free up memory  
       // (this happens if older tasks allocated lots of memory before N grew)  
       if (toGrant < numBytes && curMem + toGrant < minMemoryPerTask) {  
         // 如果实际分配的内存大小toGrant小于申请分配的内存大小numBytes，且当前已耗费内存加上马上就要分配的内存，小于Task需要的最小内存  
         // 记录日志信息  
         logInfo(s"TID $taskAttemptId waiting for at least 1/2N of $poolName pool to be free")  
         // lock等待，即MemoryManager等待  
         lock.wait()  
       } else {  
         // 对应Task的已耗费内存增加toGrant  
         memoryForTask(taskAttemptId) += toGrant  
           
         // 返回申请的内存大小toGrant  
         return toGrant  
       }  
     }  
     0L  // Never reached  
   }  
   
   /** 
    * Release `numBytes` of memory acquired by the given task. 
    * 释放给定Task申请的numBytes大小的内存 
    */  
   def releaseMemory(numBytes: Long, taskAttemptId: Long): Unit = lock.synchronized {  
       
     // 根据Task获取当前已耗费内存  
     val curMem = memoryForTask.getOrElse(taskAttemptId, 0L)  
       
     var memoryToFree = if (curMem < numBytes) {// 如果当前已耗费内存小于需要释放的内存  
       // 记录警告日志信息  
       logWarning(  
         s"Internal error: release called on $numBytes bytes but task only has $curMem bytes " +  
           s"of memory from the $poolName pool")  
       // 返回curMem  
       curMem  
     } else {  
       // 否则直接返回numBytes  
       numBytes  
     }  
       
     if (memoryForTask.contains(taskAttemptId)) {  
       // memoryForTask中对应Task的已耗费内存减少memoryToFree  
       memoryForTask(taskAttemptId) -= memoryToFree  
         
       // 已耗费内存小于等于0的话，直接删除  
       if (memoryForTask(taskAttemptId) <= 0) {  
         memoryForTask.remove(taskAttemptId)  
       }  
     }  
       
     // 唤醒所有等待的对象，比如acquireMemory()方法  
     lock.notifyAll() // Notify waiters in acquireMemory() that memory has been freed  
   }  
   
   /** 
    * Release all memory for the given task and mark it as inactive (e.g. when a task ends). 
    * @return the number of bytes freed. 
    * 释放给定Task的所有内存，并且标记其为不活跃 
    */  
   def releaseAllMemoryForTask(taskAttemptId: Long): Long = lock.synchronized {  
       
     // 获取指定Task的内存使用情况  
     val numBytesToFree = getMemoryUsageForTask(taskAttemptId)  
       
     // 释放指定Task的numBytesToFree大小的内存  
     releaseMemory(numBytesToFree, taskAttemptId)  
       
     // 返回释放的大小numBytesToFree  
     numBytesToFree  
   }  
   
 }  

其中，有一个非常重要的数据结构memoryForTask，保存的是taskAttemptId到内存耗费的映射。

我们还是重点看下acquireMemory()方法，主要逻辑如下：

1、校验，确保申请内存numBytes的大小必须大于0；

2、如果memoryForTask中不包含该Task，加入该Task，初始化为0，并唤醒其它等待的对象；

3、在一个循环体中：

3.1、获取当前活跃Task的数目numActiveTasks；

3.2、获取该Task对应的当前已耗费内存curMem；

3.3、maybeGrowPool为传进来的UnifiedMemoryManager的maybeGrowExecutionPool()方法，其通过收回缓存的块扩充the execution pool，从而减少the storage pool；

3.4、计算内存池的最大大小maxPoolSize；

3.5、平均每个Task分配的最大内存大小maxMemoryPerTask；

3.6、平均每个Task分配的最小内存大小minMemoryPerTask，为maxMemoryPerTask的一半；

3.7、计算我们可以赋予该Task的最大大小maxToGrant，取numBytes和（maxMemoryPerTask - curMem与0较大者）中的较小者，也就是，如果当前已耗费内存大于maxMemoryPerTask，则为0，不再分配啦，否则取还可以分配的内存和申请分配的内存中的较小者；

3.8、计算实际可以分配的最大大小toGrant，取maxToGrant和memoryFree中的较小者；

3.9、如果实际分配的内存大小toGrant小于申请分配的内存大小numBytes，且当前已耗费内存加上马上就要分配的内存，小于Task需要的最小内存，记录日志信息，lock等待，即MemoryManager等待；否则memoryForTask中对应Task的已耗费内存增加toGrant，返回申请的内存大小toGrant，跳出循环。

你可能感兴趣的:(Spark)

nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
Spark 组件 GraphX、Streaming 叶域大数据 spark spark 大数据分布式
Spark组件GraphX、Streaming一、SparkGraphX1.1GraphX的主要概念1.2GraphX的核心操作1.3示例代码1.4GraphX的应用场景二、SparkStreaming2.1SparkStreaming的主要概念2.2示例代码2.3SparkStreaming的集成2.4SparkStreaming的应用场景SparkGraphX用于处理图和图并行计算。Graph
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
Spark集群的三种模式 MelodyYN #Spark spark hadoop big data
文章目录1、Spark的由来1.1Hadoop的发展1.2MapReduce与Spark对比2、Spark内置模块3、Spark运行模式3.1Standalone模式部署配置历史服务器配置高可用运行模式3.2Yarn模式安装部署配置历史服务器运行模式4、WordCount案例1、Spark的由来定义：Hadoop主要解决，海量数据的存储和海量数据的分析计算。Spark是一种基于内存的快速、通用、可
Java中的大数据处理框架对比分析省赚客app开发者 java 开发语言
Java中的大数据处理框架对比分析大家好，我是微赚淘客系统3.0的小编，是个冬天不穿秋裤，天冷也要风度的程序猿！今天，我们将深入探讨Java中常用的大数据处理框架，并对它们进行对比分析。大数据处理框架是现代数据驱动应用的核心，它们帮助企业处理和分析海量数据，以提取有价值的信息。本文将重点介绍ApacheHadoop、ApacheSpark、ApacheFlink和ApacheStorm这四种流行的
写出渗透测试信息收集详细流程卿酌南烛_b805
一、扫描域名漏洞：域名漏洞扫描工具有AWVS、APPSCAN、Netspark、WebInspect、Nmap、Nessus、天镜、明鉴、WVSS、RSAS等。二、子域名探测：1、dns域传送漏洞2、搜索引擎查找（通过Google、bing、搜索c段）3、通过ssl证书查询网站：https://myssl.com/ssl.html和https://www.chinassl.net/ssltools
Spark MLlib模型训练—推荐算法 ALS(Alternative Least Squares) 不二人生 Spark ML 实战 spark-ml 推荐算法算法
SparkMLlib模型训练—推荐算法ALS(AlternativeLeastSquares)如果你平时爱刷抖音，或者热衷看电影，不知道有没有过这样的体验：这类影视App你用得越久，它就好像会读心术一样，总能给你推荐对胃口的内容。其实这种迎合用户喜好的推荐，离不开机器学习中的推荐算法。在今天这一讲，我们就结合两个有趣的电影推荐场景，为你讲解SparkMLlib支持的协同过滤与频繁项集算法电影推荐场
Python基础知识进阶之正则表达式_头歌python正则表达式进阶前端陈萨龙程序员 python 学习面试
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
分布式离线计算—Spark—基础介绍测试开发abbey 人工智能—大数据
原文作者：饥渴的小苹果原文地址：【Spark】Spark基础教程目录Spark特点Spark相对于Hadoop的优势Spark生态系统Spark基本概念Spark结构设计Spark各种概念之间的关系Executor的优点Spark运行基本流程Spark运行架构的特点Spark的部署模式Spark三种部署方式Hadoop和Spark的统一部署摘要：Spark是基于内存计算的大数据并行计算框架Spar
spark常用命令我是浣熊的微笑 spark
查看报错日志：yarnlogsapplicationIDspark2-submit--masteryarn--classcom.hik.ReadHdfstest-1.0-SNAPSHOT.jar进入$SPARK_HOME目录，输入bin/spark-submit--help可以得到该命令的使用帮助。hadoop@wyy:/app/hadoop/spark100$bin/spark-submit--
spark启动命令学不会又听不懂 spark 大数据分布式
hadoop启动：cd/root/toolssstart-dfs.sh，只需在hadoop01上启动stop-dfs.sh日志查看：cat/root/toolss/hadoop/logs/hadoop-root-datanode-hadoop03.outzookeeper启动：cd/root/toolss/zookeeperbin/zkServer.shstart，三台都要启动bin/zkServ
大数据领域的深度分析——AI是在帮助开发者还是取代他们？阳爱铭大数据与数据中台技术沉淀大数据人工智能后端数据库架构数据库开发 etl工程师 chatgpt
在大数据领域，生成式人工智能（AIGC）的应用正在迅速扩展，改变了数据科学家和开发者的工作方式。本文将从大数据的专业视角，探讨AI工具在这一领域的作用，以及它们是如何帮助开发者而非取代他们的。1.大数据领域的AI工具现状在大数据领域，AI工具已经取得了显著进展，以下是几款主要的AI工具及其功能和实际应用：ApacheSpark+MLlib：ApacheSpark是一个开源的分布式计算系统，广泛用于
大数据新视界 --大数据大厂之 Spark 性能优化秘籍：从配置到代码实践青云交大数据新视界 Spark 性能优化内存分配并行度存储级别 shuffle 减少算法优化代码实践数据读取广播变量数据倾斜 Spark 数据库
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
编程常用命令总结 Yellow0523 Linux BigData 大数据
编程命令大全1.软件环境变量的配置JavaScalaSparkHadoopHive2.大数据软件常用命令Spark基本命令Spark-SQL命令Hive命令HDFS命令YARN命令Zookeeper命令kafka命令Hibench命令MySQL命令3.Linux常用命令Git命令conda命令pip命令查看Linux系统的详细信息查看Linux系统架构(X86还是ARM，两种方法都可)端口号命令L
【面试系列】Spark 高频面试题解答野老杂谈全网最全IT公司面试宝典面试 spark 职场和发展大数据
欢迎来到我的博客，很高兴能够在这里和您见面！欢迎订阅相关专栏：⭐️全网最全IT互联网公司面试宝典：收集整理全网各大IT互联网公司技术、项目、HR面试真题.⭐️AIGC时代的创新与未来：详细讲解AIGC的概念、核心技术、应用领域等内容。⭐️大数据平台建设指南：全面讲解从数据采集到数据可视化的整个过程，掌握构建现代化数据平台的核心技术和方法。⭐️《遇见Python：初识、了解与热恋》：涵盖了Pytho
spark常见面试题爱敲代码的小黑 spark 大数据分布式
文章目录1.Spark的运行流程？2.Spark中的RDD机制理解吗？3.RDD的宽窄依赖4.DAG中为什么要划分Stage？5.Spark程序执行，有时候默认为什么会产生很多task，怎么修改默认task执行个数？6.RDD中reduceBykey与groupByKey哪个性能好，为什么？7.SparkMasterHA主从切换过程不会影响到集群已有作业的运行，为什么？8.SparkMaster使
Spark面试题 golove666 面试题大全 spark 大数据分布式面试
Spark面试题1.Spark基础概念1.1解释Spark是什么以及它的主要特点Spark是什么？Spark的主要特点1.2描述Spark运行时架构和组件主要的Spark架构组件：1.3讲述Spark中的弹性分布式数据集（RDD）和数据帧（DataFrame）弹性分布式数据集（RDD）主要特征：创建和转换：使用场景：数据帧（DataFrame）主要特征：创建和操作：使用场景：RDD与DataFra
图计算：基于SparkGrpahX计算聚类系数妙龄少女郭德纲 Spark 图算法 Scala 聚类数据挖掘机器学习
图计算：基于SparkGrpahX计算聚类系数文章目录图计算：基于SparkGrpahX计算聚类系数一、什么是聚类系数二、基于SparkGraphX的聚类系数代码实现总结一、什么是聚类系数聚类系数（ClusteringCoefficient）是图计算和网络分析中的一个重要概念，用于衡量网络中节点的局部聚集程度。它有助于理解网络中节点之间的紧密程度和网络的结构特性。这是一种用来衡量图中节点聚类程度的
2024年最全使用Python求解方程_python解方程(1)，字节面试官迟到 2401_84569545 程序员 python 学习面试
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
Spark运行时架构 tooolik spark 架构大数据
目录一，Spark运行时架构二，YARN集群架构（一）YARN集群主要组件1、ResourceManager-资源管理器2、NodeManager-节点管理器3、Task-任务4、Container-容器5、ApplicationMaster-应用程序管理器6，总结（二）YARN集群中应用程序的执行流程三、SparkStandalone架构（一）client提交方式（二）cluster提交方式四、
使用SparkSql进行表的分析与统计 xingyuan8 大数据 java
背景我们的数据挖掘平台对数据统计有比较迫切的需求，而Spark本身对数据统计已经做了一些工作，希望梳理一下Spark已经支持的数据统计功能，后期再进行扩展。准备数据在参考文献6中下载鸢尾花数据，此处格式为iris.data格式，先将data后缀改为csv后缀（不影响使用，只是为了保证后续操作不需要修改）。数据格式如下：SepalLengthSepalWidthPetalLengthPetalWid
13.Spark Core-Spark中广播变量和累加器 __元昊__
一、前述Spark中因为算子中的真正逻辑是发送到Executor中去运行的，所以当Executor中需要引用外部变量时，需要使用广播变量。累机器相当于统筹大变量，常用于计数，统计。二、具体原理1、广播变量广播变量理解图image注意事项1、能不能将一个RDD使用广播变量广播出去？不能，因为RDD是不存储数据的。可以将RDD的结果广播出去。2、广播变量只能在Driver端定义，不能在Executor
比较Spark与Flink 傲雪凌霜，松柏长青大数据后端 spark flink 大数据
ApacheSpark和ApacheFlink都是目前非常流行的大数据处理引擎，但它们在架构、处理模式、应用场景等方面有一些显著的区别。下面是二者的对比：1.处理模式Spark:主要支持批处理（BatchProcessing），也能通过SparkStreaming处理流式数据，但SparkStreaming本质上是通过微批（micro-batching）的方式处理流数据，延迟相对较高。SparkS
Spark底层逻辑傲雪凌霜，松柏长青大数据后端 spark 大数据
ApacheSpark的底层逻辑可以从其核心概念、组件和执行流程等方面来理解。Spark提供了一个分布式数据处理框架，其底层逻辑基于批处理架构，能够在大规模集群中高效地处理数据。以下是Spark的底层逻辑的详细介绍：1.核心概念Spark的底层基于几个核心概念来实现分布式计算，包括：RDD（ResilientDistributedDataset，弹性分布式数据集）：RDD是Spark最基础的数据抽
Spark - 升级版数据源JDBC2 大猪大猪
在spark的数据源中，只支持Append,Overwrite,ErrorIfExists,Ignore,这几种模式，但是我们在线上的业务几乎全是需要upsert功能的，就是已存在的数据肯定不能覆盖，在mysql中实现就是采用：ONDUPLICATEKEYUPDATE，有没有这样一种实现？官方：不好意思，不提供，dounine：我这有呀，你来用吧。哈哈，为了方便大家的使用我已经把项目打包到mave
PySpark 静听山水 Spark spark
PySpark的本质确实是Python的一个接口层，它允许你使用Python语言来编写ApacheSpark应用程序。通过这个接口，你可以利用Spark强大的分布式计算能力，同时享受Python的易用性和灵活性。1、PySpark的工作原理PySpark的工作原理可以概括为以下几个步骤：编写Python代码：开发者使用Python语法来编写Spark应用程序。这些程序通常涉及创建RDDs（弹性分布
Ubuntu的ssh 请不要问我是谁
安装sshsudoapt-getupdatesudoapt-getinstallopenssh-server检测ssh是否启动sudops-e|grepssh创建root用户sudopasswdroot配置本机无密码ssh登录cd/home/spark0ssh-keygen-trsa-P""cat.ssh/id_rsa.pub>>.ssh/authorized_keyschmod600.ssh/a
2024年大数据最新实时数仓之实时数仓架构(Hudi) 2401_84185556 程序员大数据架构
技术框架Kafka：用于接入数据源；FlinkCDC：如果直接接入业务数据源可以考虑CDC方式，如果通过Kafka缓冲接入业务数据可以忽略;Flink：用于数据ETL，包括接入数据、处理数据及输出数据全链路数据计算任务；Spark：用于数据ETL，包括处理数据及输出数据全链路数据计算任务；Hudi：湖仓一体数据管理框架，用来管理模型数据，包括ODS/DWD/DWS/DIM/ADS等；Doris：O
实时数仓之实时数仓架构(Hudi)(1)，2024年最新熬夜整理华为最新大数据开发笔试题 2401_84181221 程序员架构大数据
+Hudi：湖仓一体数据管理框架，用来管理模型数据，包括ODS/DWD/DWS/DIM/ADS等；+Doris：OLAP引擎，同步数仓结果模型，对外提供数据服务支持；+Hbase：用来存储维表信息，维表数据来源一部分有Flink加工实时写入，另一部分是从Spark任务生产，其主要作用用来支持FlinkETL处理过程中的LookupJoin功能。这里选用Hbase原因主要因为Table的HbaseC
开发者关心的那些事圣子足道 ios 游戏编程 apple 支付
我要在app里添加IAP，必须要注册自己的产品标识符（product identifiers）。产品标识符是什么？产品标识符（Product Identifiers）是一串字符串，它用来识别你在应用内贩卖的每件商品。App Store用产品标识符来检索产品信息，标识符只能包含大小写字母（A-Z）、数字（0-9）、下划线（-）、以及圆点(.)。你可以任意排列这些元素，但我们建议你创建标识符时使用
负载均衡器技术Nginx和F5的优缺点对比 bijian1013 nginx F5
对于数据流量过大的网络中，往往单一设备无法承担，需要多台设备进行数据分流，而负载均衡器就是用来将数据分流到多台设备的一个转发器。目前有许多不同的负载均衡技术用以满足不同的应用需求，如软/硬件负载均衡、本地/全局负载均衡、更高
LeetCode[Math] - #9 Palindrome Number Cwind java Algorithm 题解 LeetCode Math
原题链接：#9 Palindrome Number 要求：判断一个整数是否是回文数，不要使用额外的存储空间难度：简单分析：题目限制不允许使用额外的存储空间应指不允许使用O(n)的内存空间，O(1)的内存用于存储中间结果是可以接受的。于是考虑将该整型数反转，然后与原数字进行比较。注：没有看到有关负数是否可以是回文数的明确结论，例如
画图板的基本实现 15700786134 画图板
要实现画图板的基本功能，除了在qq登陆界面中用到的组件和方法外，还需要添加鼠标监听器，和接口实现。首先，需要显示一个JFrame界面： public class DrameFrame extends JFrame { //显示
linux的ps命令被触发 linux
Linux中的ps命令是Process Status的缩写。ps命令用来列出系统中当前运行的那些进程。ps命令列出的是当前那些进程的快照，就是执行ps命令的那个时刻的那些进程，如果想要动态的显示进程信息，就可以使用top命令。要对进程进行监测和控制，首先必须要了解当前进程的情况，也就是需要查看当前进程，而 ps 命令就是最基本同时也是非常强大的进程查看命令。使用该命令可以确定有哪些进程正在运行
Android 音乐播放器下一曲连续跳几首歌肆无忌惮_ android
最近在写安卓音乐播放器的时候遇到个问题。在MediaPlayer播放结束时会回调 player.setOnCompletionListener(new OnCompletionListener() { @Override public void onCompletion(MediaPlayer mp) { mp.reset(); Log.i("H
java导出txt文件的例子知了ing java servlet
代码很简单就一个servlet,如下： package com.eastcom.servlet; import java.io.BufferedOutputStream; import java.io.IOException; import java.net.URLEncoder; import java.sql.Connection; import java.sql.Resu
Scala stack试玩, 提高第三方依赖下载速度矮蛋蛋 scala sbt
原文地址： http://segmentfault.com/a/1190000002894524 sbt下载速度实在是惨不忍睹, 需要做些配置优化下载typesafe离线包, 保存为ivy本地库 wget http://downloads.typesafe.com/typesafe-activator/1.3.4/typesafe-activator-1.3.4.zip 解压r
phantomjs安装(linux，附带环境变量设置) ，以及casperjs安装。 alleni123 linux spider
1. 首先从官网 http://phantomjs.org/下载phantomjs压缩包，解压缩到/root/phantomjs文件夹。 2. 安装依赖 sudo yum install fontconfig freetype libfreetype.so.6 libfontconfig.so.1 libstdc++.so.6 3. 配置环境变量 vi /etc/profil
JAVA IO FileInputStream和FileOutputStream，字节流的打包输出百合不是茶 java核心思想 JAVA IO操作字节流
在程序设计语言中，数据的保存是基本，如果某程序语言不能保存数据那么该语言是不可能存在的，JAVA是当今最流行的面向对象设计语言之一，在保存数据中也有自己独特的一面，字节流和字符流 1，字节流是由字节构成的，字符流是由字符构成的字节流和字符流都是继承的InputStream和OutPutStream ,java中两种最基本的就是字节流和字符流类 FileInputStream
Spring基础实例（依赖注入和控制反转） bijian1013 spring
前提条件：在http://www.springsource.org/download网站上下载Spring框架，并将spring.jar、log4j-1.2.15.jar、commons-logging.jar加载至工程1.武器接口 package com.bijian.spring.base3; public interface Weapon { void kil
HR看重的十大技能 bijian1013 提升能力 HR 成长
一个人掌握何种技能取决于他的兴趣、能力和聪明程度，也取决于他所能支配的资源以及制定的事业目标，拥有过硬技能的人有更多的工作机会。但是，由于经济发展前景不确定，掌握对你的事业有所帮助的技能显得尤为重要。以下是最受雇主欢迎的十种技能。　　一、解决问题的能力　　每天，我们都要在生活和工作中解决一些综合性的问题。那些能够发现问题、解决问题并迅速作出有效决
【Thrift一】Thrift编译安装 bit1129 thrift
什么是Thrift The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and s
【Avro三】Hadoop MapReduce读写Avro文件 bit1129 mapreduce
Avro是Doug Cutting(此人绝对是神一般的存在）牵头开发的。开发之初就是围绕着完善Hadoop生态系统的数据处理而开展的（使用Avro作为Hadoop MapReduce需要处理数据序列化和反序列化的场景）,因此Hadoop MapReduce集成Avro也就是自然而然的事情。这个例子是一个简单的Hadoop MapReduce读取Avro格式的源文件进行计数统计，然后将计算结果
nginx定制500，502，503，504页面 ronin47 nginx　错误显示
server { listen 80; error_page 500/500.html; error_page 502/502.html; error_page 503/503.html; error_page 504/504.html; location /test {return502;}} 配置很简单，和配
java-1.二叉查找树转为双向链表 bylijinnan 二叉查找树
import java.util.ArrayList; import java.util.List; public class BSTreeToLinkedList { /* 把二元查找树转变成排序的双向链表题目：输入一棵二元查找树，将该二元查找树转换成一个排序的双向链表。要求不能创建任何新的结点，只调整指针的指向。 10 / \ 6 14 / \
Netty源码学习-HTTP-tunnel bylijinnan java netty
Netty关于HTTP tunnel的说明： http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/socket/http/package-summary.html#package_description 这个说明有点太简略了一个完整的例子在这里： https://github.com/bylijinnan
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别 coder_xpf jquery json map val()
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别数据库查询出来的map有一个字段为空通过System.out.println()输出 JSONUtil.serialize(map)： {"one":"1","two":"nul
Hibernate缓存总结 cuishikuan 开源 ssh javaweb hibernate缓存三大框架
一、为什么要用Hibernate缓存？ Hibernate是一个持久层框架，经常访问物理数据库。为了降低应用程序对物理数据源访问的频次，从而提高应用程序的运行性能。缓存内的数据是对物理数据源中的数据的复制，应用程序在运行时从缓存读写数据，在特定的时刻或事件会同步缓存和物理数据源的数据。二、Hibernate缓存原理是怎样的？ Hibernate缓存包括两大类：Hib
CentOs6 dalan_123 centos
首先su - 切换到root下面1、首先要先安装GCC GCC-C++ Openssl等以来模块：yum -y install make gcc gcc-c++ kernel-devel m4 ncurses-devel openssl-devel2、再安装ncurses模块yum -y install ncurses-develyum install ncurses-devel3、下载Erang
10款用 jquery 实现滚动条至页面底端自动加载数据效果 dcj3sjt126com JavaScript
无限滚动自动翻页可以说是web2.0时代的一项堪称伟大的技术，它让我们在浏览页面的时候只需要把滚动条拉到网页底部就能自动显示下一页的结果，改变了一直以来只能通过点击下一页来翻页这种常规做法。无限滚动自动翻页技术的鼻祖是微博的先驱：推特(twitter)，后来必应图片搜索、谷歌图片搜索、google reader、箱包批发网等纷纷抄袭了这一项技术，于是靠滚动浏览器滚动条
ImageButton去边框&Button或者ImageButton的背景透明 dcj3sjt126com imagebutton
在ImageButton中载入图片后，很多人会觉得有图片周围的白边会影响到美观，其实解决这个问题有两种方法一种方法是将ImageButton的背景改为所需要的图片。如：android:background="@drawable/XXX" 第二种方法就是将ImageButton背景改为透明，这个方法更常用在XML里； <ImageBut
JSP之c:foreach eksliang jsp forearch
原文出自：http://www.cnblogs.com/draem0507/archive/2012/09/24/2699745.html <c:forEach>标签用于通用数据循环，它有以下属性属性描述是否必须缺省值 items 进行循环的项目否无 begin 开始条件否 0 end 结束条件否集合中的最后一个项目 step 步长否 1
Android实现主动连接蓝牙耳机 gqdy365 android
在Android程序中可以实现自动扫描蓝牙、配对蓝牙、建立数据通道。蓝牙分不同类型，这篇文字只讨论如何与蓝牙耳机连接。大致可以分三步：一、扫描蓝牙设备： 1、注册并监听广播： BluetoothAdapter.ACTION_DISCOVERY_STARTED BluetoothDevice.ACTION_FOUND BluetoothAdapter.ACTION_DIS
android学习轨迹之四：org.json.JSONException: No value for hyz301 json
org.json.JSONException: No value for items 在JSON解析中会遇到一种错误，很常见的错误 06-21 12:19:08.714 2098-2127/com.jikexueyuan.secret I/System.out﹕ Result:{"status":1,"page":1,&
干货分享：从零开始学编程系列汇总 justjavac 编程
程序员总爱重新发明轮子，于是做了要给轮子汇总。从零开始写个编译器吧系列 (知乎专栏) 从零开始写一个简单的操作系统 (伯乐在线) 从零开始写JavaScript框架 (图灵社区) 从零开始写jQuery框架 (蓝色理想 ) 从零开始nodejs系列文章 (粉丝日志) 从零开始编写网络游戏
jquery-autocomplete 使用手册 macroli jquery Ajax 脚本
jquery-autocomplete学习一、用前必备官方网站：http://bassistance.de/jquery-plugins/jquery-plugin-autocomplete/ 当前版本：1.1 需要JQuery版本：1.2.6 二、使用 <script src="./jquery-1.3.2.js" type="text/ja
PLSQL-Developer或者Navicat等工具连接远程oracle数据库的详细配置以及数据库编码的修改超声波 oracle plsql
　　在服务器上将Oracle安装好之后接下来要做的就是通过本地机器来远程连接服务器端的oracle数据库，常用的客户端连接工具就是PLSQL-Developer或者Navicat这些工具了。刚开始也是各种报错，什么TNS:no listener;TNS:lost connection;TNS:target hosts...花了一天的时间终于让PLSQL-Developer和Navicat等这些客户
数据仓库数据模型之：极限存储--历史拉链表 superlxw1234 极限存储数据仓库数据模型拉链历史表
在数据仓库的数据模型设计过程中，经常会遇到这样的需求： 1. 数据量比较大; 2. 表中的部分字段会被update,如用户的地址，产品的描述信息，订单的状态等等; 3. 需要查看某一个时间点或者时间段的历史快照信息，比如，查看某一个订单在历史某一个时间点的状态，比如，查看某一个用户在过去某一段时间内，更新过几次等等; 4. 变化的比例和频率不是很大，比如，总共有10
10点睛Spring MVC4.1-全局异常处理 wiselyman spring mvc
10.1 全局异常处理使用@ControllerAdvice注解来实现全局异常处理; 使用@ControllerAdvice的属性缩小处理范围 10.2 演示演示控制器 package com.wisely.web; import org.springframework.stereotype.Controller; import org.spring