6.1 MemoryManager

1. 概述

MemoryManager是内存管理的抽象类, TaskMemoryManager通过某个MemoryManager的实现进行具体的内存管理作业.

MemoryManager主要关注于在Executor的计算和存储之间分配内存. 计算指的是进行各种Task运算时需要的中间存储空间, 而存储主要是部分数据执行persist() cache()时需要用到的空间, 这也包括了map执行结果输出shuffle可能用到的空间

MemoryManager本质上是一个计算用的模块, 它就像计算器一样计算需要分配的内存多少, 以bytes为单位, 用long存储, 返回到上层后. 由BlockManager去实际的进行空间的申请和使用.

persist() 和 cache() 底部用到的代码是一样, 不过persist可以选择存储级别

源码中的描述如下

/**
 * An abstract memory manager that enforces how memory is shared between execution and storage.
 *
 * In this context, execution memory refers to that used for computation in shuffles, joins,
 * sorts and aggregations, while storage memory refers to that used for caching and propagating
 * internal data across the cluster. There exists one MemoryManager per JVM.
 */
private[spark] abstract class MemoryManager(
    conf: SparkConf,
    numCores: Int,
    storageMemory: Long,
    onHeapExecutionMemory: Long) extends Logging

这里storageMemory的long结构指的是有多少bytes的内存可以使用, 注释中也提到了这个玩意是个单例, 初始化的时候, 每个JVM里就一个.

2. 核心方法

2.1 acquireStorageMemory

为Block申请一片内存区域

  /**
   * Acquire N bytes of memory to cache the given block, evicting existing ones if necessary.
   * Blocks evicted in the process, if any, are added to `evictedBlocks`.
   * @return whether all N bytes were successfully granted.
   */
  def acquireStorageMemory(
      blockId: BlockId,
      numBytes: Long,
      evictedBlocks: mutable.Buffer[(BlockId, BlockStatus)]): Boolean
  • UnifiedMemoryManager这具体实现调用, 来预分配内存
  • MemoryStore通过acquireUnrollMemory间接调用来申请预分配的内存

2.2 acquireExecutionMemory

 /**
   * Try to acquire up to `numBytes` of execution memory for the current task and return the
   * number of bytes obtained, or 0 if none can be allocated.
   *
   * This call may block until there is enough free memory in some situations, to make sure each
   * task has a chance to ramp up to at least 1 / 2N of the total memory pool (where N is the # of
   * active tasks) before it is forced to spill. This can happen if the number of tasks increase
   * but an older task had a lot of memory already.
   */
  private[memory]
  def acquireExecutionMemory(
      numBytes: Long,
      taskAttemptId: Long,
      memoryMode: MemoryMode): Long
  • TaskMemoryManager调用来获取内存用于运算, 和临时结果的存储

2.3 Spark默认的内存页

spark在内存管理中使用了类似内核的 MemoryAddress->MemoryTable -> MemoryPage的模式, 分配内存时以页为最小分配单位

  /**
   * The default page size, in bytes.
   *
   * If user didn't explicitly set "spark.buffer.pageSize", we figure out the default value
   * by looking at the number of cores available to the process, and the total amount of memory,
   * and then divide it by a factor of safety.
   */
  val pageSizeBytes: Long = {
    val minPageSize = 1L * 1024 * 1024   // 1MB
    val maxPageSize = 64L * minPageSize  // 64MB
    val cores = if (numCores > 0) numCores else Runtime.getRuntime.availableProcessors()
    // Because of rounding to next power of 2, we may have safetyFactor as 8 in worst case
    val safetyFactor = 16
    val maxTungstenMemory: Long = tungstenMemoryMode match {
      case MemoryMode.ON_HEAP => onHeapExecutionMemoryPool.poolSize
      case MemoryMode.OFF_HEAP => offHeapExecutionMemoryPool.poolSize
    }
    val size = ByteArrayMethods.nextPowerOf2(maxTungstenMemory / cores / safetyFactor)
    val default = math.min(maxPageSize, math.max(minPageSize, size))
    conf.getSizeAsBytes("spark.buffer.pageSize", default)
  }

你可能感兴趣的:(6.1 MemoryManager)