这个章节图片来源: https://www.ibm.com/developerworks/cn/analytics/library/ba-cn-apache-spark-memory-management/index.html

1. 为什么自己做内存管理

在前面的代码里可以看到RDD, 内部实际上是一个有一个的Partition, 而Partition的实际实现是Block.
Spark需要每个Executor上的BlockManager管理实际数据的storage level, 为了用空间换速度, 典型的就是使用内存尽可能多的换存数据, 并规避各种JVM回收上的坑. 所以Spark把内存管理单独拆出来作为一个服务, 在所有的sparkEnv里会启动一个MemoryManager来做统一内存管理.

JVM视角

根据Duke大学的介绍, spark当时启动这个计划主要是因为spark在不同的负载类型(离线分析, 流计算, 机器学习, SQL)下JVM的参数不容易配置, 性能差距可以达到2倍.
于是Spark1.6按照计划希望在JVM上建构一层来做内存管理.

Spark的内存管理视角

在这种内存管理模式下, Spark依托JVM的G1垃圾回收器来实现适应不同负载的内存管理模块.

2. MemoryManager的内部结构

主要用于管理计算用的内存和存储用的内存, 初始化时需要

numCores 本地分配到的核数
storageMemory 存储用的内存bytes
onHeapExecutionMemory 计算用的内存bytes, 这里使用on-heap, 因为off-heap用的是tachyon的内存空间, 和JVM无关.

/**
 * An abstract memory manager that enforces how memory is shared between execution and storage.
 *
 * In this context, execution memory refers to that used for computation in shuffles, joins,
 * sorts and aggregations, while storage memory refers to that used for caching and propagating
 * internal data across the cluster. There exists one MemoryManager per JVM.
 */
private[spark] abstract class MemoryManager(
    conf: SparkConf,
    numCores: Int,
    storageMemory: Long,
    onHeapExecutionMemory: Long)

MemoryManager需要维护对应内存池

 // -- Methods related to memory allocation policies and bookkeeping ------------------------------

  @GuardedBy("this")
  protected val storageMemoryPool = new StorageMemoryPool(this)
  @GuardedBy("this")
  protected val onHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "on-heap execution")
  @GuardedBy("this")
  protected val offHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "off-heap execution")

  storageMemoryPool.incrementPoolSize(storageMemory)
  onHeapExecutionMemoryPool.incrementPoolSize(onHeapExecutionMemory)
  offHeapExecutionMemoryPool.incrementPoolSize(conf.getSizeAsBytes("spark.memory.offHeap.size", 0))

3.内存管理的几个核心方法

内存结构

在比较新的unified memory manager管理模型中, 对之前的strorage + execution进行了进一步细分.

进一步细分后, 可以看到storage里面出现了一个概念unroll. 这个机制对于做过linux内核的同学可能非常容易理解, 这就是内存页伙伴算法在JVM的实现.

这个概念非常类似于我们去饭店吃饭, 虽然我们可能不太好估计同行有多少人, 但是可以根据大概的估计先把位置占了, 而且这些位置一定要是连续的, 我和我的朋友们必须坐在一张桌子上啊! 这个Unroll就是预约连续内存空间的一种模型, 可以提升执行效率, 它把在硬盘上可能分散的数据块(实际上HDFS下面的ext4会尽可能连续写) 平铺到一段连续的内存空间中.

下面三个方法对应的就是不同的内存池进行申请

/**
   * Acquire N bytes of memory to cache the given block, evicting existing ones if necessary.
   * Blocks evicted in the process, if any, are added to `evictedBlocks`.
   * @return whether all N bytes were successfully granted.
   */
  def acquireStorageMemory

  /**
   * Acquire N bytes of memory to unroll the given block, evicting existing ones if necessary.
   *
   * This extra method allows subclasses to differentiate behavior between acquiring storage
   * memory and acquiring unroll memory. For instance, the memory management model in Spark
   * 1.5 and before places a limit on the amount of space that can be freed from unrolling.
   * Blocks evicted in the process, if any, are added to `evictedBlocks`.
   *
   * @return whether all N bytes were successfully granted.
   */
  def acquireUnrollMemory

 /**
   * Try to acquire up to `numBytes` of execution memory for the current task and return the
   * number of bytes obtained, or 0 if none can be allocated.
   *
   * This call may block until there is enough free memory in some situations, to make sure each
   * task has a chance to ramp up to at least 1 / 2N of the total memory pool (where N is the # of
   * active tasks) before it is forced to spill. This can happen if the number of tasks increase
   * but an older task had a lot of memory already.
   */
  private[memory]
  def acquireExecutionMemory

和申请对应的, 还有查询已经申请的内存, 以及释放已经申请的内存的方法

 /**
   * Execution memory currently in use, in bytes.
   */
  final def executionMemoryUsed: Long

 /**
   * Release N bytes of storage memory.
   */
  def releaseStorageMemory(numBytes: Long): Unit

在spark的内存管理中, 有一个和内核内存管理非常相似的概念, 就是内存页memory page. 在内核中, memory page是使用buddy algorithm来管理, 在spark中也非常类似, 每次可以分配连续的2^n个页给需要的block使用. Linux内核中默认的页大小一般是4K, 而spark中是1M

  /**
   * The default page size, in bytes.
   *
   * If user didn't explicitly set "spark.buffer.pageSize", we figure out the default value
   * by looking at the number of cores available to the process, and the total amount of memory,
   * and then divide it by a factor of safety.
   */
  val pageSizeBytes: Long = {
    val minPageSize = 1L * 1024 * 1024   // 1MB
    val maxPageSize = 64L * minPageSize  // 64MB
    val cores = if (numCores > 0) numCores else Runtime.getRuntime.availableProcessors()
    // Because of rounding to next power of 2, we may have safetyFactor as 8 in worst case
    val safetyFactor = 16
    val maxTungstenMemory: Long = tungstenMemoryMode match {
      case MemoryMode.ON_HEAP => onHeapExecutionMemoryPool.poolSize
      case MemoryMode.OFF_HEAP => offHeapExecutionMemoryPool.poolSize
    }
    val size = ByteArrayMethods.nextPowerOf2(maxTungstenMemory / cores / safetyFactor)
    val default = math.min(maxPageSize, math.max(minPageSize, size))
    conf.getSizeAsBytes("spark.buffer.pageSize", default)
  }

9.1 Spark 内存管理 MemoryManager

1. 为什么自己做内存管理

2. MemoryManager的内部结构

3.内存管理的几个核心方法

你可能感兴趣的:(9.1 Spark 内存管理 MemoryManager)