这个章节图片来源: https://www.ibm.com/developerworks/cn/analytics/library/ba-cn-apache-spark-memory-management/index.html
1. 为什么自己做内存管理
在前面的代码里可以看到RDD
, 内部实际上是一个有一个的Partition
, 而Partition
的实际实现是Block
.
Spark需要每个Executor
上的BlockManager
管理实际数据的storage level, 为了用空间换速度, 典型的就是使用内存尽可能多的换存数据, 并规避各种JVM回收上的坑. 所以Spark把内存管理单独拆出来作为一个服务, 在所有的sparkEnv
里会启动一个MemoryManager
来做统一内存管理.
根据Duke大学的介绍, spark当时启动这个计划主要是因为spark在不同的负载类型(离线分析, 流计算, 机器学习, SQL)下JVM的参数不容易配置, 性能差距可以达到2倍.
于是Spark1.6按照计划希望在JVM上建构一层来做内存管理.
在这种内存管理模式下, Spark依托JVM的G1垃圾回收器来实现适应不同负载的内存管理模块.
2. MemoryManager的内部结构
主要用于管理计算用的内存和存储用的内存, 初始化时需要
- numCores 本地分配到的核数
- storageMemory 存储用的内存bytes
- onHeapExecutionMemory 计算用的内存bytes, 这里使用on-heap, 因为off-heap用的是tachyon的内存空间, 和JVM无关.
/**
* An abstract memory manager that enforces how memory is shared between execution and storage.
*
* In this context, execution memory refers to that used for computation in shuffles, joins,
* sorts and aggregations, while storage memory refers to that used for caching and propagating
* internal data across the cluster. There exists one MemoryManager per JVM.
*/
private[spark] abstract class MemoryManager(
conf: SparkConf,
numCores: Int,
storageMemory: Long,
onHeapExecutionMemory: Long)
MemoryManager
需要维护对应内存池
// -- Methods related to memory allocation policies and bookkeeping ------------------------------
@GuardedBy("this")
protected val storageMemoryPool = new StorageMemoryPool(this)
@GuardedBy("this")
protected val onHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "on-heap execution")
@GuardedBy("this")
protected val offHeapExecutionMemoryPool = new ExecutionMemoryPool(this, "off-heap execution")
storageMemoryPool.incrementPoolSize(storageMemory)
onHeapExecutionMemoryPool.incrementPoolSize(onHeapExecutionMemory)
offHeapExecutionMemoryPool.incrementPoolSize(conf.getSizeAsBytes("spark.memory.offHeap.size", 0))
3.内存管理的几个核心方法
在比较新的unified memory manager管理模型中, 对之前的strorage + execution进行了进一步细分.
进一步细分后, 可以看到storage里面出现了一个概念unroll
. 这个机制对于做过linux内核的同学可能非常容易理解, 这就是内存页伙伴算法在JVM的实现.
这个概念非常类似于我们去饭店吃饭, 虽然我们可能不太好估计同行有多少人, 但是可以根据大概的估计先把位置占了, 而且这些位置一定要是连续的, 我和我的朋友们必须坐在一张桌子上啊! 这个Unroll就是预约连续内存空间的一种模型, 可以提升执行效率, 它把在硬盘上可能分散的数据块(实际上HDFS下面的ext4会尽可能连续写) 平铺到一段连续的内存空间中.
下面三个方法对应的就是不同的内存池进行申请
/**
* Acquire N bytes of memory to cache the given block, evicting existing ones if necessary.
* Blocks evicted in the process, if any, are added to `evictedBlocks`.
* @return whether all N bytes were successfully granted.
*/
def acquireStorageMemory
/**
* Acquire N bytes of memory to unroll the given block, evicting existing ones if necessary.
*
* This extra method allows subclasses to differentiate behavior between acquiring storage
* memory and acquiring unroll memory. For instance, the memory management model in Spark
* 1.5 and before places a limit on the amount of space that can be freed from unrolling.
* Blocks evicted in the process, if any, are added to `evictedBlocks`.
*
* @return whether all N bytes were successfully granted.
*/
def acquireUnrollMemory
/**
* Try to acquire up to `numBytes` of execution memory for the current task and return the
* number of bytes obtained, or 0 if none can be allocated.
*
* This call may block until there is enough free memory in some situations, to make sure each
* task has a chance to ramp up to at least 1 / 2N of the total memory pool (where N is the # of
* active tasks) before it is forced to spill. This can happen if the number of tasks increase
* but an older task had a lot of memory already.
*/
private[memory]
def acquireExecutionMemory
和申请对应的, 还有查询已经申请的内存, 以及释放已经申请的内存的方法
/**
* Execution memory currently in use, in bytes.
*/
final def executionMemoryUsed: Long
/**
* Release N bytes of storage memory.
*/
def releaseStorageMemory(numBytes: Long): Unit
在spark的内存管理中, 有一个和内核内存管理非常相似的概念, 就是内存页memory page. 在内核中, memory page是使用buddy algorithm来管理, 在spark中也非常类似, 每次可以分配连续的2^n个页给需要的block使用. Linux内核中默认的页大小一般是4K, 而spark中是1M
/**
* The default page size, in bytes.
*
* If user didn't explicitly set "spark.buffer.pageSize", we figure out the default value
* by looking at the number of cores available to the process, and the total amount of memory,
* and then divide it by a factor of safety.
*/
val pageSizeBytes: Long = {
val minPageSize = 1L * 1024 * 1024 // 1MB
val maxPageSize = 64L * minPageSize // 64MB
val cores = if (numCores > 0) numCores else Runtime.getRuntime.availableProcessors()
// Because of rounding to next power of 2, we may have safetyFactor as 8 in worst case
val safetyFactor = 16
val maxTungstenMemory: Long = tungstenMemoryMode match {
case MemoryMode.ON_HEAP => onHeapExecutionMemoryPool.poolSize
case MemoryMode.OFF_HEAP => offHeapExecutionMemoryPool.poolSize
}
val size = ByteArrayMethods.nextPowerOf2(maxTungstenMemory / cores / safetyFactor)
val default = math.min(maxPageSize, math.max(minPageSize, size))
conf.getSizeAsBytes("spark.buffer.pageSize", default)
}