Spark参数spark.executor.memoryOverhead与spark.memory.offHeap.size的区别

最近疑惑一个问题,spark executor的堆外内存问题,堆外内存很好理解,这里不再解释,疑惑的是它的设置,看过官网就知道设置堆外内存的参数为spark.executor.memoryOverhead与spark.memory.offHeap.size(需要与 spark.memory.offHeap.enabled同时使用),其中这两个都是描述堆外内存的,但是它们有什么区别么?
 
因为对两个参数不是很理解,所以我在集群上找了个任务,对这两个参数进行研究。
 
Spark参数spark.executor.memoryOverhead与spark.memory.offHeap.size的区别_第1张图片
 
我是通过storage memory的计算来分析,因为storage memory的计算中包含了堆外内存。
 
正确的:
 
storage memory = (spark.executor.memory - 300M) * spark.storage.memoryFraction * spark.storage.safetyFraction + 堆外内存
 
 
如果想详细了解请看我的另一篇博客 https://blog.csdn.net/lquarius/article/details/106558464
 
已知:
 
storage memory    4.7G(见下图)
spark.executor.memory    1G(默认 )
spark.storage.memoryFraction 0.6(默认 )
spark.storage.safetyFraction 0.5(默认 )
spark.executor.memoryOverhead   5G
spark.memory.offHeap.size   4G
 
 
更正计算公式,因为动态占用机制,UI显示的 storage memory = 执行内存 + 存储内存
 
更正后(非真实的storage memory):
 
storage memory = (spark.executor.memory - 300M) * spark.storage.memoryFraction + 堆外内存
 
 
数值带入公式: 
 
4.7G = (1G - 300M) *  0.6  + 堆外内存
 
堆外内存 ≈ 4G
 
 
这样一来就清楚了,spark.memory.offHeap.size   4G 才是爸爸
 
spark.executor.memoryOverhead   5G 我凭什么当儿子,那设置我干什么?
 
以下是我从别处找来关于此问题的介绍,有兴趣的可以看看:
https://stackoverflow.com/questions/58666517/difference-between-spark-yarn-executor-memoryoverhead-and-spark-memory-offhea
https://stackoverflow.com/questions/61263618/difference-between-spark-executor-memoryoverhead-and-spark-memory-offheap-size
 
总结如下(spark 2.X):
spark.memory.offHeap.size 真正作用于spark executor的堆外内存
spark.executor.memoryOverhead 作用于yarn,通知yarn我要使用堆外内存和使用内存的大小,相当于spark.memory.offHeap.size +  spark.memory.offHeap.enabled,设置参数的大小并非实际使用内存大小
 
什么时候使用?如何使用?
需要设置堆外内存时候,什么时候需要对外内存,我觉得是任何时候,因为你不知道executor因内存不足oom, 使用时 spark.executor.memoryOverhead设置最好大于等于 spark.memory.offHeap.size
 
官网介绍:
spark.executor.memoryOverhead
executorMemory * 0.10, with minimum of 384
The amount of off-heap memory to be allocated per executor, in MiB unless otherwise specified. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%). This option is currently supported on YARN and Kubernetes.
spark.memory.offHeap.enabled
false
If true, Spark will attempt to use off-heap memory for certain operations. If off-heap memory use is enabled, then  spark.memory.offHeap.size  must be positive.
spark.memory.offHeap.size
0
The absolute amount of memory in bytes which can be used for off-heap allocation. This setting has no impact on heap memory usage, so if your executors' total memory consumption must fit within some hard limit then be sure to shrink your JVM heap size accordingly. This must be set to a positive value when  spark.memory.offHeap.enabled=true .
 
其他:
spark.yarn.executor.memoryOverhead 是什么呢?
spark.yarn.executor.memoryOverhead已废弃(1.x早起版本 例如Spark1.2文档, 不过有个yarn好像含义更明显),现更改为 spark.executor.memoryOverhead
spark3.0中 spark.executor.memoryOverhead又有新的描述,有兴趣可以关注下,不过官方文档描述的很含糊
spark.driver.memoryOverhead
driverMemory * 0.10, with minimum of 384
Amount of non-heap memory to be allocated per driver process in cluster mode, in MiB unless otherwise specified. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the container size (typically 6-10%). This option is currently supported on YARN, Mesos and Kubernetes.  Note:  Non-heap memory includes off-heap memory (when  spark.memory.offHeap.enabled=true ) and memory used by other driver processes (e.g. python process that goes with a PySpark driver) and memory used by other non-driver processes running in the same container. The maximum memory size of container to running driver is determined by the sum of  spark.driver.memoryOverhead  and  spark.driver.memory .
 
 

你可能感兴趣的:(spark,hadoop,spark,大数据)