19/05/16 10:11:39 WARN TaskMemoryManager: leak 32.0 KB memory from org.apache.spark.shuffle.sort.ShuffleExternalSorter@567bac86
19/05/16 10:11:39 WARN TaskMemoryManager: leak a page: org.apache.spark.unsafe.memory.MemoryBlock@f338fda in task 4214
19/05/16 10:11:39 ERROR Executor: Exception in task 58.0 in stage 24.0 (TID 4214)
java.lang.OutOfMemoryError: Unable to acquire 16384 bytes of memory, got 0
at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:100)
at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:111)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:153)
at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.create(UnsafeExternalSorter.java:120)
at org.apache.spark.sql.execution.WindowExec$$anonfun$15$$anon$1.fetchNextPartition(WindowExec.scala:339)
at org.apache.spark.sql.execution.WindowExec$$anonfun$15$$anon$1.next(WindowExec.scala:390)
at org.apache.spark.sql.execution.WindowExec$$anonfun$15$$anon$1.next(WindowExec.scala:289)
def fill_null(df):
:param df:
## 前向补充bulkmask,mileage的空值
# 定义窗的范围
window_befeore = Window.partitionBy(["adept", "adest", "ptdLabel", "schdate"]) \
.orderBy('generated_date') \
.rowsBetween(-sys.maxsize, 0)
# 定义前向填充的列
full_bulkmask = last(df['bulkmask'], ignorenulls=True).over(window_befeore)
filled_mileage = last(df['mileage'], ignorenulls=True).over(window_befeore)
filled_price_Y = last(df['price_Y'], ignorenulls=True).over(window_befeore)
# 填充
df = df.withColumn('full_bulkmask', full_bulkmask) \
.withColumn('filled_mileage', filled_mileage) \
.withColumn('filled_price_Y', filled_price_Y)
# 定义窗的范围
window_after = Window.partitionBy(["adept", "adest", "ptdLabel", "schdate"]) \
.orderBy(desc('generated_date')) \
.rowsBetween(-sys.maxsize, 0)
# 定义后向填充的列
full_mileage = last(df['filled_mileage'], ignorenulls=True).over(window_after)
full_price_Y = last(df['filled_price_Y'], ignorenulls=True).over(window_after)
# 填充
df = df.withColumn('full_price_Y', full_price_Y) \
.withColumn('full_mileage', full_mileage)
# 将generated_date较早且空缺的数据用full_price_Y填充
df = df.withColumn("full_bulkmask",
when(df["full_bulkmask"].isNull(), df["full_price_Y"]).otherwise(df["full_bulkmask"]))
# 添加full_price_Y
_columns1 = [c for c in df.columns if
c not in {"bulkmask", "mileage", "price_Y", "filled_price_Y", "filled_mileage"}]
df = df.select(_columns1)
return df
private void throwOom(final MemoryBlock page, final long required) {
long got = 0;
if (page != null) {
got = page.size();
taskMemoryManager.freePage(page, this);
// checkstyle.off: RegexpSinglelineJava
throw new SparkOutOfMemoryError("Unable to acquire " + required + " bytes of memory, got " +
// checkstyle.on: RegexpSinglelineJava
ShuffleExternalSorter存储数据的最小单位是MemoryBlock,每一个MemoryBlock正是TaskMemoryManager泄漏的32kb,而MemoryBlock内存泄漏给出一个反馈“管理的Executor中再也挤不出内存了”。在统一内存管理模式下(Spark内存管理可以参考我先前的一篇博客:https://blog.csdn.net/crazybean_lwb/article/details/90030429 ),如果executor的内存如果运行内存或者存储内存过大,会将排好序的数据溢出到硬盘中(详情可参考UnsafeExternalSorter.java)并且释放掉这部分内存(详情可见freeMemory函数),这种运行方式能确保有限的内存资源运算较大的数量,由于对堆内对象的分配和释放是由 JVM 管理的,而 Spark 是通过随机采样获取已经使用的内存情况,有可能因为数据量大且采样不准确而不能及时 Spill导致OOM,所以Spill不及时和Spill的量不够疑似导致超内存的原因之一。
从增大executor内存的方向出发,增大spark.memory.fraction配置参数(尽量使用默认0.6,增加这个值可能会导致垃圾回收压力增加,GC报错)一定程度上能扩大Executor有效使用的内存,但对本案例中的结果无影响。最后推断,执行WindowExec中使用window函数对数据排序的时候导致executor内存欠缺较多,下面重点分析window func源代码。
* Sort and spill the current records in response to memory pressure.
public long spill(long size, MemoryConsumer trigger) throws IOException {
if (trigger != this) {
if (readingIterator != null) {
return readingIterator.spill();
return 0L; // this should throw exception
if (inMemSorter == null || inMemSorter.numRecords() <= 0) {
return 0L;
logger.info("Thread {} spilling sort data of {} to disk ({} {} so far)",
spillWriters.size() > 1 ? " times" : " time");
ShuffleWriteMetrics writeMetrics = new ShuffleWriteMetrics();
final UnsafeSorterSpillWriter spillWriter =
new UnsafeSorterSpillWriter(blockManager, fileBufferSizeBytes, writeMetrics,
spillIterator(inMemSorter.getSortedIterator(), spillWriter);
final long spillSize = freeMemory();
// Note that this is more-or-less going to be a multiple of the page size, so wasted space in
// pages will currently be counted as memory spilled even though that space isn't actually
// written to disk. This also counts the space needed to store the sorter's pointer array.
// Reset the in-memory sorter's pointer array only after freeing up the memory pages holding the
// records. Otherwise, if the task is over allocated memory, then without freeing the memory
// pages, we might not be able to get memory for the pointer array.
totalSpillBytes += spillSize;
return spillSize;
* Free this sorter's data pages.
* @return the number of bytes freed.
private long freeMemory() {
long memoryFreed = 0;
for (MemoryBlock block : allocatedPages) {
memoryFreed += block.size();
currentPage = null;
pageCursor = 0;
return memoryFreed;
object Window {
* Creates a [[WindowSpec]] with the partitioning defined.
* @since 1.4.0
def partitionBy(cols: Column*): WindowSpec = {
spec.partitionBy(cols : _*)
* Creates a [[WindowSpec]] with the ordering defined.
* @since 1.4.0
def orderBy(cols: Column*): WindowSpec = {
spec.orderBy(cols : _*)
/* @param start boundary start, inclusive. The frame is unbounded if this is
* the minimum long value (`Window.unboundedPreceding`).
* @param end boundary end, inclusive. The frame is unbounded if this is the
* maximum long value (`Window.unboundedFollowing`).
* @since 2.1.0
// Note: when updating the doc for this method, also update WindowSpec.rowsBetween.
def rowsBetween(start: Long, end: Long): WindowSpec = {
spec.rowsBetween(start, end)
/* @param start boundary start, inclusive. The frame is unbounded if this is
* the minimum long value (`Window.unboundedPreceding`).
* @param end boundary end, inclusive. The frame is unbounded if this is the
* maximum long value (`Window.unboundedFollowing`).
* @since 2.1.0
// Note: when updating the doc for this method, also update WindowSpec.rangeBetween.
def rangeBetween(start: Long, end: Long): WindowSpec = {
spec.rangeBetween(start, end)
* A window function calculates the results of a number of window functions for a window frame.
* Before use a frame must be prepared by passing it all the rows in the current partition. After
* preparation the update method can be called to fill the output rows.
abstract class WindowFunctionFrame {
* Prepare the frame for calculating the results for a partition.
* @param rows to calculate the frame results for.
def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit
* Write the current results to the target row.
def write(index: Int, current: InternalRow): Unit
window功能函数在计算时以window frame为一个基础单位,准备好基础数据,即可将结果以新的列形式添加到原有frame的右侧。需要警惕的是window是一个代价比较昂贵的操作,操作时需要将一个分组中的所有数据放入同一个排好序的partition中,不同的frame可以并行操作。详情可以参考下面的doExecute。
* This class calculates and outputs (windowed) aggregates over the rows in a single (sorted)
* partition. The aggregates are calculated for each row in the group. Special processing
* instructions, frames, are used to calculate these aggregates. Frames are processed in the order
* specified in the window specification (the ORDER BY ... clause).
* This is quite an expensive operator because every row for a single group must be in the same
* partition and partitions must be sorted according to the grouping and sort order. The operator
* requires the planner to take care of the partitioning and sorting.
protected override def doExecute(): RDD[InternalRow] = {
// Unwrap the expressions and factories from the map.
val expressions = windowFrameExpressionFactoryPairs.flatMap(_._1)
val factories = windowFrameExpressionFactoryPairs.map(_._2).toArray
val inMemoryThreshold = sqlContext.conf.windowExecBufferInMemoryThreshold
val spillThreshold = sqlContext.conf.windowExecBufferSpillThreshold
// Start processing.
child.execute().mapPartitions { stream =>
new Iterator[InternalRow] {
// Get all relevant projections.
val result = createResultProjection(expressions)
val grouping = UnsafeProjection.create(partitionSpec, child.output)
// Manage the current partition.
val buffer: ExternalAppendOnlyUnsafeRowArray =
new ExternalAppendOnlyUnsafeRowArray(inMemoryThreshold, spillThreshold)
var bufferIterator: Iterator[UnsafeRow] = _
val windowFunctionResult = new SpecificInternalRow(expressions.map(_.dataType))
val frames = factories.map(_(windowFunctionResult))
val numFrames = frames.length
private[this] def fetchNextPartition() {
// Collect all the rows in the current partition.
// Before we start to fetch new input rows, make a copy of nextGroup.
val currentGroup = nextGroup.copy()
// clear last partition
while (nextRowAvailable && nextGroup == currentGroup) {
// 'Merge' the input row with the window function result
join(current, windowFunctionResult)
rowIndex += 1
// Return the projection.
方案三:在2.2版本以后可以调整spark.maxRemoteBlockSizeFetchToMem 参数,降低spill阈值,增加spill频率减少内存压力。