spark源码分析之TaskMemoryManager

概述

TaskMemoryManager用于管理每个task分配的内存。

在off-heap内存模式中,可以用64-bit的地址来表示内存地址。在on-heap内存模式中,通过base object的引用 和该对象中64-bit 的偏移量来表示内存地址。

当我们想要存储其它结构内部的数据结构的指针时,这是一个问题,例如记录hashmap或者sorting buffer的指针。即使我们使用128-bit来表示内存地址,我们也不能仅仅保存base object的地址,因为它在堆中是不稳定的,会因为GC而重组。

相反,我们使用以下方法来编码64-bit长的记录指针:对于off-heap内存模式,仅仅存储原始地址,对于on-heap内存模式,使用该地址的高13位来存储一个page number,低51位来存储这个page内的偏移。这些page number可以用来索引到MemoryManager内的page table 数组,从而检索到base object。

这允许我们寻址8192个page。 在on-heap内存模式中,最大page大小受限于long []数组的最大大小,允许我们寻址8192 *(2 ^ 31 - 1)* 8个字节,这是大约140 terabytes的内存。

TaskMemoryManager的实现其实是project Tungsten的一部分。Tungsten从memory和cpu层面对spark的性能进行了优化。

在默认情况下堆外内存并不启用,可通过配置 spark.memory.offHeap.enabled 参数启用,并由 spark.memory.offHeap.size 参数设定堆外空间的大小。

 

unsafe.allocateMemory方法分配堆外内存

unsafe.java

 /**
     * Allocates a new block of native memory, of the given size in bytes.  The
     * contents of the memory are uninitialized; they will generally be
     * garbage.  The resulting native pointer will never be zero, and will be
     * aligned for all value types.  Dispose of this memory by calling {@link
     * #freeMemory}, or resize it with {@link #reallocateMemory}.
     *
     * @throws IllegalArgumentException if the size is negative or too large
     *         for the native size_t type
     *
     * @throws OutOfMemoryError if the allocation is refused by the system
     *
     * @see #getByte(long)
     * @see #putByte(long, byte)
     */
	 //基于给定的目标内存的字节大小,分配一块新的本地内存,返回内存块的起始地址
    public native long allocateMemory(long bytes);

 unsafe.cpp

UNSAFE_ENTRY(jlong, Unsafe_AllocateMemory(JNIEnv *env, jobject unsafe, jlong size))
  UnsafeWrapper("Unsafe_AllocateMemory");
  size_t sz = (size_t)size;
  if (sz != (julong)size || size < 0) {
    THROW_0(vmSymbols::java_lang_IllegalArgumentException());
  }
  if (sz == 0) {
    return 0;
  }
  sz = round_to(sz, HeapWordSize);
 //基于给定的目标内存的字节大小,分配一块新的本地内存,返回内存块的起始地址的指针
  void* x = os::malloc(sz, mtInternal);
  if (x == NULL) {
    THROW_0(vmSymbols::java_lang_OutOfMemoryError());
  }
  //Copy::fill_to_words((HeapWord*)x, sz / HeapWordSize);
  return addr_to_java(x);  //将该指针转换成jlong数据类型
UNSAFE_END

linux平台下的malloc函数 —— linux manual page

       void *malloc(size_t size);
       void free(void *ptr);
       void *realloc(void *ptr, size_t size);

       The malloc() function allocates size bytes and returns a pointer to
       the allocated memory.  The memory is not initialized.  If size is 0,
       then malloc() returns either NULL, or a unique pointer value that can
       later be successfully passed to free().

       The free() function frees the memory space pointed to by ptr, which
       must have been returned by a previous call to malloc(), calloc(), or
       realloc().  Otherwise, or if free(ptr) has already been called
       before, undefined behavior occurs.  If ptr is NULL, no operation is
       performed.  
  
       The realloc() function changes the size of the memory block pointed
       to by ptr to size bytes.  The contents will be unchanged in the range
       from the start of the region up to the minimum of the old and new
       sizes.  If the new size is larger than the old size, the added memory
       will not be initialized.  

 addr_to_java方法将地址指针转换成jlong数据类型

inline void* addr_from_java(jlong addr) {
  // This assert fails in a variety of ways on 32-bit systems.
  // It is impossible to predict whether native code that converts
  // pointers to longs will sign-extend or zero-extend the addresses.
  //assert(addr == (uintptr_t)addr, "must not be odd high bits");
  return (void*)(uintptr_t)addr;
}

inline jlong addr_to_java(void* p) {
  assert(p == (void*)(uintptr_t)p, "must not be odd high bits");
  return (uintptr_t)p;
}

 关于uintptr_t的介绍如下:

它是一个可以存储指针值的unsigned int数据类型。

It is an unsigned int that is capable of storing a pointer. Which typically means that it's the same size as a pointer.

It is optionally defined in C++11 and later standards.

A common reason to want an integer type that can hold an architecture's pointer type is to perform integer-specific operations on a pointer, or to obscure the type of a pointer by providing it as an integer "handle".

a UINT_PTR (as well as the more standardized uintptr_t) is defined to be an unsigned integer that is guaranteed to be large enough to hold a pointer value. It's typically used for tricky code where pointers are put into integer values and vice-versa.

The _W64 annotation is a note to the Miscrosoft compiler that when compiling for a 64-bit target, the variable should be 64 bits wide instead of the usual 32, since on 64-bit platforms, pointers are 64 bits, but unsigned ints are usually still 32 bits. This ensures that sizeof(UINT_PTR) >= sizeof(void*) for all target platforms.

 指针类型的长度和long类型的长度总是相同的。

32和64位C语言内置数据类型,如下表所示:

上表中第一行的大写字母和数字含义如下所示:
I表示:int类型
L表示:long类型
P表示:pointer指针类型
32表示:32位系统
64表示64位系统
如:LP64表示,在64位系统下的long类型和pointer类型长度为64位。
64位Linux 使用了 LP64 标准,即:long类型和pointer类型长度为64位,其他类型的长度和32位系统下相同类型的长度相同。

unsafe对堆内内存的java对象的字段地址定位

jniHandles.hpp

JNIHandles的resolve方法将对象引用(对象句柄)转换成oop。

当我们在堆上创建一个对象实例后,就要通过虚拟机栈中的reference类型数据来操作堆上的对象。在hotspot虚拟机中,reference中存储的就是对象地址。

对象引用到对象地址存在以下的转换关系:

jobject handle >> oop p   >>  (address) p

// Interface for creating and resolving local/global JNI handles

class JNIHandles : AllStatic {
  friend class VMStructs;
 private:
  static JNIHandleBlock* _global_handles;             // First global handle block
  static JNIHandleBlock* _weak_global_handles;        // First weak global handle block
  static oop _deleted_handle;                         // Sentinel marking deleted handles

 public:
  // Resolve handle into oop
  inline static oop resolve(jobject handle);

unsafe.cpp

index_oop_from_field_offset_long方法

该方法根据传入的参数:对象引用对应的oop;成员变量在对象的偏移;求出成员变量的地址。

其中,成员变量在对象的偏移必须大于对象头的大小。

inline void* index_oop_from_field_offset_long(oop p, jlong field_offset) {
  jlong byte_offset = field_offset_to_byte_offset(field_offset);
  // Don't allow unsafe to be used to read or write the header word of oops
  assert(p == NULL || field_offset >= oopDesc::header_size(), "offset must be outside of header");
#ifdef ASSERT
  if (p != NULL) {
    assert(byte_offset >= 0 && byte_offset <= (jlong)MAX_OBJECT_SIZE, "sane offset");
    if (byte_offset == (jint)byte_offset) {
      void* ptr_plus_disp = (address)p + byte_offset;
      assert((void*)p->obj_field_addr((jint)byte_offset) == ptr_plus_disp,
             "raw [ptr+disp] must be consistent with oop::field_base");
    }
    jlong p_size = HeapWordSize * (jlong)(p->size());
    assert(byte_offset < p_size, err_msg("Unsafe access: offset " INT64_FORMAT " > object's size " INT64_FORMAT, byte_offset, p_size));
  }
#endif
  if (sizeof(char*) == sizeof(jint))    // (this constant folds!)
    return (address)p + (jint) byte_offset;
  else
    return (address)p +        byte_offset;
}

 

字段定位——ObjectFieldOffset方法

该方法基于给定的静态字段的名称,求出该静态成员变量在对象的偏移。静态成员变量在对象的偏移只跟类有关,一个类的所有对象,某个成员变量在对象的偏移都是一样的。

/***
   * Returns the memory address offset of the given static field.
   * The offset is merely used as a means to access a particular field
   * in the other methods of this class.  The value is unique to the given
   * field and the same value should be returned on each subsequent call.
   * 返回指定静态field的内存地址偏移量,在这个类的其他方法中这个值只是被用作一个访问
   * 特定field的一个方式。这个值对于 给定的field是唯一的,并且后续对该方法的调用都应该
   * 返回相同的值。
   *
   * @param field the field whose offset should be returned.
   *              需要返回偏移量的field
   * @return the offset of the given field.
   *         指定field的偏移量
   */
  public native long objectFieldOffset(Field field);

数组定位——arrayBaseOffset方法

该方法求出给定的数组类型中,数组第一个元素的偏移地址。

 /***
   * Returns the offset of the first element for a given array class.
   * To access elements of the array class, this value may be used along with
   * with that returned by 
   * arrayIndexScale,
   * if non-zero.
   * 获取给定数组中第一个元素的偏移地址。
   * 为了存取数组中的元素,这个偏移地址与arrayIndexScale
   * 方法的非0返回值一起被使用。
   * @param arrayClass the class for which the first element's address should
   *                   be obtained.
   *                   第一个元素地址被获取的class
   * @return the offset of the first element of the array class.
   *    数组第一个元素 的偏移地址
   * @see arrayIndexScale(Class)
   */
  public native int arrayBaseOffset(Class arrayClass);

spark Tungsten的内存管理

platform封装unsafe

spark的platfom类只是对unsafe的简单封装。

其中,allocateMemory方法用于分配堆外内存,LONG_ARRAY_OFFSET用于定位堆内内存long array的数组第一个元素的地址

paltform.java

 public static long allocateMemory(long size) {
    return _UNSAFE.allocateMemory(size);
  }

  public static void freeMemory(long address) {
    _UNSAFE.freeMemory(address);
  }


  public static final int LONG_ARRAY_OFFSET;

  static{
    //部分内容省略
    LONG_ARRAY_OFFSET = _UNSAFE.arrayBaseOffset(long[].class);
  }

 UnsafeMemoryAllocator分配堆外内存

UnsafeMemoryAllocator类用于分配堆外内存,底层调用unsafe类。

UnsafeMemoryAllocator.java

/**
 * A simple {@link MemoryAllocator} that uses {@code Unsafe} to allocate off-heap memory.
 */
public class UnsafeMemoryAllocator implements MemoryAllocator {

  @Override
  public MemoryBlock allocate(long size) throws OutOfMemoryError {
    long address = Platform.allocateMemory(size);
    MemoryBlock memory = new MemoryBlock(null, address, size);
    if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
      memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
    }
    return memory;
  }

  @Override
  public void free(MemoryBlock memory) {
    assert (memory.obj == null) :
      "baseObject not null; are you trying to use the off-heap allocator to free on-heap memory?";
    assert (memory.pageNumber != MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER) :
      "page has already been freed";
    assert ((memory.pageNumber == MemoryBlock.NO_PAGE_NUMBER)
            || (memory.pageNumber == MemoryBlock.FREED_IN_TMM_PAGE_NUMBER)) :
      "TMM-allocated pages must be freed via TMM.freePage(), not directly in allocator free()";

    if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
      memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_FREED_VALUE);
    }
    Platform.freeMemory(memory.offset);
    // As an additional layer of defense against use-after-free bugs, we mutate the
    // MemoryBlock to reset its pointer.
    memory.offset = 0;
    // Mark the page as freed (so we can detect double-frees).
    memory.pageNumber = MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER;
  }
}

HeapMemoryAllocator分配堆内内存

HeapMemoryAllocator类用于分配堆内内存,本质是创建long array作为堆内内存,并封装成MemoryBlock。

Platform.LONG_ARRAY_OFFSET是long array数组类型中,数组第一个元素相对数组的偏移。

HeapMemoryAllocator.java

public MemoryBlock allocate(long size) throws OutOfMemoryError {
    if (shouldPool(size)) {
      synchronized (this) {
        final LinkedList> pool = bufferPoolsBySize.get(size);  //long Arrray的弱引用链表
        if (pool != null) {
          while (!pool.isEmpty()) {
            final WeakReference arrayReference = pool.pop();//从弱引用链表获取(头节点获取)一个弱引用
            final long[] array = arrayReference.get(); //弱引用对应的long Array
            if (array != null) {
              assert (array.length * 8L >= size); //如果该long Array可容纳的字节数大于目标size
			  //将long Array封装成MemoryBlock
              MemoryBlock memory = new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, size); 
              if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
                memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
              }
              return memory;
            }
          }
          bufferPoolsBySize.remove(size);
        }
      }
    }
    long[] array = new long[(int) ((size + 7) / 8)]; //创建一个long array
     //将long Array封装成MemoryBlock
    MemoryBlock memory = new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, size);
    if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) {
      memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_CLEAN_VALUE);
    }
    return memory;
  }

 

MemoryBlock封装内存块的起始地址和字节大小

MemoryBlock继承自MemoryLocation,与父类的最大区别,它扩充了2个字段:

size:用于保存内存块的字节大小;

pageNumber:用于保存当TaskMemoryManager将该MemoryBlock添加到pageTable时,对应的数组索引号。

MemoryBlock.java

/**
 * A consecutive block of memory, starting at a {@link MemoryLocation} with a fixed size.
     一个连续的内存块,从MemoryLocation记录的起始地址开始,具有固定的字节大小。
 */
public class MemoryBlock extends MemoryLocation {

  /** Special `pageNumber` value for pages which were not allocated by TaskMemoryManagers 
    特殊pageNumber值,当TaskMemoryManager未给MemoryBlock设置pageNumber时,设置pageNUmber为-1
  */
  public static final int NO_PAGE_NUMBER = -1;

  /**
   * Special `pageNumber` value for marking pages that have been freed in the TaskMemoryManager.
   * We set `pageNumber` to this value in TaskMemoryManager.freePage() so that MemoryAllocator
   * can detect if pages which were allocated by TaskMemoryManager have been freed in the TMM
   * before being passed to MemoryAllocator.free() (it is an error to allocate a page in
   * TaskMemoryManager and then directly free it in a MemoryAllocator without going through
   * the TMM freePage() call).
     特殊pageNumber值,用来标记该MemoryBlock已经被TaskMemoryManager释放。
   */
  public static final int FREED_IN_TMM_PAGE_NUMBER = -2;

  /**
   * Special `pageNumber` value for pages that have been freed by the MemoryAllocator. This allows
   * us to detect double-frees.
   特殊pageNumber值,用来标记该MemoryBlock已经被MemoryAllocator释放。
   */
  public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;

  private final long length;

  /**
   * Optional page number; used when this MemoryBlock represents a page allocated by a
   * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager,
   * which lives in a different package.
     初始化为-1。当MemoryBlock代表一个page被TaskMemoryManger分配时会使用该字段。
   */
  public int pageNumber = NO_PAGE_NUMBER;

 
  //当是堆外内存分配时,obj为null,offset是malloc函数返回的绝对地址(address),
  //length是malloc函数的目标内存大小(size);
  //当是堆内内存分配时,obj为long array的引用,offset是long array类型的数组第一个元素相对数组的偏移,
  //length约等于long array的长度乘以8(因为分配时可能添多几个字节以使8字节对齐)
  public MemoryBlock(@Nullable Object obj, long offset, long length) {
    super(obj, offset);
    this.length = length;
  }

  /**
   * Returns the size of the memory block.
   */
  public long size() {
    return length;
  }

  /**
   * Creates a memory block pointing to the memory used by the long array.
     创建一个MemoryBlock,用来指向目标long array使用的内存。
   */
  public static MemoryBlock fromLongArray(final long[] array) {
    return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L);
  }

  /**
   * Fills the memory block with the specified byte value.
   */
  public void fill(byte value) {
    Platform.setMemory(obj, offset, length, value);
  }
}

 

MemoryLocation保存内存块的起始地址

如果是堆外内存,obj为null,offset是malloc函数返回的绝对地址;

如果是堆内内存,obj是对象引用,offset是成员变量在对象中的偏移;

 MemoryLocation.java

/**
 * A memory location. Tracked either by a memory address (with off-heap allocation),
 * or by an offset from a JVM object (in-heap allocation).
 内存位置,如果是堆外分配内存,根据内存地址追踪;如果是堆内分配内存,根据jvm对象的偏移量追踪。
 */
public class MemoryLocation {

  @Nullable
  Object obj;

  long offset;

  public MemoryLocation(@Nullable Object obj, long offset) {
    this.obj = obj;
    this.offset = offset;
  }

  public MemoryLocation() {
    this(null, 0);
  }

  public void setObjAndOffset(Object newObj, long newOffset) {
    this.obj = newObj;
    this.offset = newOffset;
  }

  public final Object getBaseObject() {
    return obj;
  }

  public final long getBaseOffset() {
    return offset;
  }
}

TaskMemoryManger内存管理

铺垫了那么久,终于来到本文的重头戏了。

TaskMemoryManger用于管理分配给某个task的内存。

该类提供了以下功能:

1、一个MemoryConsumer的HashSet集合,用于管理所有获得ExecutionMemory的MemoryConsumer;

2、一个MemoryBlock数组类型——pageTable,用于管理已分配的MemoryBlock;

3、分配内存,并对分配的堆内内存或堆外内存的地址进行了统一表示。

4、释放内存。

成员变量

/** The number of bits used to address the page table. */
  private static final int PAGE_NUMBER_BITS = 13;

  /** The number of bits used to encode offsets in data pages. */
  @VisibleForTesting
  static final int OFFSET_BITS = 64 - PAGE_NUMBER_BITS;  // 51

  /** The number of entries in the page table. */
  private static final int PAGE_TABLE_SIZE = 1 << PAGE_NUMBER_BITS;  //pageTable的大小为2的13次方

  /**
   * Maximum supported data page size (in bytes). In principle, the maximum addressable page size is
   * (1L << OFFSET_BITS) bytes, which is 2+ petabytes. However, the on-heap allocator's
   * maximum page size is limited by the maximum amount of data that can be stored in a long[]
   * array, which is (2^31 - 1) * 8 bytes (or about 17 gigabytes). Therefore, we cap this at 17
   * gigabytes.
   */
  public static final long MAXIMUM_PAGE_SIZE_BYTES = ((1L << 31) - 1) * 8L;

  /** Bit mask for the lower 51 bits of a long. */
  private static final long MASK_LONG_LOWER_51_BITS = 0x7FFFFFFFFFFFFL;

  /**
   * Similar to an operating system's page table, this array maps page numbers into base object
   * pointers, allowing us to translate between the hashtable's internal 64-bit address
   * representation and the baseObject+offset representation which we use to support both in- and
   * off-heap addresses. When using an off-heap allocator, every entry in this map will be `null`.
   * When using an in-heap allocator, the entries in this map will point to pages' base objects.
   * Entries are added to this map as new data pages are allocated.
     与操作系统的page table相似,这个数组将page number映射到base object的指针。它允许我们在hashtable
	 的内部64位地址表示和object+offset表示之间转换,从而支持堆内地址和堆外地址。当我们使用堆外的allocator时,
	 在这个map中的每个entry都为null。当使用堆内的allocator时,在这个map中的entry指向page的base object。
	 当分配一个新的data page时,一个entry会添加到这个map。	 
   */
  private final MemoryBlock[] pageTable = new MemoryBlock[PAGE_TABLE_SIZE];

  /**
   * Bitmap for tracking free pages.
   */
  private final BitSet allocatedPages = new BitSet(PAGE_TABLE_SIZE);

  private final MemoryManager memoryManager;

  private final long taskAttemptId;

  /**
   * Tracks whether we're in-heap or off-heap. For off-heap, we short-circuit most of these methods
   * without doing any masking or lookups. Since this branching should be well-predicted by the JIT,
   * this extra layer of indirection / abstraction hopefully shouldn't be too expensive.
   */
  final MemoryMode tungstenMemoryMode;

  /**
   * Tracks spillable memory consumers.
   */
  @GuardedBy("this")
  private final HashSet consumers;

allocatePage方法

该方法从堆内或堆外分配一块MemoryBlock,并作为page添加到pageTable中。

/**
   * Allocate a block of memory that will be tracked in the MemoryManager's page table; this is
   * intended for allocating large blocks of Tungsten memory that will be shared between operators.
   *
   * Returns `null` if there was not enough memory to allocate the page. May return a page that
   * contains fewer bytes than requested, so callers should verify the size of returned pages.
   *
   * @throws TooLargePageException
   */
  public MemoryBlock allocatePage(long size, MemoryConsumer consumer) {
    assert(consumer != null);
    assert(consumer.getMode() == tungstenMemoryMode);
    if (size > MAXIMUM_PAGE_SIZE_BYTES) {
      throw new TooLargePageException(size);
    }

    long acquired = acquireExecutionMemory(size, consumer); //获取consumer所需字节
    if (acquired <= 0) {
      return null;
    }

    final int pageNumber;
    synchronized (this) {
      pageNumber = allocatedPages.nextClearBit(0);
      if (pageNumber >= PAGE_TABLE_SIZE) {
        releaseExecutionMemory(acquired, consumer);
        throw new IllegalStateException(
          "Have already allocated a maximum of " + PAGE_TABLE_SIZE + " pages");
      }
      allocatedPages.set(pageNumber);
    }
    MemoryBlock page = null;
    try {
      page = memoryManager.tungstenMemoryAllocator().allocate(acquired);//分配MemoryBlock
    } catch (OutOfMemoryError e) {
      logger.warn("Failed to allocate a page ({} bytes), try again.", acquired);
      // there is no enough memory actually, it means the actual free memory is smaller than
      // MemoryManager thought, we should keep the acquired memory.
      synchronized (this) {
        acquiredButNotUsed += acquired;
        allocatedPages.clear(pageNumber);
      }
      // this could trigger spilling to free some pages.
      return allocatePage(size, consumer);
    }
    page.pageNumber = pageNumber;  //设置MemoryBlock的pageNumber字段
    pageTable[pageNumber] = page;  //将MemoryBlock作为page添加到PageTable
    if (logger.isTraceEnabled()) {
      logger.trace("Allocate page number {} ({} bytes)", pageNumber, acquired);
    }
    return page;
  }

 对page进行统一编码

通过上面的分析我们知道,page对应的内存可能来自堆或堆外。但这显然不应该由上层操作者来操心,所以 TaskMemoryManager 提供了只需传入 page 及要访问该 page 上的 offset 就能获得一个 long 型的地址。这样应用者只需操作自该地址起的某一段内存即可,而不用关心这块内存是来自哪。

/**
   * Given a memory page and offset within that page, encode this address into a 64-bit long.
   * This address will remain valid as long as the corresponding page has not been freed.
   *针对某个Page 的地址进行编码
   * @param page a data page allocated by {@link TaskMemoryManager#allocatePage}/
   * @param offsetInPage an offset in this page which incorporates the base offset. In other words,
   *                     this should be the value that you would pass as the base offset into an
   *                     UNSAFE call (e.g. page.baseOffset() + something).
     对于on-heap模式 :offsetInPage 是针对base object 的偏移量。
	 对于off-heap模式 :offsetInPage 是绝对地址
   * @return an encoded page address.
   */
  public long encodePageNumberAndOffset(MemoryBlock page, long offsetInPage) {
    if (tungstenMemoryMode == MemoryMode.OFF_HEAP) {
      // In off-heap mode, an offset is an absolute address that may require a full 64 bits to
      // encode. Due to our page size limitation, though, we can convert this into an offset that's
      // relative to the page's base offset; this relative offset will fit in 51 bits.
	  //在off-heap内存模式中,offset是用64-bit编码的绝对地址。因为我们的page大小限制,需要转换成51-bit编码。
	  //我们把offsetInPage转换成page内的相对offset,这个相对offset会用51-bit填充。
      offsetInPage -= page.getBaseOffset();
    }
    return encodePageNumberAndOffset(page.pageNumber, offsetInPage);
  }

  @VisibleForTesting
  //将MemoryBlock的pageNumber字段转为13-bit长,offset字段转为51-bit长,并组装成编码地址
  //该编码地址是逻辑地址,解码后可转为MemoryLocation的真实地址
  public static long encodePageNumberAndOffset(int pageNumber, long offsetInPage) {
    assert (pageNumber >= 0) : "encodePageNumberAndOffset called with invalid page";	
    return (((long) pageNumber) << OFFSET_BITS) | (offsetInPage & MASK_LONG_LOWER_51_BITS);
  }

  @VisibleForTesting
  //将组装后的编码地址解码成MemoryBlock的pageNumber,即对应的高13bit内容
  public static int decodePageNumber(long pagePlusOffsetAddress) {
	      return (int) (pagePlusOffsetAddress >>> OFFSET_BITS);
  }
  //将组装后的编码地址解码成MemoryBlock的offset,即对应的低51bit内容
  private static long decodeOffset(long pagePlusOffsetAddress) {	  
    return (pagePlusOffsetAddress & MASK_LONG_LOWER_51_BITS);
  }

  /**
   * Get the page associated with an address encoded by
   * {@link TaskMemoryManager#encodePageNumberAndOffset(MemoryBlock, long)}
   */
   //在TaskMemoryManager类中还提供了针对on-heap内存模式下,获取baseobject的方法
  public Object getPage(long pagePlusOffsetAddress) {
    if (tungstenMemoryMode == MemoryMode.ON_HEAP) {
      final int pageNumber = decodePageNumber(pagePlusOffsetAddress);
      assert (pageNumber >= 0 && pageNumber < PAGE_TABLE_SIZE);
      final MemoryBlock page = pageTable[pageNumber];
      assert (page != null);
      assert (page.getBaseObject() != null);
      return page.getBaseObject();
    } else {
      return null;
    }
  }

  /**
   * Get the offset associated with an address encoded by
   * {@link TaskMemoryManager#encodePageNumberAndOffset(MemoryBlock, long)}
   */
  public long getOffsetInPage(long pagePlusOffsetAddress) {
    final long offsetInPage = decodeOffset(pagePlusOffsetAddress);
    if (tungstenMemoryMode == MemoryMode.ON_HEAP) {
      return offsetInPage;
    } else {
      // In off-heap mode, an offset is an absolute address. In encodePageNumberAndOffset, we
      // converted the absolute address into a relative address. Here, we invert that operation:
	  //在off-heap内存模式中,offset是绝对地址。在encodePageNumbrAndOffset方法中,
	  //我们把这个绝对地址转换成相对地址,在这里我们反转这个操作,将相对地址转换成绝对地址:
      final int pageNumber = decodePageNumber(pagePlusOffsetAddress);
      assert (pageNumber >= 0 && pageNumber < PAGE_TABLE_SIZE);
      final MemoryBlock page = pageTable[pageNumber];
      assert (page != null);
      return page.getBaseOffset() + offsetInPage;
    }
  }

源码来自:TaskMemoryManager.java

配置是否使用堆外内存

在默认情况下堆外内存并不启用,可通过配置 spark.memory.offHeap.enabled 参数启用,并由 spark.memory.offHeap.size 参数设定堆外空间的大小。

MemoryManager.scala

 /**
   * Tracks whether Tungsten memory will be allocated on the JVM heap or off-heap using
   * sun.misc.Unsafe.
   */
  final val tungstenMemoryMode: MemoryMode = {
    if (conf.get(MEMORY_OFFHEAP_ENABLED)) {
      require(conf.get(MEMORY_OFFHEAP_SIZE) > 0,
        "spark.memory.offHeap.size must be > 0 when spark.memory.offHeap.enabled == true")
      require(Platform.unaligned(),
        "No support for unaligned Unsafe. Set spark.memory.offHeap.enabled to false.")
      MemoryMode.OFF_HEAP
    } else {
      MemoryMode.ON_HEAP
    }
  }

 在Taskmemorymanager初始化的时候,会调用MemoryManager#tungstenMemoryMode()方法返回内存模式,并设置到tungstenMemoryMode字段

 /**
   * Construct a new TaskMemoryManager.
   */
  public TaskMemoryManager(MemoryManager memoryManager, long taskAttemptId) {
    this.tungstenMemoryMode = memoryManager.tungstenMemoryMode();
    this.memoryManager = memoryManager;
    this.taskAttemptId = taskAttemptId;
    this.consumers = new HashSet<>();
  }

 tungstenMemoryMode字段可以通过getTungstenMemoryMode方法获取

/**
   * Returns Tungsten memory mode
   */
  public MemoryMode getTungstenMemoryMode() {
    return tungstenMemoryMode;
  }

因为TaskMemoryManager有memoryManager字段来保存MemoryManager的引用,所以在调用allocatePage方法分配内存时,会调用MemoryManager#tungstenMemoryAllocator方法来选择使用以下哪种allocator:

  • HeapMemoryAllocator
  • UnsafeMemoryAllocator
/**
   * Allocates memory for use by Unsafe/Tungsten code.
   */
  private[memory] final val tungstenMemoryAllocator: MemoryAllocator = {
    tungstenMemoryMode match {
      case MemoryMode.ON_HEAP => MemoryAllocator.HEAP
      case MemoryMode.OFF_HEAP => MemoryAllocator.UNSAFE
    }
  }

 

参考: 

What is uintptr_t data type

32位与64位下各类型长度对比

Java中Unsafe类详解

Spark Task 内存管理(on-heap&off-heap)

探索Spark Tungsten的秘密

关于堆外内存和堆内内存的介绍可以参阅:Apache Spark 内存管理详解,此处不再赘述。

你可能感兴趣的:(spark)