void printAddrData1Byte(void* startAddr, void* endAddr)
printf("printf startAddr = %p to endAddr = %p data\n", startAddr, endAddr);
char* pMove = (char*)startAddr;
int i = 0;
while(((char*)endAddr - pMove) != 0)
printf("%x ", (unsigned char)*pMove);
pMove += 1;
if(!(i % 4))
int main()
char *p=(char *)malloc(0);
char *p1=(char *)malloc(13);
char *p2=(char *)malloc(21);
char *p3=(char *)malloc(29);
char *p4=(char *)malloc(37);
printf("p size %d\n",malloc_usable_size(p));
printf("p1 size %d\n",malloc_usable_size(p1));
printf("p2 size %d\n",malloc_usable_size(p2));
printf("p3 size %d\n",malloc_usable_size(p3));
printf("p4 size %d\n",malloc_usable_size(p4));
printf("p adddr is %p\n",p);
printf("p1 adddr is %p\n",p1);
printf("p2 adddr is %p\n",p2);
printf("p3 adddr is %p\n",p3);
printf("p4 adddr is %p\n",p4);
struct malloc_chunk {
INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */
INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */
struct malloc_chunk* fd; /* double links -- used only if free. */
struct malloc_chunk* bk;
/* Only used for large blocks: pointer to next larger size. */
struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
struct malloc_chunk* bk_nextsize;
malloc_chunk details:
(The following includes lightly edited explanations by Colin Plumb.)
Chunks of memory are maintained using a `boundary tag' method as
described in e.g., Knuth or Standish. (See the paper by Paul
Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
survey of such techniques.) Sizes of free chunks are stored both
in the front of each chunk and at the end. This makes
consolidating fragmented chunks into bigger chunks very fast. The
size fields also hold bits representing whether chunks are free or
in use.
An allocated chunk looks like this:
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk, if allocated | |
| Size of chunk, in bytes |M|P|
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| User data starts here... .
. .
. (malloc_usable_size() bytes) .
. |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of chunk |
Where "chunk" is the front of the chunk for the purpose of most of
the malloc code, but "mem" is the pointer that is returned to the
user. "Nextchunk" is the beginning of the next contiguous chunk.
Chunks always begin on even word boundaries,(总是以偶数字长为边界,意味着以2 * size_t为对齐)
so the mem portion
(which is returned to the user) is also on an even word boundary, and
thus at least double-word aligned(double-word对齐).
Free chunks are stored in circular doubly-linked lists, and look like this:
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of previous chunk |
`head:' | Size of chunk, in bytes |P|
mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Forward pointer to next chunk in list |
| Back pointer to previous chunk in list |
| Unused space (may be 0 bytes long) .
. .
. |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`foot:' | Size of chunk, in bytes |
The P (PREV_INUSE) bit, stored in the unused low-order bit of the
chunk size (which is always a multiple of two words), is an in-use
bit for the *previous* chunk. If that bit is *clear*, then the
word before the current chunk size contains the previous chunk
size, and can be used to find the front of the previous chunk.
The very first chunk allocated always has this bit set,
preventing access to non-existent (or non-owned) memory. If
prev_inuse is set for any given chunk, then you CANNOT determine
the size of the previous chunk, and might even get a memory
addressing fault when trying to do so.
Note that the `foot' of the current chunk is actually represented
as the prev_size of the NEXT chunk. This makes it easier to
deal with alignments etc but can be very confusing when trying
to extend or adapt this code.
The two exceptions to all this are
1. The special chunk `top' doesn't bother using the
trailing size field since there is no next contiguous chunk
that would have to index off it. After initialization, `top'
is forced to always exist. If it would become less than
MINSIZE bytes long, it is replenished.
2. Chunks allocated via mmap, which have the second-lowest-order
bit M (IS_MMAPPED) set in their size fields. Because they are
allocated one-by-one, each must contain its own trailing size field.
当一个内存块为空闲时,至少要有prev_size、size、fd和bk四个参数,因此MINSIZE就代表了这四个参数需要占用的内存大小。而当前一个内存块被使用时,prev_size可能会被前一个内存块用来存储其大小,fd和bk也会被当作内存存储数据,因此当内存块被使用时,只剩下了size参数需要设置。MIN_CHUNK_SIZE就是malloc生成时最小的空间。所以在32位系统下,即使是malloc(0)时,也会有4*size_t = 16字节,除掉size的大小,用户可使用的是24字节。在内存块空闲的时候,prev_size、fd和bk这三个参数才会发挥作用。
其实可变相看成,一个chunk有头部和尾部,的头部和尾部都是保存size of chunk,当尾部划分到下一个chunk的区域时,则变成了prev_size。chunk在被使用时,除了size外,其他的字段都被用来存储数据,是为了提高chunk的有效荷载。在《深入理解计算机系统》中,也提到了头部和尾部保存当前块的大小,已分配的块中不再需要脚部,只有当前面块是空闲时,才会需要用到它的的脚部。
void printAddrData1Byte(void* startAddr, void* endAddr)
printf("printf startAddr = %p to endAddr = %p data\n", startAddr, endAddr);
char* pMove = (char*)startAddr;
int i = 0;
while(((char*)endAddr - pMove) != 0)
printf("%x ", (unsigned char)*pMove);
pMove += 1;
if(!(i % 4))
int main()
char *p=(char *)malloc(0);
char *p1=(char *)malloc(13);
char *p2=(char *)malloc(21);
char *p3=(char *)malloc(29);
char *p4=(char *)malloc(37);
char *p5=(char *)malloc(132*1024);
printAddrData1Byte(p-4, p);
printAddrData1Byte(p1-4, p1);
printAddrData1Byte(p2-4, p2);
printAddrData1Byte(p3-4, p3);
printAddrData1Byte(p4-4, p4);
printAddrData1Byte(p5-4, p5);
printf("p size %d\n",malloc_usable_size(p));
printf("p1 size %d\n",malloc_usable_size(p1));
printf("p2 size %d\n",malloc_usable_size(p2));
printf("p3 size %d\n",malloc_usable_size(p3));
printf("p4 size %d\n",malloc_usable_size(p4));
printf("p4 size %d\n",malloc_usable_size(p5));
printf("p adddr is %p\n",p);
printf("p1 adddr is %p\n",p1);
printf("p2 adddr is %p\n",p2);
printf("p3 adddr is %p\n",p3);
printf("p4 adddr is %p\n",p4);
printf("p5 adddr is %p\n",p5);
从测试结果可以看出,当实际大小为12时,其头部的第一个字节的十六进制为11,换成二进制则是0001 0001;当实际大小为20时,其头部的第一个字节的十六进制为19,换成二进制则是0001 1001,;以此类推,可以看出其低三位是不会变的,那也就对应了上面所说的第三位是AMP标志位,而P为1则说明前一个chunk正在使用,所以说,实验结果是一致的。
前面说到,当M=1 时为mmap映射区域分配,那怎么样才能使用mmap映射区域分配内存呢?从下面源码我们可以得出答案,当申请的内存大于>=mmap_threshold使用mmap函数。最小的threshold = 128KB。
The maximum overhead wastage (i.e., number of extra bytes
allocated than were requested in malloc) is less than or equal
to the minimum size, except for requests >= mmap_threshold that
are serviced via mmap(), where the worst case wastage is 2 *
sizeof(size_t) bytes plus the remainder from a system page (the
minimal mmap unit); typically 4096 or 8192 bytes.
MMAP_THRESHOLD_MAX and _MIN are the bounds on the dynamically
#define DEFAULT_MMAP_THRESHOLD_MIN (128 * 1024)
/* For 32-bit platforms we cannot increase the maximum mmap
threshold much because it is also the minimum value for the
maximum heap size and its alignment. Going above 512k (i.e., 1M
for new heaps) wastes too much address space. */
# if __WORDSIZE == 32
# define DEFAULT_MMAP_THRESHOLD_MAX (512 * 1024)
# else
# define DEFAULT_MMAP_THRESHOLD_MAX (4 * 1024 * 1024 * sizeof(long))
# endif
既然堆内内存brk和sbrk不能直接释放,为什么不全部使用 mmap 来分配,munmap直接释放呢?
既然堆内碎片不能直接释放,导致疑似“内存泄露”问题,为什么 malloc 不全部使用 mmap 来实现呢(mmap分配的内存可以会通过 munmap 进行 free ,实现真正释放)?而是仅仅对于大于 128k 的大块内存才使用 mmap ?
其实,进程向 OS 申请和释放地址空间的接口 sbrk/mmap/munmap 都是系统调用,频繁调用系统调用都比较消耗系统资源的。并且, mmap 申请的内存被 munmap 后,重新申请会产生更多的缺页中断。例如使用 mmap 分配 1M 空间,第一次调用产生了大量缺页中断 (1M/4K 次 ) ,当munmap 后再次分配 1M 空间,会再次产生大量缺页中断。缺页中断是内核行为,会导致内核态CPU消耗较大。另外,如果使用 mmap 分配小内存,会导致地址空间的分片更多,内核的管理负担更大。
同时堆是一个连续空间,并且堆内碎片由于没有归还 OS ,如果可重用碎片,再次访问该内存很可能不需产生任何系统调用和缺页中断,这将大大降低 CPU 的消耗。 因此, glibc 的 malloc 实现中,充分考虑了 sbrk 和 mmap 行为上的差异及优缺点,默认分配大块内存 (128k) 才使用 mmap 获得地址空间,也可通过 mallopt(M_MMAP_THRESHOLD,