最近看了一些开源的C/C++库,其中都对于内存分配这块做出了自己的一些优化和说明,也涉及到了一些内存分配字节对齐以及内存分页的问题。
对于内存分配的字节对齐问题,一直都是只知其事,不知其解,平时也很少关注这一块会带来的性能问题。但是要是放在一个高并发,快速以及资源最大化利用的系统里面,这一块往往是需要注意的,所以也就趁着这次机会,大概的了解一下。
我们先来看一下glibc里面malloc.c的定义
1100 /*
1101 ----------------------- Chunk representations -----------------------
1102 */
1103
1104
1105 /*
1106 This struct declaration is misleading (but accurate and necessary).
1107 It declares a "view" into memory allowing access to necessary
1108 fields at known offsets from a given base. See explanation below.
1109 */
1110
1111 struct malloc_chunk {
1112
1113 INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */
1114 INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */
1115
1116 struct malloc_chunk* fd; /* double links -- used only if free. */
1117 struct malloc_chunk* bk;
1118
1119 /* Only used for large blocks: pointer to next larger size. */
1120 struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
1121 struct malloc_chunk* bk_nextsize;
1122 };
1123
1124
1125 /*
1126 malloc_chunk details:
1127
1128 (The following includes lightly edited explanations by Colin Plumb.)
1129
1130 Chunks of memory are maintained using a `boundary tag' method as
1131 described in e.g., Knuth or Standish. (See the paper by Paul
1132 Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
1133 survey of such techniques.) Sizes of free chunks are stored both
1134 in the front of each chunk and at the end. This makes
1135 consolidating fragmented chunks into bigger chunks very fast. The
1136 size fields also hold bits representing whether chunks are free or
1137 in use.
1138
1139 An allocated chunk looks like this:
1140
1141
1142 chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1143 | Size of previous chunk, if allocated | |
1144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1145 | Size of chunk, in bytes |M|P|
1146 mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1147 | User data starts here... .
1148 . .
1149 . (malloc_usable_size() bytes) .
1150 . |
1151 nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1152 | Size of chunk |
1153 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1154
1155
1156 Where "chunk" is the front of the chunk for the purpose of most of
1157 the malloc code, but "mem" is the pointer that is returned to the
1158 user. "Nextchunk" is the beginning of the next contiguous chunk.
1159
1160 Chunks always begin on even word boundaries, so the mem portion
1161 (which is returned to the user) is also on an even word boundary, and
1162 thus at least double-word aligned.
1163
1164 Free chunks are stored in circular doubly-linked lists, and look like this:
1165
1166 chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1167 | Size of previous chunk |
1168 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1169 `head:' | Size of chunk, in bytes |P|
1170 mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1171 | Forward pointer to next chunk in list |
1172 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1173 | Back pointer to previous chunk in list |
1174 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1175 | Unused space (may be 0 bytes long) .
1176 . .
1177 . |
1178 nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1179 `foot:' | Size of chunk, in bytes |
1180 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1181
1182 The P (PREV_INUSE) bit, stored in the unused low-order bit of the
1183 chunk size (which is always a multiple of two words), is an in-use
1184 bit for the *previous* chunk. If that bit is *clear*, then the
1185 word before the current chunk size contains the previous chunk
1186 size, and can be used to find the front of the previous chunk.
1187 The very first chunk allocated always has this bit set,
1188 preventing access to non-existent (or non-owned) memory. If
1189 prev_inuse is set for any given chunk, then you CANNOT determine
1190 the size of the previous chunk, and might even get a memory
1191 addressing fault when trying to do so.
1192
1193 Note that the `foot' of the current chunk is actually represented
1194 as the prev_size of the NEXT chunk. This makes it easier to
1195 deal with alignments etc but can be very confusing when trying
1196 to extend or adapt this code.
1197
1198 The two exceptions to all this are
1199
1200 1. The special chunk `top' doesn't bother using the
1201 trailing size field since there is no next contiguous chunk
1202 that would have to index off it. After initialization, `top'
1203 is forced to always exist. If it would become less than
1204 MINSIZE bytes long, it is replenished.
1205
1206 2. Chunks allocated via mmap, which have the second-lowest-order
1207 bit M (IS_MMAPPED) set in their size fields. Because they are
1208 allocated one-by-one, each must contain its own trailing size field.
1209
1210 */
1211
1212 /*
1213 ---------- Size and alignment checks and conversions ----------
1214 */
1215
1216 /* conversion from malloc headers to user pointers, and back */
1217
1218 #define chunk2mem(p) ((void*)((char*)(p) + 2*SIZE_SZ))
1219 #define mem2chunk(mem) ((mchunkptr)((char*)(mem) - 2*SIZE_SZ))
1220
1221 /* The smallest possible chunk */
1222 #define MIN_CHUNK_SIZE (offsetof(struct malloc_chunk, fd_nextsize))
1223
1224 /* The smallest size we can malloc is an aligned minimal chunk */
1225
1226 #define MINSIZE \
1227 (unsigned long)(((MIN_CHUNK_SIZE+MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK))
1228
1229 /* Check if m has acceptable alignment */
1230
1231 #define aligned_OK(m) (((unsigned long)(m) & MALLOC_ALIGN_MASK) == 0)
1232
1233 #define misaligned_chunk(p) \
1234 ((uintptr_t)(MALLOC_ALIGNMENT == 2 * SIZE_SZ ? (p) : chunk2mem (p)) \
1235 & MALLOC_ALIGN_MASK)
1236
1237
1238 /*
1239 Check if a request is so large that it would wrap around zero when
1240 padded and aligned. To simplify some other code, the bound is made
1241 low enough so that adding MINSIZE will also not wrap around zero.
1242 */
1243
1244 #define REQUEST_OUT_OF_RANGE(req) \
1245 ((unsigned long) (req) >= \
1246 (unsigned long) (INTERNAL_SIZE_T) (-2 * MINSIZE))
1247
1248 /* pad request bytes into a usable size -- internal version */
1249
1250 #define request2size(req) \
1251 (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE) ? \
1252 MINSIZE : \
1253 ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)
1254
1255 /* Same, except also perform argument check */
1256
1257 #define checked_request2size(req, sz) \
1258 if (REQUEST_OUT_OF_RANGE (req)) { \
1259 __set_errno (ENOMEM); \
1260 return 0; \
1261 } \
1262 (sz) = request2size (req);
1263
其中,有很多的宏定义,我们只看最主要的几个。request2size负责内存对齐操作,MINSIZE是malloc时内存占用的最小内存单元,32位系统为16字节,64位系统为32字节,MALLOC_ALIGNMENT为内存对齐字节数,由于在32和64位系统中,size_t为4字节和8字节,所以MALLOC_ALIGNMENT在32位和64位系统中,分别为8和16.
实际上,对齐参数(MALLOC_ALIGNMENT)大小的设定需要满足以下两点:
1. 必须是2的幂
2. 必须是void *的整数倍
所以从request2size可知,在64位系统,如果申请内存为1~24字节,系统内存消耗32字节,当申请25字节的内存时,系统内存消耗48字节。而对于32位系统,申请内存为1~12字节时,系统内存消耗为16字节,当申请内存为13字节时,系统内存消耗为24字节。
这里分享一个别人写的怎么实现一个简单的malloc函数:http://blog.codinglabs.org/articles/a-malloc-tutorial.html