在最近开发的多线程程序中,观察到一种现象,线程调用pthread_exit()退出后,进程的VSZ没有减少,随着这样的线程增多,可以看到VSZ的值变得越来越大。
一开始以为是程序那里漏内存,查看了所有new的地方,没有发现有漏内存的情况。
通过pmap分析,发现跟没有线程退出情况的进程相比,会多出下面几个内存块,其他部分都没有不同的地方。
pmap 19661
...................
00007f80eeb5c000 4K ----- [ anon ]
00007f80eeb5d000 8192K rwx-- [ anon ]
................................
gdb里头从这些地址里头看不到任何有意义的内容
通过valgrind也没发现问题
valgrind --tool=memcheck --leak-check=full -v --track-origins=yes --log-file=val.log --track-fds=yes --time-stamp=yes --show-reachable=yes my_app
于是就怀疑是线程退出的时候没有释放资源
从网上查找资源看到chinaunix上面有些文章,关于thread的资源安全释放的问题告诫如下:
如果线程是joinable的,主线程(或某个负责回收线程的线程)需要调用pthread_join()来回收线程
如果不想把回收线程阻塞住,而让系统自动回收线程资源,即不调用pthread_join(),则线程必须是detached。
joinable和detached是通过pthread_attr_setdetachstate()来设置的。
由于我的回收线程还需要处理别的事务不能长时间阻塞住,并且通过打印pthread_join()前后的时间差发现即使线程已经退出,pthread_join()仍然可能会等上5秒钟,
所以最后采用的是pthread_exit() + detached的方法,而不是pthread_exit() + pthread_join().
回头再来看看为什么是4k和8M
首先下载glibc,在nptl目录下面能找到pthread_create.c
__pthread_create_2_1()->ALLOCATE_STACK()->allocate_stack()
/* Allocate some anonymous memory. If possible use the cache. */
-->get_cached_stack()
gdb attach应用程序
(gdb) p stack_cache ===========>这里得确保能读到libc的符号表
$3 = {next = 0x7f80edb599c0, prev = 0x7f80ed3589c0}
(gdb) p sizeof(struct pthread)
$6 = 2304 ==================>4k
(gdb) p *(struct pthread *)0x7f80eeb5b700
$9 = {{header = {tcb = 0x7f80eeb5b700, dtv = 0x1ff1190, self = 0x7f80eeb5b700, multiple_threads = 1, gscope_flag = 0, sysinfo = 0, stack_guard = 16092494444486863360, pointer_guard = 1023798611218601545,
vgetcpu_cache = {0, 0}, private_futex = 128, rtld_must_xmm_save = 0, __private_tm = {0x0, 0x0, 0x0, 0x0, 0x0}, __unused2 = 0, rtld_savespace_sse = {{{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0,
0, 0}}, {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0,
0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, {{0, 0, 0, 0}, {0, 0, 0,
0}, {0, 0, 0, 0}, {0, 0, 0, 0}}}, __padding = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, __padding = {0x7f80eeb5b700, 0x1ff1190, 0x7f80eeb5b700, 0x1, 0x0, 0xdf54066f81962a00,
0xe3543699f391649, 0x0, 0x0, 0x80, 0x0
list = 0x7f80eeb5b9e0, futex_offset = -32, list_op_pending = 0x0}, cleanup = 0x0, cleanup_jmp_buf = 0x7f80eeb5af30, cancelhandling = 2, flags = 0, specific_1stblock = {{seq = 1, data = 0x11bbbc0}, {
seq = 0, data = 0x0}
parent_cancelhandling = 0, lock = 0, setxid_futex = 0, cpuclock_offset = 2940773369682629, joinid = 0x0, result = 0x0, schedparam = {__sched_priority = 0}, schedpolicy = 0,
start_routine = 0x69e32f
exception_class = 0, exception_cleanup = 0, private_1 = 0, private_2 = 0}, stackblock = 0x7f80ee35b000, stackblock_size = 8392704, guardsize = 4096, reported_guardsize = 4096, tpp = 0x0, res = {
retrans = 0, retry = 0, options = 0, nscount = 0, nsaddr_list = {{sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, {sin_family = 0, sin_port = 0,
sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, {sin_family = 0, sin_port = 0, sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}}, id = 0, dnsrch = {0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, defdname = '\000'
s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {
addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}, {addr = {s_addr = 0}, mask = 0}}, qhook = 0, rhook = 0, res_h_errno = 0, _vcsock = 0, _flags = 0, _u = {
pad = '\000'
(gdb) p ((struct pthread *)0x7f80eeb5b700)->stackblock_size
$10 = 8392704 ============>8M
对照pmap里头dump出的信息,可以看出4k是thread控制块的大小(之所以是4k估计是页大小对其的结果),8M是thread栈的大小
而从地址的特点来看,所有栈都是在stack_cache里头分配的,这是一块预分配的内存,所以各个栈的地址是连续的。
这些地址在哪里释放呢?我们来看看pthread_join()函数
pthread_join()->__free_tcb()->__deallocate_stack()
这就是某些情况需要显式地调用pthread_join()的原因