malloc失败导致线程死锁

malloc失败导致线程死锁

环境:Linux3.44 / libc.so.6 2.17
错误栈信息:

Thread 1 (Thread 0x7fcae15e9740 (LWP 17012)):
#0  0x00007fcadededbd8 in pthread_once () from /lib64/libpthread.so.0
#1  0x00007fcadeb2a08c in backtrace () from /lib64/libc.so.6
#2  0x00007fcadea95dd4 in __libc_message () from /lib64/libc.so.6
#3  0x00007fcadea9bbf7 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fcadea9f125 in _int_malloc () from /lib64/libc.so.6
#5  0x00007fcadeaa011c in malloc () from /lib64/libc.so.6
#6  0x00007fcae13ee8a3 in _dl_map_object () from /lib64/ld-linux-x86-64.so.2
#7  0x00007fcae13f98d1 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#8  0x00007fcae13f5314 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9  0x00007fcae13f925b in _dl_open () from /lib64/ld-linux-x86-64.so.2
#10 0x00007fcadeb50912 in do_dlopen () from /lib64/libc.so.6
#11 0x00007fcae13f5314 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#12 0x00007fcadeb509d2 in __libc_dlopen_mode () from /lib64/libc.so.6
#13 0x00007fcadeb29f75 in init () from /lib64/libc.so.6
#14 0x00007fcadededbe0 in pthread_once () from /lib64/libpthread.so.0
#15 0x00007fcadeb2a08c in backtrace () from /lib64/libc.so.6
#16 0x00007fcadea95dd4 in __libc_message () from /lib64/libc.so.6
#17 0x00007fcadea9bbf7 in malloc_printerr () from /lib64/libc.so.6
#18 0x00007fcadea9f125 in _int_malloc () from /lib64/libc.so.6
#19 0x00007fcadeaa0b3a in calloc () from /lib64/libc.so.6
#20 0x0000000000510ad0 in pal_mem_calloc (type=MTYPE_LS_PREFIX, size=12) at pal_memory.c:52
#21 0x00000000005d19af in mfh_calloc (type=MTYPE_LS_PREFIX, size=12) at memory.c:168
#22 0x000000000056c6cd in ls_prefix_new (size=8) at ls_prefix.c:23
#23 0x000000000054e9c2 in ls_node_set (table=0x1c640d0, prefix=0x7fffeb616490) at ls_table.c:67
#24 0x000000000054f238 in ls_node_get (table=0x1c640d0, p=0x7fffeb616490) at ls_table.c:362
#25 0x00000000004b01e6 in ospf6_lsdb_add (lsdb=0x1c63a30, lsa=0x1d77f20) at ospf6_lsdb.c:316
#26 0x000000000049bca4 in ospf6_ls_retransmit_add (nbr=0x1c62b40, lsa=0x1d77f20) at ospf6_flood.c:140
#27 0x000000000049c5e8 in ospf6_flood_through_interface (oi=0x1c65d50, inbr=0x0, lsa=0x1d77f20) at ospf6_flood.c:396
#28 0x000000000049ca08 in ospf6_flood_through_as (top=0x1c5e2c0, inbr=0x0, lsa=0x1d77f20) at ospf6_flood.c:501
#29 0x000000000049cac0 in ospf6_flood_through (inbr=0x0, lsa=0x1d77f20) at ospf6_flood.c:519
#30 0x000000000049507a in ospf6_lsa_originate (top=0x1c5e2c0, type=5, param=0x1d78c60) at ospf6_lsa.c:3339
#31 0x00000000004cf778 in ospf6_redist_map_lsa_refresh (top=0x1c5e2c0, map=0x1d78c60) at ospf6_nsm.c:726
#32 0x00000000004cfa89 in ospf6_redist_map_update (table=0x1c5ea90, ri=0x1f40e40, type=5 '\005', parent=0x1c5e2c0) at ospf6_nsm.c:851
#33 0x00000000004d0b09 in ospf6_redistribute_timer (t=0x7fffeb616870) at ospf6_nsm.c:1240
#34 0x0000000000552526 in thread_call (thread=0x7fffeb616870) at thread.c:1283
#35 0x0000000000469290 in ospf6_start (daemon_mode=1, config_file=0x0, vty_port=2606, progname=0x7fffeb6176b4 "ospf6d") at ospf6_main.c:207
#36 0x0000000000468e6f in main (argc=2, argv=0x7fffeb616a18, envp=0x7fffeb616a30) at ../../platform/linux/ospf6.c:170

在网上搜索了一些信息,记录一下可能的情况:
信号处理方法的问题

所有开源代码里,都少有人在信号处理方法里写大量代码的,这是为什么呢?
原因在于,信号是可能在任意时刻打断你线程的正在执行代码,信号处理方法插入进去执行时,就可能造成有些函数被反复重入。例如上面这个例子中,thead1正在new一个对象,执行malloc分配内存的过程中,突然被信号打断,而信号处理方法里居然又有malloc过程,而malloc是不能反复重入的!于是导致挂死。

另一个问题的,子进程会继承父进程的很多资源,其中就包括信号,他的程序处理信号后,才pthread_create许多工作线程,而且,没有屏蔽信号,所以,所有的线程都在处理那个信号处理方法,所有线程都挂死了。

解决方法有很多种,通常是在信号处理方法里只做少量工作,通知其他线程自我回收资源。
对于多线程程序来说,只弄一个线程使用阻塞式信号处理方法,专职的处理信号,这样更符合多线程的设计精神。例如,在派生子线程前,用pthread_sigmask来设置信号不会打断子线程的运行,而在主线程里,使用阻塞的sigwait方法来同步处理信号,在这里可以处理一些复杂的操作,不用担心“重入”问题。

更贴近这个故障的:
nginx: worker process: malloc(): memory corruption

i use valgrind to check memory leak, and have detected some error:

==2243== Invalid write of size 1
==2243== at 0x4A08088: memcpy (mc_replace_strmem.c:628)
==2243== by 0x4448C9: ngx_http_proxy_subs_headers (ngx_http_proxy_subs_filter.c:149)
==2243== by 0x45B2FB: ngx_http_proxy_create_request (ngx_http_proxy_module.c:1235)
==2243== by 0x43EA7E: ngx_http_upstream_init_request (ngx_http_upstream.c:505)
==2243== by 0x43EE92: ngx_http_upstream_init (ngx_http_upstream.c:446)
==2243== by 0x4361C0: ngx_http_read_client_request_body (ngx_http_request_body.c:59)
==2243== by 0x459972: ngx_http_proxy_handler (ngx_http_proxy_module.c:703)
==2243== by 0x42BD23: ngx_http_core_content_phase (ngx_http_core_module.c:1396)
==2243== by 0x4269A2: ngx_http_core_run_phases (ngx_http_core_module.c:877)
==2243== by 0x426A9D: ngx_http_handler (ngx_http_core_module.c:860)
==2243== by 0x430661: ngx_http_process_request (ngx_http_request.c:1874)
==2243== by 0x430D97: ngx_http_process_request_headers (ngx_http_request.c:1318)
==2243== Address 0x5a1f29a is not stack’d, malloc’d or (recently) free’d
==2243==
==2243== Invalid write of size 8
==2243== at 0x4A080B3: memcpy (mc_replace_strmem.c:628)
==2243== by 0x4448C9: ngx_http_proxy_subs_headers (ngx_http_proxy_subs_filter.c:149)
==2243== by 0x45B2FB: ngx_http_proxy_create_request (ngx_http_proxy_module.c:1235)
==2243== by 0x43EA7E: ngx_http_upstream_init_request (ngx_http_upstream.c:505)
==2243== by 0x43EE92: ngx_http_upstream_init (ngx_http_upstream.c:446)
==2243== by 0x4361C0: ngx_http_read_client_request_body (ngx_http_request_body.c:59)
==2243== by 0x459972: ngx_http_proxy_handler (ngx_http_proxy_module.c:703)
==2243== by 0x42BD23: ngx_http_core_content_phase (ngx_http_core_module.c:1396)
==2243== by 0x4269A2: ngx_http_core_run_phases (ngx_http_core_module.c:877)
==2243== by 0x426A9D: ngx_http_handler (ngx_http_core_module.c:860)
==2243== by 0x430661: ngx_http_process_request (ngx_http_request.c:1874)

due to ngx_copy() out of bound, and caused by my code. so, i modify
the corrspongding code, it’s running ok until now.

pthread_once() self deadlock

你可能感兴趣的:(c)