报错 kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [insmod:55902]处理

运行之前说的tcrypt.c的修改程序(只跑摘要算法md5,sha1)
insmod tcrypt.ko sec=2 mode=400
报错 kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [insmod:55902]
并且有堆栈

[106091.127829] 
testing speed of async md5 (md5-generic)
[106091.127831] test  0 (   16 byte blocks,   16 bytes per update,   1 updates): 
[106093.127071] 6929212 opers/sec, 110867392 bytes/sec
[106093.127072] test  1 (   64 byte blocks,   16 bytes per update,   4 updates): 3149079 opers/sec, 201541088 bytes/sec
[106095.126940] test  2 (   64 byte blocks,   64 bytes per update,   1 updates): 
[106097.126759] 4028096 opers/sec, 257798144 bytes/sec
[106097.126761] test  3 (  256 byte blocks,   16 bytes per update,  16 updates): 1154874 opers/sec, 295647872 bytes/sec
[106099.127126] test  4 (  256 byte blocks,   64 bytes per update,   4 updates): 1623307 opers/sec, 415566592 bytes/sec
[106101.127270] test  5 (  256 byte blocks,  256 bytes per update,   1 updates): 
[106103.126713] 1810068 opers/sec, 463377408 bytes/sec
[106103.126715] test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates): 311122 opers/sec, 318589440 bytes/sec
[106105.126920] test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates): 537416 opers/sec, 550314496 bytes/sec
[106107.127073] test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates): 
[106109.127042] 577204 opers/sec, 591056896 bytes/sec
[106109.127044] test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates): 164557 opers/sec, 337012736 bytes/sec
[106111.126772] test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates): 
[106112.131736] INFO: rcu_sched self-detected stall on CPU
[106112.131746] 	1-...: (20933 ticks this GP) idle=cd3/140000000000001/0 softirq=418992/418992 fqs=5201 
[106112.131747] 	 (t=21000 jiffies g=178681 c=178680 q=640)
[106112.131753] Task dump for CPU 1:
[106112.131753] insmod          R  running task        0 56468 128644 0x00000088
[106112.131756]  ffff88003c643e08 ffffffff8108a2e9 0000000000000001 0000000000000001
[106112.131757]  ffff88003c643e20 ffffffff8108ca29 ffffffff81e46000 ffff88003c643e50
[106112.131758]  ffffffff81136f41 ffff88003c6585c0 ffffffff81e45e00 0000000000000000
[106112.131759] Call Trace:
[106112.131761]   
[106112.131765]  [] sched_show_task+0xe9/0x150
[106112.131767]  [] dump_cpu_task+0x39/0x40
[106112.131769]  [] rcu_dump_cpu_stacks+0x80/0xbc
[106112.131770]  [] rcu_check_callbacks+0x70b/0x860
[106112.131772]  [] ? __acct_update_integrals+0x30/0xb0
[106112.131773]  [] ? tick_sched_do_timer+0x30/0x30
[106112.131774]  [] update_process_times+0x2f/0x60
[106112.131775]  [] tick_sched_handle.isra.13+0x25/0x60
[106112.131776]  [] tick_sched_timer+0x3d/0x70
[106112.131777]  [] __hrtimer_run_queues+0xe6/0x280
[106112.142164]  [] hrtimer_interrupt+0xa8/0x1a0
[106112.142167] INFO: rcu_sched detected stalls on CPUs/tasks:
[106112.142174]  [] local_apic_timer_interrupt+0x35/0x60
[106112.142213]  [] smp_apic_timer_interrupt+0x3d/0x50
[106112.142216]  [] apic_timer_interrupt+0x7f/0x90
[106112.142217]   
[106112.146252]  [] ? md5_transform+0x6c2/0x7f0
[106112.146255]  [] md5_update+0xde/0x130
[106112.146256]  [] crypto_shash_update+0x38/0x100
[106112.146257]  [] shash_ahash_update+0x2c/0x50
[106112.146258]  [] shash_async_update+0x12/0x20
[106112.146261]  [] test_ahash_speed_common.constprop.8+0x24d/0x820 [tcrypt]
[106112.146262]  [] ? 0xffffffffa007a000
[106112.146263]  [] do_test+0x108/0x31a [tcrypt]
[106112.146264]  [] ? 0xffffffffa007a000
[106112.146265]  [] tcrypt_mod_init+0x49/0x95 [tcrypt]
[106112.146266]  [] ? 0xffffffffa007a000
[106112.146267]  [] do_one_initcall+0x3d/0x150
[106112.146269]  [] ? __might_sleep+0x4a/0x90
[106112.146270]  [] ? do_init_module+0x27/0x1d8
[106112.146287]  [] ? kmem_cache_alloc_trace+0x46/0x170
[106112.146289]  [] do_init_module+0x60/0x1d8
[106112.146291]  [] load_module+0x1245/0x1940
[106112.146292]  [] ? __symbol_put+0x40/0x40
[106112.146293]  [] ? vfs_read+0x113/0x130
[106112.146295]  [] SYSC_finit_module+0x96/0xd0
[106112.146296]  [] SyS_finit_module+0xe/0x10
[106112.146297]  [] do_syscall_64+0x4d/0xb0
[106112.146299]  [] entry_SYSCALL64_slow_path+0x25/0x25
[106112.146302] 	1-...: (20933 ticks this GP) idle=cd3/140000000000001/0 softirq=418992/418992 fqs=5201 
[106112.146304] 	(detected by 3, t=21002 jiffies, g=178681, c=178680, q=640)
[106112.146310] Task dump for CPU 1:
[106112.146311] insmod          R  running task        0 56468 128644 0x00000088
[106112.146313]  ffffffff8113711d 0000000000000001 ffffffffa006f5c0 ffffc9001166fe98
[106112.146315]  ffffc9001166fe78 ffffffff810d2205 ffffffffa006f5c0 ffffffff810cf470
[106112.146316]  0000000000000000 ffffffffa006f5d8 ffffffff8119ba93 ffff88000000001c
[106112.146317] Call Trace:
[106112.146320]  [] ? do_init_module+0x60/0x1d8
[106112.146322]  [] ? load_module+0x1245/0x1940
[106112.146323]  [] ? __symbol_put+0x40/0x40
[106112.146324]  [] ? vfs_read+0x113/0x130
[106112.146326]  [] ? SYSC_finit_module+0x96/0xd0
[106112.146328]  [] ? SyS_finit_module+0xe/0x10
[106112.146328]  [] ? do_syscall_64+0x4d/0xb0
[106112.146330]  [] ? entry_SYSCALL64_slow_path+0x25/0x25
[106113.126964] 274310 opers/sec, 561787904 bytes/sec
[106113.126966] test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates): 299329 opers/sec, 613025792 bytes/sec
[106115.127105] test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates): 
[106117.127334] 301106 opers/sec, 616665088 bytes/sec
[106117.127336] test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates):  84647 opers/sec, 346716160 bytes/sec
[106119.127368] test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates): 147179 opers/sec, 602845184 bytes/sec
[106121.126869] test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates): 152129 opers/sec, 623122432 bytes/sec
[106123.127106] test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates): 
[106125.127288] 153462 opers/sec, 628580352 bytes/sec
[106125.127289] test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates):  42507 opers/sec, 348217344 bytes/sec
[106127.126819] test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates):  74270 opers/sec, 608423936 bytes/sec
[106129.127145] test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates):  77195 opers/sec, 632381440 bytes/sec
[106131.127150] test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates):  77426 opers/sec, 634273792 bytes/sec
[106133.127428] test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 
[106135.127394]  77476 opers/sec, 634683392 bytes/sec
[106135.127404] 
testing speed of async sha1 (sha1-avx2)
[106135.127428] test  0 (   16 byte blocks,   16 bytes per update,   1 updates): 
[106137.127053] 5125425 opers/sec,  82006808 bytes/sec
[106137.127055] test  1 (   64 byte blocks,   16 bytes per update,   4 updates): 2207846 opers/sec, 141302176 bytes/sec
[106139.127001] test  2 (   64 byte blocks,   64 bytes per update,   1 updates): 
[106141.126965] 3462426 opers/sec, 221595264 bytes/sec
[106141.126966] test  3 (  256 byte blocks,   16 bytes per update,  16 updates): 786876 opers/sec, 201440256 bytes/sec
[106143.127014] test  4 (  256 byte blocks,   64 bytes per update,   4 updates): 1094996 opers/sec, 280319104 bytes/sec
[106145.127384] test  5 (  256 byte blocks,  256 bytes per update,   1 updates): 
[106147.127465] 2027543 opers/sec, 519051136 bytes/sec
[106147.127467] test  6 ( 1024 byte blocks,   16 bytes per update,  64 updates): 229002 opers/sec, 234498048 bytes/sec
[106149.127480] test  7 ( 1024 byte blocks,  256 bytes per update,   4 updates): 586406 opers/sec, 600479744 bytes/sec
[106151.127461] test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates): 
[106153.127037] 792399 opers/sec, 811416576 bytes/sec
[106153.127038] test  9 ( 2048 byte blocks,   16 bytes per update, 128 updates): 117870 opers/sec, 241397760 bytes/sec
[106155.127327] test 10 ( 2048 byte blocks,  256 bytes per update,   8 updates): 309381 opers/sec, 633612288 bytes/sec
[106157.127566] test 11 ( 2048 byte blocks, 1024 bytes per update,   2 updates): 399969 opers/sec, 819136512 bytes/sec
[106159.127546] test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates): 
[106160.998382] BUG: workqueue lockup - pool
[106160.998386]  cpus=1 node=0 flags=0x0 nice=0 stuck for 69s!
[106160.998387] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 42s!
[106160.998402] Showing busy workqueues and worker pools:
[106160.998539] workqueue events_long: flags=0x0
[106160.998557]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[106160.998559]     pending: gc_worker
[106161.008435] workqueue events_power_efficient: flags=0x80
[106161.008453]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[106161.008456]     pending: neigh_periodic_work
[106161.126951] 428257 opers/sec, 877070336 bytes/sec
[106161.126952] test 13 ( 4096 byte blocks,   16 bytes per update, 256 updates): 
[106161.859069] workqueue vmstat: flags=0xc
[106161.859091]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[106161.859094]     pending: vmstat_update
[106161.880889] workqueue xfs-log/sda3: flags=0x1c
[106161.880910]   pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=1/256
[106161.880913]     pending: xfs_log_worker
[106161.893143] workqueue xfs-log/sda1: flags=0x1c
[106161.893168]   pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256
[106161.893171]     pending: xfs_log_worker
[106163.127411]  59184 opers/sec, 242417664 bytes/sec
[106163.127413] test 14 ( 4096 byte blocks,  256 bytes per update,  16 updates): 161039 opers/sec, 659615744 bytes/sec
[106165.127604] test 15 ( 4096 byte blocks, 1024 bytes per update,   4 updates): 205362 opers/sec, 841164800 bytes/sec
[106167.127309] test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates): 
[106169.127599] 227169 opers/sec, 930484224 bytes/sec
[106169.127600] test 17 ( 8192 byte blocks,   16 bytes per update, 512 updates):  29139 opers/sec, 238706688 bytes/sec
[106171.127605] test 18 ( 8192 byte blocks,  256 bytes per update,  32 updates):  79406 opers/sec, 650498048 bytes/sec
[106173.127122] test 19 ( 8192 byte blocks, 1024 bytes per update,   8 updates): 103823 opers/sec, 850522112 bytes/sec
[106175.127301] test 20 ( 8192 byte blocks, 4096 bytes per update,   2 updates): 
[106175.136399] INFO: rcu_sched self-detected stall on CPU
[106175.136404] 	1-...: (83881 ticks this GP) idle=cd3/140000000000001/0 softirq=418992/418992 fqs=20924 
[106175.136405] 	 (t=84005 jiffies g=178681 c=178680 q=3014)
[106175.136411] Task dump for CPU 1:
[106175.136412] insmod          R  running task        0 56468 128644 0x00000088
[106175.136414]  ffff88003c643e08 ffffffff8108a2e9 0000000000000001 0000000000000001
[106175.136416]  ffff88003c643e20 ffffffff8108ca29 ffffffff81e46000 ffff88003c643e50
[106175.136417]  ffffffff81136f41 ffff88003c6585c0 ffffffff81e45e00 0000000000000000
[106175.136418] Call Trace:
[106175.136419]   
[106175.136424]  [] sched_show_task+0xe9/0x150
[106175.136425]  [] dump_cpu_task+0x39/0x40
[106175.136442]  [] rcu_dump_cpu_stacks+0x80/0xbc
[106175.136444]  [] rcu_check_callbacks+0x70b/0x860
[106175.136446]  [] ? __acct_update_integrals+0x30/0xb0
[106175.136447]  [] ? tick_sched_do_timer+0x30/0x30
[106175.136449]  [] update_process_times+0x2f/0x60
[106175.136449]  [] tick_sched_handle.isra.13+0x25/0x60
[106175.136450]  [] tick_sched_timer+0x3d/0x70
[106175.136451]  [] __hrtimer_run_queues+0xe6/0x280
[106175.136452]  [] hrtimer_interrupt+0xa8/0x1a0
[106175.136454]  [] local_apic_timer_interrupt+0x35/0x60
[106175.136456]  [] smp_apic_timer_interrupt+0x3d/0x50
[106175.136457]  [] apic_timer_interrupt+0x7f/0x90
[106175.136457]   
[106175.136461]  [] ? _loop0+0x2eb/0xe56 [sha1_ssse3]
[106175.136464]  [] ? log_store+0x116/0x200
[106175.136465]  [] ? sha1_base_init+0x40/0x40 [sha1_ssse3]
[106175.136466]  [] ? sha1_apply_transform_avx2+0x1a/0x30 [sha1_ssse3]
[106175.136467]  [] sha1_update+0xd3/0x130 [sha1_ssse3]
[106175.136468]  [] sha1_avx2_update+0x15/0x20 [sha1_ssse3]
[106175.136470]  [] crypto_shash_update+0x38/0x100
[106175.136470]  [] shash_ahash_update+0x2c/0x50
[106175.136471]  [] shash_async_update+0x12/0x20
[106175.136473]  [] test_ahash_speed_common.constprop.8+0x24d/0x820 [tcrypt]
[106175.136474]  [] ? 0xffffffffa007a000
[106175.136475]  [] do_test+0x12c/0x31a [tcrypt]
[106175.136476]  [] ? 0xffffffffa007a000
[106175.136477]  [] tcrypt_mod_init+0x49/0x95 [tcrypt]
[106175.136477]  [] ? 0xffffffffa007a000
[106175.136479]  [] do_one_initcall+0x3d/0x150
[106175.136480]  [] ? __might_sleep+0x4a/0x90
[106175.136481]  [] ? do_init_module+0x27/0x1d8
[106175.136483]  [] ? kmem_cache_alloc_trace+0x46/0x170
[106175.136484]  [] do_init_module+0x60/0x1d8
[106175.136486]  [] load_module+0x1245/0x1940
[106175.136487]  [] ? __symbol_put+0x40/0x40
[106175.136488]  [] ? vfs_read+0x113/0x130
[106175.136490]  [] SYSC_finit_module+0x96/0xd0
[106175.136491]  [] SyS_finit_module+0xe/0x10
[106175.136492]  [] do_syscall_64+0x4d/0xb0
[106175.136493]  [] entry_SYSCALL64_slow_path+0x25/0x25
[106177.127473] 112143 opers/sec, 918675456 bytes/sec
[106177.127474] test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 
[106179.127611] 112208 opers/sec, 919212032 bytes/sec
[106179.127632] 
testing speed of multibuffer sha1 (sha1-avx2)
[106179.127633] test  0 (   16 byte blocks,   16 bytes per update,   1 updates):   7572 cycles/operation,   59 cycles/byte
[106179.127636] test  2 (   64 byte blocks,   64 bytes per update,   1 updates):   9172 cycles/operation,   17 cycles/byte
[106179.127639] test  5 (  256 byte blocks,  256 bytes per update,   1 updates):  14792 cycles/operation,    7 cycles/byte
[106179.127678] test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):  37386 cycles/operation,    4 cycles/byte
[106179.127690] test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  63652 cycles/operation,    3 cycles/byte
[106179.127708] test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates): 121012 cycles/operation,    3 cycles/byte
[106179.127742] test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 237186 cycles/operation,    3 cycles/byte
[106179.127820] 
testing speed of multibuffer md5 (md5-generic)
[106179.127820] test  0 (   16 byte blocks,   16 bytes per update,   1 updates):   6570 cycles/operation,   51 cycles/byte
[106179.127823] test  2 (   64 byte blocks,   64 bytes per update,   1 updates):   8340 cycles/operation,   16 cycles/byte
[106179.127826] test  5 (  256 byte blocks,  256 bytes per update,   1 updates):  16230 cycles/operation,    7 cycles/byte
[106179.127831] test  8 ( 1024 byte blocks, 1024 bytes per update,   1 updates):  48646 cycles/operation,    5 cycles/byte
[106179.127845] test 12 ( 2048 byte blocks, 2048 bytes per update,   1 updates):  91754 cycles/operation,    5 cycles/byte
[106179.127871] test 16 ( 4096 byte blocks, 4096 bytes per update,   1 updates): 178620 cycles/operation,    5 cycles/byte
[106179.127921] test 21 ( 8192 byte blocks, 8192 bytes per update,   1 updates): 352184 cycles/operation,    5 cycles/byte

跑大量高负载程序,造成cpu soft lockup。
Soft lockup就是内核软死锁,这个bug没有让系统彻底死机,但是若干个进程(或者kernel thread)被锁死在了某个状态(一般在内核区域),很多情况下这个是由于内核锁的使用的问题。
解决办法:

echo 30 > /proc/sys/kernel/watchdog_thresh 

临时生效

sysctl -w kernel.watchdog_thresh=30
vi /etc/sysctl.conf
kernel.watchdog_thresh=30

修改后继续运行 insmod tcrypt.ko sec=2 mode=400
仍出现以下信息

kernel:BUG: workqueue lockup - pool
kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 42s!

追踪堆栈信息


请教后,老大指出内核软死锁是由于一次IO下发命令过多导致,问题结束。

你可能感兴趣的:(内核,堆栈,kernel,bug)