运行之前说的tcrypt.c的修改程序(只跑摘要算法md5,sha1)
insmod tcrypt.ko sec=2 mode=400
报错 kernel:NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [insmod:55902]
并且有堆栈
[106091.127829]
testing speed of async md5 (md5-generic)
[106091.127831] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates):
[106093.127071] 6929212 opers/sec, 110867392 bytes/sec
[106093.127072] test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 3149079 opers/sec, 201541088 bytes/sec
[106095.126940] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates):
[106097.126759] 4028096 opers/sec, 257798144 bytes/sec
[106097.126761] test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 1154874 opers/sec, 295647872 bytes/sec
[106099.127126] test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 1623307 opers/sec, 415566592 bytes/sec
[106101.127270] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates):
[106103.126713] 1810068 opers/sec, 463377408 bytes/sec
[106103.126715] test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 311122 opers/sec, 318589440 bytes/sec
[106105.126920] test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 537416 opers/sec, 550314496 bytes/sec
[106107.127073] test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates):
[106109.127042] 577204 opers/sec, 591056896 bytes/sec
[106109.127044] test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 164557 opers/sec, 337012736 bytes/sec
[106111.126772] test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates):
[106112.131736] INFO: rcu_sched self-detected stall on CPU
[106112.131746] 1-...: (20933 ticks this GP) idle=cd3/140000000000001/0 softirq=418992/418992 fqs=5201
[106112.131747] (t=21000 jiffies g=178681 c=178680 q=640)
[106112.131753] Task dump for CPU 1:
[106112.131753] insmod R running task 0 56468 128644 0x00000088
[106112.131756] ffff88003c643e08 ffffffff8108a2e9 0000000000000001 0000000000000001
[106112.131757] ffff88003c643e20 ffffffff8108ca29 ffffffff81e46000 ffff88003c643e50
[106112.131758] ffffffff81136f41 ffff88003c6585c0 ffffffff81e45e00 0000000000000000
[106112.131759] Call Trace:
[106112.131761]
[106112.131765] [] sched_show_task+0xe9/0x150
[106112.131767] [] dump_cpu_task+0x39/0x40
[106112.131769] [] rcu_dump_cpu_stacks+0x80/0xbc
[106112.131770] [] rcu_check_callbacks+0x70b/0x860
[106112.131772] [] ? __acct_update_integrals+0x30/0xb0
[106112.131773] [] ? tick_sched_do_timer+0x30/0x30
[106112.131774] [] update_process_times+0x2f/0x60
[106112.131775] [] tick_sched_handle.isra.13+0x25/0x60
[106112.131776] [] tick_sched_timer+0x3d/0x70
[106112.131777] [] __hrtimer_run_queues+0xe6/0x280
[106112.142164] [] hrtimer_interrupt+0xa8/0x1a0
[106112.142167] INFO: rcu_sched detected stalls on CPUs/tasks:
[106112.142174] [] local_apic_timer_interrupt+0x35/0x60
[106112.142213] [] smp_apic_timer_interrupt+0x3d/0x50
[106112.142216] [] apic_timer_interrupt+0x7f/0x90
[106112.142217]
[106112.146252] [] ? md5_transform+0x6c2/0x7f0
[106112.146255] [] md5_update+0xde/0x130
[106112.146256] [] crypto_shash_update+0x38/0x100
[106112.146257] [] shash_ahash_update+0x2c/0x50
[106112.146258] [] shash_async_update+0x12/0x20
[106112.146261] [] test_ahash_speed_common.constprop.8+0x24d/0x820 [tcrypt]
[106112.146262] [] ? 0xffffffffa007a000
[106112.146263] [] do_test+0x108/0x31a [tcrypt]
[106112.146264] [] ? 0xffffffffa007a000
[106112.146265] [] tcrypt_mod_init+0x49/0x95 [tcrypt]
[106112.146266] [] ? 0xffffffffa007a000
[106112.146267] [] do_one_initcall+0x3d/0x150
[106112.146269] [] ? __might_sleep+0x4a/0x90
[106112.146270] [] ? do_init_module+0x27/0x1d8
[106112.146287] [] ? kmem_cache_alloc_trace+0x46/0x170
[106112.146289] [] do_init_module+0x60/0x1d8
[106112.146291] [] load_module+0x1245/0x1940
[106112.146292] [] ? __symbol_put+0x40/0x40
[106112.146293] [] ? vfs_read+0x113/0x130
[106112.146295] [] SYSC_finit_module+0x96/0xd0
[106112.146296] [] SyS_finit_module+0xe/0x10
[106112.146297] [] do_syscall_64+0x4d/0xb0
[106112.146299] [] entry_SYSCALL64_slow_path+0x25/0x25
[106112.146302] 1-...: (20933 ticks this GP) idle=cd3/140000000000001/0 softirq=418992/418992 fqs=5201
[106112.146304] (detected by 3, t=21002 jiffies, g=178681, c=178680, q=640)
[106112.146310] Task dump for CPU 1:
[106112.146311] insmod R running task 0 56468 128644 0x00000088
[106112.146313] ffffffff8113711d 0000000000000001 ffffffffa006f5c0 ffffc9001166fe98
[106112.146315] ffffc9001166fe78 ffffffff810d2205 ffffffffa006f5c0 ffffffff810cf470
[106112.146316] 0000000000000000 ffffffffa006f5d8 ffffffff8119ba93 ffff88000000001c
[106112.146317] Call Trace:
[106112.146320] [] ? do_init_module+0x60/0x1d8
[106112.146322] [] ? load_module+0x1245/0x1940
[106112.146323] [] ? __symbol_put+0x40/0x40
[106112.146324] [] ? vfs_read+0x113/0x130
[106112.146326] [] ? SYSC_finit_module+0x96/0xd0
[106112.146328] [] ? SyS_finit_module+0xe/0x10
[106112.146328] [] ? do_syscall_64+0x4d/0xb0
[106112.146330] [] ? entry_SYSCALL64_slow_path+0x25/0x25
[106113.126964] 274310 opers/sec, 561787904 bytes/sec
[106113.126966] test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 299329 opers/sec, 613025792 bytes/sec
[106115.127105] test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates):
[106117.127334] 301106 opers/sec, 616665088 bytes/sec
[106117.127336] test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates): 84647 opers/sec, 346716160 bytes/sec
[106119.127368] test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 147179 opers/sec, 602845184 bytes/sec
[106121.126869] test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 152129 opers/sec, 623122432 bytes/sec
[106123.127106] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates):
[106125.127288] 153462 opers/sec, 628580352 bytes/sec
[106125.127289] test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 42507 opers/sec, 348217344 bytes/sec
[106127.126819] test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 74270 opers/sec, 608423936 bytes/sec
[106129.127145] test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 77195 opers/sec, 632381440 bytes/sec
[106131.127150] test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates): 77426 opers/sec, 634273792 bytes/sec
[106133.127428] test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates):
[106135.127394] 77476 opers/sec, 634683392 bytes/sec
[106135.127404]
testing speed of async sha1 (sha1-avx2)
[106135.127428] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates):
[106137.127053] 5125425 opers/sec, 82006808 bytes/sec
[106137.127055] test 1 ( 64 byte blocks, 16 bytes per update, 4 updates): 2207846 opers/sec, 141302176 bytes/sec
[106139.127001] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates):
[106141.126965] 3462426 opers/sec, 221595264 bytes/sec
[106141.126966] test 3 ( 256 byte blocks, 16 bytes per update, 16 updates): 786876 opers/sec, 201440256 bytes/sec
[106143.127014] test 4 ( 256 byte blocks, 64 bytes per update, 4 updates): 1094996 opers/sec, 280319104 bytes/sec
[106145.127384] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates):
[106147.127465] 2027543 opers/sec, 519051136 bytes/sec
[106147.127467] test 6 ( 1024 byte blocks, 16 bytes per update, 64 updates): 229002 opers/sec, 234498048 bytes/sec
[106149.127480] test 7 ( 1024 byte blocks, 256 bytes per update, 4 updates): 586406 opers/sec, 600479744 bytes/sec
[106151.127461] test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates):
[106153.127037] 792399 opers/sec, 811416576 bytes/sec
[106153.127038] test 9 ( 2048 byte blocks, 16 bytes per update, 128 updates): 117870 opers/sec, 241397760 bytes/sec
[106155.127327] test 10 ( 2048 byte blocks, 256 bytes per update, 8 updates): 309381 opers/sec, 633612288 bytes/sec
[106157.127566] test 11 ( 2048 byte blocks, 1024 bytes per update, 2 updates): 399969 opers/sec, 819136512 bytes/sec
[106159.127546] test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates):
[106160.998382] BUG: workqueue lockup - pool
[106160.998386] cpus=1 node=0 flags=0x0 nice=0 stuck for 69s!
[106160.998387] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 42s!
[106160.998402] Showing busy workqueues and worker pools:
[106160.998539] workqueue events_long: flags=0x0
[106160.998557] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[106160.998559] pending: gc_worker
[106161.008435] workqueue events_power_efficient: flags=0x80
[106161.008453] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[106161.008456] pending: neigh_periodic_work
[106161.126951] 428257 opers/sec, 877070336 bytes/sec
[106161.126952] test 13 ( 4096 byte blocks, 16 bytes per update, 256 updates):
[106161.859069] workqueue vmstat: flags=0xc
[106161.859091] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[106161.859094] pending: vmstat_update
[106161.880889] workqueue xfs-log/sda3: flags=0x1c
[106161.880910] pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=1/256
[106161.880913] pending: xfs_log_worker
[106161.893143] workqueue xfs-log/sda1: flags=0x1c
[106161.893168] pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256
[106161.893171] pending: xfs_log_worker
[106163.127411] 59184 opers/sec, 242417664 bytes/sec
[106163.127413] test 14 ( 4096 byte blocks, 256 bytes per update, 16 updates): 161039 opers/sec, 659615744 bytes/sec
[106165.127604] test 15 ( 4096 byte blocks, 1024 bytes per update, 4 updates): 205362 opers/sec, 841164800 bytes/sec
[106167.127309] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates):
[106169.127599] 227169 opers/sec, 930484224 bytes/sec
[106169.127600] test 17 ( 8192 byte blocks, 16 bytes per update, 512 updates): 29139 opers/sec, 238706688 bytes/sec
[106171.127605] test 18 ( 8192 byte blocks, 256 bytes per update, 32 updates): 79406 opers/sec, 650498048 bytes/sec
[106173.127122] test 19 ( 8192 byte blocks, 1024 bytes per update, 8 updates): 103823 opers/sec, 850522112 bytes/sec
[106175.127301] test 20 ( 8192 byte blocks, 4096 bytes per update, 2 updates):
[106175.136399] INFO: rcu_sched self-detected stall on CPU
[106175.136404] 1-...: (83881 ticks this GP) idle=cd3/140000000000001/0 softirq=418992/418992 fqs=20924
[106175.136405] (t=84005 jiffies g=178681 c=178680 q=3014)
[106175.136411] Task dump for CPU 1:
[106175.136412] insmod R running task 0 56468 128644 0x00000088
[106175.136414] ffff88003c643e08 ffffffff8108a2e9 0000000000000001 0000000000000001
[106175.136416] ffff88003c643e20 ffffffff8108ca29 ffffffff81e46000 ffff88003c643e50
[106175.136417] ffffffff81136f41 ffff88003c6585c0 ffffffff81e45e00 0000000000000000
[106175.136418] Call Trace:
[106175.136419]
[106175.136424] [] sched_show_task+0xe9/0x150
[106175.136425] [] dump_cpu_task+0x39/0x40
[106175.136442] [] rcu_dump_cpu_stacks+0x80/0xbc
[106175.136444] [] rcu_check_callbacks+0x70b/0x860
[106175.136446] [] ? __acct_update_integrals+0x30/0xb0
[106175.136447] [] ? tick_sched_do_timer+0x30/0x30
[106175.136449] [] update_process_times+0x2f/0x60
[106175.136449] [] tick_sched_handle.isra.13+0x25/0x60
[106175.136450] [] tick_sched_timer+0x3d/0x70
[106175.136451] [] __hrtimer_run_queues+0xe6/0x280
[106175.136452] [] hrtimer_interrupt+0xa8/0x1a0
[106175.136454] [] local_apic_timer_interrupt+0x35/0x60
[106175.136456] [] smp_apic_timer_interrupt+0x3d/0x50
[106175.136457] [] apic_timer_interrupt+0x7f/0x90
[106175.136457]
[106175.136461] [] ? _loop0+0x2eb/0xe56 [sha1_ssse3]
[106175.136464] [] ? log_store+0x116/0x200
[106175.136465] [] ? sha1_base_init+0x40/0x40 [sha1_ssse3]
[106175.136466] [] ? sha1_apply_transform_avx2+0x1a/0x30 [sha1_ssse3]
[106175.136467] [] sha1_update+0xd3/0x130 [sha1_ssse3]
[106175.136468] [] sha1_avx2_update+0x15/0x20 [sha1_ssse3]
[106175.136470] [] crypto_shash_update+0x38/0x100
[106175.136470] [] shash_ahash_update+0x2c/0x50
[106175.136471] [] shash_async_update+0x12/0x20
[106175.136473] [] test_ahash_speed_common.constprop.8+0x24d/0x820 [tcrypt]
[106175.136474] [] ? 0xffffffffa007a000
[106175.136475] [] do_test+0x12c/0x31a [tcrypt]
[106175.136476] [] ? 0xffffffffa007a000
[106175.136477] [] tcrypt_mod_init+0x49/0x95 [tcrypt]
[106175.136477] [] ? 0xffffffffa007a000
[106175.136479] [] do_one_initcall+0x3d/0x150
[106175.136480] [] ? __might_sleep+0x4a/0x90
[106175.136481] [] ? do_init_module+0x27/0x1d8
[106175.136483] [] ? kmem_cache_alloc_trace+0x46/0x170
[106175.136484] [] do_init_module+0x60/0x1d8
[106175.136486] [] load_module+0x1245/0x1940
[106175.136487] [] ? __symbol_put+0x40/0x40
[106175.136488] [] ? vfs_read+0x113/0x130
[106175.136490] [] SYSC_finit_module+0x96/0xd0
[106175.136491] [] SyS_finit_module+0xe/0x10
[106175.136492] [] do_syscall_64+0x4d/0xb0
[106175.136493] [] entry_SYSCALL64_slow_path+0x25/0x25
[106177.127473] 112143 opers/sec, 918675456 bytes/sec
[106177.127474] test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates):
[106179.127611] 112208 opers/sec, 919212032 bytes/sec
[106179.127632]
testing speed of multibuffer sha1 (sha1-avx2)
[106179.127633] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 7572 cycles/operation, 59 cycles/byte
[106179.127636] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 9172 cycles/operation, 17 cycles/byte
[106179.127639] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 14792 cycles/operation, 7 cycles/byte
[106179.127678] test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 37386 cycles/operation, 4 cycles/byte
[106179.127690] test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 63652 cycles/operation, 3 cycles/byte
[106179.127708] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 121012 cycles/operation, 3 cycles/byte
[106179.127742] test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 237186 cycles/operation, 3 cycles/byte
[106179.127820]
testing speed of multibuffer md5 (md5-generic)
[106179.127820] test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 6570 cycles/operation, 51 cycles/byte
[106179.127823] test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 8340 cycles/operation, 16 cycles/byte
[106179.127826] test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 16230 cycles/operation, 7 cycles/byte
[106179.127831] test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 48646 cycles/operation, 5 cycles/byte
[106179.127845] test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 91754 cycles/operation, 5 cycles/byte
[106179.127871] test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 178620 cycles/operation, 5 cycles/byte
[106179.127921] test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 352184 cycles/operation, 5 cycles/byte
跑大量高负载程序,造成cpu soft lockup。
Soft lockup就是内核软死锁,这个bug没有让系统彻底死机,但是若干个进程(或者kernel thread)被锁死在了某个状态(一般在内核区域),很多情况下这个是由于内核锁的使用的问题。
解决办法:
echo 30 > /proc/sys/kernel/watchdog_thresh
临时生效
sysctl -w kernel.watchdog_thresh=30
vi /etc/sysctl.conf
kernel.watchdog_thresh=30
修改后继续运行 insmod tcrypt.ko sec=2 mode=400
仍出现以下信息
kernel:BUG: workqueue lockup - pool
kernel:BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 42s!
追踪堆栈信息
请教后,老大指出内核软死锁是由于一次IO下发命令过多导致,问题结束。