hung_task_timeout_secs 导致测试panic

测试时遇到的kernel panic

kern  :err   : [  616.222229] INFO: task mdadm:3234 blocked for more than 120 seconds.
kern  :err   : [  616.228548]       Not tainted 5.0.0 #1
kern  :err   : [  616.229854] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kern  :info  : [  616.233379] mdadm           D    0  3234      1 0x00000000
kern  :warn  : [  616.243301] Call Trace:
kern  :warn  : [  616.244494]  ? __schedule+0x256/0x850
kern  :warn  : [  616.245617]  schedule+0x33/0x80
kern  :warn  : [  616.246685]  mddev_suspend+0x11b/0x190
kern  :warn  : [  616.247790]  ? finish_wait+0x80/0x80
kern  :warn  : [  616.248859]  suspend_lo_store+0x60/0xc0
kern  :warn  : [  616.249952]  md_attr_store+0x83/0xd0
kern  :warn  : [  616.251045]  kernfs_fop_write+0x116/0x190
kern  :warn  : [  616.252587]  __vfs_write+0x36/0x1b0
kern  :warn  : [  616.260611]  ? selinux_file_permission+0xf0/0x130
kern  :warn  : [  616.266710]  ? _cond_resched+0x19/0x30
kern  :warn  : [  616.268379]  vfs_write+0xb7/0x1b0
kern  :warn  : [  616.269872]  ksys_write+0x4f/0xb0
kern  :warn  : [  616.271475]  do_syscall_64+0x5b/0x1b0
kern  :warn  : [  616.273464]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kern  :warn  : [  616.275352] RIP: 0033:0x7f06141d8730
kern  :warn  : [  616.277155] Code: Bad RIP value.
kern  :warn  : [  616.278694] RSP: 002b:00007fffc03cdb08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
kern  :warn  : [  616.281593] RAX: ffffffffffffffda RBX: 000000005c7e568e RCX: 00007f06141d8730
kern  :warn  : [  616.284438] RDX: 0000000000000013 RSI: 00007fffc03cdbf0 RDI: 0000000000000004
kern  :warn  : [  616.287056] RBP: 00007fffc03cdbc0 R08: 000000000000eeff R09: 000000000000001d
kern  :warn  : [  616.292241] R10: 0000000000000073 R11: 0000000000000246 R12: 000055bbb843c800
kern  :warn  : [  616.294510] R13: 00007fffc03cec30 R14: 0000000000000000 R15: 0000000000000000

 Google后给出如下解释:
This is a know bug. By default Linux uses up to 40% of the available memory for file system caching.
After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous.
For flushing out this data to disk this there is a time limit of 120 seconds by default.
In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds.
This especially happens on systems with a lof of memory.

The problem is solved in later kernels and there is not “fix” from Oracle.
I fixed this by lowering the mark for flushing the cache from 40% to 10% by setting “vm.dirty_ratio=10″ in /etc/sysctl.conf.
This setting does not influence overall database performance since you hopefully use Direct IO and bypass the file system cache completely.

 

linux会设置40%(一般情况,可以通过 cat /proc/sys/vm/dirty_ratio 自行查看)的可用内存

用来做系统cache,当flush数据时这40%内存中的数据由于和IO同步问题导致超时(120s),会触发这个问题。

 

根本原因:

与kconfig CONFIG_DETECT_HUNG_TASK 有关, enable 这个kconfig 会自动设置 CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120

不多逼逼,自己看相关文档

https://cateee.net/lkddb/web-lkddb/DETECT_HUNG_TASK.html

https://cateee.net/lkddb/web-lkddb/DEFAULT_HUNG_TASK_TIMEOUT.html

 

解决方法:

1. /sbin/sysctl -w vm.dirty_ratio=10 (降低cache占可用内存的比例)

2. echo noop > /sys/block/sda/queue/scheduler (更改I/O调度算法,没有尝试过,不太理解调度。)

3 ./sbin/sysctl -w kernel.hung_task_timeout_secs = 0 / echo 0 > /proc/sys/kernel/hung_task_timeout_secs

(不设置timeout 时间)

4. 禁用CONFIG_DETECT_HUNG_TASK

 

你可能感兴趣的:(工作笔记)