Linux内核死锁检测

D状态检测

其核心思想为创建一个内核监测进程循环监测处于D状态的每一个进程(任务)。

内核配置:CONFIG_DETECT_HUNG_TASK

    Kernel hacking  --->
           [*] Detect Hung Tasks  
           (120) Default timeout for hung task detection (in seconds) (NEW)  
           [ ]   Panic (Reboot) On Hung Tasks (NEW)         

进程进入D状态时间超过120秒后打印

INFO: task sync:16015 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sync            D c0512378     0 16015   1807 0x00000000
[] (__schedule+0x1d0/0x414) from [] (io_schedule+0x64/0x8c)
[] (io_schedule+0x64/0x8c) from [] (sleep_on_page+0x8/0x10)
[] (sleep_on_page+0x8/0x10) from [] (__wait_on_bit+0x78/0xb0)
[] (__wait_on_bit+0x78/0xb0) from [] (wait_on_page_bit+0xb4/0xbc)
[] (wait_on_page_bit+0xb4/0xbc) from [] (filemap_fdatawait_range+0xd4/0x130)
[] (filemap_fdatawait_range+0xd4/0x130) from [] (filemap_fdatawait+0x38/0x40)
[] (filemap_fdatawait+0x38/0x40) from [] (sync_inodes_sb+0x108/0x13c)
[] (sync_inodes_sb+0x108/0x13c) from [] (iterate_supers+0xa4/0xec)
[] (iterate_supers+0xa4/0xec) from [] (sys_sync+0x34/0x9c)
[] (sys_sync+0x34/0x9c) from [] (ret_fast_syscall+0x0/0x30)

关闭打印:echo 0 > /proc/sys/kernel/hung_task_timeout_secs

也可手动检测,top或者ps查看进程状态,然后使用命令cat /proc/pid/status查看状态:State:    D (disk sleep),查看堆栈信息:cat /proc/pid/stack

R状态检测

   Kernel hacking  ---> 
      -*- Kernel debugging 
      [*]   Detect Hard and Soft Lockups 
      [ ]     Panic (Reboot) On Soft Lockups 

CONFIG_LOCKUP_DETECTOR=y

暂没有复现出R状态的卡住状态情况。

扩展

CONFIG_DEBUG_SPINLOCK=y 检测spinlock的未初始化使用等问题。配合NMI watchdog使用,能发现spinlock死锁。

CONFIG_DEBUG_MUTEXES=y 检测并报告mutex错误

 

你可能感兴趣的:(Linux)