linux内核死锁检测

一、死锁
死锁是两个或者多个进程/线程竞争资源造成相互等待的现象。

举例:如A进程需要资源X,进程B需要资源Y,但X资源被B所占用,Y资源被A占用,且都不释放,造成死锁。

常见的死锁:

1、递归死锁

2、 AB-BA死锁

检测技术:Lockdep
原理:其跟踪每个锁的自身状态和各个锁之间的依赖关系,经过规则验证来保证依赖的关系正确。

二、Lockdep 内核配置
自旋锁与互斥锁

在内核文件lib/Kconfig.debug中有详细的描述

CONFIG_DEBUG_LOCKDEP 在死锁发生,内核报告相应的死锁
CONFIG_PROVE_LOCKING=y
CONFIG_LOCK_STAT 追踪锁竞争的点,解释的更详细
CONFIG_DEBUG_RI_MUTEXES 实时互斥锁语义相关的死锁
CONFIG_DEBUG_LOCK_ALLOC 检测不正确的活锁(live lock)释放
CONFIG_DEBUG_ATOMIC_SLEEP 检测原子内睡眠
CONFIG_DEBUG_LOCKING_API_SELFTESTS 锁API引导时间自检
CONFIG_LOCK_TORTURE_TEST 锁的测试
Kernel hacking->Lock Debugging

将这些全打开在内核调试模式下是可以的,但是在生产环境中最好不要打开,因为占用太多内存,牺牲内核的速度。

输出的报告
WARN*()
deadlocks/lock inversion scenarios,
circular lock dependencies,
and hard IRQ/soft IRQ safe/unsafe locking bugs
三、死锁检测实例
1、试验一:隐藏的加锁
1)程序的简化版

do_each_thread(g, t) { /* ‘g’ : process ptr; ‘t’: thread ptr */
task_lock(t);
[ … ]
get_task_comm(tasknm, t);
task_unlock(t);
}
使用迭代的方式获取线程的数据结构信息。先上锁,获取任务信息,再解锁。看起来没有问题。

2)内核输出

#insmod thrd_showall_buggy.ko
[ 1404.479012] thrd_showall_buggy: loading out-of-tree module taints kernel.
[ 1404.484444] thrd_showall_buggy: module verification failed: signature and/or required key missing - tainting kernel
[ 1404.510962] thrd_showall_buggy: inserted
[ 1404.516417] ============================================
[ 1404.517142] WARNING: possible recursive locking detected
[ 1404.517826] 5.0.0+ #2 Tainted: G OE
[ 1404.518250] --------------------------------------------
[ 1404.518432] insmod/1348 is trying to acquire lock:
[ 1404.519375] 000000001759de9e (&(&p->alloc_lock)->rlock){+.+.}, at: __get_task_comm+0x38/0x88
[ 1404.521885]
[ 1404.521885] but task is already holding lock:
[ 1404.522282] 000000001759de9e (&(&p->alloc_lock)->rlock){+.+.}, at: showthrds_buggy+0x9c/0x504 [thrd_showall_buggy]
[ 1404.523738]
[ 1404.523738] other info that might help us debug this:
[ 1404.524108] Possible unsafe locking scenario:
[ 1404.524108]
[ 1404.524359] CPU0
[ 1404.524451] ----
[ 1404.524588] lock(&(&p->alloc_lock)->rlock);
[ 1404.524774] lock(&(&p->alloc_lock)->rlock);
[ 1404.525658]
[ 1404.525658] *** DEADLOCK ***
[ 1404.525658]
[ 1404.526054] May be due to missing lock nesting notation
[ 1404.526054]
[ 1404.526665] 1 lock held by insmod/1348:
[ 1404.527124] #0: 000000001759de9e (&(&p->alloc_lock)->rlock){+.+.}, at: showthrds_buggy+0x9c/0x504 [thrd_showall_buggy]
[ 1404.528286]
[ 1404.528286] stack backtrace:
[ 1404.529195] CPU: 1 PID: 1348 Comm: insmod Kdump: loaded Tainted: G OE 5.0.0+ #2
[ 1404.530369] Hardware name: linux,dummy-virt (DT)
[ 1404.531230] Call trace:
[ 1404.531459] dump_backtrace+0x0/0x52c
3)输出分析

WARNING: possible recursive locking detected 循环锁检测
[ 1404.518432] insmod/1348 is trying to acquire lock://尝试获取锁
000000001759de9e (&(&p->alloc_lock)->rlock){+.+.}, at: __get_task_comm+0x38/0x88

函数名 偏移值以及函数大小,方便进行定位

[ 1404.521885] but task is already holding lock://已经上锁

at: showthrds_buggy+0x9c/0x504 [thrd_showall_buggy]//任务

符号{+.+.}的含义
‘+’ 意味着在启用 IRQ 的情况下获取锁定

‘.’ 意味着在禁用 IRQ 的情况下获取锁定,而不是在 IRQ 上下文中获取

具体含义参考:https://www.kernel.org/doc/Documentation/ locking/lockdep-design.txt

通过上面的分析看出,get_task_comm函数尝试获取相同锁,导致死锁,查看这个内核函数,果然有进行上锁。

#define get_task_comm(buf, tsk) ({
BUILD_BUG_ON(sizeof(buf) != TASK_COMM_LEN);
__get_task_comm(buf, sizeof(buf), tsk);
})

char *__get_task_comm(char *buf, size_t buf_size, struct task_struct *tsk)
{
task_lock(tsk);
strncpy(buf, tsk->comm, buf_size);
task_unlock(tsk);
return buf;
}
EXPORT_SYMBOL_GPL(__get_task_comm);
4)解决方法:简化版

do_each_thread(g, t) {
task_lock(t);

task_unlock(t);
get_task_comm(tasknm, t);
task_lock(t);

task_unlock(t);
}

2、试验二:AB-BA锁
1)模型

2)程序中的锁的顺序

线程1 :
spin_lock(&lockA);
spin_lock(&lockB);

     spin_unlock(&lockB);
     spin_unlock(&lockA);

线程2:
spin_lock(&lockB);
spin_lock(&lockA);

     spin_unlock(&lockA);
     spin_unlock(&lockB);

3)内核检测输出

insmod deadlock_eg_AB-BA.ko lock_ooo=1
key missing - tainting kernel
[ 190.895374] deadlock_eg_AB-BA: inserted (param: lock_ooo=1)
[ 190.924925] thrd_work():115: *** thread PID 1616 on cpu 0 now ***
[ 190.936420] thrd_work():115: *** thread PID 1617 on cpu 1 now ***
[ 190.937541] iteration #0 on cpu #1
[ 190.938060] Thread #0: locking: we do: lockA --> lockB
[ 190.939223] Thread #1: locking: we do: lockB --> lockA
[ 190.941822] iteration #0 on cpu #0
[ 190.946014] B
[ 190.946185] A
[ 190.946231] B
[ 190.949057] A
[ 190.949818] A
[ 190.950828] irq event stamp: 12493
[ 190.950846]
[ 190.952232] hardirqs last enabled at (12493): [] kmem_cache_free+0x6b0/0x1178
[ 190.953328] hardirqs last disabled at (12492): [] kmem_cache_free+0x660/0x1178
[ 190.953951] ======================================================
[ 190.953983] WARNING: possible circular locking dependency detected
[ 190.955155] softirqs last enabled at (12436): [] fpsimd_restore_current_state+0x4fc/0x53c
[ 190.957546] 5.0.0+ #2 Tainted: G OE
[ 190.957646] ------------------------------------------------------
[ 190.957880] softirqs last disabled at (12434): [] fpsimd_restore_current_state+0x328/0x53c
[ 190.960741] thrd_0/0/1616 is trying to acquire lock:
[ 190.964268] (ptrval) (lockB){+.+.}, at: thrd_work+0x1e8/0x6c0 [deadlock_eg_AB_BA]
[ 190.973906]
[ 190.973906] but task is already holding lock:
[ 190.975638] (ptrval) (lockA){+.+.}, at: thrd_work+0x130/0x6c0 [deadlock_eg_AB_BA]
[ 190.979383]
[ 190.979383] which lock already depends on the new lock.
[ 190.979383]
[ 190.984836]
[ 190.984836] the existing dependency chain (in reverse order) is:
[ 190.989808]
[ 190.989808] -> #1 (lockA){+.+.}:
[ 190.991925] validate_chain+0x1250/0x14a0
[ 190.992364] __lock_acquire+0xae4/0xc08
[ 190.993684] lock_acquire+0x664/0x6b8
[ 190.998583] _raw_spin_lock+0x54/0xb0
[ 190.999824] thrd_work+0x3f0/0x6c0 [deadlock_eg_AB_BA]
[ 191.001019] kthread+0x3c0/0x3cc
[ 191.002301]
[ 191.002301] -> #0 (lockB){+.+.}:
[ 191.006355] check_prevs_add+0x148/0x2cc
[ 191.007502] validate_chain+0x1250/0x14a0
[ 191.010230] __lock_acquire+0xae4/0xc08
[ 191.011146] lock_acquire+0x664/0x6b8
[ 191.012896] _raw_spin_lock+0x54/0xb0
[ 191.016753] thrd_work+0x1e8/0x6c0 [deadlock_eg_AB_BA]
[ 191.020368] kthread+0x3c0/0x3cc
[ 191.022408]
[ 191.022408] other info that might help us debug this:
[ 191.022408]
[ 191.025625] Possible unsafe locking scenario:
[ 191.025625]
[ 191.030342] CPU0 CPU1
[ 191.034011] ---- ----
[ 191.035514] lock(lockA);
[ 191.037973] lock(lockB);
[ 191.042529] lock(lockA);
[ 191.045536] lock(lockB);
[ 191.047178]
[ 191.047178] *** DEADLOCK ***
[ 191.047178]
[ 191.051286] 1 lock held by thrd_0/0/1616:
[ 191.053763] #0: (ptrval) (lockA){+.+.}, at: thrd_work+0x130/0x6c0 [deadlock_eg_AB_BA]
[ 191.058936]
[ 191.058936] stack backtrace:
[ 191.060266] CPU: 0 PID: 1616 Comm: thrd_0/0 Kdump: loaded Tainted: G OE 5.0.0+ #2
[ 191.061426] Hardware name: linux,dummy-virt (DT)
[ 191.062226] Call trace:
[ 191.062582] dump_backtrace+0x0/0x52c
[ 191.063168] show_stack+0x24/0x30

4)分析

WARNING: possible circular locking dependency detected
Possible unsafe locking scenario:
[ 191.030342] CPU0 CPU1
[ 191.034011] ---- ----
[ 191.035514] lock(lockA);
[ 191.037973] lock(lockB);
[ 191.042529] lock(lockA);
[ 191.045536] lock(lockB);
四、锁统计
内核提供锁统计信息,以便轻松识别竞争激烈的锁。

锁可以被争用,也就是说,当上下文想要获取锁,但它已经被占用了,所以它必须等待解锁发生。激烈的争用可能会造成严重的性能瓶颈;

内核配置 CONFIG_LOCK_STAT

命令行

清空锁的状态:echo 0 > /proc/lock_stat

使能锁:echo 1 > /proc/sys/kernel/lock_stat

不使能锁:echo 0 > /proc/sys/kernel/lock_stat
五、lockdep编程的建议
使用lockdep_assert_held宏 源码位置// include/linux/lockdep.h

#define lockdep_assert_held(l) do {
WARN_ON(debug_locks && !lockdep_is_held(l));
} while (0)

#define lockdep_assert_held_write(l) do {
WARN_ON(debug_locks && !lockdep_is_held_type(l, 0));
} while (0)

#define lockdep_assert_held_read(l) do {
WARN_ON(debug_locks && !lockdep_is_held_type(l, 1));
} while (0)

#define lockdep_assert_held_once(l) do {
WARN_ON_ONCE(debug_locks && !lockdep_is_held(l));
} while (0)
如果断言失败,则会通过WARN_ON发出警告。

六、lockdep 使用可能存在的问题
存在的问题

重复加载和卸载模块可能会导致超出 lockdep 的内部锁定类限制。实际上,要么不要重复加载/卸载模块,要么重置系统。

在数据结构比较大的情况下,需要巨大的锁,未能正确初始化每个锁都可能导致lockdep溢出。

提示信息:WARNING lock debugging disabled!! - possibly due to a lockdep warning. 这可能是由于lockdep提前发出警告而发生的。

解决办法

重新启动系统并重试。
KCSAN能检测到数据竞争
deadlock eBPF提供的脚本
helgrind ,TSan 工具 检查多线程应用程序中的数据争用

你可能感兴趣的:(linux,算法,运维)