Linux内核异常自动重启(watchdog and panic)

Linux内核异常自动重启(watchdog and panic)

一般嵌入式设备都有 hardware watchdog,这样系统出现异常情况时,也能够自动重启而不是设备挂住。
如果系统没有hardware watchdog的前提下,Linux kernel发生panic时,也可以自动重启。
1)watchdog
busybox下的watchdog设置
# watchdog –help
BusyBox v1.27.1 (2018-06-12 09:43:31 CST) multi-call binary.

Usage: watchdog [-t N[ms]] [-T N[ms]] [-F] DEV

Periodically write to watchdog device DEV

    -T N    Reboot after N seconds if not reset (default 60)
    -t N    Reset every N seconds (default 30)
    -F      Run in foreground

Use 500ms to specify period in milliseconds
# watchdog -t 30 /dev/watchdog

watchdog测试
首先是通过ps查看有没有watchdog设置过:
#ps
1071 root 0:00 watchdog -t 30 /dev/watchdog
# kill -9 1071
[17925.103865] watchdog watchdog0: watchdog did not stop!
好像不能直接修改,只能先kill了,然后再设置:
# watchdog -T 1 -t 30 /dev/watchdog
等1s就可以重启,即可测试watchdog有没有开启。

默认是后态运行, -F Run in foreground是前台运行。

2)panic
该文件表示如果发生“内核严重错误(kernel panic)”,则内核在重新引导之前等待的时间(单位为秒)。0,表示在发生内核严重错误时将禁止自动重新引导。缺省设置:0
# cat /proc/sys/kernel/panic
0
# echo 5 > /proc/sys/kernel/panic
# cat /proc/sys/kernel/panic
5
3)oops
内核oops不同于panic,后者会导致OS重启,而设备驱动引发的oops通常不会如此。Oops是由于内核引用了无效指针;发生于用户空间程序通常产生一个段错误segfault,而用户态程序自身无法恢复;发生于内核空间时则称作oops。
# cat /proc/sys/kernel/panic_on_oops
0
# echo 1 > /proc/sys/kernel/panic_on_oops
# cat /proc/sys/kernel/panic_on_oops
1
#
4)怎么制造kernel panic
echo c > /proc/sysrq-trigger
/ # echo c > /proc/sysrq-trigger
[ 124.661311] sysrq: SysRq : Trigger a crash
[ 124.664614] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 124.671127] pgd = ffff800037dcd000
[ 124.673850] [00000000] *pgd=00000000b7dcf003, *pud=00000000b7db9003, *pmd=0000000000000000
[ 124.680493] Internal error: Oops: 96000046 [#1] SMP
[ 124.684399] Modules linked in: pfe(C)
[ 124.687340] CPU: 0 PID: 1057 Comm: sh Tainted: G C 4.1.35-rt41 #1
[ 124.693126] Hardware name: LS1012A RDB Board (DT)
[ 124.696894] task: ffff800039547180 ti: ffff8000397d8000 task.ti: ffff8000397d8000
[ 124.702896] PC is at sysrq_handle_crash+0x14/0x20
[ 124.706666] LR is at __handle_sysrq+0x11c/0x190
[ 124.710295] pc : [] lr : [] pstate: 60000145
[ 124.716219] sp : ffff8000397dbd40
[ 124.718870] x29: ffff8000397dbd40 x28: ffff8000397d8000
[ 124.723131] x27: ffff800000830000 x26: 0000000000000040
[ 124.727391] x25: 000000000000011a x24: 0000000000000015
[ 124.731650] x23: 0000000000000000 x22: 0000000000000004
[ 124.735910] x21: ffff800000bc3f50 x20: 0000000000000063
[ 124.740170] x19: ffff800000ba7000 x18: 00000000fffffff0
[ 124.744430] x17: 0000ffffb27ba280 x16: ffff80000019c9f0
[ 124.748690] x15: ffff800000ba7f20 x14: ffff800000c2ac48
[ 124.752950] x13: ffff800000ba7000 x12: ffff800000c2a000
[ 124.757210] x11: 0000000000000000 x10: 000000000000007c
[ 124.761469] x9 : 0000000000000170 x8 : 0000000000000002
[ 124.765729] x7 : 0000000000000001 x6 : 0000000000000170
[ 124.769989] x5 : 000000000000238c x4 : 0000000000000000
[ 124.774248] x3 : 0000000000000000 x2 : 0000000000000770
[ 124.778508] x1 : 0000000000000000 x0 : 0000000000000001
[ 124.782766]
[ 124.783957] Process sh (pid: 1057, stack limit = 0xffff8000397d8028)
[ 124.789048] Stack: (0xffff8000397dbd40 to 0xffff8000397dc000)
[ 124.793655] bd40: 397dbd80 ffff8000 003cd014 ffff8000 00000002 00000000 fffffffb ffffffff
[ 124.800211] bd60: 397dbeb8 ffff8000 00000002 00000000 80000000 00000000 00000015 00000000
[ 124.806767] bd80: 397dbda0 ffff8000 001f9300 ffff8000 397b1280 ffff8000 0019e258 ffff8000
[ 124.813323] bda0: 397dbdc0 ffff8000 0019b57c ffff8000 394f7b00 ffff8000 12acd760 00000000
[ 124.819880] bdc0: 397dbe40 ffff8000 0019bee0 ffff8000 394f7b00 ffff8000 12acd760 00000000
[ 124.826436] bde0: 397dbeb8 ffff8000 00000002 00000000 80000000 00000000 00000015 00000000
[ 124.832992] be00: 0000011a 00000000 394f7b00 ffff8000 397dbeb8 ffff8000 3967b600 ffff8000
[ 124.839548] be20: 397dbe40 ffff8000 0019beb0 ffff8000 394f7b00 ffff8000 12acd760 00000000
[ 124.846104] be40: 397dbe80 ffff8000 0019ca34 ffff8000 394f7b00 ffff8000 394f7b00 ffff8000
[ 124.852660] be60: 12acd760 00000000 00000002 00000000 80000000 00000000 00000000 00000000
[ 124.859216] be80: f8ebea80 0000ffff 00085db0 ffff8000 00000000 00000000 00000001 00000000
[ 124.865773] bea0: ffffffff ffffffff b27ba268 0000ffff 00000200 dead0000 00000000 00000000
[ 124.872329] bec0: 00000001 00000000 12acd760 00000000 00000002 00000000 12acd761 00000000
[ 124.878885] bee0: 12ac0063 00000000 00000000 00000000 80808080 00800000 00000010 00000000
[ 124.885441] bf00: 00000040 00000000 fffffff0 ffffffff 01010101 01010101 004f72e8 00000000
[ 124.891997] bf20: 01010101 01010101 00000000 00000000 004b84a8 00000000 00000008 00000000
[ 124.898553] bf40: 00000000 00000000 b27ba280 0000ffff 00000000 00000000 004f7000 00000000
[ 124.905109] bf60: 00000001 00000000 12acd760 00000000 00000002 00000000 12acd760 00000000
[ 124.911664] bf80: 00000020 00000000 00000000 00000000 004f7000 00000000 12ac8440 00000000
[ 124.918220] bfa0: 00000000 00000000 f8ebea80 0000ffff 0040e8f8 00000000 f8ebe110 0000ffff
[ 124.924776] bfc0: b27ba268 0000ffff 80000000 00000000 00000001 00000000 00000040 00000000
[ 124.931331] bfe0: 00000000 00000000 00000000 00000000 5f787200 6c616f63 65637365 6573755f
[ 124.937882] Call trace:
[ 124.939841] [] sysrq_handle_crash+0x14/0x20
[ 124.944446] [] write_sysrq_trigger+0x54/0x68
[ 124.949122] [] proc_reg_write+0x58/0x88
[ 124.953450] [] __vfs_write+0x1c/0xf8
[ 124.957568] [] vfs_write+0x90/0x1b8
[ 124.961616] [] SyS_write+0x44/0xa0
[ 124.965595] Code: 52800020 b9039820 d5033e9f d2800001 (39000020)
[ 124.970501] —[ end trace bff4483b3310fae9 ]—
[ 124.974199] Kernel panic - not syncing: Fatal exception
[ 124.978384] Rebooting in 5 seconds..

参考文章
【1】http://en.linuxreviews.org/Kernel_panic
【2】http://blog.51cto.com/changfei/1672700 linux系统参数注释
【3】https://szlin.me/2016/05/12/linux-kernel-對於系統發生kernel-panic-自動重啟之原理/
【4】https://stackoverflow.com/questions/49655943/how-to-create-a-kernel-panic-in-rhel-without-rebooting-after-panic
【5】https://unix.stackexchange.com/questions/66197/how-to-cause-kernel-panic-with-a-single-command

你可能感兴趣的:(学习)