[HOWTO]: Linux/Android常用调试工具

本文介绍Linux/Android一些常用的调试工具及其使用说明,作为备忘,持续更新中。

注意:大部分都不是本人原创,是从各地方搜集而来,原作者也未一一追溯,所以没有出处说明,如有冒犯,请评论或私信,我会尽快修改。


FIQ-Debugger

fiq debugger是集成到内核中的一种系统调试手段。

FIQ在arm架构中相当于nmi中断,fiq debugger把串口注册成fiq中断,在串口fiq中断服务程序中集成了一些系统调试命令。

一般情况下串口是普通的console模式,minicom下输入切换命令"Ctrl + A + F",串口会切换到fiq debugger模式。

因为FIQ是不可屏蔽中断,所以这种调试手段适合调试cpu被hang住的情况,可以在hang住的时候用fiq debugger打印出cpu的故障现场,常用命令是sysrq。

要使用fiq debugger,需要内核配置:

CONFIG_FIQ_DEBUGGER                         // 使能fiq debugger
CONFIG_FIQ_DEBUGGER_CONSOLE                 // fiq debugger与console可以互相切换
CONFIG_FIQ_DEBUGGER_CONSOLE_DEFAULT_ENABLE  // 启动时默认串口在console模式
Fiq debugger相关使用命令:

debug> help
FIQ Debugger commands:
 pc            PC status
 regs          Register dump
 allregs       Extended Register dump
 bt            Stack trace
 reboot [<c>]  Reboot with command <c>
 reset [<c>]   Hard reset with command <c>
 irqs          Interupt status
 sleep         Allow sleep while in FIQ
 nosleep       Disable sleep while in FIQ
 console       Switch terminal to console
 cpu           Current CPU
 cpu <number>  Switch to CPU<number>
 ps            Process list
 sysrq         sysrq options
 sysrq <param> Execute sysrq with <param>


SysRq

在定位死机问题时,有时会碰到这样的场景:系统挂死,但是又不复位。系统不主动复位就无法获得复位之前打印出的故障堆栈信息,在这种情况下,如果系统中断还是使能的情况下,可以使用组合键调用sysrq的方式来主动dump出系统堆栈信息。

要想启用SysRq,需要在配置内核选项CONFIG_MAGIC_SYSRQ。对于支持SysRq的内核,/proc/sys/kernel/sysrq控制SysRq的启用与否。关于 sysrq的更多描述,请参考内核文档Documentation/sysrq.txt。

SysRq一系列的调试命令如下:

*  What are the 'command' keys?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'b'     - Will immediately reboot the system without syncing or unmounting your disks.
'c'    - Will perform a system crash by a NULL pointer dereference. A crashdump will be taken if configured.
'd'    - Shows all locks that are held.
'e'     - Send a SIGTERM to all processes, except for init.
'f'    - Will call oom_kill to kill a memory hog process.
'g'    - Used by kgdb (kernel debugger)
'h'     - Will display help (actually any other key than those listed here will display help. but 'h' is easy to remember :-)
'i'     - Send a SIGKILL to all processes, except for init.
'j'     - Forcibly "Just thaw it" - filesystems frozen by the FIFREEZE ioctl.
'k'     - Secure Access Key (SAK) Kills all programs on the current virtual console. NOTE: See important comments below in SAK section.
'l'     - Shows a stack backtrace for all active CPUs.
'm'     - Will dump current memory info to your console.
'n'    - Used to make RT tasks nice-able
'o'     - Will shut your system off (if configured and supported).
'p'     - Will dump the current registers and flags to your console.
'q'     - Will dump per CPU lists of all armed hrtimers (but NOT regular timer_list timers) and detailed information about all
          clockevent devices.
'r'     - Turns off keyboard raw mode and sets it to XLATE.
's'     - Will attempt to sync all mounted filesystems.
't'     - Will dump a list of current tasks and their information to your console.
'u'     - Will attempt to remount all mounted filesystems read-only.
'v'    - Forcefully restores framebuffer console 'v'    - Causes ETM buffer dump [ARM-specific]
'w'    - Dumps tasks that are in uninterruptable (blocked) state.
'x'    - Used by xmon interface on ppc/powerpc platforms.
'y'    - Show global CPU Registers [SPARC-64 specific]
'z'    - Dump the ftrace buffer
'0'-'9' - Sets the console log level, controlling which kernel messages will be printed to your console. ('0', for example would make
          it so that only emergency messages like PANICs or OOPSes would make it to your console.)
*  Okay, so what can I use them for?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
如我们调试hang(死机无响应问题)时,需要找出进程状态是D的进程(这种进程是Uninterruptible Sleep,不接受任何外来信号,即是说用kill无法杀死这些进程):

echo w > /proc/sysrq-trigger

P.S. 列一下Process/Thread 状态:

"R (running)",  /*   0 */
"S (sleeping)",  /*   1 */
"D (disk sleep)", /*   2 */
"T (stopped)",  /*   4 */
"t (tracing stop)", /*   8 */
"Z (zombie)",  /*  16 */
"X (dead)",  /*  32 */
"x (dead)",  /*  64 */
"K (wakekill)",  /* 128 */
"W (waking)",  /* 256 */

通常一般的Process处于的状态都是S(sleeping),而如果一旦发现处于如D(disk sleep)、T(stopped)、Z(zombie)等就要认真审查。

debuggerd

debuggerd是android的一个daemon进程,负责在进程异常出错时,将进程的运行时信息dump出来供分析。debuggerd生成的coredump数据是以文本形式呈现,被保存在 /data/tombstone/ 目录下(名字取的也很形象,tombstone是墓碑的意思),共可保存10个文件,当超过10个时,会覆盖重写最早生成的文件。从4.2版本开始,debuggerd同时也是一个实用工具:可以在不中断进程执行的情况下打印当前进程的native堆栈;使用方法是:debuggerd -b <pid>

这可以协助我们分析进程执行行为,但最有用的地方是:它可以非常简单的定位到native进程中锁死或错误逻辑引起的死循环的代码位置。


devmem

busybox中集成了一个直接读写物理内存的工具devmem:

devmem is a small program that reads and writes from physical memory using /dev/mem.

Usage: devmem ADDRESS [WIDTH [VALUE]]

例如,我们需要了解一些GPIO引脚的配置,由于这些GPIO配置寄存器会映射到一个特别的内存段上,即SFR(Special Function Registers),我们读取相应的内存地址就可以了,如下读取0x13470000的值然后往0x13470000写入0x0:

# busybox devmem 0x13470000 32                                 
0x00022222
# busybox devmem 0x13470000 32 0x0

--to be continued...

你可能感兴趣的:([HOWTO]: Linux/Android常用调试工具)