内核调试之trace-kprobe

trace-kprobe

简介

在调试内核的时候要跟踪函数有没有执行或者返回值等等,kprobe可以实现这些,用代码写的kprobe模块还可以修改返回值。这篇主要介绍kprobe在trace下的使用。

本文以 do_filp_open 函数为例,来看一下kprobe在trace里的基本使用,do_filp_open代码如下:

struct file *do_filp_open(int dfd, struct filename *pathname,
		const struct open_flags *op)
{
	struct nameidata nd;
	int flags = op->lookup_flags;
	struct file *filp;

	set_nameidata(&nd, dfd, pathname);
	filp = path_openat(&nd, op, flags | LOOKUP_RCU);
	if (unlikely(filp == ERR_PTR(-ECHILD)))
		filp = path_openat(&nd, op, flags);
	if (unlikely(filp == ERR_PTR(-ESTALE)))
		filp = path_openat(&nd, op, flags | LOOKUP_REVAL);
	restore_nameidata();
	return filp;
}

查看入参以及返回值

首先,想看一下入参dfd和pathname里的文件名。dfd是开始查找的文件描述符,一般不指定的话是AT_FDCWD(FFFFFFFFFFFFFF9C), pathname是个结构体文件名保存在 struct filename->name里。我们先找一下name在struct filename里的偏移:

$ gdb vmlinux # 直接用gdb调试vmlinux

(gdb) p &(((struct filename *)0)->name)
$1 = (const char **) 0x0  # name的偏移值为0

参数在寄存器里存储, x86_64参数寄存器 第1~6的参数: %rdi,%rsi,%rdx,%rcx,%r8,%r9, 用下面脚本测试:

#!/bin/bash
trace_dir=/sys/kernel/debug/tracing/

# 下面echo中的 %si 是存储pathname的寄存器,它是个结构体地址,用 +0(%si) 可以取出它的值, +0是偏移量,也就是name的地址,在外面再用一个+0(),则取出的是name的值
echo 'p:t1 do_filp_open dfd=%di name=+0(+0(%si)):string' >> $trace_dir/kprobe_events

# $retval是返回值
echo 'r:t2 do_filp_open ret=$retval' >> $trace_dir/kprobe_events 

echo 1 > $trace_dir/events/kprobes/t1/enable
echo 1 > $trace_dir/events/kprobes/t2/enable
echo 1 > $trace_dir/tracing_on
cat testfile
echo 0 > $trace_dir/events/kprobes/t1/enable
echo 0 > $trace_dir/events/kprobes/t2/enable
echo 0 > $trace_dir/tracing_on
echo > $trace_dir/kprobe_events

打印如下:

<...>-7317  [006] .... 88731.883298: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile"
<...>-7317  [006] d... 88731.883304: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e48e0400"

可以看到dfd和name的值与我们预期的一样, dfd是0xffffff9c(AT_FDCWD), name是testfile。do_filp_open的返回值是0xffff9c04e48e0400,也就是打开的file对象的指针。

在struct file->f_path.dentry->d_name.name里也保存了文件名,我们来验证一下,返回值里的文件名就是我们打开的文件名,各结构的偏移如下:

(gdb) p &((struct file *)0)->f_path.dentry
$1 = (struct dentry **) 0x18 
(gdb) p &((struct dentry *)0)->d_name.name
$2 = (const unsigned char **) 0x28 

因为file里的f_path, dentry里的d_name不是指针,所以用这种方法获取它的地址偏移。
把上面脚本中观察返回值的语句改成如下:

echo 'r:t2 do_filp_open ret=$retval ret_name=+0(+40(+24($retval))):string' >> $trace_dir/kprobe_events

打印值如下:

<...>-7469  [006] .... 90981.880665: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile"
<...>-7469  [006] d... 90981.880673: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb500 ret_name="testfile"

查看代码中的返回值

比如,我想查看filp = path_openat(&nd, op, flags | LOOKUP_RCU);这句代码的返回值.
首先反汇编vmlinx, 找到path_openat对应的代码,如下:

ffffffff812e57f0 :
... # 省略代码
ffffffff812e5869:       48 89 45 a8             mov    %rax,-0x58(%rbp)
ffffffff812e586d:       83 ca 40                or     $0x40,%edx
ffffffff812e5870:       4c 89 ee                mov    %r13,%rsi
ffffffff812e5873:       4c 89 e7                mov    %r12,%rdi
ffffffff812e5876:       65 48 8b 04 25 80 5c    mov    %gs:0x15c80,%rax
ffffffff812e587d:       01 00
ffffffff812e587f:       4c 89 a0 50 0b 00 00    mov    %r12,0xb50(%rax)
ffffffff812e5886:       e8 d5 d6 ff ff          callq  ffffffff812e2f60 
ffffffff812e588b:       48 83 f8 f6             cmp    $0xfffffffffffffff6,%rax
ffffffff812e588f:       48 89 c3                mov    %rax,%rbx
ffffffff812e5892:       74 33                   je     ffffffff812e58c7 

从汇编代码可以看出调用是在ffffffff812e5886: e8 d5 d6 ff ff callq ffffffff812e2f60 这一行, 我们应该在下一行来跟踪它的返回值,也就是ffffffff812e588b: 48 83 f8 f6 cmp $0xfffffffffffffff6,%rax,fffffffffffffff6(-10)就是 -ECHILD。

首先计算一下要跟踪的代码代码到do_filp_open偏移:
offset = ffffffff812e588b - ffffffff812e57f0 = 9B = 155

我们把脚本改成下成来观察:

#!/bin/bash
trace_dir=/sys/kernel/debug/tracing/
echo 'p:t1 do_filp_open dfd=%di name=+0(+0(%si)):string' >> $trace_dir/kprobe_events
echo 'r:t2 do_filp_open ret=$retval ret_name=+0(+40(+24($retval))):string' >> $trace_dir/kprobe_events

# 打印 path_openat 的返回值
echo 'p:t3 do_filp_open+155 fp=%ax' >> $trace_dir/kprobe_events

echo 1 > $trace_dir/events/kprobes/t1/enable
echo 1 > $trace_dir/events/kprobes/t2/enable
echo 1 > $trace_dir/events/kprobes/t3/enable
echo 1 > $trace_dir/tracing_on
cat testfile
cat testfile2 #这个文件不存在
echo 0 > $trace_dir/events/kprobes/t1/enable
echo 0 > $trace_dir/events/kprobes/t2/enable
echo 0 > $trace_dir/events/kprobes/t3/enable
echo 0 > $trace_dir/tracing_on
echo > $trace_dir/kprobe_events

这个testfile2不存在,trace打印如下:

<...>-7492  [006] .... 91462.485985: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile"
<...>-7492  [006] d.Z. 91462.485992: t3: (do_filp_open+0x9b/0x110) fp=0xffff9c04e2ae7900
<...>-7492  [006] d... 91462.485992: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e2ae7900 ret_name="testfile"

......

<...>-7493  [006] .... 91462.486939: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile2"
<...>-7493  [006] d.Z. 91462.486946: t3: (do_filp_open+0x9b/0x110) fp=0xfffffffffffffffe
<...>-7493  [006] d... 91462.486949: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xfffffffffffffffe ret_name=(fault)

从上面日志可知,打开testfile时,这个文件存在,返回的是正常的file指针。testfile2不存在,返回了0xfffffffffffffffe,这个值是 -2, 也就是 -ENOENT,返回值的文件名当然显示不出来,显示 fault 。

设置过滤器

trace的打印日志非常多,可以设置过滤器,只打印我们想要的东西。trace的过滤器是filter文件,可以在里面用 && || ! == > < 等这些符号,只有filter里的条件为真时,才会执行打印。
我们脚本中增加t1的过滤器,t1只显示文件名为testfile和testfile2的,脚本如下:

#!/bin/bash
trace_dir=/sys/kernel/debug/tracing/

echo 'p:t1 do_filp_open dfd=%di name=+0(+0(%si)):string' >> $trace_dir/kprobe_events
echo 'r:t2 do_filp_open ret=$retval ret_name=+0(+40(+24($retval))):string' >> $trace_dir/kprobe_events

# 当打开的文件名是testfile或testfile2时才打印
echo 'name=="testfile" || name=="testfile2"' >> $trace_dir/events/kprobes/t1/filter

echo 1 > $trace_dir/events/kprobes/t1/enable
echo 1 > $trace_dir/events/kprobes/t2/enable
echo 1 > $trace_dir/tracing_on
cat testfile
cat testfile2
echo 0 > $trace_dir/events/kprobes/t1/enable
echo 0 > $trace_dir/events/kprobes/t2/enable
echo 0 > $trace_dir/tracing_on
echo > $trace_dir/kprobe_events

打印如下:

<...>-7628  [006] d... 92891.051256: t2: (do_open_execat+0x83/0x190 <- do_filp_open) ret=0xffff9c04e8bcae00 ret_name="cat"
<...>-7628  [006] d... 92891.051291: t2: (do_open_execat+0x83/0x190 <- do_filp_open) ret=0xffff9c04e8bca300 ret_name="ld-2.28.so"
<...>-7628  [006] d... 92891.051410: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcbf00 ret_name="ld.so.cache"
<...>-7628  [006] d... 92891.051428: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb800 ret_name="libc-2.28.so"
<...>-7628  [006] d... 92891.051606: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb100 ret_name="locale-archive"
<...>-7628  [006] .... 92891.051659: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile"
<...>-7628  [006] d... 92891.051666: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bca800 ret_name="testfile"
trace_open.sh-7627  [005] d... 92891.051827: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e488b000 ret_name="xterm-256color"
<...>-7629  [006] d... 92891.052110: t2: (do_open_execat+0x83/0x190 <- do_filp_open) ret=0xffff9c04e8bca600 ret_name="cat"
<...>-7629  [006] d... 92891.052133: t2: (do_open_execat+0x83/0x190 <- do_filp_open) ret=0xffff9c04e8bca400 ret_name="ld-2.28.so"
<...>-7629  [006] d... 92891.052242: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcbe00 ret_name="ld.so.cache"
<...>-7629  [006] d... 92891.052258: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb600 ret_name="libc-2.28.so"
<...>-7629  [006] d... 92891.052425: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb900 ret_name="locale-archive"
<...>-7629  [006] .... 92891.052467: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile2"
<...>-7629  [006] d... 92891.052475: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xfffffffffffffffe ret_name=(fault)
<...>-7629  [006] d... 92891.052508: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bca700 ret_name="locale.alias"
<...>-7629  [006] d... 92891.052544: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xfffffffffffffffe ret_name=(fault)
<...>-7629  [006] d... 92891.052553: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xfffffffffffffffe ret_name=(fault)
<...>-7629  [006] d... 92891.052559: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb200 ret_name="libc.mo"
<...>-7629  [006] d... 92891.052592: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bca000 ret_name="gconv-modules.cache"
trace_open.sh-7627  [005] d... 92891.052716: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e488af00 ret_name="enable"
trace_open.sh-7627  [005] d... 92891.070695: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e488b000 ret_name="enable"

可以看出t1只打印了testfile和testfile2,t2全部打印出来了。

你可能感兴趣的:(Kernel,kernel)