在调试内核的时候要跟踪函数有没有执行或者返回值等等,kprobe可以实现这些,用代码写的kprobe模块还可以修改返回值。这篇主要介绍kprobe在trace下的使用。
本文以 do_filp_open 函数为例,来看一下kprobe在trace里的基本使用,do_filp_open代码如下:
struct file *do_filp_open(int dfd, struct filename *pathname,
const struct open_flags *op)
{
struct nameidata nd;
int flags = op->lookup_flags;
struct file *filp;
set_nameidata(&nd, dfd, pathname);
filp = path_openat(&nd, op, flags | LOOKUP_RCU);
if (unlikely(filp == ERR_PTR(-ECHILD)))
filp = path_openat(&nd, op, flags);
if (unlikely(filp == ERR_PTR(-ESTALE)))
filp = path_openat(&nd, op, flags | LOOKUP_REVAL);
restore_nameidata();
return filp;
}
首先,想看一下入参dfd和pathname里的文件名。dfd是开始查找的文件描述符,一般不指定的话是AT_FDCWD(FFFFFFFFFFFFFF9C), pathname是个结构体文件名保存在 struct filename->name里。我们先找一下name在struct filename里的偏移:
$ gdb vmlinux # 直接用gdb调试vmlinux
(gdb) p &(((struct filename *)0)->name)
$1 = (const char **) 0x0 # name的偏移值为0
参数在寄存器里存储, x86_64参数寄存器 第1~6的参数: %rdi,%rsi,%rdx,%rcx,%r8,%r9, 用下面脚本测试:
#!/bin/bash
trace_dir=/sys/kernel/debug/tracing/
# 下面echo中的 %si 是存储pathname的寄存器,它是个结构体地址,用 +0(%si) 可以取出它的值, +0是偏移量,也就是name的地址,在外面再用一个+0(),则取出的是name的值
echo 'p:t1 do_filp_open dfd=%di name=+0(+0(%si)):string' >> $trace_dir/kprobe_events
# $retval是返回值
echo 'r:t2 do_filp_open ret=$retval' >> $trace_dir/kprobe_events
echo 1 > $trace_dir/events/kprobes/t1/enable
echo 1 > $trace_dir/events/kprobes/t2/enable
echo 1 > $trace_dir/tracing_on
cat testfile
echo 0 > $trace_dir/events/kprobes/t1/enable
echo 0 > $trace_dir/events/kprobes/t2/enable
echo 0 > $trace_dir/tracing_on
echo > $trace_dir/kprobe_events
打印如下:
<...>-7317 [006] .... 88731.883298: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile"
<...>-7317 [006] d... 88731.883304: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e48e0400"
可以看到dfd和name的值与我们预期的一样, dfd是0xffffff9c(AT_FDCWD), name是testfile。do_filp_open的返回值是0xffff9c04e48e0400,也就是打开的file对象的指针。
在struct file->f_path.dentry->d_name.name里也保存了文件名,我们来验证一下,返回值里的文件名就是我们打开的文件名,各结构的偏移如下:
(gdb) p &((struct file *)0)->f_path.dentry
$1 = (struct dentry **) 0x18
(gdb) p &((struct dentry *)0)->d_name.name
$2 = (const unsigned char **) 0x28
因为file里的f_path, dentry里的d_name不是指针,所以用这种方法获取它的地址偏移。
把上面脚本中观察返回值的语句改成如下:
echo 'r:t2 do_filp_open ret=$retval ret_name=+0(+40(+24($retval))):string' >> $trace_dir/kprobe_events
打印值如下:
<...>-7469 [006] .... 90981.880665: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile"
<...>-7469 [006] d... 90981.880673: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb500 ret_name="testfile"
比如,我想查看filp = path_openat(&nd, op, flags | LOOKUP_RCU);
这句代码的返回值.
首先反汇编vmlinx, 找到path_openat对应的代码,如下:
ffffffff812e57f0 :
... # 省略代码
ffffffff812e5869: 48 89 45 a8 mov %rax,-0x58(%rbp)
ffffffff812e586d: 83 ca 40 or $0x40,%edx
ffffffff812e5870: 4c 89 ee mov %r13,%rsi
ffffffff812e5873: 4c 89 e7 mov %r12,%rdi
ffffffff812e5876: 65 48 8b 04 25 80 5c mov %gs:0x15c80,%rax
ffffffff812e587d: 01 00
ffffffff812e587f: 4c 89 a0 50 0b 00 00 mov %r12,0xb50(%rax)
ffffffff812e5886: e8 d5 d6 ff ff callq ffffffff812e2f60
ffffffff812e588b: 48 83 f8 f6 cmp $0xfffffffffffffff6,%rax
ffffffff812e588f: 48 89 c3 mov %rax,%rbx
ffffffff812e5892: 74 33 je ffffffff812e58c7
从汇编代码可以看出调用是在ffffffff812e5886: e8 d5 d6 ff ff callq ffffffff812e2f60
这一行, 我们应该在下一行来跟踪它的返回值,也就是ffffffff812e588b: 48 83 f8 f6 cmp $0xfffffffffffffff6,%rax
,fffffffffffffff6(-10)就是 -ECHILD。
首先计算一下要跟踪的代码代码到do_filp_open偏移:
offset = ffffffff812e588b - ffffffff812e57f0 = 9B = 155
我们把脚本改成下成来观察:
#!/bin/bash
trace_dir=/sys/kernel/debug/tracing/
echo 'p:t1 do_filp_open dfd=%di name=+0(+0(%si)):string' >> $trace_dir/kprobe_events
echo 'r:t2 do_filp_open ret=$retval ret_name=+0(+40(+24($retval))):string' >> $trace_dir/kprobe_events
# 打印 path_openat 的返回值
echo 'p:t3 do_filp_open+155 fp=%ax' >> $trace_dir/kprobe_events
echo 1 > $trace_dir/events/kprobes/t1/enable
echo 1 > $trace_dir/events/kprobes/t2/enable
echo 1 > $trace_dir/events/kprobes/t3/enable
echo 1 > $trace_dir/tracing_on
cat testfile
cat testfile2 #这个文件不存在
echo 0 > $trace_dir/events/kprobes/t1/enable
echo 0 > $trace_dir/events/kprobes/t2/enable
echo 0 > $trace_dir/events/kprobes/t3/enable
echo 0 > $trace_dir/tracing_on
echo > $trace_dir/kprobe_events
这个testfile2不存在,trace打印如下:
<...>-7492 [006] .... 91462.485985: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile"
<...>-7492 [006] d.Z. 91462.485992: t3: (do_filp_open+0x9b/0x110) fp=0xffff9c04e2ae7900
<...>-7492 [006] d... 91462.485992: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e2ae7900 ret_name="testfile"
......
<...>-7493 [006] .... 91462.486939: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile2"
<...>-7493 [006] d.Z. 91462.486946: t3: (do_filp_open+0x9b/0x110) fp=0xfffffffffffffffe
<...>-7493 [006] d... 91462.486949: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xfffffffffffffffe ret_name=(fault)
从上面日志可知,打开testfile时,这个文件存在,返回的是正常的file指针。testfile2不存在,返回了0xfffffffffffffffe,这个值是 -2, 也就是 -ENOENT,返回值的文件名当然显示不出来,显示 fault 。
trace的打印日志非常多,可以设置过滤器,只打印我们想要的东西。trace的过滤器是filter文件,可以在里面用 && || ! == > < 等这些符号,只有filter里的条件为真时,才会执行打印。
我们脚本中增加t1的过滤器,t1只显示文件名为testfile和testfile2的,脚本如下:
#!/bin/bash
trace_dir=/sys/kernel/debug/tracing/
echo 'p:t1 do_filp_open dfd=%di name=+0(+0(%si)):string' >> $trace_dir/kprobe_events
echo 'r:t2 do_filp_open ret=$retval ret_name=+0(+40(+24($retval))):string' >> $trace_dir/kprobe_events
# 当打开的文件名是testfile或testfile2时才打印
echo 'name=="testfile" || name=="testfile2"' >> $trace_dir/events/kprobes/t1/filter
echo 1 > $trace_dir/events/kprobes/t1/enable
echo 1 > $trace_dir/events/kprobes/t2/enable
echo 1 > $trace_dir/tracing_on
cat testfile
cat testfile2
echo 0 > $trace_dir/events/kprobes/t1/enable
echo 0 > $trace_dir/events/kprobes/t2/enable
echo 0 > $trace_dir/tracing_on
echo > $trace_dir/kprobe_events
打印如下:
<...>-7628 [006] d... 92891.051256: t2: (do_open_execat+0x83/0x190 <- do_filp_open) ret=0xffff9c04e8bcae00 ret_name="cat"
<...>-7628 [006] d... 92891.051291: t2: (do_open_execat+0x83/0x190 <- do_filp_open) ret=0xffff9c04e8bca300 ret_name="ld-2.28.so"
<...>-7628 [006] d... 92891.051410: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcbf00 ret_name="ld.so.cache"
<...>-7628 [006] d... 92891.051428: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb800 ret_name="libc-2.28.so"
<...>-7628 [006] d... 92891.051606: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb100 ret_name="locale-archive"
<...>-7628 [006] .... 92891.051659: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile"
<...>-7628 [006] d... 92891.051666: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bca800 ret_name="testfile"
trace_open.sh-7627 [005] d... 92891.051827: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e488b000 ret_name="xterm-256color"
<...>-7629 [006] d... 92891.052110: t2: (do_open_execat+0x83/0x190 <- do_filp_open) ret=0xffff9c04e8bca600 ret_name="cat"
<...>-7629 [006] d... 92891.052133: t2: (do_open_execat+0x83/0x190 <- do_filp_open) ret=0xffff9c04e8bca400 ret_name="ld-2.28.so"
<...>-7629 [006] d... 92891.052242: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcbe00 ret_name="ld.so.cache"
<...>-7629 [006] d... 92891.052258: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb600 ret_name="libc-2.28.so"
<...>-7629 [006] d... 92891.052425: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb900 ret_name="locale-archive"
<...>-7629 [006] .... 92891.052467: t1: (do_filp_open+0x0/0x110) dfd=0xffffff9c name="testfile2"
<...>-7629 [006] d... 92891.052475: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xfffffffffffffffe ret_name=(fault)
<...>-7629 [006] d... 92891.052508: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bca700 ret_name="locale.alias"
<...>-7629 [006] d... 92891.052544: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xfffffffffffffffe ret_name=(fault)
<...>-7629 [006] d... 92891.052553: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xfffffffffffffffe ret_name=(fault)
<...>-7629 [006] d... 92891.052559: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bcb200 ret_name="libc.mo"
<...>-7629 [006] d... 92891.052592: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e8bca000 ret_name="gconv-modules.cache"
trace_open.sh-7627 [005] d... 92891.052716: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e488af00 ret_name="enable"
trace_open.sh-7627 [005] d... 92891.070695: t2: (do_sys_openat2+0x201/0x290 <- do_filp_open) ret=0xffff9c04e488b000 ret_name="enable"
可以看出t1只打印了testfile和testfile2,t2全部打印出来了。