通过 QEMU 打开学习 Linux kernel 的新世界

最近面试了一位同学,在校期间对 QEMU 进行了二次开发,实现了设备建模、故障注入、信息提取功能。面试结束后感觉挺受启发,所以自己又多了解了一下 QEMU,发现可以基于 QEMU debug Linux kernel,打开了一个学习 kernel 的新世界。

1. 编译可 debug 的 Linux kernel

https://www.kernel.org/

下了最新版本 linux-5.13.12.tar.xz

配置

通过 TUI 界面配置:

~/linux-5.13.12$ make menuconfig

开启 debug 相关的选项:

Kernel hacking  --->
  Compile-time checks and compiler options  --->
    [*] Compile the kernel with debug info
    [*]   Provide GDB scripts for kernel debugging

关闭内核地址空间布局随机:

Processor type and features  --->
  [ ]   Randomize the address of the kernel image (KASLR)

关闭签名检查:

-*- Cryptographic API  --->
  Certificates for signature checking  --->
    (debian/canonical-certs.pem) Additional X.509 keys for default system keyring

编译

开始编译(很慢......)

~/linux-5.13.12$ make -j$(nproc)

2. 构建 initramfs

initramfs 是什么?

initramfs 是一种以 cpio 格式压缩的根文件系统,它通常和 Linux 内核文件 vmlinuz 一起被打包成 boot.img 作为启动镜像。

为什么需要 initramfs?

内核启动的时候非常矛盾, boot loader 加载完内核文件 vmlinuz 后,内核紧接着需要挂载磁盘根文件系统,但如果此时内核没有相应驱动,无法识别磁盘,就需要先加载驱动,而驱动又位于 /lib/modules,得挂载根文件系统才能读取,这就陷入了一个两难境地,系统无法顺利启动。于是有了 initramfs 根文件系统,其中包含必要的设备驱动和工具,boot loader 加载 initramfs 到内存中,内核会将其挂载到根目录 /,然后运行 /init 脚本,挂载真正的磁盘根文件系统,最后通过 exec chroot . /sbin/init 命令来将根目录切换到挂载了实际磁盘文件系统中,并执行 /sbin/init 程序来启动系统中的其他进程和服务。

这里借助 BusyBox 构建极简 initramfs,提供基本的用户态可执行程序。

https://www.busybox.net/

下了最新版本 busybox-1.33.1.tar.bz2

配置

编译成静态链接:

Settings  --->
  [*] Build static binary (no shared libs)

编译

安装到 _install 目录:

make -j$(nproc) && make install

制作 initramfs

cd ./_install
mkdir dev 
sudo mknod dev/console c 5 1
sudo mknod dev/ram b 1 0 
touch init
chmod +x init

init 初始化脚本的内容为:

#!/bin/sh
echo "INIT SCRIPT"
mkdir /proc
mkdir /sys
mount -t proc none /proc
mount -t sysfs none /sys
mkdir /tmp
mount -t tmpfs none /tmp
mknod -m 666 /dev/ttyS0 c 4 64
echo -e "\nThis boot took $(cut -d' ' -f1 /proc/uptime) seconds\n"
setsid cttyhack /bin/sh
exec /bin/sh

打包成 cpio 格式:

find . -print0 | cpio --null -ov --format=newc | gzip -9 >  ../initramfs.cpio.gz

3. QEMU 启动一下试试

qemu-system-x86_64 \
  -s \
  -kernel ~/linux-5.13.12/arch/x86_64/boot/bzImage \
  -initrd  ~/busybox-1.33.1/initramfs.cpio.gz \
  --append "nokaslr root=/dev/ram init=/init console=ttyS0" \
  -nographic 

成功启动:

......
[    2.516738] Run /init as init process
INIT SCRIPT
[    2.543858] tsc: Refined TSC clocksource calibration: 2499.989 MHz
[    2.545084] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x24092f7b636, max_idle_ns: 440795303187 ns
[    2.547960] clocksource: Switched to clocksource tsc

This boot took 2.55 seconds

/ # uname -a
Linux (none) 5.13.12 #3 SMP Thu Aug 26 15:20:22 CST 2021 x86_64 GNU/Linux
/ #

4. 如何 debug 呢?

首先在 ~/linux-5.13.12/.gdbinit 中添加:

    add-auto-load-safe-path ~/linux-5.13.12/scripts/gdb/vmlinux-gdb.py

~/linux-5.13.12/目录下,通过 gdb 连上去:

:~$ gdb vmlinux
(gdb) target remote:1234
Remote debugging using :1234
0xffffffff81bf73ee in native_safe_halt () at ./arch/x86/include/asm/irqflags.h:51
51      asm volatile("sti; hlt": : :"memory");
(gdb)

设置个断点:

((gdb) b cmdline_proc_show
Note: breakpoint 1 also set at pc 0xffffffff813d2140.
Breakpoint 2 at 0xffffffff813d2140: file fs/proc/cmdline.c, line 8.
(gdb) c
Continuing.

在 QEMU 虚拟机中执行 cat /proc/cmdline 就会触发:

(gdb) c
Continuing.

Breakpoint 1, cmdline_proc_show (m=0xffff888005253690, v=0x1 ) at fs/proc/cmdline.c:8
8   {
(gdb) bt
#0  cmdline_proc_show (m=0xffff888005253690, v=0x1 ) at fs/proc/cmdline.c:8
#1  0xffffffff81349a50 in seq_read_iter (iocb=0xffffc90000603d28, iter=0xffffc90000603d00) at fs/seq_file.c:230
#2  0xffffffff813c755e in proc_reg_read_iter (iocb=, iter=) at fs/proc/inode.c:300
#3  0xffffffff81358897 in call_read_iter (file=0xffff888004a55600, iter=0xffffc90000603d00, kio=0xffffc90000603d28)
    at ./include/linux/fs.h:2108
#4  generic_file_splice_read (in=0xffff888004a55600, ppos=0xffffc90000603de8, pipe=, len=,
    flags=) at fs/splice.c:311
#5  0xffffffff81358d61 in do_splice_to (in=in@entry=0xffff888004a55600, ppos=ppos@entry=0xffffc90000603de8,
    pipe=pipe@entry=0xffff888005207f00, len=65536, len@entry=16777216, flags=flags@entry=0) at fs/splice.c:796
#6  0xffffffff81358e4c in splice_direct_to_actor (in=in@entry=0xffff888004a55600, sd=sd@entry=0xffffc90000603e30,
    actor=actor@entry=0xffffffff813591d0 ) at fs/splice.c:870
#7  0xffffffff81359049 in do_splice_direct (in=in@entry=0xffff888004a55600, ppos=ppos@entry=0xffffc90000603eb0,
    out=out@entry=0xffff888004a55200, opos=opos@entry=0xffffc90000603eb8, len=len@entry=16777216, flags=flags@entry=0)
    at fs/splice.c:979
#8  0xffffffff8131b183 in do_sendfile (out_fd=out_fd@entry=1, in_fd=in_fd@entry=3, ppos=ppos@entry=0x0 ,
    count=count@entry=16777216, max=2147483647, max@entry=0) at fs/read_write.c:1260
#9  0xffffffff8131b796 in __do_sys_sendfile64 (count=16777216, offset=0x0 , in_fd=3, out_fd=1)
    at fs/read_write.c:1325
#10 __se_sys_sendfile64 (count=16777216, offset=0, in_fd=3, out_fd=1) at fs/read_write.c:1311
#11 __x64_sys_sendfile64 (regs=) at fs/read_write.c:1311
#12 0xffffffff81be4c80 in do_syscall_64 (nr=, regs=0xffffc90000603f58) at arch/x86/entry/common.c:47
#13 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:112
#14 0x0000000000000000 in ?? ()

5. 我想 debug epoll,还需要配置什么?

需要配置一下网络,这里我配置了 host-only 模式。

首先在宿主机上设置网桥和 TAP

设置网桥

sudo ip link add br0 type bridge

为网桥设置 ip

sudo ip addr add 192.168.100.50/24 brd 192.168.100.255 dev br0

创建 TAP 接口

sudo ip tuntap add mode tap user $(whoami)
ip tuntap show

将 TAP 接口添加到网桥

sudo ip link set tap0 master br0

启起来

sudo ip link set dev br0 up
sudo ip link set dev tap0 up

然后配置 busybox 的网络

添加网卡驱动 e1000

cd ./_install
mkdir -p lib/modules
cp ~/linux-5.13.12/drivers/net/ethernet/intel/e1000/e1000.ko lib/modules/

init 脚本中加载驱动,并且配置网络


insmod /lib/modules/e1000.ko
ifconfig lo 127.0.0.1 netmask 255.0.0.0 up
ifconfig eth0 192.168.100.224 netmask 255.255.255.0 broadcast 192.168.100.255 up

重新打包 initramfs

最后在 QEMU 启动时配置网络设备

-netdev tap,ifname=tap0,id=tap0,script=no,downscript=no -device e1000,netdev=tap0

6. debug 一下 epoll 的回调

一直对 epoll 的回调不太清楚,有了这个工具,debug 一下。

设置断点 ep_poll_callback

#0  ep_poll_callback (wait=0xffff88800368a410, mode=1, sync=16, key=0xc3) at fs/eventpoll.c:1128
#1  0xffffffff810fe25e in __wake_up_common (wq_head=wq_head@entry=0xffff8880046dbac0, mode=mode@entry=1,
    nr_exclusive=nr_exclusive@entry=1, wake_flags=wake_flags@entry=16, key=key@entry=0xc3, bookmark=bookmark@entry=0xffffc90000003970)
    at kernel/sched/wait.c:108
#2  0xffffffff810fe39c in __wake_up_common_lock (wq_head=0xffff8880046dbac0, mode=mode@entry=1, nr_exclusive=nr_exclusive@entry=1,
    wake_flags=wake_flags@entry=16, key=key@entry=0xc3) at kernel/sched/wait.c:138
#3  0xffffffff810fe690 in __wake_up_sync_key (wq_head=, mode=mode@entry=1, key=key@entry=0xc3)
    at kernel/sched/wait.c:205
#4  0xffffffff81993a9b in sock_def_readable (sk=0xffff88800576e900) at net/core/sock.c:2925
#5  0xffffffff81a72c1f in tcp_data_ready (sk=sk@entry=0xffff88800576e900) at net/ipv4/tcp_input.c:4938
#6  0xffffffff81a73f64 in tcp_data_queue (sk=sk@entry=0xffff88800576e900, skb=skb@entry=0xffff888005778800)
    at net/ipv4/tcp_input.c:5008
#7  0xffffffff81a74760 in tcp_rcv_established (sk=sk@entry=0xffff88800576e900, skb=skb@entry=0xffff888005778800)
    at net/ipv4/tcp_input.c:5896
#8  0xffffffff81a82718 in tcp_v4_do_rcv (sk=sk@entry=0xffff88800576e900, skb=skb@entry=0xffff888005778800) at net/ipv4/tcp_ipv4.c:1694
#9  0xffffffff81a84c1d in tcp_v4_rcv (skb=0xffff888005778800) at net/ipv4/tcp_ipv4.c:2077
#10 0xffffffff81a524a0 in ip_protocol_deliver_rcu (net=0xffffffff828e44c0 , skb=0xffff888005778800,
    protocol=) at net/ipv4/ip_input.c:204
#11 0xffffffff81a52668 in ip_local_deliver_finish (net=, sk=, skb=)
    at ./include/linux/skbuff.h:2539
#12 0xffffffff81a526f2 in NF_HOOK (sk=0x0 , pf=2 '\002', hook=1, in=, out=0x0 ,
    okfn=0xffffffff81a52620 , skb=0xffff888005778800, net=0xffffffff828e44c0 )
    at ./include/linux/netfilter.h:301
#13 NF_HOOK (pf=2 '\002', sk=0x0 , out=0x0 , okfn=0xffffffff81a52620 ,
    in=, skb=0xffff888005778800, net=0xffffffff828e44c0 , hook=1) at ./include/linux/netfilter.h:295
#14 ip_local_deliver (skb=0xffff888005778800) at net/ipv4/ip_input.c:252
#15 0xffffffff81a528b9 in dst_input (skb=) at ./include/linux/skbuff.h:975
#16 ip_sublist_rcv_finish (head=head@entry=0xffffc90000003c48) at net/ipv4/ip_input.c:551
#17 0xffffffff81a52a3f in ip_list_rcv_finish (sk=0x0 , head=0xffffc90000003cd8, net=0xffffffff828e44c0 )
    at net/ipv4/ip_input.c:601
#18 ip_sublist_rcv (head=head@entry=0xffffc90000003cd8, dev=dev@entry=0xffff8880035ee000, net=net@entry=0xffffffff828e44c0 )
    at net/ipv4/ip_input.c:609
#19 0xffffffff81a52cb4 in ip_list_rcv (head=0xffffc90000003d50, pt=, orig_dev=)
    at net/ipv4/ip_input.c:644
#20 0xffffffff819b8e88 in __netif_receive_skb_list_ptype (orig_dev=0xffff8880035ee000, pt_prev=0xffffffff8296aa80 ,
    head=0xffffc90000003d50) at net/core/dev.c:5505
#21 __netif_receive_skb_list_core (head=head@entry=0xffff8880035eecf8, pfmemalloc=pfmemalloc@entry=false) at net/core/dev.c:5550
#22 0xffffffff819b9051 in __netif_receive_skb_list (head=0xffff8880035eecf8) at net/core/dev.c:5602
#23 netif_receive_skb_list_internal (head=head@entry=0xffff8880035eecf8) at net/core/dev.c:5712
#24 0xffffffff819b922e in gro_normal_list (napi=napi@entry=0xffff8880035eebf0) at net/core/dev.c:5866
#25 0xffffffff819ba091 in gro_normal_list (napi=0xffff8880035eebf0) at net/core/dev.c:6588
#26 napi_complete_done (n=0xffff8880035eebf0, work_done=) at net/core/dev.c:6588
#27 0xffffffffa000421a in ?? ()
#28 0xffffffff810c30d2 in __rcu_read_unlock () at ./include/linux/rcupdate.h:710
#29 rcu_read_unlock () at ./include/linux/rcupdate.h:710
#30 __queue_work (cpu=91230208, wq=0x40, work=0xffff8880035eebf0) at kernel/workqueue.c:1502
#31 0xffffffff819ba5d1 in __napi_poll (n=n@entry=0xffff8880035eebf0, repoll=repoll@entry=0xffffc90000003f1f) at net/core/dev.c:7008
#32 0xffffffff819baaaf in napi_poll (repoll=0xffffc90000003f30, n=0xffff8880035eebf0) at net/core/dev.c:7075
#33 net_rx_action (h=) at net/core/dev.c:7162
#34 0xffffffff81e000e0 in __do_softirq () at kernel/softirq.c:559
#35 0xffffffff810a91a4 in invoke_softirq () at kernel/softirq.c:433
#36 __irq_exit_rcu () at kernel/softirq.c:637
#37 irq_exit_rcu () at kernel/softirq.c:649
#38 0xffffffff81be611d in common_interrupt (regs=0xffffffff82603d68, error_code=) at arch/x86/kernel/irq.c:240
#39 0xffffffff81c00cde in asm_common_interrupt () at ./arch/x86/include/asm/idtentry.h:629
#40 0x0000000000000000 in ?? ()

调用栈一目了然。

你可能感兴趣的:(通过 QEMU 打开学习 Linux kernel 的新世界)