use after free 引起KE

问题背景:

待机状态下,按Power键或者自动进入休眠,必现KE。

分析过程:

取出mtklog看到有db产生,确实发生了KE(kernel exception),取出db和vmlinux (必须是和当前软件是同一次编译的)后,使用GAT工具解开db,取出SYS_MINI_RDUMP,使用 gdb调试:

$ arm-linux-androideabi-gdb vmlinux SYS_MINI_RDUMP
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-linux-gnu --target=arm-linux-android".
For bug reporting instructions, please see:
...
Reading symbols from KE/vmlinux...done.
[New LWP 103]
[New LWP 1]
[New LWP 2]
[New LWP 3]
[New LWP 4]
[New LWP 5]
[New LWP 6]
[New LWP 7]
[New LWP 8]
Core was generated by `console=tty0 console=ttyMT0,921600n1 root=/dev/ram vmalloc=496M slub_max_order='.
#0 0xd9027400 in ?? ()
(gdb) bt
#0 0xd9027400 in ?? ()
#1 0xc00985c0 in early_suspend (work=) at kernel-3.10/kernel/power/earlysuspend.c:144
#2 0xc0042b60 in process_one_work (worker=0xdaf6d000, work=0xc0e056a0 )
at kernel-3.10/kernel/workqueue.c:2216
#3 0xc0043098 in worker_thread (__worker=0xd90273dc) at kernel-3.10/kernel/workqueue.c:2348
#4 0xc004c0bc in kthread (_create=0xdbc69df0) at kerne
l-3.10/kernel/kthread.c:200
#5 0xc000f308 in ret_from_fork () at kernel-3.10/arch/
arm/kernel/entry-common.S:91
#6 0xc000f308 in ret_from_fork () at kernel-3.10/arch/
arm/kernel/entry-common.S:91
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) f 1
#1 0xc00985c0 in early_suspend (work=) at kernel-3.10/kernel/power/earlysuspend.c:144
144 pos->suspend(pos);
(gdb) list
139 if (pos->suspend != NULL) {
140 if (!(forbid_id & (0x1 << count))) {
141 /* if (earlysuspend_debug_mask & DEBUG_VERBOSE) */
142 pr_warn("ES handlers %d: [%pf], level: %d\n", count, pos->suspend,
143 pos->level);
144 pos->suspend(pos);
145 }
146 count++;
147 }
148 }

一套命令下来就知道问题发生在发现是死在了 kernel-3.10/kernel/power/earlysuspend.c:144 位置,而且是跑飞(#0的地址0xd9027400不是有效的代码地址),查看具体挂掉的原因,打印汇编代码:

(gdb)disas

0xc009859c <+348>: b 0xc00984e8

0xc00985a0 <+352>: ldr r3, [r4, #8]
0xc00985a4 <+356>: mov r1, r6
0xc00985a8 <+360>: movw r0, #15956 ; 0x3e54
0xc00985ac <+364>: movt r0, #49338 ; 0xc0ba
0xc00985b0 <+368>: bl 0xc09849d8
0xc00985b4 <+372>: ldr r3, [r4, #12]
0xc00985b8 <+376>: mov r0, r4
0xc00985bc <+380>: blx r3
=> 0xc00985c0 <+384>: b 0xc0098550
0xc00985c4 <+388>: addsgt r3, r10, r8, lsr sp
0xc00985c8 <+392>: rscgt r5, r0, r8, asr #12
End of assembler dump.
(gdb) info reg
r0 0xd90273dc 3640816604
r1 0x60785a 6322266
r2 0x0 0
r3 0xd90273e8 3640816616
r4 0xd90273dc 3640816604
r5 0xc0f08180 3236987264
r6 0xc 12
r7 0xc0e05648 3235927624
r8 0xdbc54400 3687138304
r9 0xdafcc030 3673997360
r10 0xdb70b6d8 3681597144
r11 0xdafcddec 3674004972
r12 0xc10af654 3238721108
sp 0xdafcddd0 0xdafcddd0
lr 0xc00985c0 3221849536
pc 0xc00985c0 0xc00985c0
cpsr 0x200b0013 537591827

可以很明确看到=>之上的汇编指令(根据AAPCS,非#0帧的地址都是返回地址,是还没执行过的指令,而前一条汇编指令则是函数跳转指令)BLX正是函数跳转指令。肯定是pos->suspend错误了。那这个suspend是哪个驱动通过register_early_suspend()注册的呢?现有db里的信息无法知道,需要添加log才行。

先看下pos在内存中的位置:

(gdb) p pos

$2 = (struct early_suspend *) 0xd90273dc

现在要做的就是要想办法把这个是哪里来得给抓出来,分析earlysuspend.c 内核代码 early_suspend 函数,pos是从early_supend_handers这个链表得来的,所以看下early_supend_handers是在哪里加载的,追踪代码会发现追踪是在 register_early_suspend 函数注册的,所以应该只要在这个函数加打印,再抓开机log就可以把错误的pos->suspend 揪出来!打印信息如下:

void register_early_suspend(struct early_suspend *handler)
{
    struct list_head *pos;
    mutex_lock(&early_suspend_lock);
    list_for_each(pos, &early_suspend_handlers) {
        struct early_suspend *e;
        e = list_entry(pos, struct early_suspend, link);
        if (e->level > handler->level)
            break;
    }
    printk("#---^_^-->>>[%s],[0x%lx],[0x%lx],[%pf]\n", __func__, (long)handler, (long)&handler->suspend, handler->suspend);
    list_add_tail(&handler->link, pos);
    early_suspend_count++;
    if ((state & SUSPENDED) && handler->suspend)
        handler->suspend(handler);
    mutex_unlock(&early_suspend_lock);
}

编译、烧机,复现问题(因为有加打印代码改动,所以有可能需要重新复现问题抓db再使用gdb调试打印 pos->suspend 的值于开机log做对比),抓开机log搜索:0xd90273e8:

[ 21.423978]<2>.(1)[1:swapper/0]#---^_^-->>>[register_early_suspend],[0xd90273dc],[0xd90273e8],[gfx1xm_early_suspend]

现在就很明显了,凶手就是 gfx1xm_early_suspend()函数,KE问题锁定在gfx1xm驱动范围内!
查看gfx1xm驱动代码发现xxx_probe()里面有register_early_suspend()函数:

gfx1xm_dev->early_fp.level = EARLY_SUSPEND_LEVEL_DISABLE_FB - 1,
gfx1xm_dev->early_fp.suspend = gfx1xm_early_suspend,
gfx1xm_dev->early_fp.resume = gfx1xm_late_resume,
register_early_suspend(&gfx1xm_dev->early_fp);

而复现问题的情况是,当gfx1xm设备不存在的时候才会出现,而在设备remove后会释放掉gfx1xm_dev结构体,导致了注册的suspend的结构体内存也变为空闲,很可能被其他模块申请去使用了,这个就是典型的use after free的情况。

正确的解法是:设备不存在的时候应该remove掉这个reigster,而当前的驱动xxx_remove()里面却没有对应的反注册函数导致了此问题!

这样一分析,那么解决问题就简单了,直接在退出函数加上反注册函数就行了:

static int gfx1xm_remove(struct spi_device *spi)
{
struct gfx1xm_dev *gfx1xm_dev = spi_get_drvdata(spi);
FUNC_ENTRY();
klog("unregister_early_suspend(&gfx1xm_dev->early_fp)\n");
unregister_early_suspend(&gfx1xm_dev->early_fp);
......
}

编译、烧机,开机 验证ok。

根本原因:

use after free。

驱动suspend结构体在被remove掉后被释放,但由于没有反注册suspend,导致early suspend过程中使用了这个结构体引起KE。

解决方法:

在remove()里添加反注册suspend:unregister_early_suspend(&gfx1xm_dev->early_fp);

你可能感兴趣的:(use after free 引起KE)