记一次驱动在win10 1903 x64下的蓝屏调试过程

背景

我们的产品有个驱动,用于做安全防护的,这个驱动已经稳定运行几年了,分32位和64位,虽然中间也偶有蓝屏,但都是新的os系统或大补丁发布后一些硬编码的偏移的问题,改掉这些问题后就好了。

但这次收到用户反馈在1903的系统上会蓝屏,然后我们搭建了两个测试环境,系统都是 win10 x64 1903,I7 CPU的系统会重现蓝屏,另一台E系列的cpu的系统运行正常。

而蓝屏发生时是挂在系统内部函数,不容易定位,同事已经分析调试这个问题好几个日日夜夜了,为了防止1903补丁的大规模推送导致大规模的蓝屏,我们必须尽快解决这蓝屏,否则整个安全组的饭碗可能不保!所以我也放下其他工作一起去调试这个问题。

过程

由于是在系统函数内部挂的,所以一定是我们的程序影响到了系统的什么内容,排除了系统函数参数个数变化导致的栈异常,排除了我们写坏了系统的内核内存等等可能,后来没办法只能通过屏蔽部分代码来缩小异常问题代码范围。

通过删减代码定位下来只要调用了读取系统注册表函数,并对这个功能进行VM加壳就会蓝屏,不加VM壳就正常,所以把问题定位到是VM壳的问题。

加了VM壳后没法调试了,所以只能把断点下在系统函数入口处,检查通过VM处理后调用的参数是不是有异常,看起来也是一切正常。

然后比较加了VM前后进入系统函数后的执行方式,内部调用分支是不是一样,一层一层跟进去,发现也是一样的,没有差异,只是最后读取一块内存时加了VM的版本蓝屏,不加VM的正常。

然后把焦点放在为什么会读取这块内存异常,两次的内存区域看起来也差不多。通过比较异常和正常时那块内存的页属性,奇怪的竟然是一样的,不存在一次内存有效一次内存无效的情况。那为什么两次读取一次异常一次正常呢?

这时开始怀疑是cpu寄存器状态的问题,比如页寄存器、比如中断位这些。通过比较异常和正常时的寄存器差异,发现正常时eflags寄存器的第19位是1,而异常时是0。通过写测试程序验证,最后定位下来是vm壳对eflags的这个标志位处理有问题,而之前的系统一直是没有使用eflags的高位的,从win10 1903开始才使用,这个问题才暴露。

解决

找到了问题修正vm引擎其实没花多少时间,但还是去值得了解下eflags寄存器的第19位是干嘛的。

记一次驱动在win10 1903 x64下的蓝屏调试过程_第1张图片
第19位是VIF位,查询intel手册,这个位是否启用受到CR4寄存器PVI (Protected-Mode Virtual Interrupts)位的控制,大概作用是一个保护模式策略。对这个标记位网站的介绍如下:

20.4 PROTECTED-MODE VIRTUAL INTERRUPTSThe IA-32 processors (beginning with the Pentium processor) also support the VIF and VIP flags in the EFLAGS
register in protected mode by setting the PVI (protected-mode virtual interrupt) flag in the CR4 register. Setting
the PVI flag allows applications running at privilege level 3 to execute the CLI and STI instructions without causing
a general-protection exception (#GP) or affecting hardware interrupts.
When the PVI flag is set to 1, the CPL is 3, and the IOPL is less than 3, the STI and CLI instructions set and clear
the VIF flag in the EFLAGS register, leaving IF unaffected. In this mode of operation, an application running in
protected mode and at a CPL of 3 can inhibit interrupts in the same manner as is described in Section 20.3.2, “Class
2—Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism”, for a
virtual-8086 mode task. When the application executes the CLI instruction, the processor clears the VIF flag. If the
processor receives a maskable hardware interrupt, the processor invokes the protected-mode interrupt handler.
This handler checks the state of the VIF flag in the EFLAGS register. If the VIF flag is clear (indicating that the active
task does not want to have interrupts handled now), the handler sets the VIP flag in the EFLAGS image on the stack
and returns to the privilege-level 3 application, which continues program execution. When the application executes
a STI instruction to set the VIF flag, the processor automatically invokes the general-protection exception handler,
which can then handle the pending interrupt. After handing the pending interrupt, the handler typically sets the VIF
flag and clears the VIP flag in the EFLAGS image on the stack and executes a return to the application program. The
next time the processor receives a maskable hardware interrupt, the processor will handle it in the normal manner
for interrupts received while the processor is operating at a CPL of 3.
As with the virtual mode extension (enabled with the VME flag in the CR4 register), the protected-mode virtual
interrupt extension only affects maskable hardware interrupts (interrupt vectors 32 through 255). NMI interrupts
and exceptions are handled in the normal manner.
When protected-mode virtual interrupts are disabled (that is, when the PVI flag in control register CR4 is set to 0,
the CPL is less than 3, or the IOPL value is 3), then the CLI and STI instructions execute in a manner compatible
with the Intel486 processor. That is, if the CPL is greater (less privileged) than the I/O privilege level (IOPL), a
general-protection exception occurs. If the IOPL value is 3, CLI and STI clear or set the IF flag, respectively.
PUSHF, POPF, IRET and INT are executed like in the Intel486 processor, regardless of whether protected-mode
virtual interrupts are enabled.
It is only possible to enter virtual-8086 mode through a task switch or the execution of an IRET instruction, and it
is only possible to leave virtual-8086 mode by faulting to a protected-mode interrupt handler (typically the general-
protection exception handler, which in turn calls the virtual 8086-mode monitor). In both cases, the EFLAGS
register is saved and restored. This is not true, however, in protected mode when the PVI flag is set and the
processor is not in virtual-8086 mode. Here, it is possible to call a procedure at a different privilege level, in which
case the EFLAGS register is not saved or modified. However, the states of VIF and VIP flags are never examined by
the processor when the CPL is not 3.
参考文献: https://xem.github.io/minix86/manual/intel-x86-and-64-manual-vol3/o_fe12b1e2a880e0ce-996.html

你可能感兴趣的:(技术)