——从内核态到用户态,从x64到wow64,从汇编到托管
win7x64,故障时新进程画面均无显示。例如ctrl+shift+esc未见任务管理器,win+r输入notepad等也没有出现notepad的界面,但是explorer的操作,crlt+alt+del等是正常的。
偶现但经常出现。
故障时,触发scrolllock蓝屏,分析dump。
查看任务管理器进程的进程地址:
2: kd> !process 0 0 taskmgr.exe
PROCESS fffffa80c8649b10
SessionId: 1 Cid: 127c Peb: 7fffffdd000 ParentCid: 071c
DirBase: 71411000 ObjectTable: fffff8a00af72640 HandleCount: 90.
Image: taskmgr.exe
PROCESS fffffa80c86db340
SessionId: 1 Cid: 046c Peb: 7fffffdc000 ParentCid: 071c
DirBase: 728e9000 ObjectTable: fffff8a009aac710 HandleCount: 90.
Image: taskmgr.exe
这里有两个taskmgr进程。查看他们的每个线程在干什么:
2: kd> !PROCESS fffffa80c8649b10
PROCESS fffffa80c8649b10
SessionId: 1 Cid: 127c Peb: 7fffffdd000 ParentCid: 071c
DirBase: 71411000 ObjectTable: fffff8a00af72640 HandleCount: 90.
Image: taskmgr.exe
VadRoot fffffa80c8b05d90 Vads 85 Clone 0 Private 1247. Modified 1. Locked 0.
DeviceMap fffff8a001579db0
Token fffff8a00af89060
ElapsedTime 00:03:48.304
UserTime 00:00:00.000
KernelTime 00:00:00.000
QuotaPoolUsage[PagedPool] 158768
QuotaPoolUsage[NonPagedPool] 10080
Working Set Sizes (now,min,max) (2788, 50, 345) (11152KB, 200KB, 1380KB)
PeakWorkingSetSize 3184
VirtualSize 84 Mb
PeakVirtualSize 85 Mb
PageFaultCount 3766
MemoryPriority BACKGROUND
BasePriority 13
CommitCharge 1794
THREAD fffffa80ca65e060 Cid 127c.1284 Teb: 000007fffffde000 Win32Thread: fffff900c320b580 WAIT: (UserRequest) UserMode Non-Alertable
fffffa80cac36b40 Mutant - owning thread fffffa80c895c950
Impersonation token: fffff8a00a75f060 (Level Impersonation)
DeviceMap fffff8a001579db0
Owning Process fffffa80c8649b10 Image: taskmgr.exe
Attached Process N/A Image: N/A
Wait Start TickCount 57688 Ticks: 14610 (0:00:03:48.281)
Context Switch Count 52 IdealProcessor: 6 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.015
Win32 Start Address 0x00000000ffd52b9c
Stack Init fffff88002968250 Current fffff88002967da0
Base fffff88002969000 Limit fffff88002961000 Call fffff880029682a0
Priority 14 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5
Kernel stack not resident.
Child-SP RetAddr Call Site
fffff880`02967de0 fffff800`03eda772 nt!KiSwapContext+0x7a
fffff880`02967f20 fffff800`03edcf9f nt!KiCommitThreadWait+0x1d2
fffff880`02967fb0 fffff800`041d1cde nt!KeWaitForSingleObject+0x19f
fffff880`02968050 fffff800`03ed4753 nt!NtWaitForSingleObject+0xde
fffff880`029680c0 00000000`778bbd7a nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`029680c0)
00000000`0018c3e8 fffff800`03eccb10 0x778bbd7a
fffff880`029682a0 00000000`00000000 nt!KiCallUserMode
THREAD fffffa80c8864720 Cid 127c.083c Teb: 000007fffffdb000 Win32Thread: fffff900c3229340 WAIT: (WrUserRequest) UserMode Non-Alertable
fffffa80cb41a280 SynchronizationEvent
Not impersonating
DeviceMap fffff8a001579db0
Owning Process fffffa80c8649b10 Image: taskmgr.exe
Attached Process N/A Image: N/A
Wait Start TickCount 57687 Ticks: 14611 (0:00:03:48.296)
Context Switch Count 1 IdealProcessor: 8 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0x00000000ffd4df20
Stack Init fffff88005e12c70 Current fffff88005e12370
Base fffff88005e13000 Limit fffff88005e0c000 Call 0000000000000000
Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5
Kernel stack not resident.
Child-SP RetAddr Call Site
fffff880`05e123b0 fffff800`03eda772 nt!KiSwapContext+0x7a
fffff880`05e124f0 fffff800`03ed9c8a nt!KiCommitThreadWait+0x1d2
fffff880`05e12580 fffff960`000ef297 nt!KeWaitForMultipleObjects+0x272
fffff880`05e12840 00000000`00000000 win32k!xxxRealSleepThread+0x2ab
2: kd> !object fffffa80cac36b40
Object: fffffa80cac36b40 Type: (fffffa80c8097f30) Mutant
ObjectHeader: fffffa80cac36b10 (new version)
HandleCount: 36 PointerCount: 61
Directory Object: fffff8a007544ba0 Name: DictManager_GlobalLocker
查看0号线程用户态的代码:
2: kd> .PROCESS /r fffffa80c8649b10;.thread fffffa80ca65e060
Implicit process is now fffffa80`c8649b10
Implicit thread is now fffffa80`ca65e060
2: kd> kb
*** Stack trace for last set context - .thread/.cxr resets it
RetAddr : Args to Child : Call Site
fffff800`03eda772 : 00000000`00000110 fffffa80`ca65e060 fffffa80`00000000 ffffffff`ffffffff : nt!KiSwapContext+0x7a
fffff800`03edcf9f : 00000000`00000160 00000000`00000000 fffff683`00000000 00000000`00000000 : nt!KiCommitThreadWait+0x1d2
fffff800`041d1cde : 0000007f`ffffff00 fffffa80`00000006 00000000`00000001 00000000`0018cb00 : nt!KeWaitForSingleObject+0x19f
fffff800`03ed4753 : fffffa80`ca65e060 00000000`ffffffff 00000000`00000000 fffffa80`cac36b40 : nt!NtWaitForSingleObject+0xde
00000000`778bbd7a : 000007fe`fdb110ac 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
000007fe`fdb110ac : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`35000035 : ntdll!NtWaitForSingleObject+0xa
*** ERROR: Symbol file could not be found. Defaulted to export symbols for SOGOUPY.IME -
000007fe`f89451e6 : 00000000`003e2640 00000000`00240031 00000000`00000000 00000000`00000160 : KERNELBASE!WaitForSingleObjectEx+0x79
00000000`003e2640 : 00000000`00240031 00000000`00000000 00000000`00000160 00000000`003e2640 : SOGOUPY!ImeExtension+0x452236
00000000`00240031 : 00000000`00000000 00000000`00000160 00000000`003e2640 000007fe`f8929800 : 0x3e2640
00000000`00000000 : 00000000`00000160 00000000`003e2640 000007fe`f8929800 00000000`c00b0001 : 0x240031
另一个taskmgr的0号线程也是被SOGOUPY.dll所直接请求的fffffa80cac36b40 Mutant
所阻塞。
mutant类似于CreateMutex名字DictManager_GlobalLocker得到的mutex。因为是SOGOUPY直接请求的mutant,可以推测它是SOGOUPY里CreateMutex得到的,未来也是由这个模块自己release的。fffffa80cac36b40 Mutant归属于线程fffffa80c895c950,查看这个线程的栈回溯:
2: kd> !thread fffffa80c895c950
THREAD fffffa80c895c950 Cid 039c.0750 Teb: 000000007ef8e000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
fffffa80ca7d8bf0 SynchronizationEvent
Not impersonating
DeviceMap fffff8a001579db0
Owning Process fffffa80c88b2500 Image: Imclient.exe
Attached Process N/A Image: N/A
Wait Start TickCount 21499 Ticks: 50799 (0:00:13:13.734)
Context Switch Count 3
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0x00000000103483b5
Stack Init fffff880050d0c70 Current fffff880050d07c0
Base fffff880050d1000 Limit fffff880050cb000 Call 0
Priority 8 BasePriority 8 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Kernel stack not resident.
Child-SP RetAddr : Args to Child : Call Site
fffff880`050d0800 fffff800`03eda772 : 00000000`00000110 fffffa80`c895c950 fffffa80`00000000 ffffffff`ffffffff : nt!KiSwapContext+0x7a
fffff880`050d0940 fffff800`03edcf9f : 00000000`000002d8 00000000`00000001 00000000`00000000 00000000`00000004 : nt!KiCommitThreadWait+0x1d2
fffff880`050d09d0 fffff800`041d1cde : 00000000`00000000 0000007f`00000006 00000000`00000001 fffff880`050d0a00 : nt!KeWaitForSingleObject+0x19f
fffff880`050d0a70 fffff800`03ed4753 : fffffa80`c895c950 00000000`007b0138 00000000`00000000 fffffa80`ca7d8bf0 : nt!NtWaitForSingleObject+0xde
fffff880`050d0ae0 00000000`754b2e09 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`050d0ae0)
00000000`05f8eff8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x754b2e09
载入用户态的dll符号:
2: kd> .process /r fffffa80c88b2500
Implicit process is now fffffa80`c88b2500
2: kd> .reload /user
Loading User Symbols
.....
2: kd> .thread fffffa80c895c950
Implicit thread is now fffffa80`c895c950
2: kd> k
*** Stack trace for last set context - .thread/.cxr resets it
Child-SP RetAddr Call Site
fffff880`050d0800 fffff800`03eda772 nt!KiSwapContext+0x7a
fffff880`050d0940 fffff800`03edcf9f nt!KiCommitThreadWait+0x1d2
fffff880`050d09d0 fffff800`041d1cde nt!KeWaitForSingleObject+0x19f
fffff880`050d0a70 fffff800`03ed4753 nt!NtWaitForSingleObject+0xde
fffff880`050d0ae0 00000000`754b2e09 nt!KiSystemServiceCopyEnd+0x13
00000000`05f8eff8 00000000`754b2bf1 wow64cpu!CpupSyscallStub+0x9
00000000`05f8f000 00000000`7552d286 wow64cpu!Thunk0ArgReloadState+0x23
00000000`05f8f0c0 00000000`7552c69e wow64!RunCpuSimulation+0xa
00000000`05f8f110 00000000`778d98ec wow64!Wow64LdrpInitialize+0x42a
00000000`05f8f660 00000000`7789a36e ntdll! ?? ::FNODOBFM::`string'+0x22b74
00000000`05f8f6d0 00000000`00000000 ntdll!LdrInitializeThunk+0xe
用户态显示这是一个32位程序运行在x64的OS中。用.thread /w显示用户态的数据:
2: kd> .thread /w fffffa80c895c950
Implicit thread is now fffffa80`c895c950
The context is partially valid. Only x86 user-mode context is available.
x86 context set
2: kd:x86> .reload /user
Loading User Symbols
.....
Loading Wow64 Symbols
................................................................
........
2: kd:x86> kb
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
060beed8 77a9ebae 000002d8 00000000 00000000 ntdll_77a50000!ZwWaitForSingleObject+0x15
060bef3c 77a9ea92 00000000 00000000 007b0000 ntdll_77a50000!RtlpWaitOnCriticalSection+0x13e
060bef64 77a9ed13 007b0138 71bbccc3 00078000 ntdll_77a50000!RtlEnterCriticalSection+0x150
060bf040 77a83431 000001f8 00000200 00000000 ntdll_77a50000!RtlpAllocateHeap+0x159
060bf0c4 77a87f1c 007b0000 00800000 000001f8 ntdll_77a50000!RtlAllocateHeap+0x23a
060bf110 77a87161 00000010 71bbcd23 067181d4 ntdll_77a50000!RtlpAllocateUserBlock+0xae
060bf1a0 77a7e172 067181d4 00000004 00000000 ntdll_77a50000!RtlpLowFragHeapAllocFromContext+0x802
*** ERROR: Symbol file could not be found. Defaulted to export symbols for SOGOUPY.IME -
060bf214 103593d8 007b0000 00000000 00000004 ntdll_77a50000!RtlAllocateHeap+0x206
WARNING: Stack unwind information not available. Following frames may be wrong.
060bf22c 1033667c 00000004 060bf244 1009f5e8 SOGOUPY!ImeExtension+0x2dc5f8
060bf238 1009f5e8 00000004 060bf260 100ff2be SOGOUPY!ImeExtension+0x2b989c
060bf244 100ff2be 00000001 06718198 066686a0 SOGOUPY!ImeExtension+0x22808
060bf260 103cf743 00000001 067181d4 7ba032c2 SOGOUPY!ImeExtension+0x824de
060bf2b4 103cdb9a 07dcb958 07dcb978 07dcb978 SOGOUPY!ImeExtension+0x352963
060bf398 103d0406 0fe80004 013350bb 00000000 SOGOUPY!ImeExtension+0x350dba
060bf3e0 103cafb8 0fe80004 013350bb 00000000 SOGOUPY!ImeExtension+0x353626
060bf440 103935af 0fe80004 060bf3fc 7ba03e6e SOGOUPY!ImeExtension+0x34e1d8
060bfe18 10392463 06666d10 10392400 06666d10 SOGOUPY!ImeExtension+0x3167cf
060bfe34 1034840c 00810fe0 7ba03e06 00000000 SOGOUPY!ImeExtension+0x315683
060bfe70 770d336a 06666d10 060bfebc 77a89902 SOGOUPY!ImeExtension+0x2cb62c
060bfe7c 77a89902 06666d10 71bbc23f 00000000 KERNEL32!BaseThreadInitThunk+0xe
060bfebc 77a898d5 103483b5 06666d10 00000000 ntdll_77a50000!__RtlUserThreadStart+0x70
060bfed4 00000000 103483b5 06666d10 00000000 ntdll_77a50000!_RtlUserThreadStart+0x1b
这里可以分析出SOGOUPY hook到进程Imclient.exe,要分配堆,堆分配的底层实现要用到一个CriticalSection 7b0138,但是此时cs已被其它占用。这个线程本该是要release一个mutex的,现在它被cs阻塞就无暇释放mutex,而这个mutex就卡住了新进程(诸入taskmgr等)的0号线程。
查看这个CriticalSection的信息:
2: kd:x86> !cs 7b0138
-----------------------------------------
Critical section = 0x007b0138 (+0x7B0138)
DebugInfo = 0x77b54960
LOCKED
LockCount = 0x6
WaiterWoken = No
OwningThread = 0x00000574
RecursionCount = 0x45
LockSemaphore = 0x2D8
SpinCount = 0x00000fa0
这个cs归属于线程id=0x574。回到64地址段,查看该进程的每一个线程,找出574线程:
2: kd:x86> !wow64exts.sw
The context is partially valid. Only x86 user-mode context is available.
Switched to Host mode
2: kd> !process fffffa80c88b2500
PROCESS fffffa80c88b2500
SessionId: 1 Cid: 039c Peb: 7efdf000 ParentCid: 12cc
DirBase: a7bc6000 ObjectTable: fffff8a009e62dc0 HandleCount: 374.
Image: Imclient.exe
VadRoot fffffa80c8284cc0 Vads 263 Clone 0 Private 8997. Modified 15. Locked 0.
DeviceMap fffff8a001579db0
Token fffff8a009e79a30
ElapsedTime 00:13:14.995
UserTime 00:00:00.000
KernelTime 00:00:00.015
QuotaPoolUsage[PagedPool] 600168
QuotaPoolUsage[NonPagedPool] 32780
Working Set Sizes (now,min,max) (15475, 50, 345) (61900KB, 200KB, 1380KB)
PeakWorkingSetSize 15475
VirtualSize 360 Mb
PeakVirtualSize 360 Mb
PageFaultCount 21550
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 12665
Job fffffa80cb7705a0
……
……
THREAD fffffa80c894d5a0 Cid 039c.0574 Teb: 000000007ef8b000 Win32Thread: fffff900c3524c10 WAIT: (UserRequest) UserMode Non-Alertable
fffffa80c92879a0 NotificationEvent
查看574线程的栈回溯:
2: kd> .THREAD /w fffffa80c894d5a0
Implicit thread is now fffffa80`c894d5a0
x86 context set
2: kd:x86> k
*** Stack trace for last set context - .thread/.cxr resets it
# ChildEBP RetAddr
00 0a130798 756a15ce ntdll_77a50000!NtWaitForSingleObject+0x15
01 0a130804 770d1194 KERNELBASE!WaitForSingleObjectEx+0x98
02 0a13081c 770d1148 KERNEL32!WaitForSingleObjectExImplementation+0x75
03 0a130830 6c73d375 KERNEL32!WaitForSingleObject+0x12
04 0a130860 6c73f6b2 mscorwks!ClrWaitForSingleObject+0x24
05 0a130f94 6c7406ab mscorwks!DoFaultReportWorker+0xaf
06 0a130fd4 6c77e3aa mscorwks!DoFaultReport+0x120
07 0a130ff8 6c77d69b mscorwks!WatsonLastChance+0x3f
08 0a1314b4 6c77d6f0 mscorwks!EEPolicy::LogFatalError+0x3ec
09 0a1314cc 6c6f16e9 mscorwks!EEPolicy::HandleFatalError+0x4d
0a 0a1314e4 6c5d96d1 mscorwks!FastNExportSEH+0x7b
0b 0a131510 6c5d9875 mscorwks!CPFH_RealFirstPassHandler+0x664
0c 0a131534 77ab3529 mscorwks!COMPlusFrameHandler+0x15a
0d 0a131558 77ab34fb ntdll_77a50000!ExecuteHandler2+0x26
0e 0a13157c 77ab349c ntdll_77a50000!ExecuteHandler+0x24
0f 0a131608 77a60143 ntdll_77a50000!RtlDispatchException+0x127
10 0a131614 0a131620 ntdll_77a50000!KiUserExceptionDispatcher+0xf
WARNING: Frame IP not in any known module. Following frames may be wrong.
11 0a131ba4 77a83431 0xa131620
12 0a131c28 77a87f1c ntdll_77a50000!RtlAllocateHeap+0x23a
13 0a131c74 77a87161 ntdll_77a50000!RtlpAllocateUserBlock+0xae
14 0a131d04 77a7e172 ntdll_77a50000!RtlpLowFragHeapAllocFromContext+0x802
15 0a131d78 77a88e3d ntdll_77a50000!RtlAllocateHeap+0x206
16 0a131d90 756ad1cf ntdll_77a50000!RtlAllocateAndInitializeSid+0x35
17 0a131dc4 6c83144c KERNELBASE!AllocateAndInitializeSid+0x2c
18 0a131e18 6c831593 mscorwks!ContainsUnmappableChars+0xb1
……
……
栈回溯由3000多行,看上去这个栈是因为某个异常触发,捕获异常后执行RtlAllocateHeap,执行RtlAllocateHeap又触发异常,捕获异常后执行RtlAllocateHeap……。
这个574线程的汇编栈回溯里有mscorwks,可能是c#的托管线程。可以用sos来查看c#代码。要把故障环境的donet文件夹拷贝回来。同时改用32位的windbg程序来分析这个dmp,因为64位的windbg无法载入这个sos.dll。
2: kd:x86> lmDvm mscorwks
Browse full module list
start end module name
00000000`6c530000 00000000`6cadb000 mscorwks (deferred)
Image path: C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll
Image name: mscorwks.dll
Browse all global symbols functions data
Timestamp: Wed Sep 29 11:53:04 2010 (4CA2B820)
CheckSum: 005B5052
ImageSize: 005AB000
File version: 2.0.50727.5420
Product version: 2.0.50727.5420
File flags: 0 (Mask 3F)
File OS: 4 Unknown Win32
File type: 2.0 Dll
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: Microsoft Corporation
ProductName: Microsoft® .NET Framework
InternalName: mscorwks.dll
OriginalFilename: mscorwks.dll
ProductVersion: 2.0.50727.5420
FileVersion: 2.0.50727.5420 (Win7SP1.050727-5400)
FileDescription: Microsoft .NET Runtime Common Language Runtime - WorkStation
LegalCopyright: © Microsoft Corporation. All rights reserved.
Comments: Flavor=Retail
即把故障环境的C:\Windows\Microsoft.NET\Framework\v2.0.50727文件夹(要确保mscorwks.dll是2.0.50727.5420)拷贝回来(如放置在工作机的D:\v2.0.50727),载入里面的sos和mscordacwks
2: kd:x86> .load D:\v2.0.50727\sos.dll
2: kd:x86> .load D:\v2.0.50727\mscordacwks.dll
然后可以用!clrstack
查看c#的栈回溯。不过这里似乎有什么问题……
2: kd:x86> !clrstack
OS Thread Id: 0x0 (2)
Unable to walk the managed stack. The current thread is likely not a
managed thread. You can run !threads to get a list of managed threads in
the process
2: kd:x86> !threads
ThreadCount: 4
UnstartedThread: 0
BackgroundThread: 3
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
1 220 0000000000823328 6020 Enabled 0000000000000000:0000000000000000 000000000081e660 0 STA
2 300 0000000000834298 b220 Enabled 0000000000000000:0000000000000000 000000000081e660 0 MTA (Finalizer)
3 574 0000000007ec4948 180b220 Disabled 0000000000000000:0000000000000000 000000000081e660 2 MTA (GC) (Threadpool Worker) (00000000029d10b4) (nested exceptions)
4 7b8 0000000007ed0c78 80a220 Enabled 0000000000000000:0000000000000000 000000000081e660 0 MTA (Threadpool Completion Port)
新进程无画面是因为0号线程被mutex(DictManager_GlobalLocker)卡住。
输入法SOGOUPY用到的mutex(DictManager_GlobalLocker)无法释放是新进程卡住的直接原因。
无法释放是因为本该由进程Imclient里线程fffffa80c895c950释放,但是它也被CriticalSection=7b0138卡住无法释放。
cs=7b0138本该由Imclient里线程574来释放,但是它出了异常,异常代码的执行中也进入Wait了。
卸载或更新sogou输入法。
让程序Imclient捕获到此类异常后直接崩溃。