ACPI_BIOS_USING_OS_MEMORY
最近我们的BIOS碰到一个奇怪的bug,最初是插上4G的Memory, BIOS Setup中只能显示3G,后来BIOS修改代码以后总算在Setup menu里面显示出了4G。显示虽然对了可再也进不去OS了,每次都是蓝底白字直接blue screen。死状如下图1所示。
图 1
可能是因为SWPM知道我之前有写过一篇讲述ACPI debug的文章,于是她就请我support让我帮帮分析这个问题。说实话蓝屏分析我实在没经 验,从没分析过L。不过最近也不是很忙,那就当作业练习一下了。架起WinDbg,将1394转PCIE的接口插到板子上(开发板上没有1394接口)。然后让debuggee run,一旦蓝屏WinDbg就会被断下,让我们来看看蓝屏的信息吧。
Waiting to reconnect...
Connected to Windows 6001 x86 compatible target, ptr64 FALSE
Kernel Debugger connection established.
Symbol search path is: D:/Vista32-sp1symbol;D:/websymbols-sp1;C:/WINNT/Symbols
Executable search path is:
Windows Kernel Version 6001 MP (1 procs) Free x86 compatible
Built by: 6001.18000.x86fre.longhorn_rtm.080118-1840
Kernel base = 0x8203d000 PsLoadedModuleList = 0x82154c70
System Uptime: not available
*** Fatal System Error: 0x000000a5
(0x00001000,0x00000000,0xFFFFFF00,0x00000105)
Break instruction exception - code 80000003 (first chance)
A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.
A fatal system error has occurred.
Connected to Windows 6001 x86 compatible target, ptr64 FALSE
Loading Kernel Symbols
........................................
Loading User Symbols
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
Use !analyze -v to get detailed debugging information.
BugCheck A5, {1000, 0, ffffff00, 105}
Probably caused by : acpi.sys ( acpi!MapPhysMem+39 )
Followup: MachineOwner
---------
nt!RtlpBreakWithStatusInstruction:
820f5514 cc int 3
由上面的信息我们知道系统发生了致命的错误错误代码为0x000000a5。Fatal System Error: 0x000000a5而且这个错误应该是由于acpi.sys (acpi!MapPhysMem+39)导致的。能获得的信息就是这么多了。更详细的信息需要输入
0: kd> !analyze –v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
ACPI_BIOS_ERROR (a5)
The ACPI Bios in the system is not fully compliant with the ACPI specification.
The first value indicates where the incompatibility lies:
This bug check covers a great variety of ACPI problems. If a kernel debugger
is attached, use "!analyze -v". This command will analyze the precise problem,
and display whatever information is most useful for debugging the specific
error.
Arguments:
Arg1: 00001000, ACPI_BIOS_USING_OS_MEMORY
ACPI had a fatal error when processing a memory
operation region.The memory operation region tried to
map memory that has been allocated for OS usage.
Arg2: 00000000, The high portion of the physical address
of the memory region.
Arg3: ffffff00, The low portion of the physical address
of the memory region.
Arg4: 00000105, The length of memory being mapped.
Debugging Details:
------------------
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
BUGCHECK_STR: 0xA5
PROCESS_NAME: System
CURRENT_IRQL: 2
LOCK_ADDRESS: 821715e0 -- (!locks 821715e0)
Resource @ nt!PiEngineLock (0x821715e0) Exclusively owned
Threads: 85d28668-01<*>
1 total locks, 1 locks currently held
PNP_TRIAGE:
Lock address : 0x821715e0
Thread Count : 1
Thread address: 0x85d28668
Thread wait : 0x28
LAST_CONTROL_TRANSFER: from 8210a2d7 to 820f5514
STACK_TEXT:
8039920c 8210a2d7 00000003 bac8867b 00000000 nt!RtlpBreakWithStatusInstruction
8039925c 8210adbd 00000003 00000105 ffffff00 nt!KiBugCheckDebugBreak+0x1c
80399628 8210a163 000000a5 00001000 00000000 nt!KeBugCheck2+0x66d
8039964c 80627376 000000a5 00001000 00000000 nt!KeBugCheckEx+0x1e
80399674 80627f9b ffffff00 00000105 85d60e58 acpi!MapPhysMem+0x39
80399690 8062aba1 85d5f000 ffffff00 00000105 acpi!MapUnmapPhysMem+0x2f
803996b4 8062f4cb 85d5f000 00000000 00008000 acpi!OpRegion+0xcb
803996d0 806287dc 85d5f000 85d60e58 00000000 acpi!ParseTerm+0x14d
803996f8 80629c75 00000000 00000000 806374b8 acpi!RunContext+0x65
80399710 80629d40 85d5f000 00000000 806374b8 acpi!InsertReadyQueue+0xa7
8039972c 8062991b 85d5f000 00000000 0000205b acpi!RestartContext+0x27
80399768 80623af9 85d5f000 86121380 85d5f000 acpi!SyncLoadDDB+0xde
8039977c 8063c4b9 ffd19010 8039979c 80636fc4 acpi!AMLILoadDDB+0x66
80399794 8063c521 86a972b0 00000000 8200df60 acpi!ACPIInitializeDDB+0x37
803997b0 8063c645 8200dec0 80637b60 80637d00 acpi!ACPIInitializeDDBs+0x47
803997c4 8061b43c 86a97bf8 86a97a38 00000000 acpi!ACPIInitialize+0xe9
803997f4 80640f7c 86a97bf8 86a7e3b0 80640ec8 acpi!ACPIInitStartACPI+0x6a
80399820 80616e4b 86a97bf8 86a7e3b0 86a7e3b0 acpi!ACPIRootIrpStartDevice+0xb4
80399850 820f9053 86a97bf8 86a97a38 803998cc acpi!ACPIDispatchIrp+0xff
80399868 821a1605 00000000 85d104c0 86a97878 nt!IofCallDriver+0x63
80399884 8204912a 803998a8 82048f47 86a97878 nt!PnpAsynchronousCall+0x96
803998d0 821a24f6 82048f47 86a97878 86a7edc8 nt!PnpStartDevice+0xb7
8039992c 821a23b1 86a97878 00000012 00000000 nt!PnpStartDeviceNode+0x13a
80399948 8219f4db 00000000 00000000 8216f530 nt!PipProcessStartPhase1+0x65
80399b44 820489e8 85d58260 00000000 80399b88 nt!PipProcessDevNodeTree+0x187
80399b9c 820488f8 00000000 86a7e5d0 833fa3b8 nt!PnpDeviceActionWorker+0xde
80399bb8 82396078 00000000 00000007 00000000 nt!PnpRequestDeviceAction+0x127
80399c34 82398ff1 808108c4 8080e430 00000000 nt!IopInitializeBootDrivers+0x3b0
80399c94 8239ccb3 808108c4 85d28990 85d28668 nt!IoInitSystem+0x5af
80399d74 82195af1 80399dc0 82212a1c 808108c4 nt!Phase1InitializationDiscard+0xb86
80399d7c 82212a1c 808108c4 bac889e7 00000000 nt!Phase1Initialization+0xd
80399dc0 8206ba3e 82195ae4 808108c4 00000000 nt!PspSystemThreadStartup+0x9d
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16
STACK_COMMAND: kb
FOLLOWUP_IP:
acpi!MapPhysMem+39
80627376 6a00 push 0
SYMBOL_STACK_INDEX: 4
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: acpi
IMAGE_NAME: acpi.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 47918b80
SYMBOL_NAME: acpi!MapPhysMem+39
FAILURE_BUCKET_ID: 0xA5_acpi!MapPhysMem+39
BUCKET_ID: 0xA5_acpi!MapPhysMem+39
Followup: MachineOwner
---------
0: kd> lmvm acpi
start end module name
8060e000 80654000 acpi (pdb symbols) C:/WINNT/Symbols/sys/acpi.pdb
Loaded symbol image file: acpi.sys
Image path: /SystemRoot/system32/drivers/acpi.sys
Image name: acpi.sys
Timestamp: Fri Jan 18 21:32:48 2008 (47918B80)
CheckSum: 00041A1F
ImageSize: 00046000
File version: 6.0.6001.18000
Product version: 6.0.6001.18000
File flags: 0 (Mask 3F)
File OS: 40004 NT Win32
File type: 3.7 Driver
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: Microsoft Corporation
ProductName: Microsoft? Windows? Operating System
InternalName: ACPI.sys
OriginalFilename: ACPI.sys
ProductVersion: 6.0.6001.18000
FileVersion: 6.0.6001.18000 (longhorn_rtm.080118-1840)
FileDescription: ACPI Driver for NT
LegalCopyright: ? Microsoft Corporation. All rights reserved.
这次信息比较详细了,给出了call stack以及acpi.pdb这个符号文件的信息了经过分析我得出了导致蓝屏的主要原因如下:最直接的原因是acpi!MapPhysMem+39该处的代码导致的,而该代码会出错又是因为BIOS中的ACPI memory map出错,OS使用的memory被BIOS占用了。原因大概明了了,那么到底是哪段BIOS code出了问题了呢?静下心来,我开始努力搜索脑海中的记忆,既然是ACPI出错那么应该是在asl code中,我记得以前看过的 asl code中当需要使用系统资源的时候通常要声OperationRegion比如我需要使用一些IO资源那么我要这么写:
OperationRegion( IO_, SystemIO, 0x62, 5 )
如需要使用SystemMemory则需要下面的写法:
OperationRegion(XPEX, SystemMemory, 0xE0020100, 0x100)
bug analysis info报出如下错误:
Arguments:
Arg1: 00001000, ACPI_BIOS_USING_OS_MEMORY
ACPI had a fatal error when processing a memory
operation region.The memory operation region tried to
map memory that has been allocated for OS usage.
Arg2: 00000000, The high portion of the physical address
of the memory region.
Arg3: ffffff00, The low portion of the physical address
of the memory region.
Arg4: 00000105, The length of memory being mapped.
从字面上看我觉得应该是在asl code中声明一段System Memory然后acpi.sys这支driver解析该段System Memory的时候出错了。么到底是哪一段code导致的呢?arg3、arg4道出了天机。应该是像下面写法的一段code导致的:
OperationRegion(???, SystemMemory, 0xffffff00, 0x105)
那么我就开始搜索asl code 最终在发现了罪犯的踪迹,下面的一段aslcode存在重大作案嫌疑:
Scope(/) { OperationRegion(ATFB,SystemMemory,0xFFFFFF00,0x105)// Relocatable operationRegion. Field(ATFB,AnyAcc,NoLock,Preserve) // Field { BCMD,8, DID,32, INFO,2048, } }
BIOS拿掉这段code以后,板子工作正常了,愉快的进入了OS。Bug是解掉了,可是到底为什么这么声明一段区域会导致错误呢?BIOS给出的解释是这部分code没有被用到,可是它和BIOS声明给OS的资源在地址上有冲突。于是就蓝屏了J。
Peter