Previously I posted instructions for finding the source of a data abort, see Windows CE: Finding the cause of a Data Abort. This will walk through those steps to find the source in a real application for. This is specific to Windows CE and later.
I have this data abort:
AKY=00000005 PC=02c138ac(lan91c111.dll+0x000038ac) RA=02c138a8(lan91c111.dll+0x000038a8) BVA=06000000 FSR=00000007
From this, I can see that it is in lan91c111.dll. Lan91c111.dll is my Ethernet driver. I was just making changes to it, so I could go back and review my changes for hints. But let's find it from the Data Abort output.
- We can also see that the Return Address (RA) is at 0x000038a8
- Subtract 0x1000 to find the Module Offset(MO) of 0x000028a8
- We can now look up the Module Offset in lan91c111.map. Here is a small section of lan91c111.map:
Address Publics by Value Rva+Base Lib:Object
0001:00002780 READ_ETH_USHORT_32BIT_MODE 10003780 f LAN91C111_Init.obj
0001:00002798 WRITE_ETH_USHORT_32BIT_MODE 10003798 f LAN91C111_Init.obj
0001:000027cc WRITE_ETH_CUSTOMIZE_USHORT 100037cc f LAN91C111_Init.obj
0001:00002820 LAN91C_Write16 10003820 f LAN91C111_Init.obj
0001:00002884 LAN91C111_MiniportInitialize 10003884 f LAN91C111_Init.obj
0001:00002fc4 LAN91C111_MiniportISR 10003fc4 f LAN91C111_Intr.obj
Looking at the addresses we find that the MO is between 00002884 LAN91C111_MiniportInitialize and 00002fc4 LAN91C111_MiniportISR. That tells us that the Data Abort occurred in LAN91C111_MiniportInitialize. To calculate the Instruction Offset(IO) subtract the Function Offset(FO) from the Module Offset: 0x00002858 - 0x00002884 = 0x28.
- The file that contains LAN91C111_MiniportInitialize is Lan91C111.c, which we know because of the name of the object file that the function is in. But, what we need is the COD file, which contains the C code as comments mixed with the assembly code that was created when the file was compiled. The COD files are in the same folder that the OBJ files are in. If you don't have the COD files, set WINCECOD=1 and rebuild.
Looking in the COD file, find the function; in this case LAN91C111_MiniportInitialize. This is what mine looks like:
00000 AREA |.rdata| { |??_C@_1BC@ECFINIDN@?$AA?$CK?$AAp?$AAt?$AAr?$AA?5?$AA?$CF?$AAX?$AA?6?$AA?$AA@| }, DATA, READONLY, SELECTION=2 ; comdat any
|??_C@_1BC@ECFINIDN@?$AA?$CK?$AAp?$AAt?$AAr?$AA?5?$AA?$CF?$AAX?$AA?6?$AA?$AA@| DCB "*"
DCB 0x0, "p", 0x0, "t", 0x0, "r", 0x0, " ", 0x0, "%", 0x0, "X"
DCB 0x0, 0xa, 0x0, 0x0, 0x0 ; `string'
; Function compile flags: /Ogsy
00000 AREA |.text| { |LAN91C111_MiniportInitialize| }, CODE, ARM, SELECTION=1 ; comdat noduplicate
00000 |LAN91C111_MiniportInitialize| PROC
; 144 : {
00000 |$L48878|
00000 e92d47f0 stmdb sp!, {r4 - r10, lr}
00004 e24dd054 sub sp, sp, #0x54
00008 |$M48876|
00008 e1a06003 mov r6, r3
0000c e1a04002 mov r4, r2
00010 e1a07001 mov r7, r1
; 145 : NDIS_STATUS Status = NDIS_STATUS_SUCCESS;
; 146 : UINT ArrayIndex;
; 147 : PMINIPORT_ADAPTER Adapter;
; 148 : USHORT temp;
; 149 :
; 150 : LPVOID lpIOBase;
; 151 : BOOL RetVal;
; 152 : DWORD *ptr = NULL;
; 153 : WCHAR szFunctionName[] = L"LAN91C111_MiniportInitialize()";
00014 e59f1720 ldr r1, [pc, #0x720]
00018 e28d0014 add r0, sp, #0x14
0001c e3a0203e mov r2, #0x3E
00020 eb000000 bl memcpy
; 154 :
; 155 :
; 156 : RETAILMSG( 1, (TEXT("*ptr %X\n"), *ptr ));
00024 e3a03000 mov r3, #0
00028 e5931000 ldr r1, [r3]
0002c e59f0704 ldr r0, [pc, #0x704]
00030 eb000000 bl NKDbgPrintfW
The numbers on the left of the assembly code are the Function Offsets, and we can see that at offset 0x28 we have:
00028 e5931000 ldr r1, [r3]
Which is dereferencing an indirect address which we can see is the *ptr in the C code above it:
; 156 : RETAILMSG( 1, (TEXT("*ptr %X\n"), *ptr ));
Now the hard part, why is dereferencing the pointer a problem? In this case, it is because ptr is NULL, but you may need to get out a debugger to find the cause. But at least we now know where the problem is.
In some cases, you may need to start with the Program Counter (PC) insteaad of the Return Address (RA) to find the source of the problem.
Update 10 June 2008
Sometimes the assembly line found is not really the source of the problem. This can be becuase of the CPU instruction pipeline. In the following real problem that I just had, the Module Offset of the data abort was at 0x1C:
; 1047 : BOOL bRet = TRUE; // This will be set to FALSE by an unsuccesful IOCTL call
; 1048 : *lpBytesReturned= 0; // Make sure this is initially zero.
00010 e59d7020 ldr r7, [sp, #0x20]
; 1049 :
; 1050 : RETAILMSG( 1, (TEXT("XXX_IOControl code\n")));
00014 e59f0b00 ldr r0, [pc, #0xB00]
00018 e3a03000 mov r3, #0
0001c e5873000 str r3, [r7]
00020 e3a04001 mov r4, #1
00024 eb000000 bl NKDbgPrintfW
But in this case the actual problem was up a few lines at offset 0x10, the dereference of the pointer lpBytesReturned. The application developer had passed in the value of, rather than the pointer to, the data.
转:http://geekswithblogs.net/BruceEitman/archive/2008/06/02/platform-builder-find-the-source-of-a-data-abort-an.aspx