现象vxworks 日志中出现如下打印:
A:tmd_check curr_buff 0x2d26ee50 len 14224 extra 0x0 {stack.c:667}
A:tmd_check 0xdeadbeef app 0x2e89e660 {stack.c:641}A:tmd_check {stack.c:651}
A:tmd_check OVERWRITTEN buffer {stack.c:661}
表明内存被改写
1.lkAddr 0x2d26ee50 查看地址附近的内存结构,可知道是数据段内存(存放初始化的全局变量),且靠近gifa_pbLaserExpPwr全局变量
% lkAddr 0x2d26ee50
0x2d26db60 gOACurrentState abs
0x2d26ee60 gifa_pbLaserExpPwr data
0x2d26ee68 gRecord_TP_Info data
0x2d26ee6c gIfaLogModuleInfo data
0x2d27038c gIfaDbgInfo data
0x2d2725f0 neConfigTable.0 bss (local)
0x2d272610 neConfigTable.1 bss (local)
0x2d272630 timeDefects.2 bss (local)
0x2d272638 portDefects.3 bss (local)
0x2d272684 autuDefects.4 bss (local)
0x2d2726a0 timeDefects.5 bss (local)
0x2d2726a8 neConfigTable.6 bss (local)
value = 0 = 0x0
2.打印该越界地址内存
% d 0x 0x2d26ee50 2d26ee50: 2d27 25e0 2d26 db50 0000 1bc8 0000 0000 *-'%.-&.P........* 2d26ee60: 0000 0000 0000 0000 0000 0001 0000 0000 *................* 2d26ee70: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ee80: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ee90: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26eea0: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26eeb0: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26eec0: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26eed0: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26eee0: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26eef0: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ef00: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ef10: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ef20: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ef30: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ef40: 0000 0000 0000 0000 0000 0000 0000 0000 *................* value = 21 = 0x15
2d27 25e0:后一块内存块的首地址 2d26 db5:前一块内存块的首地址 0000 1bc8:内存块总长度(0x1bc8 *2= 14224字节)00 为头结构中的free字节,取值为下面三种,已被破坏。#define BUFFER_USED 0xE8 // wierd numbers to decreaselikelyhood of hitting on accident
#define BUFFER_CACHE 0x7E
#define BUFFER_FREE 0xC6
3.查看尾部调试信息
lptr = (unsigned int *)((char *)hdr + 2*hdr->nWords - hdr->extra * 4 - 8);lptr = 0x2d26ee50 +0x3790 -0 -8 = 0x2d2725d8
0x2d2725d8 -----------content 2d2725d0: dead beef 2e89 e660 * .....`* 2d2725e0: 2d4b c060 2d26 ee50 0012 4d40 e800 0000 *-K.`-&.P..M@....* 2d2725f0: 0100 0000 0000 0000 0000 0000 0000 0000 *................* 2d272600: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d272610: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d272620: 0000 0000 0000 0000 0000 0000 0000 0000 *................*dead beef 溢出检测标志 2e89 e660 调试信息(appid)% ti 0x2e89e660 查看出其任务ID(所属任务为 TM_daemon(所有任务的main_task) 所属内存----无意义,只是说明全局变量(全部由它代为创建内存)
% ti 0x2e89e660 NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY ---------- ------------ -------- --- ---------- -------- -------- ------- ----- TM_daemon 5c7f8 2e89e660 179 PEND+T 2ac978 2e89e338 3d0004 2586 stack: base 0x2e89e660 end 0x2e89abc0 size 14992 high 4736 margin 10256 options: 0xc VX_DEALLOC_STACK VX_FP_TASK VxWorks Events -------------- Events Pended on : Not Pended Received Events : 0x0 Options : N/A r0 = 71728 sp = 2e89e338 r2 = 0 r3 = 0 r4 = 1770 r5 = 2e89e660 r6 = 2029230 r7 = b3 r8 = 12958dc r9 = 4 r10 = 2e89e660 r11 = 8 r12 = 2ac7fc r13 = 0 r14 = 0 r15 = 420000 r16 = 30000 r174.打硬件断点
bh 0x2d26ee50, 0x3, 0, 0, 0挂起了相应任务导致crash,可以判断出是ifa 任务导致crash.
Ifa_init函数
828行 ifx_saveEndTime_moveToNext(&gIfaHandleMsgData);
出问题时的堆栈
task 0x7f95980 ifa stack
[0]0x2d1cb9f0 IfaEpgSllHead+0
[1]0x27e7e58 Ifa_Init+1448
[2]0x5f7bc tmd_TaskEnd+992
[3] 0x3fe09c vxTaskEntry+84
[4]0x0 (UNKNOWN)+0
5.重新打断点并查看相应任务状态,表明是ifa 进行写内存操作。
% i ifa
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
-
--------- ------------ -------- --- ---------- -------- -------- ------- -----
ifa 5f62c 7f95980 228 SUSPEND 27ec3ec 7f937a8 3d0002 0
6.由于第1步就得到相应的符号表,检查相关代码,最后得到 gifa_pbLaserExpPwr 下标取值可能有-1情况。
int gifa_pbLaserExpPwr[2];
/* ***********code **************/
gifa_pbLaserExpPwr [uiFuncPortNo-1]=pPortCfg->sPortCfg.pbLaserExpPwr;
/* ***********code **************/
7.修复后问题正常内存打印
% d 0x2d26ee50 2d26ee50: 2d27 25e0 2d26 db50 0000 1bc8 e800 0000 *-'%.-&.P........* 2d26ee60: ffff ffce 0000 0000 0000 0001 4946 4100 *............IFA.* 2d26ee70: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ee80: 0000 0000 0000 0000 0000 0000 4552 524f *............ERRO* 2d26ee90: 5200 0000 0000 0000 0000 0000 0000 0000 *R...............* 2d26eea0: 0000 0000 0000 0000 0000 0000 0000 0003 *................* 2d26eeb0: 00c8 4572 726f 7220 696e 666f 2e00 0000 *..Error info....* 2d26eec0: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26eed0: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26eee0: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26eef0: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ef00: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ef10: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ef20: 0000 0000 0000 0000 0000 0000 0000 0000 *................* 2d26ef30: 0000 0000 4d53 4700 0000 0000 0000 0000 *....MSG.........* 2d26ef40: 0000 0000 0000 0000 0000 0000 0000 0000 *................* value = 21 = 0x15
e800 0000 被改写过。 全局变量由tm_task 统一分配(编译时已经指定),1块内存可能包含多个全局变量。
相关知识
-> bh address, access, task, count, quiet
access: 0 - instruction,
1 - read/write data,
2 - read data,
3 - write data内存块的基本结构如下:
Big Endian
0x12345678 低地址 高地址 -----------------------------------------> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 12 | 34 | 56 | 78 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+