设置vxWorks硬件断点调试

In VxWorks 5.5 shell, we could use the following tool to set hardware breakpoint:
-> bh   address, access, task, count, quiet
access: 0 - instruction, 
             1 - read/write data, 
             2 - read data, 
             3 - write data
For example, if you want to monitor the data write to the address 0x27b5600, you could use:
-> bh   0x27b5600, 3, 0, 0, 0
When any tasks try to write data to the address 0x27b5600, it will break and the related task will be suspended.

Here is an example on how to debug stack overflow using the hardware breakpoint. It is related to an IPv6 CR, which is good for demonstration.
---------------------
1. Background
---------------------
In IPv6, when an interface is configured with a new address, the switch would send out a NS message to determine if the given address has been used by another switch.
If yes, the switch would get a response NA message, then it would give up the given address. This process is called DAD(duplicate address detection). DAD is performed for both IPv6 management interface and the other general IPv6 interfaces.

----------------
2. Problem
----------------
When the tester assigns the duplicate IPv6 management address on the different switches, she gets the following error message:
SW WARNING checkStack: task: 2 tid: 0x27699a8 name: tNetTask size: 9984 cur: 248 high: 9984 margin: 0
It means that the task tNetTask is overflow or is corrupted in the processing of the incoming DAD NA message.

----------------------
3. Investigation
----------------------
This issue might be caused by stack overflow or corruption, we need reproduce it and analyze the stack information.
Step (1): Make the related tasks breakable. Since the tNetTask is overflow in this case, we make it first.
In the shell, run the following command:
-> taskOptionsSet(tNetTask, 7, 5)

Step (2): Select the address to be monitored.
We need select an address in the stack of tNetTask as the one to be monitored. 
In the shell, we could use the following command to get some general stack information of the task tNetTask.
-> ti tNetTask
---------------------------------------------------------------------------------------------------------------
NAME      ENTRY    TID       PRI   STATUS   PC       SP     ERRNO  DELAY
-------------  -----------   --------     ----    ------------   -------    --------     -------     -----
tNetTask  netTask  2692518  50    READY  1423c0  2692420    0         0

stack: base 0x2692518  end 0x268fe08  size 9984   high 2344   margin 7640 

options: 0x5
VX_SUPERVISOR_MODE    VX_DEALLOC_STACK    

VxWorks Events
--------------
Events Pended on    : Not Pended
Received Events       : 0x0
Options                    : N/A

r0     =        0   sp     =  2692420   r2     =        0    r3     =        0
r4     =        0   r5      =        0       r6     =        0    r7     =        0
r8     =        0   r9      =        0      r10    =        0    r11    =        0
r12    =        0   r13    =        0      r14    =        0    r15    =        0
r16    =        0   r17    =        0      r18    =        0    r19    =        0
r20    =        0   r21    =        0      r22    =        0    r23    =        0
r24    =        0   r25    =        0      r26    =        0    r27    =        0
r28    =        0   r29    =    ffffffff      r30    =   b030    r31    =  17e0700
msr   =  b030     lr     =        0       ctr    =        0     pc     =   1423c0
cr     = 20000043  xer =        0
value = 0 = 0x0
-------------------------------------------------------------------------------------------------------------

As we can see, the stack end address is 0x268fe08. Let us display the memory nearby this address.
-> d 0x268fe08, 20, 4
-------------------------------------------------------------------------------------------------
0268fe00:                                744e6574 5461736b   *   tNetTask*
0268fe10:  00eeeeee eeeeeeee eeeeeeee eeeeeeee   *................*
0268fe20:  eeeeeeee eeeeeeee eeeeeeee eeeeeeee   *................*
0268fe30:  eeeeeeee eeeeeeee eeeeeeee eeeeeeee   *................*
0268fe40:  eeeeeeee eeeeeeee eeeeeeee eeeeeeee   *................*
0268fe50:  eeeeeeee eeeeeeee                                 *................*
value = 21 = 0x15
--------------------------------------------------------------------------------------------------

As it is shown above, the tNetTask's name is saved at its stack end address. Normally, it should not be changed except for stack overflow or corruption. Let us select this address as the one to be monitored.
-> bh 0x268fe08,3,0,0,0

Step (3): Reproduce the problem
When I reproduce the problem, it breaks by the hardware breakpoint with the following information:
------------------------------------------------------------------------------------------------------------------------------------------------
Break at 0x0268fe08: G_MacAddrCapacity+0x4933c0   Task: 0x2692518 (tNetÞ®/}DìWò¸°:Ú7ðPð)
------------------------------------------------------------------------------------------------------------------------------------------------

It is obviously that the address 0x268fe08 is corrupted by tNetTask itself. I could guess that the problem is not caused by the stack corruption. But I still need dump and analyze the satck information to confirm and to find out the reason for the stack overflow.

Step (4): Dump and Analyze the stack of tNetTask
This time, we can not display the information of tNetTask using "ti tNetTask" as before, since the stack end part has been corrupted.
-> ti tNetTask
----------------------------------------
Undefined symbol: tNetTask
-----------------------------------------

We could try its TID. The TID of tNetTask is given in Step (4), 0x2692518. We could also get the TID using command "i".
-> ti 0x2692518
----------------------------------------------------------------------------------------------------------------------
NAME          ENTRY       TID    PRI   STATUS      PC       SP     ERRNO  DELAY
----------        ------------      --------   ---    ----------       --------    --------     -------      -----
tNetÞ®/}DnetTask    2692518  50   SUSPEND  a0d08  268f8b0      0          0

stack: base 0x2692518  end 0x268fe08  size 9984   high 9984   margin 0    

options: 0x5
VX_SUPERVISOR_MODE  VX_DEALLOC_STACK    

VxWorks Events
--------------
Events Pended on    : Not Pended
Received Events     : 0x0
Options             : N/A

r0     =   ba78c4     sp     =  268f8b0   r2     =        0         r3    =  12be6e8
r4     =  268fe0c      r5     =        412   r6     =        0         r7    = 3e07841c
r8     =        0         r9     =  1520000   r10   =      14c      r11    =        0
r12    =        0       r13    =        0        r14   =        0       r15    =        0
r16    =        0       r17    =        0        r18   =        0       r19    =  124d1b8
r20    =  2690b40   r21    =      420      r22   =  124d1bc   r23    =  2690e60
r24    =        0       r25    =        0        r26   =  2690d40   r27    =        4
r28    =  268f930   r29    =  268f930     r30   =  15235a8   r31    =  2690d60
msr    =     b030      lr    =   107a04     ctr   =      137       pc     =    a0d08
cr     = 20842043  xer    =        0
value = 0 = 0x0
----------------------------------------------------------------------------------------------------------------------

We can see that tNetTask is suspended by the hardware breakpoint. The sp register has the top stack frame address, it has the value 0x268f8b0, which is lower than the stack end address 0x268fe08. The stack grows from high address to low address. 

VxWorks has a shell tool to do stack trace on task:
-> tt 0x2692518
--------------------------------------------------
trcStack aborted: error in top frame
--------------------------------------------------

In our case, It doesn't work since the overflow part of the stack might be corrupted by other tasks. I have to dump the call stack by myself.
-> d 0x268f8b0, 50, 4
--------------------------------------------------------------------------------------------------
0268f8b0:  0268f8d0 00000000 00000000 00000000   *.h..............*
0268f8c0:  00000000 0268f930 015235a8 02690d60   *.....h.0.R5..i.`*
0268f8d0:  0268f910 00ba78c4 00000000 00000000   *.h....x.........*
0268f8e0:  00000000 00000000 00000000 00000000   *................*
0268f8f0:  00000000 00000000 02690d40 02690e60   *[email protected].`*
0268f900:  0268f930 0268f920 02690d60 02690d60   *.h.0.h. .i.`.i.`*
0268f910:  026909a0 004ca2ec 00000000 00000000   *.i...L..........*
0268f920:  00000000 00000000 00000000 00000000   *................*
0268f930:  00000000 00000000 00000000 00000000   *................*
0268f940:  00000000 00000000 00000000 00000000   *................*
0268f950:  00000000 00000000 00000000 00000000   *................*
0268f960:  00000000 00000000 00000000 00000000   *................*
0268f970:  00000000 00000000                     *................*
value = 21 = 0x15
--------------------------------------------------------------------------------------------------

The data at address 0x268f8b0 has the value 0x0268f8d0, which is the address of the next level stack frame. Let us analyze this stack frame:
-------------------------------------------------------------------------------------------------
0268f8d0:  0268f910 00ba78c4 00000000 00000000   *.h....x.........*
-------------------------------------------------------------------------------------------------

The data at address 0x0268f8d4 is the return address. We could find the related function it belongs to.
-> lkAddr 0x00ba78c4
----------------------------------------------------------
0x00ba780c BF_set_key                text    
0x00ba7a30 BIO_new                    text    
0x00ba7ac8 BIO_set                     text    
0x00ba7b80 BIO_free                    text    
0x00ba7c50 BIO_read                   text    
0x00ba7d8c BIO_write                   text    
0x00ba7efc BIO_puts                    text    
0x00ba8014 BIO_gets                   text    
0x00ba813c BIO_int_ctrl                text    
0x00ba8164 BIO_ptr_ctrl                text    
0x00ba81a0 BIO_ctrl                     text    
0x00ba82b8 BIO_callback_ctrl        text    
value = 0 = 0x0
-----------------------------------------------------------

So, it belongs to the function BF_set_key. Using the similar method, we finally could get the whole call stack as follows:
-------------------------------------
vxTaskEntry()
netTask()
dec21x40RxIntHandle()
dec21x40Recv()
endRcvRtnCall()
muxReceive()
endEtherInputHookRtn()
rcip6InputSniffer()
ipv6ProcessFrame()
ifyDipRx()
processIngressPacket()
ifyRpcInProcLocalPkt()
v6ProcLocalPkt()
v6InnerProcLocalPtk()
v6NdRx()
v6procNbrAdv()
ifyDADComplete()
duReport()
bf_encrypt_NP_info()
BF_set_key()
------------------------------------

---------------------
4. Root Cause
---------------------
According to some investigation, the call stack itself has no errors. But when I look into the code of the function bf_encrypt_NP_info, I find it declares a huge local struct data as follows:
int bf_encrypt_NP_info(const unsigned char *inText, char *retText)
{
        char iv[8];
        int enc_data_length=0;
        BF_KEY key;
        …
}

typedef struct bf_key_st
    {
    BF_LONG P[BF_ROUNDS+2];
    BF_LONG S[4*256];                    --> 4*4*256 = 4096 bytes
    } BF_KEY;

In Step (2), we could see that the stack size for tNetTask is only 9984, which is much less than that of tMainTask(81232). When the function bf_encrypt_NP_info is called, its local parameters run out of the free space of the stack, which makes it overflow.

你可能感兴趣的:(设置vxWorks硬件断点调试)