【分享】SRIO错误的基本判决

最近发现论坛上好多SRIO的帖子,刚好应客户需求我总结了一些SRIO的东西,在这里也分享出来,作为抛砖引玉吧。

首先坦白来说我不是太懂SRIO,至少SRIO协议我没完全看过,O(∩_∩)O~,为了给客户写这些东西,我也查阅了很多资料和书籍,如果有不对的地方,欢迎大家讨论和指正。(论坛上的SRIO大虾很多,比如Zhan,Allen等,嘿嘿……)

SRIO其实不是个通用的名字,真正能google到的叫Lp-serial,这个全称是什么暂且买个关子,它是一种协议,就是规定了两个都遵照这种协议的设备可以通信。要注意的是这里规定的是两个设备,不是三个也不是四个,这个理解是比较重要的。比如你一个switch连接了3个DSP和2个FPGA,这5个都可以通过SRIO协议通信的。但是本质上,这3个DSP和2个FPGA都是和Switch通信,再进一步来说这个协议是端到端的协议。之所以要说明这个问题,就是很多客户反映的DSP到FPAG发包木有成功,如果中间有switch,那么你需要检查DSP到switch的链路,switch到FPGA的链路。而不是笼统的去看DSP到FPGA,这就是协议的本质。

再说协议,协议规定了SRIO在物理层传递是按照固定的报文的。如果你是做原始的SRIO的IP,那么你需要手动的拼接这些报文;如果你用TI芯片,恭喜你,TI使用LSU帮你拼接,你只需要配置LSU寄存器就可以了。所以有人问LSU怎么填,那么如果你懂了协议里面的包格式,同时了解LSU如何对应到协议中去,那么你就不会有任何疑问了。(这当然是说的简单啊……)

继而就说到SRIO的错误检测了,通常遇到SRIO错误,我们这里抛去硬件信号质量错误(这种错误需要看眼图)。我们一般首先会看offset为0x158的SPn_ERR_STAT寄存器。比特位域如下表所示,这个寄存器可以分为3个部分来看,一是port状态,二是输出和输出的stop error,三是重传的stop error。我们下面分三部分重点说明各个状态是什么意义。

Bit

Name

Description

0

Port_Uninitialized

输入和输出端口未初始化,bit0和bit1是互斥的,在同一时刻有且只有1个bit为1 (硬件自动设置和清除)

1

Port_Ok

输入和输出端口已经被初始化完成,且双方互相发送error-free控制符号(硬件自动设置和清除)

2

Port_Error

输入或者输出端口遇到一个硬件无法恢复的错误,主要是指link-response未收到或者收到错误response

4

Port_Write_Pnd

端口要求发起一个port-write的维护操作告知对端错误状态。Port-write的接收方式是预先定义好的,当出现端口错误的时候就会往该方发送port-write的维护报文

8

Input_Error_STP

输入端口检测到一个传输错误(硬件自动设置和清除)

9

Input_Error_ENC

输入端口曾经检测到一个传输错误,随着bit8的置位而置位,写1可清除

10

Input_Retry_STP

输入端口进入重传停止状态

16

Output_Error_STP

输出端口检测到一个传输错误(硬件自动设置和清除)

17

Output_Error_ENC

输出端口曾经检测到一个传输错误,随着bit16的置位而置位,写1可清除

18

Output_Retry_STP

输出端口进入重传停止状态(硬件自动设置和清除)

19

Output_Retried

输出端口重传标志,随着bit18设置而置位,写1可清除

20

Output_Retry_Enc

输出口曾经处于输出重传状态

24

Output_Degrd_Enc

输出端口的degraded错误数达到或者超过门限值

25

Output_Fail_Enc

输出端口的Failed错误数达到或者超过门限值

26

Output_Pkt_Drop

输出端口丢弃一个包(只对Switch设备)

 

Port uninitialized and Port Ok

端口未初始化和端口OK是一组状态,端口状态只能是未初始化或者OK。通常在刚开始的时候端口状态时未初始化的,需要用户进行初始化配置才能变成端口OK状态。端口的初始化配置主要是端口的接收时钟窗对齐以及端口宽度的确认过程;大部分情况端口宽度通常是固定配置的,只有接收时钟窗需要调整。

接收时钟窗调整的过程是,两个连接的设备都互相向对方不停的发送training control symbol和link-request control symbol。成功收到并检测出control symbol的端口会回复一个idle control symbol,收到idle control symbol的端口会清除port uninitialized状态转为port ok状态。

 

Input and Output Error Stop

输入和输出停止错误是成双成对存在的

错误发生场景

设备A给设备B发送报文

设备B发现接收到的idle控制符号或者报文错误,那么设备B进入input error stop状态(该bit置1,同时input error encounter也置位)。

设备B发送PNA(packet-not-accpet)控制符号给设备A

设备A收到PNA后停止发送任何消息,备份当前发送失败的消息并进入output error stop状态(该bit置1,同时output error encounter也置位)。

 

错误恢复场景

前提:设备A处于output error stop,设备B处于input error stop

设备A发送link-request给设备B

设备B回应link-response给设备A,并清除input error stop状态

设备A收到link-response,清除output error stop状态。

设备A继续发送上次未成功报文或者发送优先级更高的报文

 

Input and Output Retry Stop

输入和输出重传错误是成双成对存在的

错误发生场景

设备A给设备B发送报文

设备B发现一些临时问题导致不能接收报文(比如没有空闲buffer可以接收),那么设备B丢弃该报文,进入input retry stop状态(该bit置1,同时input retry encounter也置位)。

设备B发送PR(packet-retry)控制符号给设备A

设备A收到PR后停止发送任何消息,备份当前发送失败的消息并进入output retry stop状态(该bit置1,同时output retry encounter也置位)。

 

错误恢复场景

前提:设备A处于output retry stop,设备B处于input retry stop

设备A发送restart-from-retry给设备B

设备B收到restart-from-retry后,清除input error stop状态并开始接收报文

设备A清除output retry stop状态,继续发送上次未成功报文或者发送优先级更高的报文

 

需要指出的是,这个寄存器是SRIO错误状态判断的最基本的寄存器,还有更高级的东东,可惜我也是一知半解,下次学习了再分享吧!

 

 

 

 

First, I don't often give praise for support but I must say Travis, Karthik and Derek from TI have been extremely instrumental in getting my SRIO environment to work and bringing me up to speed on tips and tricks for SRIO. It has been very nice to see success and progress. Thanks again guys!
So in an effort to consolidate much of what I have learned, I will post here some information that I would have found extremely helpful 5 weeks ago :)

My environment:

I was using a aTCA chassis with a MCH that has an SRIO switch. For more on my setup, please see this post: http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/164695.aspx Here I say what parts I am using and some commands that I found helpful when working with the MCH.

 

My Goal:

Get DSP-A on C6678 EVM-A to send a message to DSP-B on C6678 EVM-B via an SRIO switch.

 

TI Example Programs:

When the TI programs are in loopback mode, they seem to work just fine. They use the CSL (Chip Support Library) to access registers. The CSL is nice. Plus once you open a CSL handle, you can just access the structure with all the registers yourself (see example in the after port ok section).

To switch between loopback and "normal" mode use these CSLs:

CSL_SRIO_SetLoopbackMode(hSrio, 0);

CSL_SRIO_SetNormalMode(hSrio,0);

While it seemed to make sense for me to use the MultiCoreLoopback example or the ChipToChip example, it turns out these are extremely complex and make it difficult to learn/understand what is going on.

Travis recommended using the loopbackDioIsr example project. This project is simpler using less of the queuing capabilities of the dsp. This program essentially accesses an exact memory location as given in the LSU (load storage unit) registers at the destination ID. So in the case of a loopback, it is just another location in DSP-A. In normal mode, it is a memory location in DSP-B (assuming you set up the destination IDs). Be careful, this also means you can write to any location in memory - any!

Switch and routing issues:

- Remember all device ideas have to be unique. So if you use the same example program on each DSP (A&B), it won't work unless you change the device ids.

- Remember that the switch will need to be configured to properly route destination ids on the packets. This can be done with maintanence packets or by direct configuration of the switch. In my case, the switch has a default configuration file that I modified to route from DSP-A to DSP-B.

- Remember to make sure that the switch enables the input and output on the port you are using. If it is not enabled only maintanence packets will be received. All other types will be dropped.

- my switch could only accept one connection from one device ID. The way the TI SRIO works is that you can have multiple port connections but they will always have the same device ID. The examples are written to make (4) 1 lane connections to the other device. You might need to adjust this portion of the examples if your switch is like mine.

- Travis says "port-writes" should be disabled unless you have a specific reason to use them.

- some switches need to be specially enabled to accept 16 bit destination ids. Just something to keep in mind.

Understanding Ack IDs (from Travis):

Normal handshaking at the physical layer would be like this:

Device A sends a packet to Device B with ackid n

There is a transmission error on packet ackid n

Device B sees a CRC error and goes into Input error stopped state
(drops all new RX packets)

Device B sends a PNA control symbol to Device A

On reception of the PNA, Device A goes into output error stopped
state (stops sending any new packets)

Device A sends a LR Input status control symbol to device B

On reception of the LR input status, Device B sends a Link
maintenance response control symbol indicating packet ackid n was the PNA
packet.

Also, Device B then enters normal mode.

On reception of the link maintenance response packet, Device A goes
into normal mode and starts resending packets to device B stating with packet
ackid n

Things to check after "Port Ok":

After you get a port ok, if you are having problems sending messages here are some registers you should check (the listed register is for Port 0).

- ERR_Stat (TI register 0xB158)

- LM_RESP (TI register B144)

- ERR_Det (TI register C040)

- SP0_CTL (TI register B15C)

These collectively told me that the switch was not accepting my packets and in the end lead to the discovery that the switch had not enabled input and output messaging and was only accepting maintanence packets.

If you ever see the Output Error Stop condition or the Input Error Stop condition, there is a magic number that is to be written to a register. In fact, Travis recommends doing this no matter what after receiving "Port Ok".

hSrio->RIO_PLM[i].RIO_PLM_SP_LONG_CS_TX1 = 0x2003F044;

System_printf("SRIO (Core %i): Correct Output Error Stop Condition.\n", coreNum);

After you have sent messages using the LSU, there is an LSU status register that is very helpful for indicating if the transfer was good or not.

Maintenance Packets:

Here is a short blip of code that I wrote to read a register from the switch via a maintanence packet. this function will work with the dioIsr example. Sorry about the formatting.

static Int32 maintanenceReadReg(Srio_SockHandle handle, UInt32 srioReg)

{

Srio_SockAddrInfo to;

uint16_t compCode;

uint16_t counter;

int32_t startTime;

UInt8 * pReadRespBuf = NULL;

UInt8 * pTmpRead = NULL;

pReadRespBuf = (uint8_t*)Osal_srioDataBufferMalloc(4);

if(pReadRespBuf == NULL)

{

System_printf("Error: pReadRespBuf Memory allocation failed.\n");

}

pTmpRead = pReadRespBuf;

for (counter = 0; counter < 4; counter++)

{

*pTmpRead++ = 0x55;

}

to.dio.rapidIOMSB = 0x0;

to.dio.rapidIOLSB = srioReg; //0x0015C;//(uint32_t)&dstDataBufPtr[srcDstBufIdx][0];

to.dio.dstID = DEVICE_ID4_8BIT;

to.dio.ttype = 0; //Read

to.dio.ftype = 8; //Maintanence packets

/* Send the DIO Information. */

if (Srio_sockSend_DIO (handle, pReadRespBuf, 4, (Srio_SockAddrInfo*)&to) < 0)

{

System_printf ("Error: (Core %d): Could not send message.\n", coreNum);

return -1;

}

/* Wait for the interrupt to occur without touching the peripheral. */

/* Other useful work could be done here such as by invoking a scheduler */

startTime = TSCL;

while((! srioLsuIsrServiced) && ((TSCL - startTime) < SRIO_DIO_LSU_ISR_TIMEOUT));

if (! srioLsuIsrServiced) {

System_printf ("ISR didn't happen within set time - %d cycles. Example failed !!!\n", SRIO_DIO_LSU_ISR_TIMEOUT);

return -1;

}

Osal_srioDataBufferFree(pReadRespBuf, 4);

return 0;

}

Calling the function:

maintanenceReadReg(mySrioSocket, 0x15C);

CSL_SRIO_ClearLSUPendingInterrupt (hSrioCSL, 0xFFFFFFFF, 0xFFFFFFFF);

srioLsuIsrServiced = 0;

This clearingLSUPending interrupt is important - it has to happen after each transmission (at least in this example).

Various Other Posts to check:

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/168310.aspx

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/168310.aspx

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/167006.aspx

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/165949.aspx

 

I am sure I have forgotten a few things but hopefully this will get you started and post away, hopefully Travis, Derek or Karthik will see it and be able to help!

 

Good luck!

 

Brandy

PS - thanks again guys, it feels great to be moving forward!!

转载于:https://www.cnblogs.com/fpga/archive/2013/03/06/2947194.html

你可能感兴趣的:(【分享】SRIO错误的基本判决)