(1)-RFCN(.wk)LoadModel函数参数解析
Hi3559AV100 NNIE开发(1)-RFCN(.wk)LoadModel函数参数解析 - 尚码园
以后随笔将更多笔墨着重于NNIE开发系列,下文是关于Hi3359AV100 NNIE开发(1)-RFCN NNIE LoadModel函数与参数解析,经过对LoadModel函数的解析,可以很好理解.wk文件的具体内容,为方便为对其余不一样模型.wk加载时如何进行修改给出参照。数组
在RFCN demo中把RFCN的.wk模型文件经过函数导出模型参数,具体以下所示:网络
1 static SAMPLE_SVP_NNIE_MODEL_S s_stRfcnModel = {0}; 2 3 HI_CHAR *pcModelName = "./data/nnie_model/detection/inst_rfcn_resnet50_cycle_352x288.wk"; 4 5 //函数输入参数 6 SAMPLE_COMM_SVP_NNIE_LoadModel(pcModelName,&s_stRfcnModel);
SAMPLE_COMM_SVP_NNIE_LoadModel函数定义原型以下:ide
1 SAMPLE_COMM_SVP_NNIE_LoadModel( 2 HI_CHAR * pszModelFile, 3 SAMPLE_SVP_NNIE_MODEL_S *pstNnieModel)
LoadModel函数下SAMPLE_SVP_NNIE_MODEL_S参数定义:函数
1 typedef struct hiSAMPLE_SVP_NNIE_MODEL_S
2 {
3 SVP_NNIE_MODEL_S stModel;
4 SVP_MEM_INFO_S stModelBuf;//store Model file存储模型文件
5 }SAMPLE_SVP_NNIE_MODEL_S;
给出SAMPLE_SVP_NNIE_MODEL_S结构体下SVP_NNIE_MODEL_S参数定义:工具
1 /*NNIE model*/ 2 typedef struct hiSVP_NNIE_MODEL_S 3 { 4 SVP_NNIE_RUN_MODE_E enRunMode;/*枚举类型,网络模型运行模式*/ 5 6 HI_U32 u32TmpBufSize; /*temp buffer size 辅助内存大小*/ 7 HI_U32 u32NetSegNum; /*网络模型中 NNIE 执行的网络分段数,取值[1,8]*/ 8 SVP_NNIE_SEG_S astSeg[SVP_NNIE_MAX_NET_SEG_NUM];/*网络在 NNIE 引擎上执行的段信息*/ 9 SVP_NNIE_ROIPOOL_INFO_S astRoiInfo[SVP_NNIE_MAX_ROI_LAYER_NUM]; /*ROIPooling info(信息)*/ 10 11 SVP_MEM_INFO_S stBase; /*网络其余信息*/ 12 }SVP_NNIE_MODEL_S;
enRunModel:为枚举类型,表示网络模型的运行模式,有SVP_NNIE_RUN_MODE_CHIP(只能在Chip芯片上运行),以及SVP_NNIE_RUN_MODE_FUNC_SIM(只能用于PC端功能仿真)两个枚举值。能够经过打印能够看到RFCN网络模型的运行状况。
u32TempBufSize:为辅助内存大小。
u32NetSegNum:为网络模型中NNIE执行的网络分段数,取值为1~8。这里的分段是指模型执行中可能会分红多段,一些段在NNIE上执行,一些段在CPU或DSP上执行,给出图示以前,首先先给出SVP对扩展层的参考设计:
当网络中存在 Non-support 层时,须要将网络进行切分,不支持的部分由用户使用 CPU或者 DSP 等方式实现,统称为非 NNIE 方式。由此整个网络会出现 NNIE->非 NNIE->NNIE… 的分段执行方式。以Faster RCNN为例,具体以下图所示:
u32NetSegNum就是指有多少段是在NNIE上执行的,若是一个网络模型所有都是在NNIE上执行,那么这个u32NetSegNum就是1。由上图可知FasterRCNN网络的NNIE执行分为两段,即u32NetSegNum = 2。
astSeg[SVP_NNIE_MAX_NET_SEG_NUM]:这个参数是一个结构体数组,SVP_NNIE_MAX_NET_SEG_NUM(ROI池信息)在hi_nnie.h中定义为8,这个数组表示每一段NNIE网络的各段的具体信息,具体信息有哪些,来看SVP_NNIE_MODEL_S结构体下SVP_NNIE_SEG_S(SEG段信息)这个结构体:
1 /***************************************************************/ 2 /*Segment information*/(分段信息) 3 typedef struct hiSVP_NNIE_SEG_S 4 { 5 SVP_NNIE_NET_TYPE_E enNetType; /*网络段的类型*/ 6 HI_U16 u16SrcNum; /*网络段的输入节点数*/ 7 HI_U16 u16DstNum; /*网络段的输出节点数*/ 8 HI_U16 u16RoiPoolNum; /*网络段中包含的 RoiPooling 以及 PSRoiPooling layer 数*/ 9 10 HI_U16 u16MaxStep; /*RNN/LSTM 网络中序列的最大“帧数”*/ 11 12 HI_U32 u32InstOffset; 13 HI_U32 u32InstLen; 14 15 SVP_NNIE_NODE_S astSrcNode[SVP_NNIE_MAX_INPUT_NUM];/*网络段的第 i 个输入节点信息, SVP_NNIE_MAX_INPUT_NUM为16*/ 16 17 SVP_NNIE_NODE_S astDstNode[SVP_NNIE_MAX_OUTPUT_NUM];/*网络段的第 i 个输出节点信息, SVP_NNIE_MAX_OUTPUT_NUM为16*/ 18 19 HI_U32 au32RoiIdx[SVP_NNIE_MAX_ROI_LAYER_NUM_OF_SEG]; /*Roipooling info index 网络段的第 i 个 RoiPooling 或者 PsRoiPooling 在SVP_NNIE_MODEL_S 中 SVP_NNIE_ROIPOOL_INFO_S 数组的下标,SVP_NNIE_MAX_ROI_LAYER_NUM_OF_SEG为2*/ 20 }SVP_NNIE_SEG_S;
SVP_NNIE_MODEL_S结构体SVP_NNIE_SEG_S结构体下enNetType参数为枚举类型,具体以下所示:
1 /*Network type 例子见后面表格 */ 2 typedef enum hiSVP_NNIE_NET_TYPE_E 3 { 4 SVP_NNIE_NET_TYPE_CNN = 0x0, /* Non-ROI input cnn net,普通的CNN\DNN网络类型 */ 5 SVP_NNIE_NET_TYPE_ROI = 0x1, /* With ROI input cnn net,有RPN层输出框信息的网络类型*/ 6 SVP_NNIE_NET_TYPE_RECURRENT = 0x2, /* RNN or LSTM net */ 7 8 SVP_NNIE_NET_TYPE_BUTT 9 }SVP_NNIE_NET_TYPE_E;
包含4种类型:SVP_NNIE_NET_TYPE_CNN表示普通的的CNN网络, SVP_NNIE_NET_TYPE_ROI有RPN层输出框信息的网络类型,这里其实就是指Faster RCNN的NNIE模型中的Proposal层,这个层包含RPN输出框信息,且由CPU来执行。SVP_NNIE_NET_TYPE_RECURRENT则表示RNN循环神经网络或者LSTM长短时间记忆网络。
SVP_NNIE_MODEL_S结构体SVP_NNIE_SEG_S结构体下u16SrcNum:表示这个段的输入节点数,即这个段网络有多少个输入,也是后面的astSrcNode数组的元素的有效个数。
SVP_NNIE_MODEL_S结构体SVP_NNIE_SEG_S结构体下u16DstNum:表示这个段的输出节点数,即这个段网络有多少个输出,也是后面的astDstNode数组的元素的有效个数。
SVP_NNIE_MODEL_S结构体SVP_NNIE_SEG_S结构体下astSrcNode与astDstNode:表示这个段的输入和输出节点的具体信息,其类型为SVP_NNIE_NODE(节点)_S,具体以下:
1 /*Node information*/ 2 typedef struct hiSVP_NNIE_NODE_S 3 { 4 SVP_BLOB_TYPE_E enType; /*节点的类型*/ 5 union 6 { 7 struct 8 { 9 HI_U32 u32Width; /*节点内存形状的宽*/ 10 HI_U32 u32Height; /*节点内存形状的高*/ 11 HI_U32 u32Chn; /*节点内存形状的通道数*/ 12 }stWhc; 13 14 HI_U32 u32Dim; /*节点内存的向量维度*/ 15 }unShape; 16 17 HI_U32 u32NodeId; /*节点在网络中的 Id*/ 18 HI_CHAR szName[SVP_NNIE_NODE_NAME_LEN];/*Report layer bottom name or data layer bottom name*/报告层底部和数据层底部 19 }SVP_NNIE_NODE_S;
SVP_NNIE_MODEL_S(模型)结构体SVP_NNIE_SEG_S(分段)结构体SVP_NNIE_NODE_S(节点)结构体下enType是枚举类型,其类型SVP_BLOB_TYPE_E以下:
1 /*Blob type*/ 2 typedef enum hiSVP_BLOB_TYPE_E 3 { 4 SVP_BLOB_TYPE_S32 = 0x0, /*Blob 数据元素为 S32 类型*/ 5 6 SVP_BLOB_TYPE_U8 = 0x1, /*Blob 数据元素为 U8 类型*/ 7 8 /*channel = 3*/ 9 SVP_BLOB_TYPE_YVU420SP = 0x2, /*Blob 数据内存排布为 YVU420SP*/ 10 11 /*channel = 3*/ 12 SVP_BLOB_TYPE_YVU422SP = 0x3,/*Blob 数据内存排布为 YVU422SP*/ 13 14 SVP_BLOB_TYPE_VEC_S32 = 0x4, /*Blob 中存储向量,每一个元素为 S32 类型*/ 15 16 SVP_BLOB_TYPE_SEQ_S32 = 0x5,/*Blob 中存储序列,数据元素为 S32 类型*/ 17 18 SVP_BLOB_TYPE_BUTT 19 }SVP_BLOB_TYPE_E;
(以Fast RCNN为例)经过打印输出SVP_NNIE_MODEL_S结构体中的astSeg,即打印两段NNIE网络信息的输入输出节点信息,具体以下:
从打印的信息,咱们首先看段与节点的类型,这个对于之后的分析有用,由于后面的一些初始化操做会根据不一样的类型有不一样的操做,以下表:
将以上打印与下面网络图结合,这里的节点名szName,我的的理解是,若是做为输入节点,则显示的是该层的bottom的名字,若是做为输出节点,则显示top的名字。第1段有1个输入节点,即data层的输入。输出节点的数量打印显示是4个输出,而下面的网络图中只有3个输出,即conv5, rpn_bbox_pred, rpn_cls_prob_reshape,打印显示多了一个rpn_cls_score。在RuyiStudio的网络图里rpn_cls_score是在第一段的中间,并不是做为输出,为何会把它当输出呢?玄机就在网络描述文件.prototxt里面,咱们来看rpn_cls_score这一层,以下:
1 layer { 2 name: "rpn_cls_score" 3 type: "Convolution" 4 bottom: "rpn/output" 5 top: "rpn_cls_score_report" 6 convolution_param { 7 num_output: 18 # 2(bg/fg) * 9(anchors) 8 kernel_size: 1 pad: 0 stride: 1 9 weight_filler { type: "gaussian" std: 0.01 } 10 bias_filler { type: "constant" value: 0 } 11 } 12 }
这一层的top输出是rpn_cls_score_report,即rpn_cls_score加了后缀_report。在《HiSVP开发指南》的3.2.7章节,以下:
能够看到,中间层的top加上_report后看成该段的一个输出,所以从打印那里能够看到这一段有rpn_cls_score做为输出。
经过点击Ruyistudio 软件中Mark 按钮进入工具自动标记后的 Prototxt 的网络拓扑图:(以Fast RCNN为例)
(2)-RFCN(.wk)LoadModel及NNIE Init函数运行过程分析
Hi3559AV100 NNIE开发(2)-RFCN(.wk)LoadModel及NNIE Init函数运行过程分析 - 爱码网
通过对LoadModel函数及NNIE Init函数实现分析,结合上一篇随笔对LoadModel函数参数挖掘,很大程度上能够理解NNIE初始化实现过程,并给其他算法模型在NNIE移植提供参考,下面将给出RFCN Load_Model函数执行过程与NNIE_RFCN参数初始化过程。
pszModelFile是导入目标检测方法RFCN .wk模型的位置与文件名,pstNnieModel结构的详细解析已经通过上一篇随笔(Hi3359AV100 NNIE开发(1)-RFCN(.wk)LoadModel函数参数解析 :https://www.cnblogs.com/iFrank/p/14500648.html)给出。
而在RFCN demo中把RFCN的.wk模型文件通过函数导出模型参数,具体调用如下所示:
1 static SAMPLE_SVP_NNIE_MODEL_S s_stRfcnModel = {0}; 2 3 HI_CHAR *pcModelName = "./data/nnie_model/detection/inst_rfcn_resnet50_cycle_352x288.wk"; 4 5 //函数输入参数 6 SAMPLE_COMM_SVP_NNIE_LoadModel(pcModelName,&s_stRfcnModel);
下面给出LoadModel函数的具体分析,深入函数内部,把各个细节弄清楚,先给SAMPLE_COMM_SVP_NNIE_LoadModel函数:
1 /*SAMPLE_COMM_SVP_NNIE_LoadModel(pcModelName, 2 &s_stRfcnModel);*/ 3 HI_S32 SAMPLE_COMM_SVP_NNIE_LoadModel( 4 HI_CHAR * pszModelFile, 5 SAMPLE_SVP_NNIE_MODEL_S *pstNnieModel) 6 { 7 HI_S32 s32Ret = HI_INVALID_VALUE; 8 HI_U64 u64PhyAddr = 0; 9 HI_U8 *pu8VirAddr = NULL; 10 HI_SL slFileSize = 0; 11 /*Get model file size*/ 12 FILE *fp=fopen(pszModelFile,"rb"); 13 SAMPLE_SVP_CHECK_EXPR_RET(NULL == fp,s32Ret,SAMPLE_SVP_ERR_LEVEL_ERROR,"Error, open model file failed!\n"); 14 s32Ret = fseek(fp,0L,SEEK_END); // 文件指针指向文件尾 15 SAMPLE_SVP_CHECK_EXPR_GOTO(-1 == s32Ret,FAIL_0,SAMPLE_SVP_ERR_LEVEL_ERROR,"Error, fseek failed!\n"); 16 slFileSize = ftell(fp); // 获取文件字节大小 17 SAMPLE_SVP_CHECK_EXPR_GOTO(slFileSize <= 0,FAIL_0,SAMPLE_SVP_ERR_LEVEL_ERROR,"Error, ftell failed!\n"); 18 s32Ret = fseek(fp,0L,SEEK_SET); // 再将文件指针指向文件头 19 SAMPLE_SVP_CHECK_EXPR_GOTO(-1 ==s32Ret,FAIL_0,SAMPLE_SVP_ERR_LEVEL_ERROR,"Error, fseek failed!\n"); 20 21 /*malloc model file mem 根据文件大小计算需分配的物理地址及虚拟地址大小*/ 22 s32Ret = SAMPLE_COMM_SVP_MallocMem("SAMPLE_NNIE_MODEL", 23 NULL, 24 (HI_U64*)&u64PhyAddr, //0 25 (void**)&pu8VirAddr, //NULL 26 slFileSize); //ftell(fp) 27 SAMPLE_SVP_CHECK_EXPR_GOTO(HI_SUCCESS != s32Ret,FAIL_0,SAMPLE_SVP_ERR_LEVEL_ERROR, 28 "Error(%#x),Malloc memory failed!\n",s32Ret); 29 30 pstNnieModel->stModelBuf.u32Size = (HI_U32)slFileSize; //文件大小 31 pstNnieModel->stModelBuf.u64PhyAddr = u64PhyAddr; //物理地址 32 pstNnieModel->stModelBuf.u64VirAddr = (HI_U64)pu8VirAddr; //虚拟指针 33 34 /*读取整个wk文件到虚拟地址*/ 35 s32Ret = fread(pu8VirAddr,slFileSize,1,fp); 36 SAMPLE_SVP_CHECK_EXPR_GOTO(1 != s32Ret,FAIL_1,SAMPLE_SVP_ERR_LEVEL_ERROR, 37 "Error,read model file failed!\n"); 38 39 /*load model ,从wk文件数据buf 中的模型中解析出网络模型*/ 40 s32Ret = HI_MPI_SVP_NNIE_LoadModel(&pstNnieModel->stModelBuf, /*输入:模型数据buf*/ 41 &pstNnieModel->stModel); /*输出:网络模型结构体*/ 42 43 SAMPLE_SVP_CHECK_EXPR_GOTO(HI_SUCCESS != s32Ret,FAIL_1,SAMPLE_SVP_ERR_LEVEL_ERROR, 44 "Error,HI_MPI_SVP_NNIE_LoadModel failed!\n"); 45 46 fclose(fp); 47 return HI_SUCCESS; 48 FAIL_1: 49 SAMPLE_SVP_MMZ_FREE(pstNnieModel->stModelBuf.u64PhyAddr,pstNnieModel->stModelBuf.u64VirAddr); 50 pstNnieModel->stModelBuf.u32Size = 0; 51 FAIL_0: 52 if (NULL != fp) 53 { 54 fclose(fp); 55 } 56 57 return HI_FAILURE; 58 }
通过分析,LoadModel函数执行以下步骤:
(1)获取wk文件文件大小;
(2)根据文件大小分配存储wk文件的物理地址与虚拟地址;
(3)读取wk文件到虚拟地址
(4)从wk文件数据的buf中解析出网络模型信息
执行完后上述步骤后,模型存储在s_stRfcnModel.stModel结构体里,这个结构体里存储的是什么信息,可参考我上一篇随笔Hi3359AV100 NNIE开发(1)-RFCN(.wk)LoadModel函数参数解析 :https://www.cnblogs.com/iFrank/p/14500648.html,这里简单罗列各个段、输入输出节点(以Fast RCNN为例,因为HiSVP开发是以Fast RCNN为例进行细节说明的,此处与文档匹配)的信息如下:
在完成SAMPLE_COMM_SVP_NNIE_LoadModel提取.wk模型参数值之后,结构体指针给到 s_stRfcnNnieParam.pstModel = &s_stRfcnModel.stModel;这个结构体中,随后进行的就是RFCN算法的NNIE参数初始化,具体如下所示:
1 /* 2 stNnieCfg.pszPic= NULL; 3 stNnieCfg.u32MaxInputNum = 1; //max input image num in each batch 4 stNnieCfg.u32MaxRoiNum = 300; 5 stNnieCfg.aenNnieCoreId[0] = SVP_NNIE_ID_0; //set NNIE core for 0-th Seg 6 //表示下标为 0 的 NNIE 引擎 7 stNnieCfg.aenNnieCoreId[1] = SVP_NNIE_ID_0; //set NNIE core for 1-th Seg 8 stNnieCfg.aenNnieCoreId[2] = SVP_NNIE_ID_0; //set NNIE core for 2-th Seg 9 10 s_stRfcnNnieParam.pstModel = &s_stRfcnModel.stModel; 11 s_stRfcnSoftwareParam.apcRpnDataLayerName[0] = "rpn_cls_score"; 12 s_stRfcnSoftwareParam.apcRpnDataLayerName[1] = "rpn_bbox_pred"; 13 14 s32Ret = SAMPLE_SVP_NNIE_Rfcn_ParamInit(&stNnieCfg, 15 &s_stRfcnNnieParam, 16 &s_stRfcnSoftwareParam); 17 */ 18 19 //函数初始化 20 static HI_S32 SAMPLE_SVP_NNIE_Rfcn_ParamInit( 21 SAMPLE_SVP_NNIE_CFG_S* pstCfg, 22 SAMPLE_SVP_NNIE_PARAM_S *pstNnieParam, //pstModel 23 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S* pstSoftWareParam) 24 { 25 HI_S32 s32Ret = HI_SUCCESS; 26 /*init hardware para*/ 27 s32Ret = SAMPLE_COMM_SVP_NNIE_ParamInit(pstCfg,pstNnieParam); 28 SAMPLE_SVP_CHECK_EXPR_GOTO(HI_SUCCESS != s32Ret,INIT_FAIL_0,SAMPLE_SVP_ERR_LEVEL_ERROR, 29 "Error(%#x),SAMPLE_COMM_SVP_NNIE_ParamInit failed!\n",s32Ret); 30 31 /*init software para*/ 32 s32Ret = SAMPLE_SVP_NNIE_Rfcn_SoftwareInit(pstCfg,pstNnieParam,pstSoftWareParam); 33 SAMPLE_SVP_CHECK_EXPR_GOTO(HI_SUCCESS != s32Ret,INIT_FAIL_0,SAMPLE_SVP_ERR_LEVEL_ERROR, 34 "Error(%#x),SAMPLE_SVP_NNIE_Rfcn_SoftwareInit failed!\n",s32Ret); 35 36 return s32Ret; 37 INIT_FAIL_0: 38 s32Ret = SAMPLE_SVP_NNIE_Rfcn_Deinit(pstNnieParam,pstSoftWareParam,NULL); 39 SAMPLE_SVP_CHECK_EXPR_RET(HI_SUCCESS != s32Ret,s32Ret,SAMPLE_SVP_ERR_LEVEL_ERROR, 40 "Error(%#x),SAMPLE_SVP_NNIE_Rfcn_Deinit failed!\n",s32Ret); 41 return HI_FAILURE; 42 43 }
这个函数里面执行稍复杂,简单来说就是使用stNnieCfg等信息来初始化s_stRfcnNnieParam,在使用s_stRfcnNnieParam等来初始化s_stRfcnSoftwareParam。
RFCN NNIE初始化函数分为SAMPLE_COMM_SVP_NNIE_ParamInit(初始化硬件参数)与SAMPLE_SVP_NNIE_Rfcn_SoftwareInit(初始化软件参数)两个函数。
首先看SAMPLE_COMM_SVP_NNIE_ParamInit:
1 HI_S32 SAMPLE_COMM_SVP_NNIE_ParamInit(SAMPLE_SVP_NNIE_CFG_S *pstNnieCfg, 2 SAMPLE_SVP_NNIE_PARAM_S *pstNnieParam) 3 { 4 ....... 5 6 /*NNIE parameter initialization */ 7 s32Ret = SAMPLE_SVP_NNIE_ParamInit(pstNnieCfg,pstNnieParam); 8 ....... 9 }
这个函数的实现里做了一些输入参数的有效判断后,就直接调用SAMPLE_SVP_NNIE_ParamInit,因此我们就直接看SAMPLE_SVP_NNIE_ParamInit(NNIE参数初始化)的实现,在这个函数里首先调用:
1 /*fill forward info*/(填补转发信息)
2 s32Ret = SAMPLE_SVP_NNIE_FillForwardInfo(pstNnieCfg,pstNnieParam);
这个函数的实质就是使用pstNnieParam->pstModel->astSeg的信息来初始化pstNnieParam->astForwardWithBboxCtrl与pstNnieParam->astSegData这两个结构体。
SAMPLE_SVP_NNIE_ParamInit下第2个关键SAMPLE_SVP_NNIE_GetTaskAndBlobBufSize函数,输入参数如下:
1 /*Get taskInfo and Blob mem size*/获取taskInfo和Blob内存大小 2 s32Ret = SAMPLE_SVP_NNIE_GetTaskAndBlobBufSize(pstNnieCfg, 3 pstNnieParam, 4 &u32TotalTaskBufSize,/*输入&输出:输入值为0; 输出:网络各段辅助内存的总和*/ 5 &u32TmpBufSize,/*输入&输出,输入值为0; 输出:模型辅助内存大小*/ 6 astBlobSize,/*输入&输出:输入为空; 输出:各段第1个输入、输出节点辅助内存*/ 7 &u32TotalSize);
这个函数是计算各个段、各个段中的各个节点的的辅助内存大小。我们知道,在之前的load模型的步骤中,是已经获取到模型的辅助内存(pstNnieParam->pstModel->u32TmpBufSize),但各段、段中各个节点的辅助内存是不知道的,因此该函数就是获取这些辅助内存。在这个函数中,首先调用底层API HI_MPI_SVP_NNIE_GetTskBufSize获取到网络任务的各段的辅助内存pstNnieParam->au32TaskBufSize,然后再调用SAMPLE_SVP_NNIE_GetBlobMemSize计算第1段的第1个输入节点Blob的辅助内存,以及每段的第1个输出节点的Blob辅助内存。
回到SAMPLE_SVP_NNIE_ParamInit函数中,SAMPLE_SVP_NNIE_GetTaskAndBlobBufSize执行完后,u32TotalSize为总的辅助内存大小(含模型、段、节点),此时调用第三个函数MallocCached实现内存空间分配:
1 /*Malloc mem*/M分配信息
2 s32Ret = SAMPLE_COMM_SVP_MallocCached("SAMPLE_NNIE_TASK",NULL,(HI_U64*)&u64PhyAddr,(void**)&pu8VirAddr,u32TotalSize);
接着后面,执行SAMLE_COMM_SVP_FlushCacheha函数,实现cache数据转移到内存中,具体实现如下:
1 HI_S32 SAMPLE_COMM_SVP_FlushCache(HI_U64 u64PhyAddr, HI_VOID *pvVirAddr, HI_U32 u32Size)
2 {
3 HI_S32 s32Ret = HI_SUCCESS;
4 s32Ret = HI_MPI_SYS_MmzFlushCache(u64PhyAddr, pvVirAddr,u32Size);
5
6 /*刷新 cache 里的内容到内存并且使 cache 里的内容无效
7 此接口应与 HI_MPI_SYS_MmzAlloc_Cached 接口配套使用。*/
8 return s32Ret;
9 }
再根据得到的虚拟内存地址、物理内存地址来初始化pstNnieParam->stTaskBuf、pstNnieParam->stTmpBuf、pstNnieParam->astForwardWithBboxCtrl[i].stTmpBuf、pstNnieParam->astForwardWithBboxCtrl[i].stTskBuf、stNnieParam->astForwardCtrl[i].stTskBuf、stNnieParam->astSegData[i].astSrc[j]这些结构体中的内存地址值,这个才是真正的初始化,之前在SAMPLE_SVP_NNIE_FillForwardInfo函数中也有对这些结构体做初始化,但那是“false init”,到此SAMPLE_COMM_SVP_NNIE_paramInit函数下SAMPLE_SVP_NNIE_ParamInit函数执行完毕。
首先给出SAMPLE_SVP_NNIE_Rfcn_SoftwareInit函数调用,输入三个参数,相比于SAMPLE_COMM_SVP_NNIE_ParamInit,多了s_stRfcnSoftwareParam参数,其参数用来设置RPN data layer name
和查找RPN input data,这个需根据项目实际的算法模型来进行调整改变,参数设置与调用具体如下所示:
1 stNnieCfg.pszPic= NULL; 2 stNnieCfg.u32MaxInputNum = 1; //max input image num in each batch 3 stNnieCfg.u32MaxRoiNum = 300; 4 stNnieCfg.aenNnieCoreId[0] = SVP_NNIE_ID_0; //set NNIE core for 0-th Seg 5 //表示下标为 0 的 NNIE 引擎 6 stNnieCfg.aenNnieCoreId[1] = SVP_NNIE_ID_0; //set NNIE core for 1-th Seg 7 stNnieCfg.aenNnieCoreId[2] = SVP_NNIE_ID_0; //set NNIE core for 2-th Seg 8 9 s_stRfcnNnieParam.pstModel = &s_stRfcnModel.stModel; 10 s_stRfcnSoftwareParam.apcRpnDataLayerName[0] = "rpn_cls_score"; 11 s_stRfcnSoftwareParam.apcRpnDataLayerName[1] = "rpn_bbox_pred"; 12 13 s32Ret = SAMPLE_SVP_NNIE_Rfcn_ParamInit(&stNnieCfg, 14 &s_stRfcnNnieParam, 15 &s_stRfcnSoftwareParam);
随后我们进入到SAMPLE_SVP_NNIE_Rfcn_SoftwareInit函数体,定义如下:
1 /****************************************************************************** 2 * function : Rfcn software para init 3 ******************************************************************************/ 4 static HI_S32 SAMPLE_SVP_NNIE_Rfcn_SoftwareInit( 5 SAMPLE_SVP_NNIE_CFG_S* pstCfg, 6 SAMPLE_SVP_NNIE_PARAM_S *pstNnieParam, 7 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S* pstSoftWareParam)
函数体内最主要功能是实现s_stRfcnSoftwareParam参数的赋值,大量赋值语句,来实现Rpn参数的软件初始化,具体过程分为如下:
(1)Init Rpn para;
1 pstSoftWareParam->u32MaxRoiNum = pstCfg->u32MaxRoiNum; 2 pstSoftWareParam->u32ClassNum = 21; 3 pstSoftWareParam->u32NumRatioAnchors = 3; 4 pstSoftWareParam->u32NumScaleAnchors = 3; 5 pstSoftWareParam->au32Scales[0] = 8 * SAMPLE_SVP_NNIE_QUANT_BASE; 6 pstSoftWareParam->au32Scales[1] = 16 * SAMPLE_SVP_NNIE_QUANT_BASE; 7 pstSoftWareParam->au32Scales[2] = 32 * SAMPLE_SVP_NNIE_QUANT_BASE; 8 pstSoftWareParam->au32Ratios[0] = 0.5 * SAMPLE_SVP_NNIE_QUANT_BASE; 9 pstSoftWareParam->au32Ratios[1] = 1 * SAMPLE_SVP_NNIE_QUANT_BASE; 10 pstSoftWareParam->au32Ratios[2] = 2 * SAMPLE_SVP_NNIE_QUANT_BASE; 11 pstSoftWareParam->u32OriImHeight = pstNnieParam->astSegData[0].astSrc[0].unShape.stWhc.u32Height; 12 pstSoftWareParam->u32OriImWidth = pstNnieParam->astSegData[0].astSrc[0].unShape.stWhc.u32Width; 13 pstSoftWareParam->u32MinSize = 16; 14 pstSoftWareParam->u32FilterThresh = 0; 15 pstSoftWareParam->u32SpatialScale = (HI_U32)(0.0625 * SAMPLE_SVP_NNIE_QUANT_BASE); 16 pstSoftWareParam->u32NmsThresh = (HI_U32)(0.7 * SAMPLE_SVP_NNIE_QUANT_BASE); 17 pstSoftWareParam->u32FilterThresh = 0; 18 pstSoftWareParam->u32NumBeforeNms = 6000; 19 for(i = 0; i < pstSoftWareParam->u32ClassNum; i++) 20 { 21 pstSoftWareParam->au32ConfThresh[i] = 1; 22 pstSoftWareParam->af32ScoreThr[i] = 0.8f; 23 } 24 pstSoftWareParam->u32ValidNmsThresh = (HI_U32)(0.3 * 4096);
(2)设置Rpn输入数据信息,输入信息是由RPN data layer‘s name作为设置的依据;
1 for(i = 0; i < 2; i++) 2 { 3 for(j = 0; j < pstNnieParam->pstModel->astSeg[0].u16DstNum; j++) 4 { 5 if(0 == strncmp(pstNnieParam->pstModel->astSeg[0].astDstNode[j].szName, 6 pstSoftWareParam->apcRpnDataLayerName[i], 7 SVP_NNIE_NODE_NAME_LEN)) 8 { 9 pstSoftWareParam->aps32Conv[i] =(HI_S32*)pstNnieParam->astSegData[0].astDst[j].u64VirAddr; 10 pstSoftWareParam->au32ConvHeight[i] = pstNnieParam->pstModel->astSeg[0].astDstNode[j].unShape.stWhc.u32Height; 11 pstSoftWareParam->au32ConvWidth[i] = pstNnieParam->pstModel->astSeg[0].astDstNode[j].unShape.stWhc.u32Width; 12 pstSoftWareParam->au32ConvChannel[i] = pstNnieParam->pstModel->astSeg[0].astDstNode[j].unShape.stWhc.u32Chn; 13 break; 14 } 15 } 16 SAMPLE_SVP_CHECK_EXPR_RET((j == pstNnieParam->pstModel->astSeg[0].u16DstNum), 17 HI_FAILURE,SAMPLE_SVP_ERR_LEVEL_ERROR,"Error,failed to find report node %s!\n", 18 pstSoftWareParam->apcRpnDataLayerName[i]); 19 if(0 == i) 20 { 21 pstSoftWareParam->u32ConvStride = pstNnieParam->astSegData[0].astDst[j].u32Stride; 22 } 23 } 24 25 pstSoftWareParam->stRpnBbox.enType = SVP_BLOB_TYPE_S32; 26 pstSoftWareParam->stRpnBbox.unShape.stWhc.u32Chn = 1; 27 pstSoftWareParam->stRpnBbox.unShape.stWhc.u32Height = pstCfg->u32MaxRoiNum; 28 pstSoftWareParam->stRpnBbox.unShape.stWhc.u32Width = SAMPLE_SVP_COORDI_NUM; 29 pstSoftWareParam->stRpnBbox.u32Stride = SAMPLE_SVP_NNIE_ALIGN16(SAMPLE_SVP_COORDI_NUM*sizeof(HI_U32)); 30 pstSoftWareParam->stRpnBbox.u32Num = 1;
(3)最后一步是配置Rfcn软件内存空间大小;
1 u32RpnTmpBufSize = SAMPLE_SVP_NNIE_RpnTmpBufSize(pstSoftWareParam->u32NumRatioAnchors, 2 pstSoftWareParam->u32NumScaleAnchors,pstSoftWareParam->au32ConvHeight[0], 3 pstSoftWareParam->au32ConvWidth[0]); 4 u32RpnTmpBufSize = SAMPLE_SVP_NNIE_ALIGN16(u32RpnTmpBufSize); 5 u32RpnBboxBufSize = pstSoftWareParam->stRpnBbox.u32Num* 6 pstSoftWareParam->stRpnBbox.unShape.stWhc.u32Height*pstSoftWareParam->stRpnBbox.u32Stride; 7 u32GetResultTmpBufSize = SAMPLE_SVP_NNIE_Rfcn_GetResultTmpBuf(pstCfg->u32MaxRoiNum,pstSoftWareParam->u32ClassNum); 8 u32GetResultTmpBufSize = SAMPLE_SVP_NNIE_ALIGN16(u32GetResultTmpBufSize); 9 u32ClassNum = pstSoftWareParam->u32ClassNum; 10 u32DstRoiSize = SAMPLE_SVP_NNIE_ALIGN16(u32ClassNum*pstCfg->u32MaxRoiNum*sizeof(HI_U32)*SAMPLE_SVP_NNIE_COORDI_NUM); 11 u32DstScoreSize = SAMPLE_SVP_NNIE_ALIGN16(u32ClassNum*pstCfg->u32MaxRoiNum*sizeof(HI_U32)); 12 u32ClassRoiNumSize = SAMPLE_SVP_NNIE_ALIGN16(u32ClassNum*sizeof(HI_U32)); 13 u32TotalSize = u32RpnTmpBufSize + u32RpnBboxBufSize + u32GetResultTmpBufSize + u32DstRoiSize + 14 u32DstScoreSize + u32ClassRoiNumSize; 15 16 s32Ret = SAMPLE_COMM_SVP_MallocCached("SAMPLE_RFCN_INIT",NULL,(HI_U64*)&u64PhyAddr, 17 (void**)&pu8VirAddr,u32TotalSize); 18 SAMPLE_SVP_CHECK_EXPR_RET(HI_SUCCESS != s32Ret,s32Ret,SAMPLE_SVP_ERR_LEVEL_ERROR, 19 "Error,Malloc memory failed!\n"); 20 memset(pu8VirAddr,0, u32TotalSize); 21 SAMPLE_COMM_SVP_FlushCache(u64PhyAddr,(void*)pu8VirAddr,u32TotalSize); 22 23 pstSoftWareParam->stRpnTmpBuf.u64PhyAddr = u64PhyAddr; 24 pstSoftWareParam->stRpnTmpBuf.u64VirAddr = (HI_U64)(pu8VirAddr); 25 pstSoftWareParam->stRpnTmpBuf.u32Size = u32RpnTmpBufSize; 26 27 pstSoftWareParam->stRpnBbox.u64PhyAddr = u64PhyAddr+u32RpnTmpBufSize; 28 pstSoftWareParam->stRpnBbox.u64VirAddr = (HI_U64)(pu8VirAddr)+u32RpnTmpBufSize; 29 30 pstSoftWareParam->stGetResultTmpBuf.u64PhyAddr = u64PhyAddr+u32RpnTmpBufSize+u32RpnBboxBufSize; 31 pstSoftWareParam->stGetResultTmpBuf.u64VirAddr = (HI_U64)(pu8VirAddr+u32RpnTmpBufSize+u32RpnBboxBufSize); 32 pstSoftWareParam->stGetResultTmpBuf.u32Size = u32GetResultTmpBufSize; 33 34 pstSoftWareParam->stDstRoi.enType = SVP_BLOB_TYPE_S32; 35 pstSoftWareParam->stDstRoi.u64PhyAddr = u64PhyAddr+u32RpnTmpBufSize+u32RpnBboxBufSize+u32GetResultTmpBufSize; 36 pstSoftWareParam->stDstRoi.u64VirAddr = (HI_U64)(pu8VirAddr+u32RpnTmpBufSize+u32RpnBboxBufSize+u32GetResultTmpBufSize); 37 pstSoftWareParam->stDstRoi.u32Stride = SAMPLE_SVP_NNIE_ALIGN16(u32ClassNum*pstSoftWareParam->u32MaxRoiNum*sizeof(HI_U32)*SAMPLE_SVP_NNIE_COORDI_NUM); 38 pstSoftWareParam->stDstRoi.u32Num = 1; 39 pstSoftWareParam->stDstRoi.unShape.stWhc.u32Chn = 1; 40 pstSoftWareParam->stDstRoi.unShape.stWhc.u32Height = 1; 41 pstSoftWareParam->stDstRoi.unShape.stWhc.u32Width = u32ClassNum*pstSoftWareParam->u32MaxRoiNum*SAMPLE_SVP_NNIE_COORDI_NUM; 42 43 pstSoftWareParam->stDstScore.enType = SVP_BLOB_TYPE_S32; 44 pstSoftWareParam->stDstScore.u64PhyAddr = u64PhyAddr+u32RpnTmpBufSize+u32RpnBboxBufSize+u32GetResultTmpBufSize+u32DstRoiSize; 45 pstSoftWareParam->stDstScore.u64VirAddr = (HI_U64)(pu8VirAddr+u32RpnTmpBufSize+u32RpnBboxBufSize+u32GetResultTmpBufSize+u32DstRoiSize); 46 pstSoftWareParam->stDstScore.u32Stride = SAMPLE_SVP_NNIE_ALIGN16(u32ClassNum*pstSoftWareParam->u32MaxRoiNum*sizeof(HI_U32)); 47 pstSoftWareParam->stDstScore.u32Num = 1; 48 pstSoftWareParam->stDstScore.unShape.stWhc.u32Chn = 1; 49 pstSoftWareParam->stDstScore.unShape.stWhc.u32Height = 1; 50 pstSoftWareParam->stDstScore.unShape.stWhc.u32Width = u32ClassNum*pstSoftWareParam->u32MaxRoiNum; 51 52 pstSoftWareParam->stClassRoiNum.enType = SVP_BLOB_TYPE_S32; 53 pstSoftWareParam->stClassRoiNum.u64PhyAddr = u64PhyAddr+u32RpnTmpBufSize+u32RpnBboxBufSize+u32GetResultTmpBufSize+u32DstRoiSize+u32DstScoreSize; 54 pstSoftWareParam->stClassRoiNum.u64VirAddr = (HI_U64)(pu8VirAddr+u32RpnTmpBufSize+u32RpnBboxBufSize+u32GetResultTmpBufSize+u32DstRoiSize+u32DstScoreSize); 55 pstSoftWareParam->stClassRoiNum.u32Stride = SAMPLE_SVP_NNIE_ALIGN16(u32ClassNum*sizeof(HI_U32)); 56 pstSoftWareParam->stClassRoiNum.u32Num = 1; 57 pstSoftWareParam->stClassRoiNum.unShape.stWhc.u32Chn = 1; 58 pstSoftWareParam->stClassRoiNum.unShape.stWhc.u32Height = 1; 59 pstSoftWareParam->stClassRoiNum.unShape.stWhc.u32Width = u32ClassNum;
pstSoftWareParam(s_stRfcnSoftwareParam)参数的大量赋值完成上述操作后,为后续NNIE thread work函数SAMPLE_SVP_NNIE_Rfcn_ViToVo提供参数,如下所示:
1 pstParam = &s_stRfcnNnieParam; //SAMPLE_SVP_NNIE_Rfcn_ViToVo函数内进行再赋值操作 2 pstSwParam = &s_stRfcnSoftwareParam;
到此为止,RFCN LoadModel与RFCN NNIE初始化函数分析已经完成。
(3) RuyiStudio软件 .wk文件生成过程-mobilefacenet.cfg的参数配置
Hi3559AV100 NNIE开发(3)RuyiStudio软件 .wk文件生成过程-mobilefacenet.cfg的参数配置 - 简书
之后随笔将更多笔墨着重于NNIE开发系列,下文是关于Hi3559AV100 NNIE开发(3)RuyiStudio软件 .wk文件生成过程-mobilefacenet.cfg的参数配置,目前项目需要对mobilefacenet网络进行.wk的开发,通过RuyiStudio创建工程(关于软件RuyiStudio的安装与配置看后期是否有时间,有时间就会出一篇随笔供大家参考),在工程下配置mobilefacenet.cfg文件,加载训练好的mobilefacenet.caffemodel与mobilefacenet.prototxt并进行mobilefacenet.wk的生成,其中重要的一环为mobilefacenet.cfg参数的配置,下面随笔将给出具体操作,首先给出整体需要配置的参数表:
1[prototxt_file] 2[caffemodel_file] 3[batch_num] 4[net_type] 5[sparse_rate] 6[compile_mode] 7[is_simulation] 8[log_level] 9[instruction_name] 10[RGB_order] 11[data_scale] 12[internal_stride] 13[image_list] 14[image_type] 15[mean_file] 16[norm_type]
下面给出一一说明:
(1)prototxt_file
为网络描述文件,NNIE mapper 对 prototxt 的输入层格式、layer 格式、激活层、Scale、Bias 层、RNN、LSTM 层及特殊的中间层上报、高精度配置、指定支持层有 CPU 执行等特定的规范约束。这里我对中间层、高精度配置、CPU执行进行一个具体的叙述:
①中间层是指不在网络段结尾处的层。用户需要中间层结果输出时,需要对应层的“top”域中添加“_report”标识符进行标注。如果某一中间层有多个 top 都需要输出,用户可以为每一个 top 添加上报标注。
正在上传... 取消
1layer { 2name:"conv5 " 3type:"Convolution" 4bottom:"conv4" 5top:"conv5_report" 6 convolution_param { 7num_output:256 8kernel_size:3 9pad:110stride:111 }12}
②用户指定自定义计算精度(compile_mode=2)时,在对应层的层名后加上高精度 “_hp”(16比特)标记,可实现指定任意层为高精度输入,格式如下所示。
1layer { 2name:"conv5_hp" 3type:"Convolution" 4bottom:"conv4" 5top:"conv5" 6 convolution_param { 7num_output:256 8kernel_size:3 9pad:110stride:111 }12}
③对于mapper支持层,可以通过在name字段增加_cpu标记来指定该层切换为cpu执行(包含CPU、DSP等非NNIE执行的,均使用_cpu标志),格式如下所示。
1layer { 2bottom:"rpn_cls_score" 3top:"rpn_cls_score_reshape" 4name:"rpn_cls_score_reshape_cpu" 5type:"Reshape" 6 reshape_param { 7 shape { 8dim:0 9dim:210dim: -111dim:012 } 13 }14}
(2)caffemodel_file:
网络模型数据文件。
(3)[batch_num]
0/1:single(单张)模式;
>1:batch(多张)模式。采用single模式mapper一个任务只能处理一张图片,内部存储全部为一张图片分配,减少数据调度次数。采用batch模式,在计算FC时batch_num张图片同时计算,计算资源利用率高。 (最大取值256)
(4)[net_type]:
网络的类型。
0:CNN(不包含LSTM/RNN/ROIPooling/PSROIPooling 的任意网络);
1:ROI/PSROI(包含 ROI Pooling 和 PSROI Pooling的网络);
2:Recurrent(包含 LSTM、RNN 的网络);
(5)[sparse_rate] --->(取值0到1,默认0)
NNIE引擎采用了参数压缩技术以减少带宽占用,为了提高压缩率,可通对FC参数进稀疏处理。
用户通过sparse_rate数值指定多少比例的FC参数稀疏为0,例如配0.5,则FC参数有50%将被稀疏为0,由于数据变的稀疏,压缩模块会获得更好的压缩率。稀疏值越高,计算FC时所需参数带宽越低,但精度会有所下降。
(6)[compile_mode]
0:Low-bandwidth(低带宽模式,默认):通过量化算法使参数与数据位宽最少,使系统所需带宽达到最小,但会有精度损失;
1:High-precision(高精度模式): 结果精度最好,但是性能会下降;;
2:User-specify(用户配置模式): 需要用户在prototxt中标明所有使用高精度计算的层,标注规则请见prototxt_file说明;
(7)[is_simulation]
网络模型转化类型。
0:Chip,芯片模式,网络模型转化成在芯片上加载的wk文件,指令仿真也使用此模式;
1:Simulation,仿真模式,网络模型转化成在PC端仿真上加载的wk文件,功能仿真使用此模式;
(8)[log_level]
设置是否开启日志文件,以及配置打印的等级,本参数可省略,当省略时,为不打印日志文件。
0:打印main函数流程,cfg文件等信息;
1:打印nnie_mapper解析到的文件信息,包含image_list、prototxt、内存分配过程;
2:打印中间表示信息;
3:打印详细信息,有大量文件输出,转化耗时较长,请谨慎使用;
(9)[instruction_name]
nnie_mapper生成的知识库文件名称。默认生成如下格式的知识库名:inst.wk;用户也可以自行修改生成的知识库名字。
(10)[RGB_order] --->取值范围:{RGB,BGR} default:BGR
image_type设置为0时,该参数无效;
image_type设置为1时,不管该参数配置何值,要求用户板端输入必须为BGR_Planar格式图像;
image_type设置为3、5时,表示YUV图像数据转成RGB Planar或者BGR Planar图像输入给网络。
本参数可省略。
(11)[data_scale]
数据预处理缩放比例,配置为浮点数,配合norm_type使用本参数可省略,默认为0.00390625=1/256。FLT_MAX等于3.402823466e+38。
(12)[internal_stride]
用户根据DDR颗粒对应的最佳读写效率配置中间结果的对齐方式。要求:DDR3对应16,DDR4对应32,可不填,默认为16;
(13)[image_list]
NNIE mapper 用于数据量化的参考图像 list 文件或feature map 文件。该配置跟 image_type 相关。如果网络的数据输入是灰度或者 RGB 图像输入,即image_type 配置不为 0,image_list 配置为所参考图片的list,内容示意图如下图图示,图片的格式支持以下几种:
(14)[image_type]
表示网络实际执行时输入给网络的数据类型,该配置跟 image list 相关。
0∶表示网络数据输入为 SVP BLOB_TYPE_S32(参考《HiSVP API参考》)或者向量的类型(VEC_S32和 SEQ S32);此时要求 image list 配 置为 feature map 文件;
1∶ 表示网络数据输入为 SVP BLOB TYPE U8(普通的灰度图和 RGB 图)类型; 此时要求 image_list 配置是 RGB 图或者灰度图片的 list 文件;
3∶ 网络数据输入为 SVP_ BLOB_TYPE YUV420SP类型;
5∶ 网络数据输入为SVP_BLOB_TYPE YUV422SP类型;
当配置为3或者5时,image_list配置为 RGB图片的 list 文件。
(15)[mean_file]
norm_type为1、4时,表示均值文件xxx.binaryproto;
norm_type为2、5时,表示通道均值文件;
norm_type为0、3时,用户也需要配置mean_file项,但具体内容可以是一个无效路径,比如null;通道均值文件mean.txt中每一行的浮点数表示 对应的通道均值,如单通道只有一个值。
(16)[norm_type]
表示对网络数据输入的预处理方法。注意image_type配置为0时,norm_type只能配置为0;image_type配置为3或者5时,网络输入数据为YUV图像,但是NNIE硬件会根据RGB_order配置项自动转为RGB或者BGR图像,此时norm_type配置方法跟image_type为1时一致。
0:不做任何预处理;
1:mean file,减图像均值;
2:channel mean_value,减通道均值;
3:data_scale,对图像像素值乘以data_scale;
4:mean filewith data_scale,减图像均值后再乘以data_scale;
5:channel mean_value with data_scale,减通道均值后再乘以data_scale。
(17) [is_check_prototxt]
检查网络描述文件标志。
0:mapper模式,对prototxt、caffemodel等进行转化。
1:网络过滤器模式,对prototxt文件是否符合支持规格进行检查。
给出目前我的mobilefacenet.cfg配置:
1[prototxt_file] ./mark_prototxt/mobilefacenet_mark_nnie_20210205133124.prototxt 2[caffemodel_file] ./data/mobilefacenet.prototxt.caffemodel 3[batch_num]256 4[net_type]0 5[sparse_rate]0 6[compile_mode]0 7[is_simulation]1 8[log_level]3 9[instruction_name] ./mobileface_func10[RGB_order] BGR11[data_scale]0.007812512[internal_stride]1613[image_list] ./data/images/imageList.txt14[image_type]115[mean_file] ./data/pixel_mean.txt16[norm_type]5
Wooden Speakers https://www.zeshuiplatform.com/
(4)-mobilefacenet.cfg参数配置挖坑解决与SVP_NNIE_Cnn实现分析
转:Hi3559AV100 NNIE开发(4)mobilefacenet.cfg参数配置挖坑解决与SVP_NNIE_Cnn实现分析_七月的风 - 致知学习-PC万里
前面随笔给出了NNIE开发的基本知识,下面几篇随笔将着重于Mobilefacenet NNIE开发,实现mobilefacenet.wk的chip版本,并在Hi3559AV100上实现mobilefacenet网络功能,外接USB摄像头通过MPP平台输出至VO HDMI显示结果。下文是Hi3559AV100 NNIE开发(4)mobilefacenet.cfg参数配置挖坑解决与SVP_NNIE_Cnn实现分析,目前项目需要对mobilefacenet网络进行.wk的开发,下面给出在.wk生成过程中遇到的坑与解决方式,并给出SVP_NNIE_Cnn整体实现的各个step分析,为后面在板载上实现mobilefacenet网络打下基础。
CNN_convert_bin_and_print_featuremap.py和Get Caffe Output这里的预处理方式都是先乘以【data_scale】,再减均值【mean_file】,而在量化生成 .mk 文件时却是先减均值再乘以scale的。
给出预处理这一个环节对输入数据data的处理方式:
1 data = inputs 2 if norm_type == '4' or norm_type == '5': 3 data = data * float(data_scale)
data是uint8类型的array,是先乘以了【data_scale】的,也就是说和NNIE 生成wk中的操作顺序是不一致的,对于mobilefacenet.cfg网络输入数据预处理方法时,当norm_type = 5时,输入数据减通道均值后再乘以 data_scale,如下所示:
所在在实际操作中,需要对均值文件进行处理,转换方式如下:
(data - 128.0) * 0.0078125 <==> data * 0.0078125 - 1
因此这里需要做的修改就是需要将【mean_file】pixel_mean_compare.txt修设置为1.0:
最终生成mobilefacenet.wk,结果如下所示,具体的测试需要下一步进行。
1 begin parameter compressing.... 2 3 end parameter compressing 4 5 begin compress index generating.... 6 7 end compress index generating 8 9 begin binary code generating.... 10 11 ......................................................................................................................... 12 ....................................................................end binary code generating 13 14 begin quant files writing.... 15 16 end quant files writing 17 18 . 19 ===============D:\Hi3559_NNIE\3559\mobileface\mobileface.cfg Successfully!=============== 20 21 End [RuyiStudio Wk NNIE Mapper] [D:\Hi3559_NNIE\3559\mobileface\mobileface.cfg] mobileface
下面给出SAMPLE_SVP_NNIE_Cnn函数的执行过程,主要分为下面八个步骤:
1 HI_CHAR *pcSrcFile = "./data/nnie_image/y/0_28x28.y"; 2 HI_CHAR *pcModelName = "./data/nnie_model/classification/inst_mnist_cycle.wk"; 3 4 5 /*Set configuration parameter */ 6 stNnieCfg.pszPic= pcSrcFile; 7 stNnieCfg.u32MaxInputNum = u32PicNum; //max input image num in each batch 8 stNnieCfg.u32MaxRoiNum = 0; 9 stNnieCfg.aenNnieCoreId[0] = SVP_NNIE_ID_0;//set NNIE core 10 s_stCnnSoftwareParam.u32TopN = 5; 11 12 13 14 /*Sys init ---step1*/ 15 SAMPLE_COMM_SVP_CheckSysInit(); 16 17 /*CNN Load model ------step2*/ 18 s32Ret = SAMPLE_COMM_SVP_NNIE_LoadModel(pcModelName,&s_stCnnModel); 19 20 21 /*CNN parameter initialization -------step3*/ 22 /*Cnn software parameters are set in SAMPLE_SVP_NNIE_Cnn_SoftwareParaInit, 23 if user has changed net struct, please make sure the parameter settings in 24 SAMPLE_SVP_NNIE_Cnn_SoftwareParaInit function are correct*/ 25 s_stCnnNnieParam.pstModel = &s_stCnnModel.stModel; 26 s32Ret = SAMPLE_SVP_NNIE_Cnn_ParamInit(&stNnieCfg,&s_stCnnNnieParam,&s_stCnnSoftwareParam); 27 28 29 /*record tskBuf -------step4*/ 30 s32Ret = HI_MPI_SVP_NNIE_AddTskBuf(&(s_stCnnNnieParam.astForwardCtrl[0].stTskBuf)); 31 32 33 /*Fill src data -------step5*/ 34 SAMPLE_SVP_TRACE_INFO("Cnn start!\n"); 35 stInputDataIdx.u32SegIdx = 0; 36 stInputDataIdx.u32NodeIdx = 0; 37 s32Ret = SAMPLE_SVP_NNIE_FillSrcData(&stNnieCfg,&s_stCnnNnieParam,&stInputDataIdx); 38 39 40 /*NNIE process(process the 0-th segment) -------step6*/ 41 stProcSegIdx.u32SegIdx = 0; 42 s32Ret = SAMPLE_SVP_NNIE_Forward(&s_stCnnNnieParam,&stInputDataIdx,&stProcSegIdx,HI_TRUE); 43 44 45 46 /*Software process --------step7*/ 47 /*if user has changed net struct, please make sure SAMPLE_SVP_NNIE_Cnn_GetTopN 48 function's input datas are correct*/ 49 s32Ret = SAMPLE_SVP_NNIE_Cnn_GetTopN(&s_stCnnNnieParam,&s_stCnnSoftwareParam); 50 51 52 53 /*Print result --------step8*/ 54 SAMPLE_SVP_TRACE_INFO("Cnn result:\n"); 55 s32Ret = SAMPLE_SVP_NNIE_Cnn_PrintResult(&(s_stCnnSoftwareParam.stGetTopN), 56 s_stCnnSoftwareParam.u32TopN);
完成的是MPP系统的初始化,主要实现的是Sys_Init和VB_Init,实现MPP内存池的配置,具体实现如下:
1 HI_VOID SAMPLE_COMM_SVP_CheckSysInit(HI_VOID) 2 { 3 .............. 4 SAMPLE_COMM_SVP_SysInit() 5 { 6 //省略了部分过程,列出实现关键函数 7 HI_MPI_SYS_Exit(); 8 HI_MPI_VB_Exit(); 9 10 memset(&struVbConf,0,sizeof(VB_CONFIG_S)); 11 12 struVbConf.u32MaxPoolCnt = 2; 13 struVbConf.astCommPool[1].u64BlkSize = 768*576*2; 14 struVbConf.astCommPool[1].u32BlkCnt = 1; 15 16 s32Ret = HI_MPI_VB_SetConfig((const VB_CONFIG_S *)&struVbConf); //设置MPP视频缓存池属性 17 18 19 s32Ret = HI_MPI_VB_Init(); //初始化MPP缓存池 20 21 22 s32Ret = HI_MPI_SYS_Init(); //初始化MPP系统 23 24 } 25 26 ............. 27 }
函数从用户事先加载到 buf 中的模型中解析出网络模型,其函数实现较为复杂,具体的函数参数解析和函数运行过程已经在前面随笔给出了,需要的话,可以参考随笔:
Hi3559AV100 NNIE开发(1)-RFCN(.wadModel函数参数解析 (https://www.cnblogs.com/iFrank/p/14500648.html)
Hi3559AV100 NNIE开发(2)-RFCN(.wk)LoadModel及NNIE Init函数运行过程分析 (https://www.cnblogs.com/iFrank/p/14503482.html)
首先给出调用与定义,便于分析:
1 /*Set configuration parameter*/ 2 stNnieCfg.pszPic= pcSrcFile; 3 stNnieCfg.u32MaxInputNum = u32PicNum; //max input image num in each batch 4 stNnieCfg.u32MaxRoiNum = 0; 5 stNnieCfg.aenNnieCoreId[0] = SVP_NNIE_ID_0;//set NNIE core 6 7 s_stCnnSoftwareParam.u32TopN = 5; 8 9 s_stCnnNnieParam.pstModel = &s_stCnnModel.stModel; 10 s32Ret = SAMPLE_SVP_NNIE_Cnn_ParamInit(&stNnieCfg, 11 &s_stCnnNnieParam, 12 &s_stCnnSoftwareParam); 13 14 15 16 static HI_S32 SAMPLE_SVP_NNIE_Cnn_ParamInit(SAMPLE_SVP_NNIE_CFG_S* pstNnieCfg, 17 SAMPLE_SVP_NNIE_PARAM_S *pstCnnPara, 18 SAMPLE_SVP_NNIE_CNN_SOFTWARE_PARAM_S* pstCnnSoftWarePara) 19 { 20 ........ 21 22 /*init hardware para*/ 23 s32Ret = SAMPLE_COMM_SVP_NNIE_ParamInit(pstNnieCfg, 24 pstCnnPara); 25 26 27 /*init software para*/ 28 if(pstCnnSoftWarePara!=NULL) 29 { 30 s32Ret = SAMPLE_SVP_NNIE_Cnn_SoftwareParaInit(pstNnieCfg, 31 pstCnnPara, 32 pstCnnSoftWarePara); 33 "Error(%#x),SAMPLE_SVP_NNIE_Cnn_SoftwareParaInit failed!\n",s32Ret); 34 } 35 36 ........ 37 }
其中SAMPLE_COMM_SVP_NNIE_ParamInit函数及参数分析可见之前随笔:
Hi3559AV100 NNIE开发(2)-RFCN(.wk)LoadModel及NNIE Init函数运行过程分析 (https://www.cnblogs.com/iFrank/p/14503482.html),之前的随笔介绍的很详细,这里就不在赘述了。
对SAMPLE_SVP_NNIE_Cnn_SoftwareParaInit函数,首先给出定义:
SAMPLE_SVP_NNIE_CFG_S* SAMPLE_SVP_NNIE_PARAM_S *pstCnnPara, SAMPLE_SVP_NNIE_CNN_SOFTWARE_PARAM_S* HI_U32 u32GetTopNMemSize = HI_U32 u32GetTopNAssistBufSize = HI_U32 u32GetTopNPerFrameSize = HI_U32 u32TotalSize = HI_U32 u32ClassNum = pstCnnPara->pstModel->astSeg[].astDstNode[ HI_U64 u64PhyAddr = HI_U8* pu8VirAddr = HI_S32 s32Ret = u32GetTopNPerFrameSize = pstCnnSoftWarePara->u32TopN* u32GetTopNMemSize = SAMPLE_SVP_NNIE_ALIGN16(u32GetTopNPerFrameSize)*pstNnieCfg-> u32GetTopNAssistBufSize = u32ClassNum* u32TotalSize = u32GetTopNMemSize+ s32Ret = SAMPLE_COMM_SVP_MallocMem(,NULL,(HI_U64*)& (**)& SAMPLE_SVP_CHECK_EXPR_RET(HI_SUCCESS != memset(pu8VirAddr, pstCnnSoftWarePara->stGetTopN.u32Num= pstNnieCfg-> pstCnnSoftWarePara->stGetTopN.unShape.stWhc.u32Chn = pstCnnSoftWarePara->stGetTopN.unShape.stWhc.u32Height = pstCnnSoftWarePara->stGetTopN.unShape.stWhc.u32Width = u32GetTopNPerFrameSize/ pstCnnSoftWarePara->stGetTopN.u32Stride = pstCnnSoftWarePara->stGetTopN.u64PhyAddr = pstCnnSoftWarePara->stGetTopN.u64VirAddr = pstCnnSoftWarePara->stAssistBuf.u32Size = pstCnnSoftWarePara->stAssistBuf.u64PhyAddr = u64PhyAddr+ pstCnnSoftWarePara->stAssistBuf.u64VirAddr = (HI_U64)pu8VirAddr+ }
函数体内最主要功能是实现s_stCnnSoftwareParam参数的赋值,包含大量赋值语句,其中s_stCnnSoftwareParam结构体各个元素赋值的意义等需要的时候再进行研讨,此外函数还实现在用户态分配 MMZ 内存。通过对两个函数的分析,step3 SAMPLE_SVP_NNIE_Cnn_ParamInit()完成。
为了记录 TskBuf 地址信息,其作用和注意事项:
①记录 TskBuf 地址信息,用于减少内核态内存映射次数,提升效率;
②TskBuf 地址信息的记录是通过链表进行管理,链表长度默认值为 32,链表长度可通过模块参数 nnie_max_tskbuf_num 进行配;
③若没调用 HI_MPI_SVP_NNIE_AddTskBuf 预先把 TskBuf 地址信息记录到系统,那么之后调用 Forward/ForwardWithBbox 每次都会 Map/Unmap 操作 TskBuf 内核态虚拟地址,效率会比较低。
给出函数调用和定义:
1 /*SAMPLE_COMM_SVP_NNIE_LoadModel(pcModelName,&s_stCnnModel);*/ 2 s_stCnnNnieParam.pstModel = &s_stCnnModel.stModel; 3 4 s32Ret = HI_MPI_SVP_NNIE_AddTskBuf(&(s_stCnnNnieParam.astForwardCtrl[0].stTskBuf)); 5 6 7 //定义 8 HI_S32 HI_MPI_SVP_NNIE_AddTskBuf(const SVP_MEM_INFO_S* pstTskBuf);
实现src数据的填充,此函数十分关键,对所给图像数据:./data/nnie_image/y/0_28x28.y进行处理,为了更好的分析数据处理函数,首先给出函数调用信息:
stNnieCfg.pszPic= stNnieCfg.u32MaxInputNum = u32PicNum; stNnieCfg.u32MaxRoiNum = stNnieCfg.aenNnieCoreId[] = SVP_NNIE_ID_0; s_stCnnNnieParam.pstModel = & stInputDataIdx.u32SegIdx = stInputDataIdx.u32NodeIdx = s32Ret = SAMPLE_SVP_NNIE_FillSrcData(&stNnieCfg, &s_stCnnNnieParam, &stInputDataIdx);
为了更加清楚函数功能,先给出函数定义,方便后面分析(忽略一些次要信息):
1 static HI_S32 SAMPLE_SVP_NNIE_FillSrcData(SAMPLE_SVP_NNIE_CFG_S* pstNnieCfg, 2 SAMPLE_SVP_NNIE_PARAM_S *pstNnieParam, 3 SAMPLE_SVP_NNIE_INPUT_DATA_INDEX_S* pstInputDataIdx)
总的来说,函数实现以下功能:
①open file fopen(pstNnieCfg->pszPic,"rb");
1 //定义文件名 2 HI_CHAR *pcSrcFile = "./data/nnie_image/y/0_28x28.y"; 3 4 stNnieCfg.pszPic= pcSrcFile; 5 6 //函数定义 7 HI_S32 SAMPLE_SVP_NNIE_FillSrcData(SAMPLE_SVP_NNIE_CFG_S* pstNnieCfg, 8 SAMPLE_SVP_NNIE_PARAM_S *pstNnieParam, 9 SAMPLE_SVP_NNIE_INPUT_DATA_INDEX_S* pstInputDataIdx) 10 //函数调用 11 SAMPLE_SVP_NNIE_FillSrcData(&stNnieCfg, 12 &s_stCnnNnieParam, 13 &stInputDataIdx); 14 15 fp = fopen(pstNnieCfg->pszPic,"rb");
②为后面fread读取数据量确定u32VarSize大小:
1 /*get data size s32Ret = fread(pu8PicAddr,u32Dim*u32VarSize,1,fp);*/ 2 if(SVP_BLOB_TYPE_U8 <= pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].enType && 3 SVP_BLOB_TYPE_YVU422SP >= pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].enType) 4 { 5 u32VarSize = sizeof(HI_U8); 6 } 7 else 8 { 9 u32VarSize = sizeof(HI_U32); 10 }
③随即通过pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].enType参数(参数找定义,应该是与输入模型.wk的模型参数有关,后面可以直接通过printf进行打印输出,看结果是啥)进行if-lese分支选择,之后通过fread对fp文件指针读取数据,确定数据内存地址,并刷新 cache 里的内容到内存并且使 cache 里的内容无效,最后fclose(fp)。
先给出enType参数类型:
代码实现:
1 /*fill src data*/ 2 if(SVP_BLOB_TYPE_SEQ_S32 == pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].enType) 3 { 4 u32Dim = pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].unShape.stSeq.u32Dim; 5 u32Stride = pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u32Stride; 6 pu32StepAddr = (HI_U32*)(pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].unShape.stSeq.u64VirAddrStep); 7 pu8PicAddr = (HI_U8*)(pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u64VirAddr); 8 for(n = 0; n < pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u32Num; n++) 9 { 10 for(i = 0;i < *(pu32StepAddr+n); i++) 11 { 12 s32Ret = fread(pu8PicAddr,u32Dim*u32VarSize,1,fp); 13 SAMPLE_SVP_CHECK_EXPR_GOTO(1 != s32Ret,FAIL,SAMPLE_SVP_ERR_LEVEL_ERROR,"Error,Read image file failed!\n"); 14 pu8PicAddr += u32Stride; 15 } 16 u32TotalStepNum += *(pu32StepAddr+n); 17 } 18 SAMPLE_COMM_SVP_FlushCache(pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u64PhyAddr, 19 (HI_VOID *) pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u64VirAddr, 20 u32TotalStepNum*u32Stride); 21 } 22 else 23 { 24 u32Height = pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].unShape.stWhc.u32Height; 25 u32Width = pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].unShape.stWhc.u32Width; 26 u32Chn = pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].unShape.stWhc.u32Chn; 27 u32Stride = pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u32Stride; 28 pu8PicAddr = (HI_U8*)(pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u64VirAddr); 29 if(SVP_BLOB_TYPE_YVU420SP== pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].enType) 30 { 31 for(n = 0; n < pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u32Num; n++) 32 { 33 for(i = 0; i < u32Chn*u32Height/2; i++) 34 { 35 s32Ret = fread(pu8PicAddr,u32Width*u32VarSize,1,fp); 36 SAMPLE_SVP_CHECK_EXPR_GOTO(1 != s32Ret,FAIL,SAMPLE_SVP_ERR_LEVEL_ERROR,"Error,Read image file failed!\n"); 37 pu8PicAddr += u32Stride; 38 } 39 } 40 } 41 else if(SVP_BLOB_TYPE_YVU422SP== pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].enType) 42 { 43 for(n = 0; n < pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u32Num; n++) 44 { 45 for(i = 0; i < u32Height*2; i++) 46 { 47 s32Ret = fread(pu8PicAddr,u32Width*u32VarSize,1,fp); 48 SAMPLE_SVP_CHECK_EXPR_GOTO(1 != s32Ret,FAIL,SAMPLE_SVP_ERR_LEVEL_ERROR,"Error,Read image file failed!\n"); 49 pu8PicAddr += u32Stride; 50 } 51 } 52 } 53 else 54 { 55 for(n = 0; n < pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u32Num; n++) 56 { 57 for(i = 0;i < u32Chn; i++) 58 { 59 for(j = 0; j < u32Height; j++) 60 { 61 s32Ret = fread(pu8PicAddr,u32Width*u32VarSize,1,fp); 62 SAMPLE_SVP_CHECK_EXPR_GOTO(1 != s32Ret,FAIL,SAMPLE_SVP_ERR_LEVEL_ERROR,"Error,Read image file failed!\n"); 63 pu8PicAddr += u32Stride; 64 } 65 } 66 } 67 } 68 SAMPLE_COMM_SVP_FlushCache(pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u64PhyAddr, 69 (HI_VOID *) pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u64VirAddr, 70 pstNnieParam->astSegData[u32SegIdx].astSrc[u32NodeIdx].u32Num*u32Chn*u32Height*u32Stride); 71 } 72 73 fclose(fp);
便于分析先给出函数的调用及参数的定义:
1 s_stCnnNnieParam.pstModel = &s_stCnnModel.stModel; 2 /* SAMPLE_COMM_SVP_NNIE_LoadModel(pcModelName,&s_stCnnModel); */ 3 4 stInputDataIdx.u32SegIdx = 0; 5 stInputDataIdx.u32NodeIdx = 0; 6 7 stProcSegIdx.u32SegIdx = 0; 8 9 s32Ret = SAMPLE_SVP_NNIE_Forward(&s_stCnnNnieParam, 10 &stInputDataIdx, 11 &stProcSegIdx, 12 HI_TRUE); 13 14 static HI_S32 SAMPLE_SVP_NNIE_Forward( 15 SAMPLE_SVP_NNIE_PARAM_S *pstNnieParam, 16 SAMPLE_SVP_NNIE_INPUT_DATA_INDEX_S* pstInputDataIdx, 17 SAMPLE_SVP_NNIE_PROCESS_SEG_INDEX_S* pstProcSegIdx, 18 HI_BOOL bInstant)
SAMPLE_SVP_NNIE_Forward中①SAMPLE_COMM_SVP_FlushCache函数主要实现将内存数据刷新到内存中;②HI_MPI_SVP_NNIE_Forward函数同时对输入样本(s)进行CNN预测,对对应样本(s)进行输出响应;③HI_MPI_SVP_NNIE_Query函数用于查询nnie上运行函数的状态,在阻塞模式下,系统等待,直到被查询的函数被调用;在非阻塞模式下,查询当前状态,不做任何操作。
给出函数调用与参数细节:
1 s_stCnnNnieParam.pstModel = &s_stCnnModel.stModel; 2 /* SAMPLE_COMM_SVP_NNIE_LoadModel(pcModelName,&s_stCnnModel); */ 3 4 s_stCnnSoftwareParam.u32TopN = 5; 5 SAMPLE_SVP_NNIE_Cnn_ParamInit(&stNnieCfg, //通过此函数对s_stCnnSoftwareParam进行了赋值操作 6 &s_stCnnNnieParam, 7 &s_stCnnSoftwareParam); 8 9 s32Ret = SAMPLE_SVP_NNIE_Cnn_GetTopN(&s_stCnnNnieParam, 10 &s_stCnnSoftwareParam); 11 12 HI_S32 SAMPLE_SVP_NNIE_Cnn_GetTopN(SAMPLE_SVP_NNIE_PARAM_S*pstNnieParam, 13 SAMPLE_SVP_NNIE_CNN_SOFTWARE_PARAM_S* pstSoftwareParam)
此函数目前基本不修改,函数内部具体实现目前暂不说明,只需注意一点如果改变了网络结构,请确保SAMPLE_SVP_NNIE_Cnn_GetTopN
函数的输入数据正确。
给出函数调用与参数细节:
1 s_stCnnNnieParam.pstModel = &s_stCnnModel.stModel; 2 /* SAMPLE_COMM_SVP_NNIE_LoadModel(pcModelName,&s_stCnnModel); */ 3 4 s_stCnnSoftwareParam.u32TopN = 5; 5 SAMPLE_SVP_NNIE_Cnn_ParamInit(&stNnieCfg, //通过此函数对s_stCnnSoftwareParam进行了赋值操作 6 &s_stCnnNnieParam, 7 &s_stCnnSoftwareParam); 8 9 s32Ret = SAMPLE_SVP_NNIE_Cnn_PrintResult(&(s_stCnnSoftwareParam.stGetTopN), 10 s_stCnnSoftwareParam.u32TopN); 11 12 HI_S32 SAMPLE_SVP_NNIE_Cnn_PrintResult(SVP_BLOB_S *pstGetTopN, 13 HI_U32 u32TopN)
有什么问题,大家可以提出来,一起讨论,后面将给出mobilefacenet的NNIE实现。
(5)mobilefacenet.wk仿真成功量化及与CNN_convert_bin_and_print_featuremap.py输出中间层数据对比过程
前面随笔给出了NNIE开发的基本知识,下面几篇随笔将着重于Mobilefacenet NNIE开发,实现mobilefacenet.wk的chip版本,并在Hi3559AV100上实现mobilefacenet网络功能,外接USB摄像头通过MPP平台输出至VO HDMI显示结果。下文是Hi3559AV100 NNIE开发(5)mobilefacenet.wk仿真成功量化及与CNN_convert_bin_and_print_featuremap.py输出中间层数据对比过程,目前实现PC端对mobilefacenet.wk仿真成功量化,为后续在板载chip上加载mobilefacenet.wk输出数据进行比较做准备。
操作系统:Windows 10
仿真工具: Ruyi Studio 2.0.28
开发平台: Hi3559AV100
网络模型: Mobilefacenet
框架:Caffe
测试前需要搭建好RuyiStudio开发环境,能够正确运行工程,并进行了mobilefacenet 网络训练,生成mobilefacenet.caffemodel,确定好mobilefacenet.prototxt(因为mobilefacenet的所有网络层都是NNIE支持的网络层,所以不需要手动修改、增加或者删除操作,可以通过marked_prototxt中的mark和check操作查看是否正确生成对应的网络结构)。
在测试前先给出NNIE一般量化流程:
(1)需要把其他非caffemodel模型对应转换到caffemodel模型,因为Hi35xx系列NNIE只支持caffemodel模型;
(2)配置仿真量化参数(即配置mobilefacenet.cfg)进行PC仿真量化,获得中间层输出结果A(mapper_quant目录下);
(3)使用RuyiStudio提供的python中间层输出工具,获得中间层输出结果B(data/ouput目录下);
(4)使用Ruyi Studio的向量对比工具Vector Comparison对A和B进行对比,观察误差,使误差控制在一定范围(利用CosineSimilarity参数);
(5)配置板载chip运行量化参数生成mobilefacenet.wk文件,上板运行获得输出结果C;
(6)对比结果A和C,使仿真与板载误差控制在可接受范围内。
创建好工程后,首先配置mobilefacenet.wk文件,需要注意以下几点:
(1)首先选择is_simulation为Simulation进行仿真测试,对比结果正确后再进行Inst/Chip生成板上运行的wk文件。因为mobilefacenet的所有网络层都是NNIE支持的网络层,所以不需要手动修改、增加或者删除操作,可以通过marked_prototxt中的mark和check操作查看是否正确生成对应的网络结构;
(2)log_level = 3可以输出所有中间层的结果,在进行仿真对比调试时应当开启,方便进行向量对比调试;
(3)batch_num不能取大,取到之后会报错,目前batch_num = 16;
(4)image_list的设置十分关键,其决定了你实际输入给模型的图片数据是怎么样的。其中image_type默认选择U8,RGB_order表示输入给网络的RGB图像的RGB三通道顺序,norm_type是对图像数据的预处理,这里我们选择channel mean_value with data_scale,对输入图像的数据进行减均值并归一。设置data_scale为0.0078125,即1/128,pixel_mean.txt如下图所示。即让原本[0,255]区间的像素值转换到[-1,1]的区间内。下面给出imageList.txt文本内容:
(5)mapper_quant中保存了所有的输出信息,Mobileface_func.wk是生成的仿真wk文件。注意:mapper_quant中保存的输出信息是选择的image_list文件的最后一张图片的输出(这个非常关键,为后面.py输出中间层结果对比的时候确认是哪张图片进行向量数据对比)
给出mobileface.cfg的具体配置:(具体.cfg参数设置可以见:Hi3559AV100 NNIE开发(3)RuyiStudio软件 .wk文件生成过程-mobilefacenet.cfg的参数配置 https://www.cnblogs.com/iFrank/p/14515089.html)
随后点击RuyiStudio软件左上角的make Wk按钮,跳出下面示意图,点击OK即可生成mobileface.wk:
给出CNN_convert_bin_and_print_featuremap.py(RuyiStudio版本为2.0.28):(见此文件放置到mobileface工程data目录下)
1 #from __future__ import print_function 2 import caffe 3 import pickle 4 from datetime import datetime 5
Hi3559AV100 NNIE开发(5)mobilefacenet.wk仿真成功量化及与CNN_convert_bin_and_print_featuremap.py输出中间层数据对比过程 - 程序猿欧文的个人空间 - OSCHINA - 中文开源技术交流社区
(6)RFCN中NNIE实现关键线程函数-&gt;SAMPLE_SVP_NNIE_Rfcn_ViToVo()进行数据流分析
Hi3559AV100 NNIE开发(6)RFCN中NNIE实现关键线程函数-&gt;SAMPLE_SVP_NNIE_Rfcn_ViToVo()进行数据流分析
前面随笔给出了NNIE开发的基本知识,下面几篇随笔将着重于Mobilefacenet NNIE开发,实现mobilefacenet.wk的chip版本,并在Hi3559AV100上实现mobilefacenet网络功能,外接USB摄像头通过MPP平台输出至VO HDMI显示结果。下文是Hi3559AV100 NNIE开发(6)RFCN中实现关键线程函数->SAMPLE_SVP_NNIE_Rfcn_ViToVo()进行数据流分析,通过对线程函数分析,详细了解如何对.wk模型数据进行处理并弄清楚检测框绘制这些后处理的实现。
首先给出SAMPLE_SVP_NNIE_Rfcn_ViToVo()函数的调用,为后续分析提供参照:
1 static pthread_t s_hNnieThread (线程)= 0; //全局定义
2
3 HI_CHAR acThreadName[16] = {0};//局部定义
4
5
6 /******************************************
7 Create work thread
8 ******************************************/
9 snprintf(acThreadName, 16, "NNIE_ViToVo");
10 prctl(PR_SET_NAME, (unsigned long)acThreadName, 0,0,0);
11 pthread_create(&s_hNnieThread, 0, SAMPLE_SVP_NNIE_Rfcn_ViToVo, NULL);
其中prctl函数的定义如下,其中PR_SET_NAME表示使用(char *) arg2所指向的位置中的值设置调用线程的名称。
1 int prctl(int option, unsigned long arg2, unsigned long arg3, 2 unsigned long arg4, unsigned long arg5);
下面讲分析SAMPLE_SVP_NNIE_Rfcn_ViToVo(HI_VOID* pArgs)函数的内部实现,具体设计到5个函数,实现的功能已经注释出来,具体如下:
1 static HI_VOID* SAMPLE_SVP_NNIE_Rfcn_ViToVo(HI_VOID* pArgs) 2 { 3 4 .... //参数定义 5 6 while (HI_FALSE == s_bNnieStopSignal) 7 { 8 //用户从通道获取一帧处理完成的图像 通道1-输出stExtFrmInfo 9 s32Ret = HI_MPI_VPSS_GetChnFrame(s32VpssGrp, 10 as32VpssChn[1], 11 &stExtFrmInfo, 12 s32MilliSec); 13 ...... 14 15 //用户从通道获取一帧处理完成的图像 通道0-输出stBaseFrmInfo 16 s32Ret = HI_MPI_VPSS_GetChnFrame(s32VpssGrp, 17 as32VpssChn[0], 18 &stBaseFrmInfo, 19 s32MilliSec); 20 ...... 21 22 //关键处理函数 23 s32Ret = SAMPLE_SVP_NNIE_Rfcn_Proc( 24 pstParam, 25 pstSwParam, 26 &stExtFrmInfo, 27 stBaseFrmInfo.stVFrame.u32Width, 28 stBaseFrmInfo.stVFrame.u32Height); 29 ...... 30 31 //Draw rect 32 s32Ret = SAMPLE_COMM_SVP_NNIE_FillRect( 33 &stBaseFrmInfo, 34 &(pstSwParam->stRect), 35 0x0000FF00); //绿色 36 ...... 37 38 //将视频图像送入指定输出通道显示。 39 s32Ret = HI_MPI_VO_SendFrame(voLayer, 40 voChn, 41 &stBaseFrmInfo, 42 s32MilliSec); 43 ...... 44 45 BASE_RELEASE: 46 s32Ret = HI_MPI_VPSS_ReleaseChnFrame(s32VpssGrp,as32VpssChn[0], &stBaseFrmInfo); 47 ...... 48 49 EXT_RELEASE: 50 s32Ret = HI_MPI_VPSS_ReleaseChnFrame(s32VpssGrp,as32VpssChn[1], &stExtFrmInfo); 51 ....... 52 53 } 54 55 return HI_NULL; 56 }
下面给出HI_MPI_VPSS_GetChnFrame函数的参数调用与实现过程(其中VPSS双通道输出),具体功能为从通道获取一帧处理完成的图像:
1 HI_S32 s32VpssGrp = 0; 2 HI_S32 as32VpssChn[] = {VPSS_CHN0, VPSS_CHN1}; 3 VIDEO_FRAME_INFO_S stBaseFrmInfo; 4 VIDEO_FRAME_INFO_S stExtFrmInfo; 5 HI_S32 s32MilliSec = 20000; 6 7 //用户从通道获取一帧处理完成的图像 通道1-输出stExtFrmInfo 8 s32Ret = HI_MPI_VPSS_GetChnFrame(s32VpssGrp, 9 as32VpssChn[1], 10 &stExtFrmInfo, 11 s32MilliSec); 12 13 //用户从通道获取一帧处理完成的图像 通道0-输出stBaseFrmInfo 14 s32Ret = HI_MPI_VPSS_GetChnFrame(s32VpssGrp, 15 as32VpssChn[0], 16 &stBaseFrmInfo, 17 s32MilliSec); 18 19 //函数定义 20 HI_S32 HI_MPI_VPSS_GetChnFrame(VPSS_GRP VpssGrp, VPSS_CHN VpssChn, 21 VIDEO_FRAME_INFO_S *pstVideoFrame, HI_S32 s32MilliSec);
SAMPLE_SVP_NNIE_Rfcn_Proc函数实现是整个RFCN NNIE数据处理过程的Key Point,检测加框等信息来源处,现给出函数参数及实现分析:
1 SAMPLE_SVP_NNIE_PARAM_S *pstParam; 2 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S *pstSwParam; 3 4 pstParam = &s_stRfcnNnieParam; 5 pstSwParam = &s_stRfcnSoftwareParam; 6 /* 7 s_stRfcnNnieParam及s_stRfcnSoftwareParam参数涉及前述操作 8 9 s_stRfcnNnieParam.pstModel = &s_stRfcnModel.stModel; 10 s_stRfcnSoftwareParam.apcRpnDataLayerName[0] = "rpn_cls_score"; 11 s_stRfcnSoftwareParam.apcRpnDataLayerName[1] = "rpn_bbox_pred"; 12 s32Ret = SAMPLE_SVP_NNIE_Rfcn_ParamInit(&stNnieCfg, 13 &s_stRfcnNnieParam, 14 &s_stRfcnSoftwareParam); 15 */ 16 17 18 VIDEO_FRAME_INFO_S stBaseFrmInfo; 19 VIDEO_FRAME_INFO_S stExtFrmInfo; 20 21 //其中stBaseFrmInfo参数与stExtFrmInfo参数是经过下述函数输出得到 22 /* 23 s32Ret = HI_MPI_VPSS_GetChnFrame(s32VpssGrp, 24 as32VpssChn[1], 25 &stExtFrmInfo, 26 s32MilliSec); 27 s32Ret = HI_MPI_VPSS_GetChnFrame(s32VpssGrp, 28 as32VpssChn[0], 29 &stBaseFrmInfo, 30 s32MilliSec); 31 */ 32 33 34 s32Ret = SAMPLE_SVP_NNIE_Rfcn_Proc( 35 pstParam, 36 pstSwParam, 37 &stExtFrmInfo, 38 stBaseFrmInfo.stVFrame.u32Width, 39 stBaseFrmInfo.stVFrame.u32Height);
随之给出SAMPLE_SVP_NNIE_Rfcn_Proc函数的定义及内部实现过程:
1 static HI_S32 SAMPLE_SVP_NNIE_Rfcn_Proc( 2 SAMPLE_SVP_NNIE_PARAM_S *pstParam, 3 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S *pstSwParam, 4 VIDEO_FRAME_INFO_S* pstExtFrmInfo, 5 HI_U32 u32BaseWidth,HI_U32 u32BaseHeight) 6 { 7 8 ......参数定义 9 10 s32Ret = SAMPLE_SVP_NNIE_Forward(pstParam, 11 &stInputDataIdx, 12 &stProcSegIdx, 13 HI_TRUE); 14 15 ...... 16 17 /*RPN*/ 18 s32Ret = SAMPLE_SVP_NNIE_Rfcn_Rpn(pstParam, 19 pstSwParam); 20 21 22 if(0 != pstSwParam->stRpnBbox.unShape.stWhc.u32Height) 23 { 24 25 s32Ret = SAMPLE_SVP_NNIE_ForwardWithBbox( 26 pstParam, 27 &stInputDataIdx, 28 &pstSwParam->stRpnBbox, 29 &stProcSegIdx, 30 HI_TRUE); 31 ...... 32 33 s32Ret = SAMPLE_SVP_NNIE_ForwardWithBbox( 34 pstParam, 35 &stInputDataIdx, 36 &pstSwParam->stRpnBbox, 37 &stProcSegIdx, 38 HI_TRUE); 39 ...... 40 41 s32Ret = SAMPLE_SVP_NNIE_Rfcn_GetResult(pstParam, 42 pstSwParam); 43 44 } 45 else 46 { ...... } 47 s32Ret = SAMPLE_SVP_NNIE_RoiToRect( 48 &(pstSwParam->stDstScore), 49 &(pstSwParam->stDstRoi), 50 &(pstSwParam->stClassRoiNum), 51 pstSwParam->af32ScoreThr, 52 HI_TRUE, 53 &(pstSwParam->stRect), 54 pstExtFrmInfo->stVFrame.u32Width, 55 pstExtFrmInfo->stVFrame.u32Height, 56 u32BaseWidth, 57 u32BaseHeight); 58 59 ...... 60 61 return s32Ret; 62 63 }
首先给出函数的调用及参数细节,便于分析函数功能:
1 其中在SAMPLE_SVP_NNIE_Rfcn_ViToVo函数中: 2 SAMPLE_SVP_NNIE_PARAM_S *pstParam; 3 pstParam = &s_stRfcnNnieParam; //此值传给SAMPLE_SVP_NNIE_Rfcn_Proc 4 5 6 7 static HI_S32 SAMPLE_SVP_NNIE_Rfcn_Proc( 8 SAMPLE_SVP_NNIE_PARAM_S *pstParam, /*这个参数传进来之后经过了SP420赋值然后送入 9 SAMPLE_SVP_NNIE_Forward*/ 10 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S *pstSwParam, 11 VIDEO_FRAME_INFO_S* pstExtFrmInfo, 12 HI_U32 u32BaseWidth,HI_U32 u32BaseHeight) 13 14 ...... 15 16 SAMPLE_SVP_NNIE_INPUT_DATA_INDEX_S stInputDataIdx = {0}; 17 SAMPLE_SVP_NNIE_PROCESS_SEG_INDEX_S stProcSegIdx = {0}; 18 19 stInputDataIdx.u32SegIdx = 0; 20 stInputDataIdx.u32NodeIdx = 0; 21 22 /*SP420*/ 23 pstParam->astSegData[stInputDataIdx.u32SegIdx].astSrc[stInputDataIdx.u32NodeIdx].u64VirAddr = pstExtFrmInfo->stVFrame.u64VirAddr[0]; 24 pstParam->astSegData[stInputDataIdx.u32SegIdx].astSrc[stInputDataIdx.u32NodeIdx].u64PhyAddr = pstExtFrmInfo->stVFrame.u64PhyAddr[0]; 25 pstParam->astSegData[stInputDataIdx.u32SegIdx].astSrc[stInputDataIdx.u32NodeIdx].u32Stride = pstExtFrmInfo->stVFrame.u32Stride[0]; 26 27 /*NNIE process 0-th seg*/ 28 stProcSegIdx.u32SegIdx = 0; 29 30 /*NNIE process 0-th seg*/ 31 stProcSegIdx.u32SegIdx = 0; 32 s32Ret = SAMPLE_SVP_NNIE_Forward(pstParam, 33 &stInputDataIdx, 34 &stProcSegIdx, 35 HI_TRUE); 36 37 ...... 38 39 //函数定义 40 static HI_S32 SAMPLE_SVP_NNIE_Forward( 41 SAMPLE_SVP_NNIE_PARAM_S *pstNnieParam, 42 SAMPLE_SVP_NNIE_INPUT_DATA_INDEX_S* pstInputDataIdx, 43 SAMPLE_SVP_NNIE_PROCESS_SEG_INDEX_S* pstProcSegIdx, 44 HI_BOOL bInstant)
此函数如其名,主要实现了NNIE forward功能,下面给出调用具体函数:
1 /****************************************************************************** 2 * function : NNIE Forward 3 ******************************************************************************/ 4 static HI_S32 SAMPLE_SVP_NNIE_Forward(SAMPLE_SVP_NNIE_PARAM_S *pstNnieParam, 5 SAMPLE_SVP_NNIE_INPUT_DATA_INDEX_S* pstInputDataIdx, 6 SAMPLE_SVP_NNIE_PROCESS_SEG_INDEX_S* pstProcSegIdx,HI_BOOL bInstant) 7 { 8 ...... 9 10 SAMPLE_COMM_SVP_FlushCache( 11 pstNnieParam->astForwardCtrl[pstProcSegIdx->u32SegIdx].stTskBuf.u64PhyAddr, 12 (HI_VOID *) pstNnieParam->astForwardCtrl[pstProcSegIdx->u32SegIdx].stTskBuf.u64VirAddr, 13 pstNnieParam->astForwardCtrl[pstProcSegIdx->u32SegIdx].stTskBuf.u32Size); 14 15 /*set input blob according to node name*/ 16 if(pstInputDataIdx->u32SegIdx != pstProcSegIdx->u32SegIdx) 17 { 18 for(i = 0; i < pstNnieParam->pstModel->astSeg[pstProcSegIdx->u32SegIdx].u16SrcNum; i++) 19 { 20 ...... 21 } 22 } 23 24 /*NNIE_Forward 多节点输入输出的 CNN 类型网络预测。 25 对输入样本(s)进行CNN预测,对对应样本(s)进行输出响应*/ 26 s32Ret = HI_MPI_SVP_NNIE_Forward( 27 &hSvpNnieHandle, 28 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astSrc, 29 pstNnieParam->pstModel, 30 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst, 31 &pstNnieParam->astForwardCtrl[pstProcSegIdx->u32SegIdx], 32 bInstant); 33 34 35 if(bInstant) 36 { 37 /*Wait NNIE finish,,,,enNnieId 执行网络段的 NNIE 引擎 ID。 */ 38 while(HI_ERR_SVP_NNIE_QUERY_TIMEOUT == (s32Ret = HI_MPI_SVP_NNIE_Query(pstNnieParam->astForwardCtrl[pstProcSegIdx->u32SegIdx].enNnieId, 39 hSvpNnieHandle, &bFinish, HI_TRUE))) 40 { 41 ...... 42 } 43 } 44 45 bFinish = HI_FALSE; 46 for(i = 0; i < pstNnieParam->astForwardCtrl[pstProcSegIdx->u32SegIdx].u32DstNum; i++) 47 { 48 if(SVP_BLOB_TYPE_SEQ_S32 == pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].enType) 49 { 50 ...... 51 52 SAMPLE_COMM_SVP_FlushCache( 53 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u64PhyAddr, 54 (HI_VOID *) pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u64VirAddr, 55 u32TotalStepNum*pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u32Stride); 56 57 } 58 else 59 { 60 61 SAMPLE_COMM_SVP_FlushCache(pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u64PhyAddr, 62 (HI_VOID *) pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u64VirAddr, 63 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u32Num* 64 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].unShape.stWhc.u32Chn* 65 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].unShape.stWhc.u32Height* 66 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u32Stride); 67 } 68 } 69 70 return s32Ret; 71 }
下面给出sample_svp_NNIE_Rfcn_Rpn子函数分析,此函数主要是用于rpn相关,下面给出参数调用及函数分析:
1 其中在SAMPLE_SVP_NNIE_Rfcn_ViToVo函数中: 2 SAMPLE_SVP_NNIE_PARAM_S *pstParam; //此值传给SAMPLE_SVP_NNIE_Rfcn_Proc 3 pstParam = &s_stRfcnNnieParam; //此值传给SAMPLE_SVP_NNIE_Rfcn_Proc 4 5 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S *pstSwParam; 6 pstSwParam = &s_stRfcnSoftwareParam; 7 8 static HI_S32 SAMPLE_SVP_NNIE_Rfcn_Proc( 9 SAMPLE_SVP_NNIE_PARAM_S *pstParam, /*这个参数传进来之后经过了SP420赋值然后送入 10 SAMPLE_SVP_NNIE_Forward*/ 11 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S *pstSwParam, //直接传给SAMPLE_SVP_NNIE_Rfcn_Rpn函数 12 VIDEO_FRAME_INFO_S* pstExtFrmInfo, 13 HI_U32 u32BaseWidth,HI_U32 u32BaseHeight) 14 15 /*SP420*/ 16 pstParam->astSegData[stInputDataIdx.u32SegIdx].astSrc[stInputDataIdx.u32NodeIdx].u64VirAddr = pstExtFrmInfo->stVFrame.u64VirAddr[0]; 17 pstParam->astSegData[stInputDataIdx.u32SegIdx].astSrc[stInputDataIdx.u32NodeIdx].u64PhyAddr = pstExtFrmInfo->stVFrame.u64PhyAddr[0]; 18 pstParam->astSegData[stInputDataIdx.u32SegIdx].astSrc[stInputDataIdx.u32NodeIdx].u32Stride = pstExtFrmInfo->stVFrame.u32Stride[0]; 19 20 此后,pstParam参数先传入SAMPLE_SVP_NNIE_Forward函数进行处理,随后传入SAMPLE_SVP_NNIE_Rfcn_Rpn函数 21 22 /*RPN*/ 23 s32Ret = SAMPLE_SVP_NNIE_Rfcn_Rpn(pstParam, 24 pstSwParam); 25 26 // 函数调用 ,用于used to do rpn 27 HI_S32 SAMPLE_SVP_NNIE_Rfcn_Rpn( 28 SAMPLE_SVP_NNIE_PARAM_S*pstNnieParam, 29 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S* pstSoftwareParam)
下面给出sample_svp_NNIE_Rfcn_Rpn子函数内部实现,此函数虽然传入两个参数,但是没有对pstNnieParam参数进行任何处理,完成的pstSoftwareParam结构体内大量参数的的处理 ,主要是调用了SVP_NNIE_Rpn函数与SAMPLE_COMM_SVP_FlushCache函数:
1 HI_S32 SAMPLE_SVP_NNIE_Rfcn_Rpn(SAMPLE_SVP_NNIE_PARAM_S*pstNnieParam, 2 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S* pstSoftwareParam) 3 { 4 HI_S32 s32Ret = HI_SUCCESS; 5 s32Ret = SVP_NNIE_Rpn(pstSoftwareParam->aps32Conv,pstSoftwareParam->u32NumRatioAnchors, 6 pstSoftwareParam->u32NumScaleAnchors,pstSoftwareParam->au32Scales, 7 pstSoftwareParam->au32Ratios,pstSoftwareParam->u32OriImHeight, 8 pstSoftwareParam->u32OriImWidth,pstSoftwareParam->au32ConvHeight, 9 pstSoftwareParam->au32ConvWidth,pstSoftwareParam->au32ConvChannel, 10 pstSoftwareParam->u32ConvStride,pstSoftwareParam->u32MaxRoiNum, 11 pstSoftwareParam->u32MinSize,pstSoftwareParam->u32SpatialScale, 12 pstSoftwareParam->u32NmsThresh,pstSoftwareParam->u32FilterThresh, 13 pstSoftwareParam->u32NumBeforeNms,(HI_U32*)pstSoftwareParam->stRpnTmpBuf.u64VirAddr, 14 (HI_S32*)pstSoftwareParam->stRpnBbox.u64VirAddr, 15 &pstSoftwareParam->stRpnBbox.unShape.stWhc.u32Height); 16 SAMPLE_COMM_SVP_FlushCache(pstSoftwareParam->stRpnBbox.u64PhyAddr, 17 (HI_VOID *) pstSoftwareParam->stRpnBbox.u64VirAddr, 18 pstSoftwareParam->stRpnBbox.u32Num* 19 pstSoftwareParam->stRpnBbox.unShape.stWhc.u32Chn* 20 pstSoftwareParam->stRpnBbox.unShape.stWhc.u32Height* 21 pstSoftwareParam->stRpnBbox.u32Stride); 22 SAMPLE_SVP_CHECK_EXPR_RET(HI_SUCCESS != s32Ret,s32Ret,SAMPLE_SVP_ERR_LEVEL_ERROR, 23 "Error,SVP_NNIE_Rpn failed!\n"); 24 return s32Ret; 25 }
完成前面几个函数后,通过if判断,当满足条件后执行SAMPLE_SVP_NNIE_ForwardWithBbox函数:
1 if(0 != pstSwParam->stRpnBbox.unShape.stWhc.u32Height)
否则不满足if条件的时候,将执行下面赋值语句:
1 for (i = 0; i < pstSwParam->stClassRoiNum.unShape.stWhc.u32Width; i++) 2 { 3 *(((HI_U32*)(HI_UL)pstSwParam->stClassRoiNum.u64VirAddr)+i) = 0; 4 }
而SAMPLE_SVP_NNIE_ForwardWithBbox执行了有两次,分别是对不同NNIE process x-th seg进行处理,具体如下:
1 其中在SAMPLE_SVP_NNIE_Rfcn_ViToVo函数中: 2 SAMPLE_SVP_NNIE_PARAM_S *pstParam; //此值传给SAMPLE_SVP_NNIE_Rfcn_Proc 3 pstParam = &s_stRfcnNnieParam; //此值传给SAMPLE_SVP_NNIE_Rfcn_Proc 4 5 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S *pstSwParam; 6 pstSwParam = &s_stRfcnSoftwareParam; 7 8 static HI_S32 SAMPLE_SVP_NNIE_Rfcn_Proc( 9 SAMPLE_SVP_NNIE_PARAM_S *pstParam, /*这个参数传进来之后经过了SP420赋值然后送入 10 SAMPLE_SVP_NNIE_Forward*/ 11 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S *pstSwParam, /*直接传给SAMPLE_SVP_NNIE_Rfcn_Rpn函数 12 随后传给SAMPLE_SVP_NNIE_ForwardWithBbox函数*/ 13 VIDEO_FRAME_INFO_S* pstExtFrmInfo, 14 HI_U32 u32BaseWidth,HI_U32 u32BaseHeight) 15 16 /*RPN*/ 17 s32Ret = SAMPLE_SVP_NNIE_Rfcn_Rpn(pstParam, 18 pstSwParam); /*此函数完成pstSwParam的相关赋值,并 19 传给SAMPLE_SVP_NNIE_ForwardWithBbox函数*/ 20 21 22 /*在SAMPLE_SVP_NNIE_ForwardWithBbox函数前面,已经有了下面参数的赋值, 23 并且传入至SAMPLE_SVP_NNIE_Forward函数*/ 24 stInputDataIdx.u32SegIdx = 0; 25 stInputDataIdx.u32NodeIdx = 0; 26 27 s32Ret = SAMPLE_SVP_NNIE_Forward(pstParam, 28 &stInputDataIdx, //stInputDataIdx参数已经用于此函数中 29 &stProcSegIdx, 30 HI_TRUE); 31 32 33 //函数调用 34 /*NNIE process 1-th seg, the input data comes from 3-rd report node of 0-th seg, 35 the input roi comes from RPN results*/ 36 stInputDataIdx.u32SegIdx = 0; 37 stInputDataIdx.u32NodeIdx = 3; 38 39 stProcSegIdx.u32SegIdx = 1; 40 41 s32Ret = SAMPLE_SVP_NNIE_ForwardWithBbox( 42 pstParam, 43 &stInputDataIdx, 44 &pstSwParam->stRpnBbox, 45 &stProcSegIdx, 46 HI_TRUE); 47 48 49 /*NNIE process 2-nd seg, the input data comes from 4-th report node of 0-th seg 50 the input roi comes from RPN results*/ 51 stInputDataIdx.u32SegIdx = 0; 52 stInputDataIdx.u32NodeIdx = 4; 53 54 stProcSegIdx.u32SegIdx = 2; 55 56 s32Ret = SAMPLE_SVP_NNIE_ForwardWithBbox( 57 pstParam, 58 &stInputDataIdx, 59 &pstSwParam->stRpnBbox, 60 &stProcSegIdx, 61 HI_TRUE);
下面给出函数具体实现:
1 /****************************************************************************** 2 * function : NNIE ForwardWithBbox 3 ******************************************************************************/ 4 static HI_S32 SAMPLE_SVP_NNIE_ForwardWithBbox( 5 SAMPLE_SVP_NNIE_PARAM_S *pstNnieParam, 6 SAMPLE_SVP_NNIE_INPUT_DATA_INDEX_S* pstInputDataIdx, 7 SVP_SRC_BLOB_S astBbox[], 8 SAMPLE_SVP_NNIE_PROCESS_SEG_INDEX_S* pstProcSegIdx, 9 HI_BOOL bInstant) 10 { 11 ...... 12 13 SAMPLE_COMM_SVP_FlushCache(pstNnieParam->astForwardWithBboxCtrl[pstProcSegIdx->u32SegIdx].stTskBuf.u64PhyAddr, 14 (HI_VOID *) pstNnieParam->astForwardWithBboxCtrl[pstProcSegIdx->u32SegIdx].stTskBuf.u64VirAddr, 15 pstNnieParam->astForwardWithBboxCtrl[pstProcSegIdx->u32SegIdx].stTskBuf.u32Size); 16 17 /*set input blob according to node name*/ 18 if(pstInputDataIdx->u32SegIdx != pstProcSegIdx->u32SegIdx) 19 { 20 for(i = 0; i < pstNnieParam->pstModel->astSeg[pstProcSegIdx->u32SegIdx].u16SrcNum; i++) 21 { 22 for(j = 0; j < pstNnieParam->pstModel->astSeg[pstInputDataIdx->u32SegIdx].u16DstNum; j++) 23 { 24 ...... 25 } 26 ...... 27 } 28 } 29 /*NNIE_ForwardWithBbox*/ 30 s32Ret = HI_MPI_SVP_NNIE_ForwardWithBbox( 31 &hSvpNnieHandle, 32 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astSrc, 33 astBbox, 34 pstNnieParam->pstModel, //网络类型只支持ROI/PSROI 35 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst, 36 &pstNnieParam->astForwardWithBboxCtrl[pstProcSegIdx->u32SegIdx], 37 bInstant); 38 39 ...... 40 41 if(bInstant) 42 { 43 /*Wait NNIE finish*/ 44 while(HI_ERR_SVP_NNIE_QUERY_TIMEOUT == (s32Ret = HI_MPI_SVP_NNIE_Query(pstNnieParam->astForwardWithBboxCtrl[pstProcSegIdx->u32SegIdx].enNnieId, 45 hSvpNnieHandle, &bFinish, HI_TRUE))) 46 { 47 ...... 48 } 49 } 50 51 bFinish = HI_FALSE; 52 53 54 for(i = 0; i < pstNnieParam->astForwardWithBboxCtrl[pstProcSegIdx->u32SegIdx].u32DstNum; i++) 55 { 56 if(SVP_BLOB_TYPE_SEQ_S32 == pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].enType) 57 { 58 ...... 59 60 SAMPLE_COMM_SVP_FlushCache(pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u64PhyAddr, 61 (HI_VOID *) pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u64VirAddr, 62 u32TotalStepNum*pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u32Stride); 63 } 64 else 65 { 66 SAMPLE_COMM_SVP_FlushCache(pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u64PhyAddr, 67 (HI_VOID *) pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u64VirAddr, 68 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u32Num* 69 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].unShape.stWhc.u32Chn* 70 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].unShape.stWhc.u32Height* 71 pstNnieParam->astSegData[pstProcSegIdx->u32SegIdx].astDst[i].u32Stride); 72 } 73 } 74 75 return s32Ret; 76 }
下一个子函数是获得NNIE Rfcn的结果,前提需要保证网络结构和输入数据保持一致,函数调用如下:
1 其中在SAMPLE_SVP_NNIE_Rfcn_ViToVo函数中: 2 SAMPLE_SVP_NNIE_PARAM_S *pstParam; //此值传给SAMPLE_SVP_NNIE_Rfcn_Proc 3 pstParam = &s_stRfcnNnieParam; //此值传给SAMPLE_SVP_NNIE_Rfcn_Proc 4 5 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S *pstSwParam; 6 pstSwParam = &s_stRfcnSoftwareParam; 7 8 static HI_S32 SAMPLE_SVP_NNIE_Rfcn_Proc( 9 SAMPLE_SVP_NNIE_PARAM_S *pstParam, /*这个参数传进来之后经过了SP420赋值然后送入 10 SAMPLE_SVP_NNIE_Forward*/ 11 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S *pstSwParam, /*直接传给SAMPLE_SVP_NNIE_Rfcn_Rpn函数 12 随后传给SAMPLE_SVP_NNIE_ForwardWithBbox函数*/ 13 VIDEO_FRAME_INFO_S* pstExtFrmInfo, 14 HI_U32 u32BaseWidth,HI_U32 u32BaseHeight) 15 16 pstParam->astSegData[stInputDataIdx.u32SegIdx].astSrc[stInputDataIdx.u32NodeIdx].u64VirAddr = pstExtFrmInfo->stVFrame.u64VirAddr[0]; 17 pstParam->astSegData[stInputDataIdx.u32SegIdx].astSrc[stInputDataIdx.u32NodeIdx].u64PhyAddr = pstExtFrmInfo->stVFrame.u64PhyAddr[0]; 18 pstParam->astSegData[stInputDataIdx.u32SegIdx].astSrc[stInputDataIdx.u32NodeIdx].u32Stride = pstExtFrmInfo->stVFrame.u32Stride[0]; 19 20 21 s32Ret = SAMPLE_SVP_NNIE_Forward(pstParam, //参数调用 22 &stInputDataIdx, 23 &stProcSegIdx, 24 HI_TRUE); 25 26 /*RPN*/ 27 s32Ret = SAMPLE_SVP_NNIE_Rfcn_Rpn(pstParam, //参数调用 28 pstSwParam); //参数调用 29 30 31 s32Ret = SAMPLE_SVP_NNIE_ForwardWithBbox( 32 pstParam, //参数调用 33 &stInputDataIdx, 34 &pstSwParam->stRpnBbox, 35 &stProcSegIdx, 36 HI_TRUE); 37 38 39 40 //函数调用 41 42 /*GetResult*/ 43 /*if user has changed net struct, please make sure SAMPLE_SVP_NNIE_Rfcn_GetResult 44 function's input datas are correct*/ 45 s32Ret = SAMPLE_SVP_NNIE_Rfcn_GetResult(pstParam, 46 pstSwParam); 47 48 //函数调用 49 HI_S32 SAMPLE_SVP_NNIE_Rfcn_GetResult(SAMPLE_SVP_NNIE_PARAM_S*pstNnieParam, 50 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S* pstSoftwareParam)
此函数作用是为了获取NNIE RFCN的结果,其核心是实现了SVP_NNIE_Rfcn_GetResult函数,具体如下:
1 HI_S32 SAMPLE_SVP_NNIE_Rfcn_GetResult(SAMPLE_SVP_NNIE_PARAM_S*pstNnieParam, 2 SAMPLE_SVP_NNIE_RFCN_SOFTWARE_PARAM_S* pstSoftwareParam) 3 { 4 HI_S32 s32Ret = HI_SUCCESS; 5 HI_U32 i = 0; 6 HI_S32* ps32Proposal = (HI_S32*)pstSoftwareParam->stRpnBbox.u64VirAddr; 7 8 ...... 9 10 for(i = 0; i < pstSoftwareParam->stRpnBbox.unShape.stWhc.u32Height; i++) 11 { 12 *(ps32Proposal+SAMPLE_SVP_NNIE_COORDI_NUM*i) /= SAMPLE_SVP_NNIE_QUANT_BASE; 13 *(ps32Proposal+SAMPLE_SVP_NNIE_COORDI_NUM*i+1) /= SAMPLE_SVP_NNIE_QUANT_BASE; 14 *(ps32Proposal+SAMPLE_SVP_NNIE_COORDI_NUM*i+2) /= SAMPLE_SVP_NNIE_QUANT_BASE; 15 *(ps32Proposal+SAMPLE_SVP_NNIE_COORDI_NUM*i+3) /= SAMPLE_SVP_NNIE_QUANT_BASE; 16 } 17 //this function is used to get RFCN result 18 s32Ret = SVP_NNIE_Rfcn_GetResult( 19 (HI_S32*)pstNnieParam->astSegData[1].astDst[0].u64VirAddr, 20 pstNnieParam->astSegData[1].astDst[0].u32Stride, 21 (HI_S32*)pstNnieParam->astSegData[2].astDst[0].u64VirAddr, 22 pstNnieParam->astSegData[2].astDst[0].u32Stride, 23 (HI_S32*)pstSoftwareParam->stRpnBbox.u64VirAddr, 24 pstSoftwareParam->stRpnBbox.unShape.stWhc.u32Height, 25 pstSoftwareParam->au32ConfThresh,pstSoftwareParam->u32MaxRoiNum, 26 pstSoftwareParam->u32ClassNum,pstSoftwareParam->u32OriImWidth, 27 pstSoftwareParam->u32OriImHeight,pstSoftwareParam->u32ValidNmsThresh, 28 (HI_U32*)pstSoftwareParam->stGetResultTmpBuf.u64VirAddr, 29 (HI_S32*)pstSoftwareParam->stDstScore.u64VirAddr, 30 (HI_S32*)pstSoftwareParam->stDstRoi.u64VirAddr, 31 (HI_S32*)pstSoftwareParam->stClassRoiNum.u64VirAddr); 32 ...... 33 return s32Ret; 34 }
SAMPLE_SVP_NNIE_Rfcn_Proc函数最后一个子函数为对ROI(感兴趣区域画框),RFNC对21类物体进行目标识别,具体调用及实现如下:
1 /*draw result, this sample has 21 classes: 2 class 0:background class 1:plane class 2:bicycle 3 class 3:bird class 4:boat class 5:bottle 4 class 6:bus class 7:car class 8:cat 5 class 9:chair class10:cow class11:diningtable 6 class 12:dog class13:horse class14:motorbike 7 class 15:person class16:pottedplant class17:sheep 8 class 18:sofa class19:train class20:tvmonitor*/ 9 s32Ret = SAMPLE_SVP_NNIE_RoiToRect( 10 &(pstSwParam->stDstScore), 11 &(pstSwParam->stDstRoi), 12 &(pstSwParam->stClassRoiNum), 13 pstSwParam->af32ScoreThr, 14 HI_TRUE, 15 &(pstSwParam->stRect), 16 pstExtFrmInfo->stVFrame.u32Width, 17 pstExtFrmInfo->stVFrame.u32Height, 18 u32BaseWidth, 19 u32BaseHeight); 20 21 /****************************************************************************** 22 * function : roi to rect 23 ******************************************************************************/ 24 HI_S32 SAMPLE_SVP_NNIE_RoiToRect(SVP_BLOB_S *pstDstScore, 25 SVP_BLOB_S *pstDstRoi, SVP_BLOB_S *pstClassRoiNum, HI_FLOAT *paf32ScoreThr, 26 HI_BOOL bRmBg,SAMPLE_SVP_NNIE_RECT_ARRAY_S *pstRect, 27 HI_U32 u32SrcWidth, HI_U32 u32SrcHeight,HI_U32 u32DstWidth,HI_U32 u32DstHeight) 28 { 29 HI_U32 i = 0, j = 0; 30 HI_U32 u32RoiNumBias = 0; 31 HI_U32 u32ScoreBias = 0; 32 HI_U32 u32BboxBias = 0; 33 HI_FLOAT f32Score = 0.0f; 34 HI_S32* ps32Score = (HI_S32*)pstDstScore->u64VirAddr; 35 HI_S32* ps32Roi = (HI_S32*)pstDstRoi->u64VirAddr; 36 HI_S32* ps32ClassRoiNum = (HI_S32*)pstClassRoiNum->u64VirAddr; 37 HI_U32 u32ClassNum = pstClassRoiNum->unShape.stWhc.u32Width; 38 HI_U32 u32RoiNumTmp = 0; 39 40 ....... 41 42 pstRect->u32TotalNum = 0; 43 pstRect->u32ClsNum = u32ClassNum; 44 if (bRmBg) 45 { 46 pstRect->au32RoiNum[0] = 0; 47 u32RoiNumBias += ps32ClassRoiNum[0]; 48 for (i = 1; i < u32ClassNum; i++) 49 { 50 u32ScoreBias = u32RoiNumBias; 51 u32BboxBias = u32RoiNumBias * SAMPLE_SVP_NNIE_COORDI_NUM; 52 u32RoiNumTmp = 0; 53 /*if the confidence score greater than result thresh, the result will be drawed*/ 54 if(((HI_FLOAT)ps32Score[u32ScoreBias] / SAMPLE_SVP_NNIE_QUANT_BASE >= 55 paf32ScoreThr[i]) && (ps32ClassRoiNum[i] != 0)) 56 { 57 for (j = 0; j < (HI_U32)ps32ClassRoiNum[i]; j++) 58 { 59 /*Score is descend order*/ 60 f32Score = (HI_FLOAT)ps32Score[u32ScoreBias + j] / SAMPLE_SVP_NNIE_QUANT_BASE; 61 if ((f32Score < paf32ScoreThr[i]) || (u32RoiNumTmp >= SAMPLE_SVP_NNIE_MAX_ROI_NUM_OF_CLASS)) 62 { 63 break; 64 } 65 66 pstRect->astRect[i][u32RoiNumTmp].astPoint[0].s32X = (HI_U32)((HI_FLOAT)ps32Roi[u32BboxBias + j*SAMPLE_SVP_NNIE_COORDI_NUM] / (HI_FLOAT)u32SrcWidth * (HI_FLOAT)u32DstWidth) & (~1) ; 67 pstRect->astRect[i][u32RoiNumTmp].astPoint[0].s32Y = (HI_U32)((HI_FLOAT)ps32Roi[u32BboxBias + j*SAMPLE_SVP_NNIE_COORDI_NUM + 1] / (HI_FLOAT)u32SrcHeight * (HI_FLOAT)u32DstHeight) & (~1); 68 69 pstRect->astRect[i][u32RoiNumTmp].astPoint[1].s32X = (HI_U32)((HI_FLOAT)ps32Roi[u32BboxBias + j*SAMPLE_SVP_NNIE_COORDI_NUM + 2]/ (HI_FLOAT)u32SrcWidth * (HI_FLOAT)u32DstWidth) & (~1); 70 pstRect->astRect[i][u32RoiNumTmp].astPoint[1].s32Y = pstRect->astRect[i][u32RoiNumTmp].astPoint[0].s32Y; 71 72 pstRect->astRect[i][u32RoiNumTmp].astPoint[2].s32X = pstRect->astRect[i][u32RoiNumTmp].astPoint[1].s32X; 73 pstRect->astRect[i][u32RoiNumTmp].astPoint[2].s32Y = (HI_U32)((HI_FLOAT)ps32Roi[u32BboxBias + j*SAMPLE_SVP_NNIE_COORDI_NUM + 3] / (HI_FLOAT)u32SrcHeight * (HI_FLOAT)u32DstHeight) & (~1); 74 75 pstRect->astRect[i][u32RoiNumTmp].astPoint[3].s32X = pstRect->astRect[i][u32RoiNumTmp].astPoint[0].s32X; 76 pstRect->astRect[i][u32RoiNumTmp].astPoint[3].s32Y = pstRect->astRect[i][u32RoiNumTmp].astPoint[2].s32Y; 77 78 u32RoiNumTmp++; 79 } 80 81 } 82 83 pstRect->au32RoiNum[i] = u32RoiNumTmp; 84 pstRect->u32TotalNum += u32RoiNumTmp; 85 u32RoiNumBias += ps32ClassRoiNum[i]; 86 } 87 88 } 89 return HI_SUCCESS; 90 }
SAMPLE_COMM_SVP_NNIE_FillRect主要是配合VGS实现画框功能,具体调用和实现如下,函数功能已由注释给出:
1 //Draw rect 2 s32Ret = SAMPLE_COMM_SVP_NNIE_FillRect( 3 &stBaseFrmInfo, 4 &(pstSwParam->stRect), 5 0x0000FF00); //绿色 6 7 HI_S32 SAMPLE_COMM_SVP_NNIE_FillRect( 8 VIDEO_FRAME_INFO_S *pstFrmInfo, 9 SAMPLE_SVP_NNIE_RECT_ARRAY_S* pstRect, 10 HI_U32 u32Color) 11 { 12 VGS_HANDLE VgsHandle = -1; 13 HI_S32 s32Ret = HI_SUCCESS; 14 HI_U32 i,j; 15 VGS_TASK_ATTR_S stVgsTask;//定义 VGS task 的属性 16 VGS_ADD_COVER_S stVgsAddCover;//定义 VGS 上 COVER 的配置 17 static HI_U32 u32Frm = 0; 18 u32Frm++; 19 if (0 == pstRect->u32TotalNum) 20 { 21 return s32Ret; 22 } 23 s32Ret = HI_MPI_VGS_BeginJob(&VgsHandle); //启动一个 job。 24 if (s32Ret != HI_SUCCESS) 25 { 26 ...... 27 return s32Ret; 28 } 29 30 memcpy(&stVgsTask.stImgIn, pstFrmInfo, sizeof(VIDEO_FRAME_INFO_S)); 31 memcpy(&stVgsTask.stImgOut, pstFrmInfo, sizeof(VIDEO_FRAME_INFO_S)); 32 33 stVgsAddCover.enCoverType = COVER_QUAD_RANGLE;//任意四边形COVER 34 stVgsAddCover.u32Color = u32Color; //RGB888 35 stVgsAddCover.stQuadRangle.bSolid = HI_FALSE; //空心 COVER 36 stVgsAddCover.stQuadRangle.u32Thick = 2; //2 像素对齐 37 38 for (i = 0; i < pstRect->u32ClsNum; i++) 39 { 40 for (j = 0; j < pstRect->au32RoiNum[i]; j++) 41 { 42 memcpy(stVgsAddCover.stQuadRangle.stPoint, pstRect->astRect[i][j].astPoint, sizeof(pstRect->astRect[i][j].astPoint)); 43 44 //做 COVER 任务时,输入输出图像为同一块 buffer 45 //往一个已经启动的 job 里添加打 COVER task。 task属性必须满足VGS的能力。 46 s32Ret = HI_MPI_VGS_AddCoverTask(VgsHandle, &stVgsTask, &stVgsAddCover); 47 if (s32Ret != HI_SUCCESS) 48 { 49 SAMPLE_PRT("HI_MPI_VGS_AddCoverTask fail,Error(%#x)\n", s32Ret); 50 HI_MPI_VGS_CancelJob(VgsHandle); 51 return s32Ret; 52 } 53 54 } 55 56 } 57 //提交一个 job。 58 s32Ret = HI_MPI_VGS_EndJob(VgsHandle); 59 if (s32Ret != HI_SUCCESS) 60 { 61 SAMPLE_PRT("HI_MPI_VGS_EndJob fail,Error(%#x)\n", s32Ret); 62 HI_MPI_VGS_CancelJob(VgsHandle); 63 return s32Ret; 64 } 65 66 return s32Ret;
SAMPLE_SVP_NNIE_Rfcn_ViToVo线程函数执行的最后一个函数HI_MPI_VO_SendFrame,函数作用是将视频图像送入指定输出通道显示,具体调用如下:
1 s32Ret = HI_MPI_VO_SendFrame(voLayer, 2 voChn, 3 &stBaseFrmInfo, 4 s32MilliSec);
Hi3559AV100 NNIE开发(6)RFCN中NNIE实现关键线程函数-&gt;SAMPLE_SVP_NNIE_Rfcn_ViToVo()进行数据流分析
(7)Ruyistudio 输出mobileface_func.wk与板载运行mobileface_chip.wk输出中间层数据对比
前面随笔讲了关于NNIE的整个开发流程,并给出了Hi3559AV100 NNIE开发(5)mobilefacenet.wk仿真成功量化及与CNN_convert_bin_and_print_featuremap.py输出中间层数据对比过程:https://www.cnblogs.com/iFrank/p/14528882.html,下文是Hi3559AV100 NNIE开发(7) Ruyistudio 输出mobileface_func.wk与板载运行mobileface_chip.wk输出中间层数据对比,通过在Hi3559AV100平台上跑mobileface NNIE,来验证mobileface.caffemodel模型的正确性,便于后续MPP开发。
操作系统:Windows 10
仿真工具: Ruyi Studio 2.0.28
开发平台: Hi3559AV100(SDK020)
网络模型: Mobilefacenet(CNN)
框架:Caffe1.0
目前 NNIE 配套软件及工具链仅支持以 Caffe 框架,使用其他框架的网络模型需要转化为 Caffe 框架下的模型,而且目前NNIE 工具链目前只支持 Caffe 框架,且以 Caffe1.0 版本为基础。
以 Caffe 框架上训练的模型为例,NNIE 的开发流程如图所示。在 Caffe 上训练、使用 NNIE 的 mapper 工具转化都是离线的。通过设置不同的模式,mapper 将*.caffemodel 转化成在仿真器、仿真库或板端上可加载执行的数据指令文件。一般在开发前期,用户可使用仿真器对训练出来的模型进行精度、性能、带宽进行初步评估,符合用户预期后再使用仿真库进行完整功能的仿真,最后将程序移植到板端。
在测试前先给出NNIE一般量化流程,并给出我的测试结果:
(1)需要把其他非caffemodel模型对应转换到caffemodel模型,因为Hi35xx系列NNIE只支持caffemodel模型;
(2)配置仿真量化参数(即配置mobilefacenet.cfg)进行PC仿真量化,获得中间层输出结果A(mapper_quant目录下);
(3)使用RuyiStudio提供的python中间层输出工具,获得中间层输出结果B(data/ouput目录下);
(4)使用Ruyi Studio的向量对比工具Vector Comparison对A和B进行对比,观察误差,使误差控制在一定范围(利用CosineSimilarity参数),目前我测试的正确率为0.9914857;
(5)配置板载chip运行量化参数生成mobilefacenet.wk文件,上板运行获得输出结果C;
(6)对比结果A和C,使仿真与板载误差控制在可接受范围内,目前我测试的正确率为0.9946077;
(7)之后进行MPP开发,关键在于对NNIE Blob数据的处理;
因为Mobileface.wk在板载运行时,输入的为.bgr格式图形,之前写的随笔VS2015上OpenCV-2.4.13安装与Hi35xx .jpg/.bmp格式转.bgr格式开发 :https://www.cnblogs.com/iFrank/p/14552094.html,也已经给出实现过程,现再次给出基于OpenCV实现的.jpg转.bgr的实现代码:
#include2 #include 3 #include 4 #include 5 #include 67 #include"opencv2/opencv.hpp"8 #include"opencv2/highgui/highgui.hpp"9 #include"opencv2/imgproc/imgproc.hpp"1011usingnamespacecv;1213 typedef unsigned charU_CHAR;1415intmain()16{17constchar *filename = "C:/Users/PC/Desktop/jpg_bgr/10_MariaCallas_28_f.jpg";18char *outname = "C:/Users/PC/Desktop/jpg_bgr/10_MariaCallas_28_f.bgr";19int flag = 1;2021 cv::Mat img =cv::imread(filename);22if(!img.data)23{24printf("read image error\n");25return-1;26}2728//缩放29 resize(img, img, Size(112,112));//224x22430//imshow("img",img);31//waitKey(0);3233 U_CHAR *data = (U_CHAR*)img.data;34int step =img.step;35printf("Step: %d, height: %d, width: %d\n",36 step, img.rows, img.cols);3738 FILE *fp = fopen(outname, "wb");39int h =img.rows;40int w =img.cols;41int c =img.channels();4243for(int k = 0; k 4、RuyiStudio mobileface_func.wk仿真输出与板载mobileface_chip.wk输出比较过程
4.1、代码修改(其他NNIE参数配置根据自己板子set)
首先修改代码,主要是修改SAMPLE_SVP_NNIE_Cnn函数,具体如下(之后在加入结果输出就行):
其他参数配置根据自己的需要和板载进行调整,这里就不多说了,之后在板载终端运行之后,会产生output.hex文件,这个就是后面用来比较的数据。里面都是16进制数据,用来表示浮点数:
4.2、mobileface_chip.wk输出比较
RuyiStudio软件的Vector Comparsion设置及比较结果如下,并结合了.prototxt Graph View的图解。在 Compare 完后,双击选择需要查看的行,会弹出详细对比界面,从结果可以看出,数据数据比较精度为0.99460775,达到预期效果。
Hi3559AV100 NNIE开发(7) Ruyistudio 输出mobileface_func.wk与板载运行mobileface_chip.wk输出中间层数据对比 - 爱码网