【TensorRT】Faster RCNN部署

参考github的samples,写下加深理解

Faster RCNN是一个二阶段模型,部署起来比一阶段网络麻烦一些。在本示例中,使用了TensorRT的一个叫做RPROI_TRT的plugin library,它融合了RPN和ROIPooling。

这部分比较重要,首先介绍:

layer {
  name: "RPROIFused"
  type: "RPROI"
  bottom: 'rpn_cls_prob_reshape'
  bottom: 'rpn_bbox_pred'
  bottom: 'conv5_3'
  bottom: 'im_info'
  top: 'rois'
  top: 'pool5'
  region_proposal_param {
    feature_stride: 16
    prenms_top: 6000
    nms_max_out: 300
    anchor_ratio_count: 3
    anchor_scale_count: 3
    iou_threshold: 0.7
    min_box_size: 16
    anchor_ratio: 0.5
    anchor_ratio: 1.0
    anchor_ratio: 2.0
    anchor_scale: 8.0
    anchor_scale: 16.0
    anchor_scale: 32.0
  }
  roi_pooling_param {
    pooled_h: 7
    pooled_w: 7
    spatial_scale: 0.0625
  }
}

 比较重要的参数:

rois:这里的rois其实就是模型最后输出的rois

nms_max_out: nms之后保留的bbox的最大数量,需要跟程序中的

params.nmsMaxOut设置相同

 

 

1、Preprocessing the input

输入图像尺寸:375x500x3

注意:输出的channel需要和设计网络时的通道顺序一样,这里就按照论文里设定的是BGR了。

//注意输入shape为N,C,H,W
float* data = new float[N*INPUT_C*INPUT_H*INPUT_W];
//减均值
float pixelMean[3]{ 102.9801f, 115.9465f, 122.7717f }; // also in BGR order
for (int i = 0, volImg = INPUT_C*INPUT_H*INPUT_W; i < N; ++i)
{
    //注意数据在内存中的组织方式
	for (int c = 0; c < INPUT_C; ++c)
	{
		for (unsigned j = 0, volChl = INPUT_H*INPUT_W; j < volChl; ++j)
        {
            data[i*volImg + c*volChl + j] =  float(ppms[i].buffer[j*INPUT_C + 2 - c]) - pixelMean[c];
        }
	}
}

2、Defining the network

没啥可说的,直接看github吧

3、Building the engine

强调一下输入和输出参数:

输入:

  • data 就是我们预处理里那个,保存着图像数据
  • imInfo 包含着图像信息,{rows, columns, the scale for each image in a batch}

输出:

  • bbox_pred 是预测的offset,按顺序分别是height,width,center_x,center_y的offset
  • cls_prob 是各个bbox的类别预测值
  • rois 是各个bbox的height,width,center_x,center_y
  • count 不推荐使用,可以忽略,以后的版本会去掉这个输出

4、Running the engine

没啥可说的,直接看github吧

5、Verifying the output

对output的处理

bool SampleFasterRCNN::verifyOutput(const samplesCommon::BufferManager& buffers)
{
    const int batchSize = mParams.batchSize;
    const int nmsMaxOut = mParams.nmsMaxOut;
    const int outputClsSize = mParams.outputClsSize;   // 21
    const int outputBBoxSize = mParams.outputClsSize * 4; // 21 x 4

    const float* imInfo = static_cast(buffers.getHostBuffer("im_info"));
    const float* deltas = static_cast(buffers.getHostBuffer("bbox_pred"));
    const float* clsProbs = static_cast(buffers.getHostBuffer("cls_prob"));
    float* rois = static_cast(buffers.getHostBuffer("rois"));

    // Unscale back to raw image space
    for (int i = 0; i < batchSize; ++i)
    {
        for (int j = 0; j < nmsMaxOut * 4 && imInfo[i * 3 + 2] != 1; ++j)
        {
            rois[i * nmsMaxOut * 4 + j] /= imInfo[i * 3 + 2];
        }
    }

    std::vector predBBoxes(batchSize * nmsMaxOut * outputBBoxSize, 0);

    //限定bbox在图像内部,并将bbox的表示方式转变为左上角和右下角表示
    bboxTransformInvAndClip(rois, deltas, predBBoxes.data(), imInfo, batchSize, nmsMaxOut, outputClsSize);

    const float nmsThreshold = 0.3f;
    const float score_threshold = 0.8f;
    const std::vector classes{"background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car",
        "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa",
        "train", "tvmonitor"};

    // The sample passes if there is at least one detection for each item in the batch
    bool pass = true;

    // 一般来说, batchSize=1
    for (int i = 0; i < batchSize; ++i)
    {
        // 一共有nmsMaxOut这么多个输出bbox,每个bbox可能有outputClsSize类别,每个类别都有自己相对应的四个坐标参数, 
        // 所以PredBBoxes的size为:nmsMaxOut * 21 *4, 每个bbox占 21 × 4 这么大的内存
        // 相应的,对应clsProbs的输出个数是: 每个bbox都对应 21 这么大的内存.
        float* bbox = predBBoxes.data() + i * nmsMaxOut * outputBBoxSize;
        
        const float* scores = clsProbs + i * nmsMaxOut * outputClsSize;
        int numDetections = 0;
        // 遍历每个类别,找到每个类别在内存中偏移
        for (int c = 1; c < outputClsSize; ++c) // Skip the background
        {
            std::vector> scoreIndex;
            // 遍历该类别对应的bbox
            for (int r = 0; r < nmsMaxOut; ++r)
            {
                // 保留大于阈值的框,并排序
                if (scores[r * outputClsSize + c] > score_threshold)
                {
                    scoreIndex.push_back(std::make_pair(scores[r * outputClsSize + c], r));
                    std::stable_sort(scoreIndex.begin(), scoreIndex.end(),
                        [](const std::pair& pair1, const std::pair& pair2) {
                            return pair1.first > pair2.first;
                        });
                }
            }

            // Apply NMS algorithm
            
            // 再使用一遍NMS,很多人不明白为啥两次NMS,这里解释一下
            // 第一遍NMS是在RPN里,为了合并bbox,然后进行roi pooling,输入到二阶段网络中去
            // 第二遍NMS是用在inference的最后,因为二阶段又会调整bbox, 所以有些bbox的重叠度更高了,需要去除
            const std::vector indices = nonMaximumSuppression(scoreIndex, bbox, c, outputClsSize, nmsThreshold);

            numDetections += static_cast(indices.size());

            // Show results
            for (unsigned k = 0; k < indices.size(); ++k)
            {
                const int idx = indices[k];
                const std::string storeName
                    = classes[c] + "-" + std::to_string(scores[idx * outputClsSize + c]) + ".ppm";
                gLogInfo << "Detected " << classes[c] << " in " << mPPMs[i].fileName << " with confidence "
                         << scores[idx * outputClsSize + c] * 100.0f << "% "
                         << " (Result stored in " << storeName << ")." << std::endl;

                const samplesCommon::BBox b{bbox[idx * outputBBoxSize + c * 4], bbox[idx * outputBBoxSize + c * 4 + 1],
                    bbox[idx * outputBBoxSize + c * 4 + 2], bbox[idx * outputBBoxSize + c * 4 + 3]};
                writePPMFileWithBBox(storeName, mPPMs[i], b);
            }
        }
        pass &= numDetections >= 1;
    }
    return pass;
}

输出的格式由caffe定义的模型结构决定,实际怎么确定还得单独分析 

 

你可能感兴趣的:(深度学习,tensorrt)