参考github的samples,写下加深理解
Faster RCNN是一个二阶段模型,部署起来比一阶段网络麻烦一些。在本示例中,使用了TensorRT的一个叫做RPROI_TRT的plugin library,它融合了RPN和ROIPooling。
这部分比较重要,首先介绍:
layer {
name: "RPROIFused"
type: "RPROI"
bottom: 'rpn_cls_prob_reshape'
bottom: 'rpn_bbox_pred'
bottom: 'conv5_3'
bottom: 'im_info'
top: 'rois'
top: 'pool5'
region_proposal_param {
feature_stride: 16
prenms_top: 6000
nms_max_out: 300
anchor_ratio_count: 3
anchor_scale_count: 3
iou_threshold: 0.7
min_box_size: 16
anchor_ratio: 0.5
anchor_ratio: 1.0
anchor_ratio: 2.0
anchor_scale: 8.0
anchor_scale: 16.0
anchor_scale: 32.0
}
roi_pooling_param {
pooled_h: 7
pooled_w: 7
spatial_scale: 0.0625
}
}
比较重要的参数:
rois:这里的rois其实就是模型最后输出的rois
nms_max_out: nms之后保留的bbox的最大数量,需要跟程序中的
params.nmsMaxOut设置相同
输入图像尺寸:375x500x3
注意:输出的channel需要和设计网络时的通道顺序一样,这里就按照论文里设定的是BGR了。
//注意输入shape为N,C,H,W
float* data = new float[N*INPUT_C*INPUT_H*INPUT_W];
//减均值
float pixelMean[3]{ 102.9801f, 115.9465f, 122.7717f }; // also in BGR order
for (int i = 0, volImg = INPUT_C*INPUT_H*INPUT_W; i < N; ++i)
{
//注意数据在内存中的组织方式
for (int c = 0; c < INPUT_C; ++c)
{
for (unsigned j = 0, volChl = INPUT_H*INPUT_W; j < volChl; ++j)
{
data[i*volImg + c*volChl + j] = float(ppms[i].buffer[j*INPUT_C + 2 - c]) - pixelMean[c];
}
}
}
没啥可说的,直接看github吧
强调一下输入和输出参数:
输入:
输出:
没啥可说的,直接看github吧
对output的处理
bool SampleFasterRCNN::verifyOutput(const samplesCommon::BufferManager& buffers)
{
const int batchSize = mParams.batchSize;
const int nmsMaxOut = mParams.nmsMaxOut;
const int outputClsSize = mParams.outputClsSize; // 21
const int outputBBoxSize = mParams.outputClsSize * 4; // 21 x 4
const float* imInfo = static_cast(buffers.getHostBuffer("im_info"));
const float* deltas = static_cast(buffers.getHostBuffer("bbox_pred"));
const float* clsProbs = static_cast(buffers.getHostBuffer("cls_prob"));
float* rois = static_cast(buffers.getHostBuffer("rois"));
// Unscale back to raw image space
for (int i = 0; i < batchSize; ++i)
{
for (int j = 0; j < nmsMaxOut * 4 && imInfo[i * 3 + 2] != 1; ++j)
{
rois[i * nmsMaxOut * 4 + j] /= imInfo[i * 3 + 2];
}
}
std::vector predBBoxes(batchSize * nmsMaxOut * outputBBoxSize, 0);
//限定bbox在图像内部,并将bbox的表示方式转变为左上角和右下角表示
bboxTransformInvAndClip(rois, deltas, predBBoxes.data(), imInfo, batchSize, nmsMaxOut, outputClsSize);
const float nmsThreshold = 0.3f;
const float score_threshold = 0.8f;
const std::vector classes{"background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car",
"cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa",
"train", "tvmonitor"};
// The sample passes if there is at least one detection for each item in the batch
bool pass = true;
// 一般来说, batchSize=1
for (int i = 0; i < batchSize; ++i)
{
// 一共有nmsMaxOut这么多个输出bbox,每个bbox可能有outputClsSize类别,每个类别都有自己相对应的四个坐标参数,
// 所以PredBBoxes的size为:nmsMaxOut * 21 *4, 每个bbox占 21 × 4 这么大的内存
// 相应的,对应clsProbs的输出个数是: 每个bbox都对应 21 这么大的内存.
float* bbox = predBBoxes.data() + i * nmsMaxOut * outputBBoxSize;
const float* scores = clsProbs + i * nmsMaxOut * outputClsSize;
int numDetections = 0;
// 遍历每个类别,找到每个类别在内存中偏移
for (int c = 1; c < outputClsSize; ++c) // Skip the background
{
std::vector> scoreIndex;
// 遍历该类别对应的bbox
for (int r = 0; r < nmsMaxOut; ++r)
{
// 保留大于阈值的框,并排序
if (scores[r * outputClsSize + c] > score_threshold)
{
scoreIndex.push_back(std::make_pair(scores[r * outputClsSize + c], r));
std::stable_sort(scoreIndex.begin(), scoreIndex.end(),
[](const std::pair& pair1, const std::pair& pair2) {
return pair1.first > pair2.first;
});
}
}
// Apply NMS algorithm
// 再使用一遍NMS,很多人不明白为啥两次NMS,这里解释一下
// 第一遍NMS是在RPN里,为了合并bbox,然后进行roi pooling,输入到二阶段网络中去
// 第二遍NMS是用在inference的最后,因为二阶段又会调整bbox, 所以有些bbox的重叠度更高了,需要去除
const std::vector indices = nonMaximumSuppression(scoreIndex, bbox, c, outputClsSize, nmsThreshold);
numDetections += static_cast(indices.size());
// Show results
for (unsigned k = 0; k < indices.size(); ++k)
{
const int idx = indices[k];
const std::string storeName
= classes[c] + "-" + std::to_string(scores[idx * outputClsSize + c]) + ".ppm";
gLogInfo << "Detected " << classes[c] << " in " << mPPMs[i].fileName << " with confidence "
<< scores[idx * outputClsSize + c] * 100.0f << "% "
<< " (Result stored in " << storeName << ")." << std::endl;
const samplesCommon::BBox b{bbox[idx * outputBBoxSize + c * 4], bbox[idx * outputBBoxSize + c * 4 + 1],
bbox[idx * outputBBoxSize + c * 4 + 2], bbox[idx * outputBBoxSize + c * 4 + 3]};
writePPMFileWithBBox(storeName, mPPMs[i], b);
}
}
pass &= numDetections >= 1;
}
return pass;
}
输出的格式由caffe定义的模型结构决定,实际怎么确定还得单独分析