OpenCV 深度学习框架的使用
flyfish
环境说明:
操作系统:Ubuntu 18.04
OpenCV版本:4.0.1
opencv_contrib版本:4.0.1
这个版本只提供模型使用,不提供训练
一、OpenCV源码编译
预备工作
先做移除工作,再安装
sudo apt-get remove x264 libx264-dev
根据自己的情况选择安装如下包
sudo apt-get install build-essential checkinstall cmake pkg-config yasm
sudo apt-get install git gfortran
sudo apt-get install libjpeg8-dev libjasper-dev libpng12-dev
sudo apt-get install libtiff5-dev
sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev
sudo apt-get install libxine2-dev libv4l-dev
sudo apt-get install libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev
sudo apt-get install qt5-default libgtk2.0-dev libtbb-dev
sudo apt-get install libatlas-base-dev
sudo apt-get install libfaac-dev libmp3lame-dev libtheora-dev
sudo apt-get install libvorbis-dev libxvidcore-dev
sudo apt-get install libopencore-amrnb-dev libopencore-amrwb-dev
sudo apt-get install x264 v4l-utils
可选安装依赖
sudo apt-get install libprotobuf-dev protobuf-compiler
sudo apt-get install libgoogle-glog-dev libgflags-dev
sudo apt-get install libgphoto2-dev libeigen3-dev libhdf5-dev doxygen
opencv源码文件夹与opencv_contrib源码文件夹在同一级目录
$ cd opencv
$ mkdir build
$ cd build
OPENCV_EXTRA_MODULES_PATH是 opencv_contrib下的modules目录
包含opencv_contrib源码的编译方法
cmake -D CMAKE_BUILD_TYPE=Release \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
-D OPENCV_GENERATE_PKGCONFIG=YES \
-D WITH_1394=OFF ..
如果要安装更多,根据自己的情况选择ON或者OFF
cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D INSTALL_C_EXAMPLES=ON \
-D INSTALL_PYTHON_EXAMPLES=ON \
-D WITH_TBB=ON \
-D WITH_V4L=ON \
-D WITH_QT=ON \
-D WITH_OPENGL=ON \
-D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
-D BUILD_EXAMPLES=ON ..
$ make -j4
$ sudo make install
$ sudo ldconfig
然后添加环境变量PKG_CONFIG_PATH到~/.bashrc
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
链接库共享
更新链接库执行管理命令ldconfig:
sudo ldconfig -v
验证方法
$ cd opencv/samples/cpp/example_cmake
$ cmake .
$ make
$ ./opencv_example
如果在编译过程中出现如下错误
/usr/bin/ld: /usr/local/lib/libgflags.a(gflags.cc.o): relocation R_X86_64_PC32 against symbol `stderr@@GLIBC_2.2.5' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: 最后的链结失败: 错误的值
collect2: error: ld returned 1 exit status
modules/sfm/CMakeFiles/opencv_sfm.dir/build.make:320: recipe for target 'lib/libopencv_sfm.so.4.1.0' failed
make[2]: *** [lib/libopencv_sfm.so.4.1.0] Error 1
CMakeFiles/Makefile2:14822: recipe for target 'modules/sfm/CMakeFiles/opencv_sfm.dir/all' failed
make[1]: *** [modules/sfm/CMakeFiles/opencv_sfm.dir/all] Error 2
Makefile:162: recipe for target 'all' failed
make: *** [all] Error 2
解决方案是
git clone https://github.com/gflags/gflags.git
cd gflags
mkdir build && cd build
cmake -DBUILD_SHARED_LIBS=ON -DBUILD_STATIC_LIBS=ON ..
make
sudo make install
二、一些函数的说明
官网共包含如下实例
samples/dnn/classification.cpp
samples/dnn/colorization.cpp
samples/dnn/object_detection.cpp
samples/dnn/openpose.cpp
samples/dnn/segmentation.cpp
samples/dnn/text_detection.cpp
Confidence threshold 置信度阈值
Non-maximum suppression threshold非极大值抑制阈值
选择一个计算后端(Choose one of computation backends)
0: automatically (by default)
1: Halide language (http://halide-lang.org/)
2: Intel’s Deep Learning Inference Engine (https://software.intel.com/openvino-toolkit)
3: OpenCV implementation
代码是这样
cv::dnn::DNN_BACKEND_DEFAULT,
cv::dnn::DNN_BACKEND_HALIDE,
cv::dnn::DNN_BACKEND_INFERENCE_ENGINE,
cv::dnn::DNN_BACKEND_OPENCV,
cv::dnn::DNN_BACKEND_VKCOM
选择一个目标计算设备(Choose one of target computation devices)
0: CPU target (by default)
1: OpenCL,
2: OpenCL fp16 (half-float precision)
3: VPU
代码是这样
cv::dnn::DNN_TARGET_CPU,
cv::dnn::DNN_TARGET_OPENCL,
cv::dnn::DNN_TARGET_OPENCL_FP16,
cv::dnn::DNN_TARGET_MYRIAD,
cv::dnn::DNN_TARGET_VULKAN,
cv::dnn::DNN_TARGET_FPGA
假设以目标分类为例
std::vectorstd::string classes
类别,可以在代码里直接硬写入,也可以从文件读取
std::string file = parser.get("classes");
std::ifstream ifs(file.c_str());
if (!ifs.is_open())
CV_Error(Error::StsError, "File " + file + " not found");
std::string line;
while (std::getline(ifs, line))
{
classes.push_back(line);
}
加载模型 OpenCV支持以下模型
Net cv::dnn::net = readNet
Net cv::dnn::readNetFromCaffe
Net cv::dnn::readNetFromDarknet
Net cv::dnn::readNetFromONNX
Net cv::dnn::readNetFromTensorflow
Net cv::dnn::readNetFromTorch
文件的扩展名
.caffemodel (Caffe, http://caffe.berkeleyvision.org/)
.pb (TensorFlow, https://www.tensorflow.org/)
.t7 | *.net (Torch, http://torch.ch/)
.weights (Darknet, https://pjreddie.com/darknet/)
.bin (DLDT, https://software.intel.com/openvino-toolkit)
net.setPreferableBackend(DNN_BACKEND_OPENCV);
net.setPreferableTarget(DNN_TARGET_CPU);
从数据帧中创建一个4维的blob
Mat frame, blob;
Size inpSize(inpWidth > 0 ? inpWidth : frame.cols,
inpHeight > 0 ? inpHeight : frame.rows);
blobFromImage(frame, blob, scale, inpSize, mean, swapRB, false);
解释重载函数
void cv::dnn::blobFromImage ( InputArray image,
OutputArray blob,
double scalefactor = 1.0,
const Size & size = Size(),
const Scalar & mean = Scalar(),
bool swapRB = false,
bool crop = false,
int ddepth = CV_32F
)
image 可以是视频帧
blob是输出的数据
scalefactor 图像值的乘法器。
size 图像大小,也就是训练的时候使用图像的大小
mean
scalar with mean values which are subtracted from channels. Values are intended to be in (mean-R, mean-G, mean-B) order if image has BGR ordering and swapRB is true.
标量,其平均值是从通道中减去的。
值的顺序为(mean-R、mean-G、mean-B)
如果图像具有BGR顺序,则swapRB为真。
也就是图像通道有的是RGB顺序的,也有的是BGR顺序的,根据swapRB来判断通道顺序
swapRB 图像通道是RGB顺序还是BGR顺序的
if crop is true, input image is resized so one side after resize is equal to corresponding dimension in size and another one is equal or larger. Then, crop from the center is performed. If crop is false, direct resize without cropping and preserving aspect ratio is performed.
如果crop为true,则对输入图像进行大小调整,裁剪图像。如果crop是false的,等比例缩放。
图像的数据类型,支持32F和8U
CV_8U unsigned
CV_32F float
运行模型
net.setInput(blob);
std::vector outs;
net.forward(outs, outNames);
这里的outNames由 std::vector outNames = net.getUnconnectedOutLayersNames();获取
void cv::dnn::Net::forward ( OutputArrayOfArrays outputBlobs,
const std::vector< String > & outBlobNames
)
运行前向传播来计算outBlobNames中列出的层的输出。
如果使用Qt编译
可用如下配置
TEMPLATE = app
CONFIG += console c++11
CONFIG -= app_bundle
CONFIG -= qt
INCLUDEPATH += /usr/local/include \
/usr/local/include/opencv4 \
/usr/local/include/opencv4/opencv2
LIBS += /usr/local/lib/libopencv_calib3d.so \
/usr/local/lib/libopencv_core.so \
/usr/local/lib/libopencv_highgui.so \
/usr/local/lib/libopencv_imgproc.so \
/usr/local/lib/libopencv_imgcodecs.so\
/usr/local/lib/libopencv_objdetect.so\
/usr/local/lib/libopencv_photo.so \
/usr/local/lib/libopencv_dnn.so \
/usr/local/lib/libopencv_features2d.so \
/usr/local/lib/libopencv_stitching.so \
/usr/local/lib/libopencv_flann.so\
/usr/local/lib/libopencv_videoio.so \
/usr/local/lib/libopencv_video.so\
/usr/local/lib/libopencv_ml.so
SOURCES += \
main.cpp
HEADERS += \
common.hpp
三、两个示例的说明
以object detection为例查看使用说明
官网的例子,参数过多 https://github.com/opencv/opencv/blob/master/samples/dnn/object_detection.cpp
#include
#include
#include
#include
#include
#include "common.hpp"
std::string keys =
"{ help h | | Print help message. }"
"{ @alias | | An alias name of model to extract preprocessing parameters from models.yml file. }"
"{ zoo | models.yml | An optional path to file with preprocessing parameters }"
"{ device | 0 | camera device number. }"
"{ input i | | Path to input image or video file. Skip this argument to capture frames from a camera. }"
"{ framework f | | Optional name of an origin framework of the model. Detect it automatically if it does not set. }"
"{ classes | | Optional path to a text file with names of classes to label detected objects. }"
"{ thr | .5 | Confidence threshold. }"
"{ nms | .4 | Non-maximum suppression threshold. }"
"{ backend | 0 | Choose one of computation backends: "
"0: automatically (by default), "
"1: Halide language (http://halide-lang.org/), "
"2: Intel's Deep Learning Inference Engine (https://software.intel.com/openvino-toolkit), "
"3: OpenCV implementation }"
"{ target | 0 | Choose one of target computation devices: "
"0: CPU target (by default), "
"1: OpenCL, "
"2: OpenCL fp16 (half-float precision), "
"3: VPU }";
using namespace cv;
using namespace dnn;
float confThreshold, nmsThreshold;
std::vector classes;
void postprocess(Mat& frame, const std::vector& out, Net& net);
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame);
void callback(int pos, void* userdata);
std::vector getOutputsNames(const Net& net);
int main(int argc, char** argv)
{
CommandLineParser parser(argc, argv, keys);
const std::string modelName = parser.get("@alias");
const std::string zooFile = parser.get("zoo");
keys += genPreprocArguments(modelName, zooFile);
parser = CommandLineParser(argc, argv, keys);
parser.about("Use this script to run object detection deep learning networks using OpenCV.");
if (argc == 1 || parser.has("help"))
{
parser.printMessage();
return 0;
}
confThreshold = parser.get("thr");
nmsThreshold = parser.get("nms");
float scale = parser.get("scale");
Scalar mean = parser.get("mean");
bool swapRB = parser.get("rgb");
int inpWidth = parser.get("width");
int inpHeight = parser.get("height");
CV_Assert(parser.has("model"));
std::string modelPath = findFile(parser.get("model"));
std::string configPath = findFile(parser.get("config"));
// Open file with classes names.
if (parser.has("classes"))
{
std::string file = parser.get("classes");
std::ifstream ifs(file.c_str());
if (!ifs.is_open())
CV_Error(Error::StsError, "File " + file + " not found");
std::string line;
while (std::getline(ifs, line))
{
classes.push_back(line);
}
}
// Load a model.
Net net = readNet(modelPath, configPath, parser.get("framework"));
net.setPreferableBackend(parser.get("backend"));
net.setPreferableTarget(parser.get("target"));
std::vector outNames = net.getUnconnectedOutLayersNames();
// Create a window
static const std::string kWinName = "Deep learning object detection in OpenCV";
namedWindow(kWinName, WINDOW_NORMAL);
int initialConf = (int)(confThreshold * 100);
createTrackbar("Confidence threshold, %", kWinName, &initialConf, 99, callback);
// Open a video file or an image file or a camera stream.
VideoCapture cap;
if (parser.has("input"))
cap.open(parser.get("input"));
else
cap.open(parser.get("device"));
// Process frames.
Mat frame, blob;
while (waitKey(1) < 0)
{
cap >> frame;
if (frame.empty())
{
waitKey();
break;
}
// Create a 4D blob from a frame.
Size inpSize(inpWidth > 0 ? inpWidth : frame.cols,
inpHeight > 0 ? inpHeight : frame.rows);
blobFromImage(frame, blob, scale, inpSize, mean, swapRB, false);
// Run a model.
net.setInput(blob);
if (net.getLayer(0)->outputNameToIndex("im_info") != -1) // Faster-RCNN or R-FCN
{
resize(frame, frame, inpSize);
Mat imInfo = (Mat_(1, 3) << inpSize.height, inpSize.width, 1.6f);
net.setInput(imInfo, "im_info");
}
std::vector outs;
net.forward(outs, outNames);
postprocess(frame, outs, net);
// Put efficiency information.
std::vector layersTimes;
double freq = getTickFrequency() / 1000;
double t = net.getPerfProfile(layersTimes) / freq;
std::string label = format("Inference time: %.2f ms", t);
putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));
imshow(kWinName, frame);
}
return 0;
}
void postprocess(Mat& frame, const std::vector& outs, Net& net)
{
static std::vector outLayers = net.getUnconnectedOutLayers();
static std::string outLayerType = net.getLayer(outLayers[0])->type;
std::vector classIds;
std::vector confidences;
std::vector boxes;
if (outLayerType == "DetectionOutput")
{
// Network produces output blob with a shape 1x1xNx7 where N is a number of
// detections and an every detection is a vector of values
// [batchId, classId, confidence, left, top, right, bottom]
CV_Assert(outs.size() > 0);
for (size_t k = 0; k < outs.size(); k++)
{
float* data = (float*)outs[k].data;
for (size_t i = 0; i < outs[k].total(); i += 7)
{
float confidence = data[i + 2];
if (confidence > confThreshold)
{
int left = (int)data[i + 3];
int top = (int)data[i + 4];
int right = (int)data[i + 5];
int bottom = (int)data[i + 6];
int width = right - left + 1;
int height = bottom - top + 1;
if (width * height <= 1)
{
left = (int)(data[i + 3] * frame.cols);
top = (int)(data[i + 4] * frame.rows);
right = (int)(data[i + 5] * frame.cols);
bottom = (int)(data[i + 6] * frame.rows);
width = right - left + 1;
height = bottom - top + 1;
}
classIds.push_back((int)(data[i + 1]) - 1); // Skip 0th background class id.
boxes.push_back(Rect(left, top, width, height));
confidences.push_back(confidence);
}
}
}
}
else if (outLayerType == "Region")
{
for (size_t i = 0; i < outs.size(); ++i)
{
// Network produces output blob with a shape NxC where N is a number of
// detected objects and C is a number of classes + 4 where the first 4
// numbers are [center_x, center_y, width, height]
float* data = (float*)outs[i].data;
for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)
{
Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
Point classIdPoint;
double confidence;
minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
if (confidence > confThreshold)
{
int centerX = (int)(data[0] * frame.cols);
int centerY = (int)(data[1] * frame.rows);
int width = (int)(data[2] * frame.cols);
int height = (int)(data[3] * frame.rows);
int left = centerX - width / 2;
int top = centerY - height / 2;
classIds.push_back(classIdPoint.x);
confidences.push_back((float)confidence);
boxes.push_back(Rect(left, top, width, height));
}
}
}
}
else
CV_Error(Error::StsNotImplemented, "Unknown output layer type: " + outLayerType);
std::vector indices;
NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
for (size_t i = 0; i < indices.size(); ++i)
{
int idx = indices[i];
Rect box = boxes[idx];
drawPred(classIds[idx], confidences[idx], box.x, box.y,
box.x + box.width, box.y + box.height, frame);
}
}
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{
rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 255, 0));
std::string label = format("%.2f", conf);
if (!classes.empty())
{
CV_Assert(classId < (int)classes.size());
label = classes[classId] + ": " + label;
}
int baseLine;
Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
top = max(top, labelSize.height);
rectangle(frame, Point(left, top - labelSize.height),
Point(left + labelSize.width, top + baseLine), Scalar::all(255), FILLED);
putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.5, Scalar());
}
void callback(int pos, void*)
{
confThreshold = pos * 0.01f;
}
CMakeLists.txt的内容
ocv_install_example_src(dnn *.cpp *.hpp CMakeLists.txt)
set(OPENCV_DNN_SAMPLES_REQUIRED_DEPS
opencv_core
opencv_imgproc
opencv_dnn
opencv_imgcodecs
opencv_videoio
opencv_highgui)
ocv_check_dependencies(${OPENCV_DNN_SAMPLES_REQUIRED_DEPS})
if(NOT BUILD_EXAMPLES OR NOT OCV_DEPENDENCIES_FOUND)
return()
endif()
project(dnn_samples)
ocv_include_modules_recurse(${OPENCV_DNN_SAMPLES_REQUIRED_DEPS})
file(GLOB_RECURSE dnn_samples RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} *.cpp)
foreach(sample_filename ${dnn_samples})
ocv_define_sample(tgt ${sample_filename} dnn)
ocv_target_link_libraries(${tgt} ${OPENCV_LINKER_LIBS} ${OPENCV_DNN_SAMPLES_REQUIRED_DEPS})
endforeach()
简单点的object_detection_yolo(https://github.com/spmallick/learnopencv/blob/master/ObjectDetection-YOLO/object_detection_yolo.cpp)
// This code is written at BigVision LLC. It is based on the OpenCV project. It is subject to the license terms in the LICENSE file found in this distribution and at http://opencv.org/license.html
// Usage example: ./object_detection_yolo.out --video=run.mp4
// ./object_detection_yolo.out --image=bird.jpg
#include
#include
#include
#include
#include
#include
const char* keys =
"{help h usage ? | | Usage examples: \n\t\t./object_detection_yolo.out --image=dog.jpg \n\t\t./object_detection_yolo.out --video=run_sm.mp4}"
"{image i || input image }"
"{video v || input video }"
;
using namespace cv;
using namespace dnn;
using namespace std;
// Initialize the parameters
float confThreshold = 0.5; // Confidence threshold
float nmsThreshold = 0.4; // Non-maximum suppression threshold
int inpWidth = 416; // Width of network's input image
int inpHeight = 416; // Height of network's input image
vector classes;
// Remove the bounding boxes with low confidence using non-maxima suppression
void postprocess(Mat& frame, const vector& out);
// Draw the predicted bounding box
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame);
// Get the names of the output layers
vector getOutputsNames(const Net& net);
int main(int argc, char** argv)
{
CommandLineParser parser(argc, argv, keys);
parser.about("Use this script to run object detection using YOLO3 in OpenCV.");
if (parser.has("help"))
{
parser.printMessage();
return 0;
}
// Load names of classes
string classesFile = "coco.names";
ifstream ifs(classesFile.c_str());
string line;
while (getline(ifs, line)) classes.push_back(line);
// Give the configuration and weight files for the model
String modelConfiguration = "yolov3.cfg";
String modelWeights = "yolov3.weights";
// Load the network
Net net = readNetFromDarknet(modelConfiguration, modelWeights);
net.setPreferableBackend(DNN_BACKEND_OPENCV);
net.setPreferableTarget(DNN_TARGET_CPU);
// Open a video file or an image file or a camera stream.
string str, outputFile;
VideoCapture cap;
VideoWriter video;
Mat frame, blob;
try {
outputFile = "yolo_out_cpp.avi";
if (parser.has("image"))
{
// Open the image file
str = parser.get("image");
ifstream ifile(str);
if (!ifile) throw("error");
cap.open(str);
str.replace(str.end()-4, str.end(), "_yolo_out_cpp.jpg");
outputFile = str;
}
else if (parser.has("video"))
{
// Open the video file
str = parser.get("video");
ifstream ifile(str);
if (!ifile) throw("error");
cap.open(str);
str.replace(str.end()-4, str.end(), "_yolo_out_cpp.avi");
outputFile = str;
}
// Open the webcaom
else cap.open(parser.get("device"));
}
catch(...) {
cout << "Could not open the input image/video stream" << endl;
return 0;
}
// Get the video writer initialized to save the output video
if (!parser.has("image")) {
video.open(outputFile, VideoWriter::fourcc('M','J','P','G'), 28, Size(cap.get(CAP_PROP_FRAME_WIDTH), cap.get(CAP_PROP_FRAME_HEIGHT)));
}
// Create a window
static const string kWinName = "Deep learning object detection in OpenCV";
namedWindow(kWinName, WINDOW_NORMAL);
// Process frames.
while (waitKey(1) < 0)
{
// get frame from the video
cap >> frame;
// Stop the program if reached end of video
if (frame.empty()) {
cout << "Done processing !!!" << endl;
cout << "Output file is stored as " << outputFile << endl;
waitKey(3000);
break;
}
// Create a 4D blob from a frame.
blobFromImage(frame, blob, 1/255.0, cvSize(inpWidth, inpHeight), Scalar(0,0,0), true, false);
//Sets the input to the network
net.setInput(blob);
// Runs the forward pass to get output of the output layers
vector outs;
net.forward(outs, getOutputsNames(net));
// Remove the bounding boxes with low confidence
postprocess(frame, outs);
// Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)
vector layersTimes;
double freq = getTickFrequency() / 1000;
double t = net.getPerfProfile(layersTimes) / freq;
string label = format("Inference time for a frame : %.2f ms", t);
putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 255));
// Write the frame with the detection boxes
Mat detectedFrame;
frame.convertTo(detectedFrame, CV_8U);
if (parser.has("image")) imwrite(outputFile, detectedFrame);
else video.write(detectedFrame);
imshow(kWinName, frame);
}
cap.release();
if (!parser.has("image")) video.release();
return 0;
}
// Remove the bounding boxes with low confidence using non-maxima suppression
void postprocess(Mat& frame, const vector& outs)
{
vector classIds;
vector confidences;
vector boxes;
for (size_t i = 0; i < outs.size(); ++i)
{
// Scan through all the bounding boxes output from the network and keep only the
// ones with high confidence scores. Assign the box's class label as the class
// with the highest score for the box.
float* data = (float*)outs[i].data;
for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)
{
Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
Point classIdPoint;
double confidence;
// Get the value and location of the maximum score
minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
if (confidence > confThreshold)
{
int centerX = (int)(data[0] * frame.cols);
int centerY = (int)(data[1] * frame.rows);
int width = (int)(data[2] * frame.cols);
int height = (int)(data[3] * frame.rows);
int left = centerX - width / 2;
int top = centerY - height / 2;
classIds.push_back(classIdPoint.x);
confidences.push_back((float)confidence);
boxes.push_back(Rect(left, top, width, height));
}
}
}
// Perform non maximum suppression to eliminate redundant overlapping boxes with
// lower confidences
vector indices;
NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
for (size_t i = 0; i < indices.size(); ++i)
{
int idx = indices[i];
Rect box = boxes[idx];
drawPred(classIds[idx], confidences[idx], box.x, box.y,
box.x + box.width, box.y + box.height, frame);
}
}
// Draw the predicted bounding box
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{
//Draw a rectangle displaying the bounding box
rectangle(frame, Point(left, top), Point(right, bottom), Scalar(255, 178, 50), 3);
//Get the label for the class name and its confidence
string label = format("%.2f", conf);
if (!classes.empty())
{
CV_Assert(classId < (int)classes.size());
label = classes[classId] + ":" + label;
}
//Display the label at the top of the bounding box
int baseLine;
Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
top = max(top, labelSize.height);
rectangle(frame, Point(left, top - round(1.5*labelSize.height)), Point(left + round(1.5*labelSize.width), top + baseLine), Scalar(255, 255, 255), FILLED);
putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0,0,0),1);
}
// Get the names of the output layers
vector getOutputsNames(const Net& net)
{
static vector names;
if (names.empty())
{
//Get the indices of the output layers, i.e. the layers with unconnected outputs
vector outLayers = net.getUnconnectedOutLayers();
//get the names of all the layers in the network
vector layersNames = net.getLayerNames();
// Get the names of the output layers in names
names.resize(outLayers.size());
for (size_t i = 0; i < outLayers.size(); ++i)
names[i] = layersNames[outLayers[i] - 1];
}
return names;
}