paddleOCRv3 openVINO部署
PaddleOCRv3识别部分(rec)用C++在openVINO上的部署,关注阶段从训练结束开始到用openVINO部署。中间使用的软件版本如下
paddleOCR(release2.5),(一致把release2.5叫做paddleOCRv3)
openVINO 2021.4
opencv 4.4.0
paddlepaddle-gpu: 2.2.2.post111
paddleOCRv3项目下提供了一个转换方法在tools/tools/export_model.py里面,直接用下面的命令行方式转换,指定配置文件和训练的模型文件,还有存储路径即可,具体如下:
训练好的模型在./output/myOCR_model路径下
python tools/export_model.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec_my2.yml -o Global.pretrained_model=./output/myOCR_model/best_accuracy Global.save_inference_dir=./inference/myOCR_model/
经过上面的步骤得到下面的inference模型
paddle2onnx --model_dir D:\myAPP\pythonDoc\PaddleOCRv3\inference\myOCR_model --model_filename inference.pdmodel --params_file inference.pdiparams --save_file D:\myAPP\pythonDoc\PaddleOCRv3\inference\myOCR_model/onnx/PaddleOCRv3.onnx --opset_version 11
得到如下的onnx格式的模型
用netron看onnx的输入节点
注意:现在的onnx模型输入的维度上batchsize和图片宽度是不定的,在onnx模型上为-1,但是openVINO2021.4现在不支持尺寸的自动推导,所以需要固定输入尺寸,转换时用–input_shape=[1,3,48,320]指定。openVINO的具体使用方法这里先不细述了,可以用–help查看具体的用法。
python "G:\openVINO\install\openvino_2021.4.752\deployment_tools\model_optimizer\mo.py" --input_model="K:\model\PaddleOCR\onnx\ppocr.onnx" --output_dir="K:\model\PaddleOCR\onnx\opv" --model_name="ppocr" --data_type=FP32 --input_shape=[1,3,48,320]
得到下面的模型
配置选项:openVINO2021系列配置都一样,openVINO2021.4版本是Intel长期支持的版本,opencv的话主要是用来处理图片,和推理过程无关,输入图片的前处理阶段padding的时候用到cv::copyMakeBoder函数,似乎opencv4.x版本都支持,3.x的从哪一个版本开始支持我也不是很清楚,所以想直接copy代码跑的话就用4.x版本的吧。
我这里使用的是openVINO2021.4和opencv4.4.0.
附加包含目录
附加库目录
连接器/输入
dll依赖
dll的依赖可以有三种方式,1)设置系统环境变量,把dll的位置添加到系统变量中,2)把dll拷贝到exe的同级目录,3)在vs的工程配置中设置:调试/环境+dll的路径,我这里的是:
path=G:\openVINO\install\openvino_2021.4.752\deployment_tools\ngraph\lib;D:\opencv440\opencv\build\x64\vc15\bin;G:\openVINO\install\openvino_2021.4.752\deployment_tools\inference_engine\external\tbb\bin;G:\openVINO\install\openvino_2021.4.752\deployment_tools\inference_engine\bin\intel64\Release;
openVINO的调用流程,准备模型,准备input和output blob,然后循环 input blob里填充数据 ==> 推理 ==> 后处理 的过程。具体可以查看官方文档,当时网上找资源看有没有现成的拿来用,但发现都不是很合适,所以偷懒还是不好哇,耐着性子读读文档,读得多了英文看起来也顺利了些。
在后处理阶段,输出维度是[batch,40,39],40是最多能识别40个字符,39是字符字典数+1,第一个元素是blank,然后是你的字典,我的字典在这里的是std::string dict = "-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ ";训练之前没注意在txt文件里面多了一个空行,所以总的字符就是1+10+26+1=38,在训练阶段会被自动在起始位填充一个blank字段,在解析输出的时候,需要排除第一个位置的预测的原因就在这里。
另外的字符的解析,例如把输出存为40行39列的图片,则按行找最大值的位置索引,一共有40个索引值,判断第一个位置索引值是不是零(为零表示预测为blank字段),如果不是则保留索引,从第二个开始依次和前面的索引比较,如果和前一个索引值不同,则保留索引,当遍历完40个索引的时候就得到了最后的索引表。
举例
maxIndex:每一行的最大值的索引,长度为40
[0,23,0,0,4,4,4,0,0,35 .。。。]
保留下来的索引值就是
[23,4,35。。。]
详见后处理代码。talk is cheap, show me your code
paddleOCR.h
#pragma once
#ifndef _PADDLEOCR_H_
#define _PADDLEOCR_H_
#include "opencv.hpp"
#include "inference_engine.hpp"
void showImage(const cv::Mat & image, std::string name, int waitMode = 1, int windowMode = 1);
/*
Function name:
normalizeImage
Parameters:
@image: Scource image
@out: Destination image
@mean: channels mean fo the train dataset
@stdv: standard deviation of the train dataset
image [0,255] convert to [0,1] ,image -=mean, image/=stdv;
*/
void normalizeImage(const cv::Mat & image, cv::Mat & out, std::vector<double>mean, std::vector<double>stdv);
/*
Function name:
paddingImage
Parameters:
@image: Source image
@out: Destination image of the same type as src and the size Size(src.cols+left+right,
src.rows+top+bottom)
@top: top pixels
@left: left pixels
@bottom: bottom pixels
@right: right pixels;
@boderType: frequently method cv::BORDER_REPLICATE ,cv::BORDER_CONSTANT
boderType more details https://docs.opencv.org/3.4/d2/de8/group__core__array.html#ga209f2f4869e304c82d07739337eae7c5
@Border value if borderType==BORDER_CONSTANT
This function use opencv cv::copyMakeBoder to padding image
More details https://docs.opencv.org/3.4/d2/de8/group__core__array.html#ga2ac1049c2c3dd25c2b41bffe17658a36
*/
void paddingImage(const cv::Mat & image, cv::Mat & out,
int top, int left, int bottom, int right,
int bodeyType, const cv::Scalar& value = cv::Scalar());
/*
Function name :
paddleOCRPreprocess
Parameters:
@image: Source image
@out: Destination image
@targetHeight: target image height ,in paddleOCRv3, input height is 48
@targetWidth: target image width, in paddleOCRv3, input width is 320
@mean: channels mean of the training set
@stdv: standard deviation of the training set,size of mean and stdv should be equal to image.channels()
Briefs:
image ==> padding ==> normalize
*/
void paddleOCRPreprocess(const cv::Mat & image, cv::Mat & out, const int targetHeight, const int targetWidth,
std::vector<double>mean, std::vector<double>stdv);
void paddleOCRPostProcess(cv::Mat &output, std::string &result, float &prob);
void demo();
#endif // !_PADDLEOCR_H_
paddleOCR.cpp
#include
#include
#include "opencv.hpp"
#include "paddleOCR.h"
#define SPEED_TEST
void showImage(const cv::Mat & image, std::string name, int waitMode, int windowMode)
{
if (image.empty())
{
std::cout << "ERROR: In showImage the input image is empty!\n";
return;
}
if (waitMode < 0)
waitMode = 0;
if (windowMode != 0 || windowMode != 1)
windowMode = cv::WINDOW_AUTOSIZE;
cv::namedWindow(name, windowMode);
cv::imshow(name, image);
cv::waitKey(waitMode);
}
void normalizeImage(const cv::Mat &image, cv::Mat & out, std::vector<double>mean, std::vector<double>stdv)
{
if (image.empty())
throw "normalizeImage input image is empty()!";
if (mean.size() != stdv.size())
throw "normalizeImage mean.size() != stdv.size()!";
if (mean.size() != image.channels())
throw "normalizeImage mean.size() != image.channels()";
for (double stdv_item : stdv)
{
//if standard deviation is zero, the image's all pixels are same
if (stdv_item == 0)
throw "normalizeImage stdv is zero";
}
image.convertTo(out, CV_32F, 1.0 / 256.0f, 0);
if (out.channels() == 1)
{
out -= mean[0];
out /= stdv[0];
}
else if (out.channels() > 1)
{
std::vector<cv::Mat> channelImage;
cv::split(out, channelImage);
for (int i = 0; i < out.channels(); i++)
{
channelImage[i] -= mean[i];
channelImage[i] /= stdv[i];
}
cv::merge(channelImage, out);
}
return;
}
void paddingImage(const cv::Mat & image, cv::Mat & out,
int top, int left, int bottom, int right,
int bodeyType, const cv::Scalar& value)
{
if (image.empty())
throw "padding input image is empty()!";
cv::copyMakeBorder(image, out, top, bottom, left, right, bodeyType, value);
return;
}
void paddleOCRPreprocess(const cv::Mat & image, cv::Mat & out, const int targetHeight, const int targetWidth,
std::vector<double>mean,std::vector<double>stdv)
{
if (image.empty())
throw "paddleOCRPreprocess : input image is empty()\n";
if (targetHeight <= 0 || targetWidth <= 0)
throw "paddleOCRPreprocess target size error targetHeight<=0 || targetWidth<=0";
//Resize image
//Adjust the height of the original image to match the height of the target image
int sourceWidth = image.cols;
int sourceHeight = image.rows;
//double targetWHRatio = (double)targetWidth / targetHeight;
double sourceWHRatio = (double)sourceWidth / sourceHeight;
int newHeight = targetHeight;
int newWidth = newHeight * sourceWHRatio;
if (newWidth > targetWidth)
newWidth = targetWidth;
cv::resize(image, out, cv::Size(newWidth, newHeight));
//Normalize image
normalizeImage(out, out, mean, stdv);
//Padding image
//the resized image's height is always equal to targetHeight,but width will not
if (newWidth < targetWidth)
{
int right = targetWidth - newWidth;
//paddingImage(out, out, 0, 0, 0, right, cv::BORDER_REPLICATE);// 按最后一行填充
paddingImage(out, out, 0, 0, 0, right, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));//0 填充
}
//showImage(out, "padding",1,0);
}
void paddleOCRPostProcess(cv::Mat &output, std::string &result, float &prob)
{
std::string dict = ".-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
if (output.empty())
return ;
result = "";;
int h = output.rows;
int w = output.cols;
std::vector<int> maxIndex;
std::vector<float>maxProb;
double maxVal;
cv::Point maxLoc;
for (int row = 0; row < h; row++)
{
cv::Mat temp(1, w,CV_32FC1,output.ptr<float>(row));
cv::minMaxLoc(temp, NULL, &maxVal, NULL, &maxLoc);
maxIndex.push_back(maxLoc.x);
maxProb.push_back((float)maxVal);
}
std::vector<int>selectedIndex;
std::vector<float>selectedProb;
//在maxIndex中找出与前一个的index不一样且不为0的位置,
//先判断第一个元素
if (maxIndex.size() != 0 && maxIndex[0] != 0)
{
selectedIndex.push_back(maxIndex[0]);
selectedProb.push_back(maxProb[0]);
}
for (int i = 1; i < maxIndex.size() ; i++)
{
if (maxIndex[i] != maxIndex[i - 1] && maxIndex[i] != 0)
{
selectedIndex.push_back(maxIndex[i]);
selectedProb.push_back(maxProb[i]);
}
}
double meanProb = 0;
for (int i = 0; i < selectedIndex.size(); i++)
{
result += dict[selectedIndex[i]];
meanProb += selectedProb[i];
}
if (selectedIndex.size() == 0)
meanProb = 0;
else
meanProb /= selectedIndex.size();
prob = meanProb;
return ;
}
void demo()
{
using namespace std;
string xmlPath, binPath, imageDirs;
xmlPath = "K:\\model\\PaddleOCR\\onnx\\opv\\ppocr.xml";
binPath = "K:\\model\\PaddleOCR\\onnx\\opv\\ppocr.bin";
imageDirs = "K:\\imageData\\OCR\\ocr_dataset\\test";
//imageDirs = "\\\\192.168.1.247\\Pictures\\imageAndModel\\paddle_OCR_dataset\\test\\real\\t2x";
//imageDirs = "K:\\imageData\\OCR\\ocr_dataset\\bg_black";
string inputNodeName = "x", outputNodeName = "softmax_2.tmp_0";
vector<double>mean = { 0.5,0.5,0.5 };
vector<double>stdv = { 0.5,0.5,0.5 };
const int targetHeight = 48;
const int targetWidth = 320;
vector<cv::String> imagePathList;
cv::glob(imageDirs, imagePathList,1);
//1. Create Inference Engine Core
InferenceEngine::Core core;
//InferenceEngine::CNNNetwork network;
InferenceEngine::ExecutableNetwork executable_network;
//2. (Optional). Configure Input and Output of the Model
//3. Load the Model to the Device
executable_network = core.LoadNetwork(xmlPath, "CPU");
// show some information
InferenceEngine::ConstInputsDataMap inputInfo = executable_network.GetInputsInfo();
for (auto inputIter : inputInfo)
{
cout << "input node name :" << inputIter.first << endl;;
}
InferenceEngine::ConstOutputsDataMap outputInfo = executable_network.GetOutputsInfo();
for (auto outputIter : outputInfo)
{
cout << "output node name : " << outputIter.first << endl;
//use netron youcan see the output node name is : softmax_2.tmp_0
}
//4. Create an Inference Request¶
InferenceEngine::InferRequest inferRequest = executable_network.CreateInferRequest();
inferRequest.Infer(); //warmup
//5.1 Prepare input blob
//input data must be aligned (resized manually) with a given blob size and have a correct color format
InferenceEngine::Blob::Ptr inputBlobPtr = inferRequest.GetBlob(inputNodeName);
InferenceEngine::SizeVector inputSize = inputBlobPtr->getTensorDesc().getDims();
auto inputdata = inputBlobPtr->buffer()
.as<InferenceEngine::PrecisionTrait<InferenceEngine::Precision::FP32>::value_type *>();
//let's see the input size (NCHW)
cout << "the input blob size: ";
for (size_t item : inputSize)
cout << item << ",";;
cout << endl;
//5.2 Prepare output blob
InferenceEngine::Blob::Ptr outputBlobPtr = inferRequest.GetBlob(outputNodeName);
auto outputData = outputBlobPtr->buffer().
as<InferenceEngine::PrecisionTrait<InferenceEngine::Precision::FP32>::value_type *>();//是否可以提前而不是每次都调用
InferenceEngine::SizeVector outputSize = outputBlobPtr->getTensorDesc().getDims();
cout << "output blob size:";
for (size_t item : outputSize)
cout << item << ",";
cout << endl;;
cv::Mat image, input, output;
for (cv::String imageDir : imagePathList)
{
image = cv::imread(imageDir);
if (image.empty())
continue;
showImage(image, "original image",1,0);
#ifdef SPEED_TEST
static double totalTime = 0;
static double totalNum = 0;
clock_t start_time = clock();
#endif // SPEED_TEST
size_t channels = inputSize[1];
size_t inputHeight = inputSize[2];
size_t inputWidth = inputSize[3];
rsize_t imageSize = inputHeight * inputWidth;
//Preprocess
paddleOCRPreprocess(image, input, targetHeight, targetWidth, mean, stdv);
//Prepare input data
for (size_t pid = 0; pid < imageSize; ++pid)
{
for (size_t ch = 0; ch < channels; ++ch)
{
inputdata[imageSize*ch + pid] = input.at<cv::Vec3f>(pid)[ch];
}
}
//6. Infer
//for(int i=0;i<100;i++)
inferRequest.Infer();
cv::Mat temp(outputSize[1],outputSize[2],CV_32FC1,outputData );
output = temp;
//cout << output;
std::string result;
float prob;
//7. Postprocess
paddleOCRPostProcess(output,result,prob);
#ifdef SPEED_TEST
clock_t end_time = clock();
double run_time = 1000 * (end_time - start_time) / CLOCKS_PER_SEC;
totalNum+=1;
totalTime += run_time;
cout << "run time:" << run_time << endl;;
cout << "total time: " << totalTime <<", totalNum: "<<totalNum<<", mean time: "<< totalTime/totalNum<<endl;;
#endif // SPEED_TEST
cv::Mat textScore = cv::Mat::zeros(200, 200, CV_8UC3);
cv::putText(textScore, "text:" + result, cv::Point(10, 50), 1, 1, cv::Scalar(0, 255, 0));
cv::putText(textScore, "score:" + std::to_string((int)(prob*100)), cv::Point(10, 120), 1, 1, cv::Scalar(0, 2, 250));
showImage(textScore, "textScore");
int c = cv::waitKey(0);
if (c == 27)
break;
}
return;
}
main.cpp
#include
#include "paddleOCR.h"
int main()
{
std::cout << "Hello World!\n";
demo();
//testFunction();
}