OpenCv + Qt5.12.2 文字识别

OpenCv + Qt5.12.2 文字检测与文本识别

前言

​ 好久没有进行一些相关的更新的了,去年一共更新了四篇,最近一直在做音视频相关的直播服务,又是重新学习积攒经验的一个过程。去年疫情也比较严重,等到解封,又一直很忙,最近又算有了一些时间,所以想着可以做一些更新了,又拿起了 OpenCV,做一些相关更新了。其实代码相关的工作,在上一篇 OpenCV-摄像头相关的完成之后已经做完了,只是一直没有写相关博客,这次先给做完。

简介

​ 文本检测与文本识别都是基于原生OpenCV的扩张模块来实现的,基本流程是按照 OpenCV 文字检测与识别模块来实现的,只不过是我做了一些关于Ot与OpenCV的集成工作做成了项目。大致工作流程为:图片选择功能选择图片保存

​ 相关的文档我在内外网搜索后发现大致几篇一样的文档,来源不可考,大致都贴出来:

OpenCV 文字檢測與識別模塊 - 台部落 / OpenCV 文字检测与识别模块 - CSDN

OPENCV 文字检测与识别模块 - 灰信网

文档基本相同,CSDN与灰信网完全相同,台部落是资源路径不同,台部落是原始模型资源路径,CSDN与灰信网的路径相同是一个网盘。但是台部落与CSDN博主是同一个名字。那就是灰信网。

资源路径

编译相关的已经在前两篇文档已经描述过了,路径如下: OpenCv4.4.0+Qt5.12.2+OpenCv-Contrib-4.4.0。

那就描述一下本期需要用到的一些资源:

文字检测

资源文件描述如下: textDetector.hpp 文档中 37-39行。详细内容如下:

/** @brief TextDetectorCNN class provides the functionallity of text bounding box detection.
 This class is representing to find bounding boxes of text words given an input image.
 This class uses OpenCV dnn module to load pre-trained model described in @cite LiaoSBWL17.
 The original repository with the modified SSD Caffe version: https://github.com/MhLiao/TextBoxes.
 Model can be downloaded from [DropBox](https://www.dropbox.com/s/g8pjzv2de9gty8g/TextBoxes_icdar13.caffemodel?dl=0).
 Modified .prototxt file with the model description can be found in `opencv_contrib/modules/text/samples/textbox.prototxt`.
 */

textbox.prototxt - 本地文档模块目录中,按照路径查找即可。

TextBoxes_icdar13.caffemodel - TextBoxes_icdar13.caffemodel

文字识别

所需要的资源如下:见相关网页描述: OpenCV.org, text_recognition_cnn.cpp,不过也只是贴出了相关路径而已,原始博客中提到的关于

    cout << "   Demo of text recognition CNN for text detection." << endl
         << "   Max Jaderberg et al.: Reading Text in the Wild with Convolutional Neural Networks, IJCV 2015"< " << endl
         << "   Caffe Model files (textbox.prototxt, TextBoxes_icdar13.caffemodel)"<

相关路径已经失效。

vgg_text,是一些快照文件,只有两个比较小的文件资源,模型module已经是没有的了。最后还是使用CSDN博主的资源,利用百度网盘下载了,折磨人。

其他涉及到资源文件,基本都在模块的文件路径下:

trained_classifierNM1.xml
trained_classifierNM2.xml
OCRHMM_transitions_table.xml
OCRHMM_knn_model_data.xml.gz
trained_classifier_erGrouping.xml

路径如下:

opencv_contrib-4.4.0\modules\text\samples

其他的一些图片资源也可以在当前目录下找到。

代码

头文件:

#ifndef MAINWINDOW_H
#define MAINWINDOW_H

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

class ParallelExtracCSER: public cv::ParallelLoopBody
{
private:
    std::vector &channels;
    std::vector> ®ions;
    std::vector> erFiter_1;
    std::vector> erFiter_2;
public:
    ParallelExtracCSER(std::vector &_channels, std::vector> &_regions,
                       std::vector> _erFiter_1, std::vector> _erFiter_2)
        : channels(_channels), regions(_regions), erFiter_1(_erFiter_1), erFiter_2(_erFiter_2){}
    virtual void operator()( const cv::Range &r) const CV_OVERRIDE
    {
        for(int c = r.start; c < r.end; c++)
        {
            erFiter_1[c]->run(channels[c], regions[c]);
            erFiter_2[c]->run(channels[c], regions[c]);
        }
    }
    ParallelExtracCSER & operator=(const ParallelExtracCSER &a);
};

template  
class ParallelOCR: public cv::ParallelLoopBody
{
private:
    std::vector &detections;
    std::vector &outputs;
    std::vector > &boxes;
    std::vector > &words;
    std::vector > &confidences;
    std::vector > &ocrs;
public:
    ParallelOCR(std::vector &_detections, std::vector &_outputs, std::vector< std::vector > &_boxes,
                std::vector< std::vector > &_words, std::vector< std::vector > &_confidences,
                std::vector< cv::Ptr > &_ocrs):detections(_detections),outputs(_outputs),boxes(_boxes),words(_words),confidences(_confidences),ocrs(_ocrs)
    {}

    virtual void operator()(const cv::Range &r) const CV_OVERRIDE
    {
        for(int c=r.start; c < r.end; c++)
        {
            ocrs[c%ocrs.size()]->run(detections[c], outputs[c], &boxes[c], &words[c], &confidences[c], cv::text::OCR_LEVEL_WORD);
        }
    }
    ParallelOCR & operator=(const ParallelOCR &a);
};

namespace Ui {
class MainWindow;
}

class MainWindow : public QMainWindow
{
    Q_OBJECT

public:
    explicit MainWindow(QWidget *parent = nullptr);
    ~MainWindow();

private:
    Ui::MainWindow *ui;
    void WindowInit();
    std::string sourcePath;
    void showImage(cv::Mat &image);
    bool fileExists(const std::string &filename);
    void textboxDraw(cv::Mat src, std::vector &groups, std::vector &probs, std::vector &indexes);
    bool isRepetitive(const std::string &s);
    void erDraw(std::vector &channels, std::vector> ®ions, std::vector group, cv::Mat segmentation);

public slots:
    void slot_importImage();
    void slot_saveImage();
    void slot_textDetector();
    void slot_textRecognizer();
};


#endif // MAINWINDOW_H

MainWindow类是主要的Ctrl模块,其他两个类 ParallelExtracCSERParallelOCR属于业务类了,主要功能模块实现相关的。

函数实现

槽函数

主要对应四个主要功能,图片导入,图片保存,文本检测,文本识别

1. slot_importImage()
void MainWindow::slot_importImage()
{
    QString imagePath = QFileDialog::getOpenFileName(this,"选择图片","./","*png *jpg *jpeg");
    QImage image;
    if(image.load(imagePath))
        qDebug() << "导入图片成功" << imagePath;
    sourcePath = QDir::toNativeSeparators(imagePath).toStdString();
    qDebug() << "图片路径:" << QDir::toNativeSeparators(imagePath);
    int imageWidth = image.width();
    int imageHeight = image.height();

    if(imageWidth > 640)
    {
        imageHeight = (640*10 / imageWidth) * imageHeight /10;
        imageWidth = 640;
    }

    if(imageHeight > 480)
    {
        imageWidth = (480*10 / imageHeight) * imageWidth /10;
        imageHeight = 480;
    }

    image = image.scaled(imageWidth, imageHeight, Qt::IgnoreAspectRatio, Qt::SmoothTransformation);
    this->resize(imageWidth*2+2,imageHeight);
    ui->label_source->setPixmap(QPixmap::fromImage(image));
}
2.slot_saveImage()
void MainWindow::slot_saveImage()
{
    if(currentActive.isEmpty() || sourcePath.empty())
    {
        qDebug() << "currentActive is " << currentActive.isEmpty() << " sourcePath: " << sourcePath.empty();
        return;
    }
    QString source_path_name = QString::fromStdString(sourcePath);
    size_t pos = sourcePath.find('.');
    if(pos == std::string::npos)
    {
        qDebug() << QString::fromStdString(sourcePath) << " iamget format is error";
        return;
    }
    QStringList sourcePaths = source_path_name.split('.');
    QString saveName = sourcePaths.at(0) + "_" + currentActive + "." + sourcePaths.at(1);
    if(ui->label_result->pixmap()->save(saveName, sourcePaths.at(1).toStdString().c_str()))
    {
        qDebug() << saveName << " save success.";
    }
    else
    {
        qDebug() << saveName << " save fail.";
    }
}
3.slot_textDetector()
void MainWindow::slot_textDetector()
{
    const std::string modelArch = "textbox.prototxt" ;
    const std::string moddelWeights = "TextBoxes_icdar13.caffemodel";
    if(!fileExists(modelArch) || !fileExists(moddelWeights))
    {
        qDebug() << "Model files not found in the current directory. Aborting!";
        return;
    }

    if(sourcePath.empty())
    {
        qDebug() << "图片路径无效,请检查图片是否存在!";
        return;
    }
    cv::Mat image = cv::imread(sourcePath, cv::IMREAD_COLOR);
    if(image.empty())
    {
        qDebug() << "image is empty" << sourcePath.c_str();
        return;
    }

    qDebug() << "Starting Text Box Demo";
    cv::Ptr textSpotter = cv::text::TextDetectorCNN::create(modelArch, moddelWeights);
    std::vector bbox;
    std::vector outProbabillities;
    textSpotter->detect(image, bbox, outProbabillities);
    std::vector indexes;
    cv::dnn::NMSBoxes(bbox, outProbabillities, 0.4f, 0.5f, indexes);

    cv::Mat imageCopy = image.clone();
//    float threshold = 0.5;
//    for(int i = 0; i < bbox.size(); i++)
//    {
//        if(outProbabillities[i] > threshold)
//        {
//            cv::Rect rect = bbox[i];
//            cv::rectangle(imageCopy,rect,cv::Scalar(255,0,0),2);
//        }
//    }
    textboxDraw(imageCopy, bbox, outProbabillities, indexes);
    showImage(imageCopy);

    imageCopy = image.clone();
    cv::Ptr wordSpotter =
            cv::text::OCRHolisticWordRecognizer::create("dictnet_vgg_deploy.prototxt", "dictnet_vgg.caffemodel", "dictnet_vgg_labels.txt");
    for(size_t i = 0; i < indexes.size(); i++)
    {
        cv::Mat wordImg;
        cv::cvtColor(image(bbox[indexes[i]]),wordImg, cv::COLOR_BGR2GRAY);
        std::string word;
        std::vector confs;
        wordSpotter->run(wordImg, word, nullptr, nullptr, &confs);

        cv::Rect currrentBox = bbox[indexes[i]];
        rectangle(imageCopy, currrentBox, cv::Scalar( 0, 255, 255 ), 2, cv::LINE_AA);

        int baseLine = 0;
        cv::Size labelSize = cv::getTextSize(word, cv::FONT_HERSHEY_PLAIN, 1, 1, &baseLine);
        int yLeftBottom = std::max(currrentBox.y, labelSize.height);
        rectangle(imageCopy, cv::Point(currrentBox.x, yLeftBottom - labelSize.height),
                  cv::Point(currrentBox.x +labelSize.width, yLeftBottom + baseLine), cv::Scalar( 255, 255, 255 ), cv::FILLED);

        putText(imageCopy, word, cv::Point(currrentBox.x , yLeftBottom), cv::FONT_HERSHEY_PLAIN, 1, cv::Scalar( 0,0,0 ), 1, cv::LINE_AA);
    }
    showImage(imageCopy);
}
4.slot_textRecognizer()

void MainWindow::slot_textRecognizer()
{
    if(sourcePath.empty())
    {
        qDebug() << "图片路径无效,请检查图片是否存在!";
        return;
    }
    cv::Mat image = cv::imread(sourcePath, cv::IMREAD_COLOR);
    if(image.empty())
    {
        qDebug() << "image is empty" << sourcePath.c_str();
        return;
    }

    bool downsize = false;
    int RegionType = 1;
    int GroupingAlgorithm = 0;
    int Recongnition = 0;
    cv::String regionTypeString[2] = {"ERStats","MSER"};
    cv::String GroupingAlgorithmsStr[2] = {"exhaustive_search", "multioriented"};
    cv::String recognitionsStr[2] = {"Tesseract", "NM_chain_features + KNN"};

    std::vector channels;
    std::vector> regions(2);

    cv::Mat gray,outImage;
    // Create ERFilter objects with the 1st and 2nd stage default classifiers
    // since er algorithm is not reentrant we need one filter for channel
    std::vector< cv::Ptr > erFilters1;
    std::vector< cv::Ptr > erFilters2;

    if(!fileExists("trained_classifierNM1.xml") || !fileExists("trained_classifierNM2.xml")
            || !fileExists("OCRHMM_transitions_table.xml") || !fileExists("OCRHMM_knn_model_data.xml.gz") || !fileExists("trained_classifier_erGrouping.xml"))
    {
        qDebug() << " trained_classifierNM1.xml file not found!";
        return;
    }

    for(int i = 0; i<2; i++ )
    {
        cv::Ptr erFilter1 = createERFilterNM1(cv::text::loadClassifierNM1("trained_classifierNM1.xml"), 8, 0.00015f, 0.13f, 0.2f, true, 0.1f);
        cv::Ptr erFilter2 = createERFilterNM2(cv::text::loadClassifierNM2("trained_classifierNM2.xml"), 0.5);
        erFilters1.push_back(erFilter1);
        erFilters2.push_back(erFilter2);
    }

    int numOcrs = 10;
    std::vector> ocrs;
    for(int o = 0; o < numOcrs; o++)
    {
        ocrs.push_back(cv::text::OCRTesseract::create());
    }

    cv::Mat transitionP;
    std::string filename = "OCRHMM_transitions_table.xml";
    cv::FileStorage fs(filename, cv::FileStorage::READ);
    fs["transition_probabilities"] >> transitionP;
    fs.release();

    cv::Mat emissionP = cv::Mat::eye(62, 62, CV_64FC1);
    std::string voc = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

    std::vector< cv::Ptr> decoders;

    for(int o = 0; o > contours;
        std::vector bboxes;
        cv::Ptr mesr = cv::MSER::create(21, (int)(0.00002*gray.cols*gray.rows), (int)(0.05*gray.cols * gray.rows), 1, 0.7);
        mesr->detectRegions(gray, contours, bboxes);

        if(contours.size() > 0)
            MSERsToERStats(gray, contours, regions);
    }
    break;
    }

    std::vector< std::vector> nmRegionGroups;
    std::vector nmBoxes;
    switch (GroupingAlgorithm) {
    case 0:
        cv::text::erGrouping(image, channels, regions, nmRegionGroups, nmBoxes, cv::text::ERGROUPING_ORIENTATION_HORIZ);
        break;
    case 1:
        cv::text::erGrouping(image, channels, regions, nmRegionGroups, nmBoxes, cv::text::ERGROUPING_ORIENTATION_ANY, "trained_classifier_erGrouping.xml", 0.5);
        break;
    }

    /*Text Recognition (OCR)*/
    int bottom_bar_height = outImage.rows/7 ;
    cv::copyMakeBorder(image, outImage, 0, bottom_bar_height, 0, 0, cv::BORDER_CONSTANT, cv::Scalar(150, 150, 150));
    float scale_font = (float)(bottom_bar_height /85.0);
    std::vector words_detection;
    float min_confidence1 = 0.f, min_confidence2 = 0.f;

    if (Recongnition == 0)
    {
        min_confidence1 = 51.f;
        min_confidence2 = 60.f;
    }

    std::vector detections;

    for (int i=0; i<(int)nmBoxes.size(); i++)
    {
        rectangle(outImage, nmBoxes[i].tl(), nmBoxes[i].br(), cv::Scalar(255,255,0),3);

        cv::Mat group_img = cv::Mat::zeros(image.rows+2, image.cols+2, CV_8UC1);
        erDraw(channels, regions, nmRegionGroups[i], group_img);
        group_img(nmBoxes[i]).copyTo(group_img);
        copyMakeBorder(group_img,group_img,15,15,15,15,cv::BORDER_CONSTANT,cv::Scalar(0));
        detections.push_back(group_img);
    }
    std::vector outputs((int)detections.size());
    std::vector< std::vector > boxes((int)detections.size());
    std::vector< std::vector > words((int)detections.size());
    std::vector< std::vector > confidences((int)detections.size());
    // parallel process detections in batches of ocrs.size() (== num_ocrs)
    for (int i=0; i<(int)detections.size(); i=i+(int)numOcrs)
    {
        cv::Range r;
        if (i+(int)numOcrs <= (int)detections.size())
            r = cv::Range(i,i+(int)numOcrs);
        else
            r = cv::Range(i,(int)detections.size());

        switch(Recongnition)
        {
        case 0: // Tesseract
            qDebug() << "+++++";
            cv::parallel_for_(r, ParallelOCR(detections, outputs, boxes, words, confidences, ocrs));
            qDebug() << "---";
            break;
        case 1: // NM_chain_features + KNN
            cv::parallel_for_(r, ParallelOCR(detections, outputs, boxes, words, confidences, decoders));
            break;
        }
    }
    for(auto &it : outputs)
    {
        qDebug() << QString::fromStdString(it);
    }
    for (int i=0; i<(int)detections.size(); i++)
    {
        outputs[i].erase(remove(outputs[i].begin(), outputs[i].end(), '\n'), outputs[i].end());
        //cout << "OCR output = \"" << outputs[i] << "\" length = " << outputs[i].size() << endl;
        if (outputs[i].size() < 3)
            continue;

        for (int j=0; j<(int)boxes[i].size(); j++)
        {
            boxes[i][j].x += nmBoxes[i].x-15;
            boxes[i][j].y += nmBoxes[i].y-15;

            //cout << "  word = " << words[j] << "\t confidence = " << confidences[j] << endl;
            if ((words[i][j].size() < 2) || (confidences[i][j] < min_confidence1) ||
                    ((words[i][j].size()==2) && (words[i][j][0] == words[i][j][1])) ||
                    ((words[i][j].size()< 4) && (confidences[i][j] < min_confidence2)) ||
                    isRepetitive(words[i][j]))
                continue;
            words_detection.push_back(words[i][j]);
            rectangle(outImage, boxes[i][j].tl(), boxes[i][j].br(), cv::Scalar(255,0,255),3);
            cv::Size word_size = getTextSize(words[i][j], cv::FONT_HERSHEY_SIMPLEX, (double)scale_font, (int)(3*scale_font), nullptr);
            cv::rectangle(outImage, boxes[i][j].tl()-cv::Point(3,word_size.height+3), boxes[i][j].tl()+cv::Point(word_size.width,0), cv::Scalar(255,0,255),-1);
            cv::putText(outImage, words[i][j], boxes[i][j].tl()-cv::Point(1,1), cv::FONT_HERSHEY_SIMPLEX, scale_font, cv::Scalar(255,255,255),(int)(3*scale_font));
        }
    }
    tAll = ((double)cv::getTickCount() - tAll)*1000/cv::getTickFrequency();
    int text_thickness = 1+(outImage.rows/500);
    std::string fps_info = cv::format("%2.1f Fps. %dx%d", (float)(1000 / tAll), image.cols, image.rows);
    cv::putText(outImage, fps_info, cv::Point( 10,outImage.rows-5 ), cv::FONT_HERSHEY_DUPLEX, scale_font, cv::Scalar(255,0,0), text_thickness);
    cv::putText(outImage, regionTypeString[RegionType], cv::Point((int)(outImage.cols*0.5), outImage.rows - (int)(bottom_bar_height/ 1.5)), cv::FONT_HERSHEY_DUPLEX, scale_font, cv::Scalar(255,0,0), text_thickness);
    cv::putText(outImage, GroupingAlgorithmsStr[GroupingAlgorithm], cv::Point((int)(outImage.cols*0.5),outImage.rows-((int)(bottom_bar_height /3)+4) ), cv::FONT_HERSHEY_DUPLEX, scale_font, cv::Scalar(255,0,0), text_thickness);
    cv::putText(outImage, regionTypeString[Recongnition], cv::Point((int)(outImage.cols*0.5),outImage.rows-5 ), cv::FONT_HERSHEY_DUPLEX, scale_font, cv::Scalar(255,0,0), text_thickness);
    showImage(outImage);
}

Ctrl函数

void MainWindow::WindowInit()
{
    //设置菜单
    QMenu* file = ui->menuBar->addMenu(QString("文件"));
    QAction* importImage = file->addAction(QString("选择图片"));
    QAction* saveImage = file->addAction(QString("保存"));

    QMenu* funtion = ui->menuBar->addMenu(QString("功能"));
    QAction* textDetector = funtion->addAction(QString("文字检测"));
    QAction* textRecognizer = funtion->addAction(QString("文字识别"));

    //绑定信号与槽函数
    connect(importImage,&QAction::triggered,this,&MainWindow::slot_importImage);
    connect(saveImage,&QAction::triggered,this,&MainWindow::slot_saveImage);
    connect(textDetector,&QAction::triggered,this,&MainWindow::slot_textDetector);
    connect(textRecognizer,&QAction::triggered,this,&MainWindow::slot_textRecognizer);
}

Qt图片显示函数

做了一个图片显示,附带缩放显示

void MainWindow::showImage(cv::Mat &image)
{
    cv::Mat outImage;
    cv::cvtColor(image, outImage, cv::COLOR_BGR2RGB);
    QImage qImage = QImage((const unsigned char*)(outImage.data),outImage.cols,outImage.rows,outImage.step,QImage::Format_RGB888);
    int imageWidth = qImage.width();
    int imageHeight = qImage.height();

    if(imageWidth > 640)
    {
        imageHeight = (640*10 / imageWidth) * imageHeight /10;
        imageWidth = 640;
    }

    if(imageHeight > 480)
    {
        imageWidth = (480*10 / imageHeight) * imageWidth /10;
        imageHeight = 480;
    }

    qImage = qImage.scaled(imageWidth, imageHeight, Qt::IgnoreAspectRatio, Qt::SmoothTransformation);
    ui->label_result->setPixmap(QPixmap::fromImage(qImage));
}

文字绘制

void MainWindow::textboxDraw(cv::Mat src, std::vector& groups, std::vector& probs, std::vector& indexes)
{
    for (size_t i = 0; i < indexes.size(); i++)
    {
        if (src.type() == CV_8UC3)
        {
            cv::Rect currrentBox = groups[indexes[i]];
            cv::rectangle(src, currrentBox, cv::Scalar( 0, 255, 255 ), 2, cv::LINE_AA);
            cv::String cvlabel = cv::format("%.2f", probs[indexes[i]]);
            qDebug() << "text box: " << currrentBox.size().width << " " <

## 源码

基本流程如上,相关的函数解释与释义都已经附上,更详细的说明解释,见上述博客内容,就不再做一边赘述了。

源码

你可能感兴趣的:(OpenCv,opencv,qt,计算机视觉)