调用调用tesseract_ocr实现OCR(二)

摘要

本文档记录了本人如何使用tesseract_ocr实现字符识别功能。该技术文档包括函数解释与工程实例,如需转载,请注明引用。


工程实例

这里将tesseract_ocr的调用分成了三个子函数,分别是init_tesseract()、ocr()和end_tesseract()。

void init_tesseract(tesseract::TessBaseAPI *api)
{
    /*本函数实现tesseract api的初始化功能,包括语言包及路径指定、识别模式、白名单设置、图片分割模式*/
    api->Init("NULL", "eng", tesseract::OEM_TESSERACT_ONLY);//
    api->SetVariable("tessedit_char_whitelist", "0123456789");//白名单,即先验识别范围
    /*
    NULL为可指定路径
    eng为语言包名称
    第三个参数为OCR引擎模式
    0 =仅限原始Tesseract tesseract::OEM_TESSERACT_ONLY
    1 =只有神经网络LSTM tesseract::OEM_CUBE_ONLY
    2 =Tesseract + LSTM OEM_TESSERACT_CUBE_COMBINED
    3 =基于可用的默认值 tesseract::OEM_DEFAULT
    白名单不支持LSTM
    */
    api->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
    /*
    PSM_OSD_ONLY    Orientation and script detection only.
    PSM_AUTO_OSD    Automatic page segmentation with orientation and script detection. (OSD)
    PSM_AUTO_ONLY   Automatic page segmentation, but no OSD, or OCR.
    PSM_AUTO    Fully automatic page segmentation, but no OSD.
    PSM_SINGLE_COLUMN   Assume a single column of text of variable sizes.
    PSM_SINGLE_BLOCK_VERT_TEXT  Assume a single uniform block of vertically aligned text.
    PSM_SINGLE_BLOCK    Assume a single uniform block of text. (Default!)
    PSM_SINGLE_LINE Treat the image as a single text line.
    PSM_SINGLE_WORD Treat the image as a single word.
    PSM_CIRCLE_WORD Treat the image as a single word in a circle.
    PSM_SINGLE_CHAR Treat the image as a single character.
    PSM_COUNT   Number of enum entries.
    */
}

char* ocr(tesseract::TessBaseAPI *api, cv::Mat inputImg,   float &conf)
{
    char* showtxt;
    api->SetImage((uchar*)inputImg.data, inputImg.cols, inputImg.rows, inputImg.channels(), inputImg.step);//
    //api->SetRectangle(0, 0, inputImg.cols, inputImg.rows);
    //Boxa* boxes = api.GetComponentImages(tesseract::RIL_TEXTLINE, true, NULL, NULL);
    /*
    enum PageIteratorLevel {
        RIL_BLOCK,     // Block of text/image/separator line.
        RIL_PARA,      // Paragraph within a block.
        RIL_TEXTLINE,  // Line within a paragraph.
        RIL_WORD,      // Word within a textline.
        RIL_SYMBOL     // Symbol/character within a word.
    };
    */
    //api.SetAccuracyVSpeed(tesseract:);
    //api.SetOutputName("out");
    showtxt=api->GetUTF8Text();//Get the text
    conf = api->MeanTextConf();//置信值
    return showtxt;
}

void end_tesseract(tesseract::TessBaseAPI *api)
{
    api->Clear();
    api->End();
}

本文仅介绍前两个子函数,
* init_tesseract()
*api->Init()有三个参数,第一个参数是语言包路径,可自行设置。第二个为需要加载语言包的名字,可以加载多个语言包,例如"eng+chi_sim"。第三个参数为OCR引擎模式。需要注意的是白名单仅不支持LSTM。通过设定白名单可以设定识别范围,上述程序中识别结果仅从0123456789中选择。
*api->setPageSegMode()

你可能感兴趣的:(调用调用tesseract_ocr实现OCR(二))