光学字符识别最早是指针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术,现在已经拓展为通过深度学习等技术对图像中的字符内容进行检测,返回文本内容和文本所在图片中的位置信息,通常为四个边界的坐标(后一段解释为个人理解)。
原图(左)和识别结果可视化(右)
以本文所使用的是百度飞浆的PaddleOCR工具库,理由如下:
1.国内公司开发的项目,提供了大量的中文操作和学习文档,方便使用与学习,属于小白友好型项目;
2.可拓展性良好,接口均已预留可直接调用,提供了适用于各种部署场景的轻量级网络和开发模组,属于开发者友好型项目。
GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices) - GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)https://github.com/PaddlePaddle/PaddleOCR 可通过终端使用:cd到项目根目录,输入
#layout analysis + table recognition
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure
#layout analysis
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/1.png --type=structure --table=false --ocr=false
#table recognition
paddleocr --image_dir=PaddleOCR/ppstructure/docs/table/table.jpg --type=structure --layout=false
paddleocr.py是主模块,--image_dir 是待识别图片地址 --type、--table与--layout共同控制识别模式选择。
操作界面展示
我通过调用接口编写了一个简单地识别模块predict.py
"""
1.img_path 为您想要识别的图片所在地址(注意:路径不能有中文!)
2.Exit?输入exit即可退出
"""
import os
import cv2
from paddleocr import PPStructure, draw_structure_result, save_structure_res
table_engine = PPStructure(show_log=True)
while True:
save_folder = 'Output'
img_path = input('\nPlease enter img path:')
if img_path == '':
img_path = 'Input/emotion/ocr6.jpg'
print(f'Image path: {img_path}')
img = cv2.imread(img_path)
result = table_engine(img)
save_name = os.path.basename(img_path).split('.')[0]
save_structure_res(result, save_folder, save_name)
for line in result:
line.pop('img')
print(line)
from PIL import Image
font_path = 'PaddleOCR/doc/fonts/simfang.ttf' # PaddleOCR下提供字体包
image = Image.open(img_path).convert('RGB')
im_show = draw_structure_result(image, result, font_path=font_path)
im_show = Image.fromarray(im_show)
im_show.save(f'Output/{save_name}/result.jpg')
print(
'\n------------------------------------------------It is show time !--------------------------------------------------------')
for i in range(result[0]['res'].__len__()):
ocr_res = result[0]['res'][i]['text']
print(f'ocr result[{i + 1}]: {ocr_res}')
exit = input(f'\nExit?')
if exit == 'exit':
break
输入
输入地址:Input/emotion/ocr13.jpg
原始输出
重点在result = table_engine(img),输入图片地址img,返回结果result
1.result为长度为1的列表(list)变量
2.result[0]为长度为4的字典(dict)变量
3.result[0]['res']为长度为2(即识别到的字符块个数)的列表(list)变量
4.result[0]['res'][0]为长度为3的字典(dict)变量,包含了识别到的第一个代码块的所有信息
4.1 result[0]['res'][0]['text']:第一个字符块的文本识别结果
4.2 result[0]['res'][0]['confidence']: 第一个字符块的文本识别置信度
4.3 result[0]['res'][0]['text_region']: 第一个字符块的旋转矩形检测框四个边界点坐标
4.3.1 可通过result[0]['res'][0]['text_region'][0][0]和result[0]['res'][0]['text_region'][0][1]来调用检测框边界点坐标
终端输出
D:\DLSoftware\Anaconda3\envs\paddle\python.exe C:/Users/cleste/Desktop/PaddleOCR-release-2.5/predict.py
[2022/06/17 12:44:20] ppocr DEBUG: Namespace(Output='./Output', alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir=None, cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./Output', det=True, det_algorithm='DB', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_fce_box_type='poly', det_limit_side_len=960, det_limit_type='max', det_model_dir='C:\\Users\\cleste/.paddleocr/whl\\det\\ch\\ch_PP-OCRv3_det_infer', det_pse_box_thresh=0.85, det_pse_box_type='quad', det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_mem=500, help='==SUPPRESS==', image_dir=None, ir_optim=True, label_list=['0', '180'], lang='ch', layout=True, layout_label_map=None, layout_path_model='lp://PubLayNet/ppyolov2_r50vd_dcn_365e_publaynet/config', max_batch_size=10, max_text_length=25, min_subgraph_size=15, mode='structure', ocr=True, ocr_version='PP-OCRv3', precision='fp32', process_id=0, rec=True, rec_algorithm='SVTR_LCNet', rec_batch_num=6, rec_char_dict_path='C:\\Users\\cleste\\Desktop\\PaddleOCR-release-2.5\\ppocr\\utils\\ppocr_keys_v1.txt', rec_image_shape='3, 48, 320', rec_model_dir='C:\\Users\\cleste/.paddleocr/whl\\rec\\ch\\ch_PP-OCRv3_rec_infer', save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], show_log=True, structure_version='PP-STRUCTURE', table=True, table_char_dict_path='C:\\Users\\cleste\\Desktop\\PaddleOCR-release-2.5\\ppocr\\utils\\dict\\table_structure_dict.txt', table_max_len=488, table_model_dir='C:\\Users\\cleste/.paddleocr/whl\\table\\en_ppocr_mobile_v2.0_table_structure_infer', total_process_num=1, type='ocr', use_angle_cls=False, use_dilation=False, use_gpu=True, use_mp=False, use_onnx=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, use_xpu=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False)
Please enter img path:Input/emotion/ocr13.jpg
Image path: Input/emotion/ocr13.jpg
[2022/06/17 12:44:36] ppocr DEBUG: dt_boxes num : 5, elapse : 0.040994882583618164
[2022/06/17 12:44:36] ppocr DEBUG: rec_res num : 5, elapse : 0.028000593185424805
[2022/06/17 12:44:36] ppocr DEBUG: dt_boxes num : 4, elapse : 0.02500295639038086
[2022/06/17 12:44:36] ppocr DEBUG: rec_res num : 4, elapse : 0.016002893447875977
{'type': 'Figure', 'bbox': [12, 49, 466, 460], 'res': [{'text': "WhenI'mbored nobody", 'confidence': 0.9314340353012085, 'text_region': [[20.0, 54.0], [458.0, 59.0], [458.0, 93.0], [20.0, 87.0]]}, {'text': 'textme,butassoonas', 'confidence': 0.947204053401947, 'text_region': [[21.0, 102.0], [448.0, 102.0], [448.0, 133.0], [21.0, 133.0]]}, {'text': "I'm busy.....", 'confidence': 0.8454315662384033, 'text_region': [[21.0, 141.0], [236.0, 148.0], [235.0, 179.0], [20.0, 171.0]]}, {'text': 'still nobodytext me', 'confidence': 0.8983144164085388, 'text_region': [[20.0, 182.0], [391.0, 185.0], [391.0, 218.0], [20.0, 215.0]]}, {'text': 'Haha ;)', 'confidence': 0.9406118988990784, 'text_region': [[17.0, 413.0], [121.0, 418.0], [119.0, 451.0], [16.0, 446.0]]}]}
{'type': 'Title', 'bbox': [12, 53, 454, 396], 'res': [{'text': "WhenI'mborednobod", 'confidence': 0.9059685468673706, 'text_region': [[21.0, 58.0], [447.0, 61.0], [447.0, 90.0], [21.0, 87.0]]}, {'text': 'textme,butassoonas', 'confidence': 0.9522207975387573, 'text_region': [[21.0, 103.0], [448.0, 103.0], [448.0, 133.0], [21.0, 133.0]]}, {'text': "I'mbusy..", 'confidence': 0.9493505954742432, 'text_region': [[22.0, 140.0], [206.0, 148.0], [204.0, 178.0], [20.0, 171.0]]}, {'text': 'stillnobodytextme', 'confidence': 0.9422968626022339, 'text_region': [[20.0, 183.0], [388.0, 186.0], [388.0, 218.0], [20.0, 215.0]]}]}
------------------------------------------------It is show time !--------------------------------------------------------
ocr result[1]: WhenI'mbored nobody
ocr result[2]: textme,butassoonas
ocr result[3]: I'm busy.....
ocr result[4]: still nobodytext me
ocr result[5]: Haha ;)
Exit?exit
进程已结束,退出代码0
文件夹输出
输出地址:Output/ocr13
输出的内容
(2)包含res的文本文件