1、 安装python-3.9版本
链接:https://pan.baidu.com/s/1IgF1RwGyV7Qu-FqspeloYg
提取码:pn9k
2、 安装PaddlePaddle
# CUDA 9 or CUDA 10 安装
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
# CPU版本
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
3、安装 PaddkeOCR Whl Package
pip install "paddleocr>=2.0.1"
4、测试
paddleocr --image_dir ./imgs_en/img_12.jpg --use_angle_cls true --lang en --use_gpu false
5、python 程序识别模型
from paddleocr import PaddleOCR,draw_ocr
# Paddleocr supports Chinese, English, French, German, Korean and Japanese.
# You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan`
# to switch the language model in order.
ocr = PaddleOCR(use_angle_cls=True, lang='en') # need to run only once to download and load model into memory
img_path = './imgs_en/img_12.jpg'
result = ocr.ocr(img_path, cls=True)
for line in result:
print(line)
# draw result
from PIL import Image
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./fonts/simfang.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')
以上是可以按标准的识别模型进行识别、下面进行模型训练 5、安装PaddlePaddle
pip3 install --upgrade pip
# 如果您的机器安装的是CUDA9或CUDA10,请运行以下命令安装
python3 -m pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple
# 如果您的机器是CPU,请运行以下命令安装
python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
6、安装 PPOCRLabel
windows
pip install PPOCRLabel # 安装
PPOCRLabel --lang ch # 运行
linux版本
pip3 install PPOCRLabel
pip3 install trash-cli
PPOCRLabel --lang ch
对数据进行分类:
cd ./PPOCRLabel # 将目录切换到PPOCRLabel文件夹下
python gen_ocr_train_val_test.py --trainValTestRatio 6:2:2 --datasetRootPath ../train_data
7、下载PaddleOCR源码 GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices) 8、修改配置文件 configs>det>det_mv3_db.yml 文件模型 9、下载训练模型,修改训练地址 10、修改线程数 11、修改评估集地址det_mv3_db.yml 12、最后训练好可以在./output/db_mv3 下面的yml中查看训练的配置文件 13、需要将生成的转换成为infer文件 命令如下:
python tools/export_model.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./output/db_mv3/iter_epoch_1200 Global.save_inference_dir=./output/db_mv3_infer/
14、在 C->usr->.paddleocr->whl->det替换掉infer文件
相关参考:
GitHub - PaddlePaddle/PaddleOCR: Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
PPOCRLabel 训练
PaddleOCR 训练自己的模型