centos 7 配置tessdata 4.1.1-rc2-20-g01fb


注,使用yum-config-manger 命令时,
有时系统会提示 : yum-config-manager: command not found
这是因为系统默认没有安装这个命令,这个命令在yum-utils包里,
可以通过命令:yum install yum-utils 安装此包即可


yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/
sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key
yum update
yum install tesseract  
pip install tesseract  


在https://github.com/tesseract-ocr/tessdata 下载chi_sim.traineddata 语言的模型文件
将模型文件移动到/usr/share/tesseract/4/tessdata/

tesseract 3.jpg  3.txt  -l chi_sim

python测试脚本
import pytesseract
pytesseract.image_to_string('/opt/pyd/3.jpg', lang='chi_sim',config='--psm 11')

注,config='--psm 11'中11这个参数要根据需求改进,参数如下   

                    //0    Orientation and script detection(OSD) only.
                    //1    Automatic page segmentation with OSD.
                    //2    Automatic page segmentation, but no OSD, or OCR.
                    //3    Fully automatic page segmentation, but no OSD. (Default)
                    //4    Assume a single column of text of variable sizes.
                    //5    Assume a single uniform block of vertically aligned text.
                    //6    Assume a single uniform block of text.
                    //7    Treat the image as a single text line.
                    //8    Treat the image as a single word.
                    //9    Treat the image as a single word in a circle.
                    //10    Treat the image as a single character.
                    //11    Sparse text. Find as much text as possible in no particular order.
                    //12    Sparse text with OSD.
                    //13    Raw line. Treat the image as a single text line,
                    //bypassing hacks that are Tesseract - specific.

你可能感兴趣的:(opencv,python)