注,使用yum-config-manger 命令时,
有时系统会提示 : yum-config-manager: command not found
这是因为系统默认没有安装这个命令,这个命令在yum-utils包里,
可以通过命令:yum install yum-utils 安装此包即可
yum-config-manager --add-repo https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/
sudo rpm --import https://build.opensuse.org/projects/home:Alexander_Pozdnyakov/public_key
yum update
yum install tesseract
pip install tesseract
在https://github.com/tesseract-ocr/tessdata 下载chi_sim.traineddata 语言的模型文件
将模型文件移动到/usr/share/tesseract/4/tessdata/
tesseract 3.jpg 3.txt -l chi_sim
python测试脚本
import pytesseract
pytesseract.image_to_string('/opt/pyd/3.jpg', lang='chi_sim',config='--psm 11')
注,config='--psm 11'中11这个参数要根据需求改进,参数如下
//0 Orientation and script detection(OSD) only.
//1 Automatic page segmentation with OSD.
//2 Automatic page segmentation, but no OSD, or OCR.
//3 Fully automatic page segmentation, but no OSD. (Default)
//4 Assume a single column of text of variable sizes.
//5 Assume a single uniform block of vertically aligned text.
//6 Assume a single uniform block of text.
//7 Treat the image as a single text line.
//8 Treat the image as a single word.
//9 Treat the image as a single word in a circle.
//10 Treat the image as a single character.
//11 Sparse text. Find as much text as possible in no particular order.
//12 Sparse text with OSD.
//13 Raw line. Treat the image as a single text line,
//bypassing hacks that are Tesseract - specific.