win7,64位系统安装Tesseract和pytesseract，用python轻松实现中文识别，超全攻略！

OCR(Optical Character Recognition):光学字符识别,是指对图片文件中的文字进行分析识别，获取的过程。

Tesseract：开源的OCR识别引擎，初期Tesseract引擎由HP实验室研发，后来贡献给了开源软件业，后经由Google进行改进，消除bug，优化，重新发布。当前版本为4.0

step1:

安装tesseract

从官网下载这个文件：

tesseract-ocr-setup-4.00.00dev.exe

下载地址：https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#400-alpha-for-windows

下载完成直接点击安装，安装过程中注意选择自己想要安装的语言包

下载完后进行安装,默认情况下安装程序会给你配置系统环境变量,以指向安装目录（之后可以通过DOS界面在任意目录运行tesseract）。安装完成后目录如下:

tessdata 目录存放的是语言字库文件，和在命令行界面中可能用到的参数所对应的文件. 这个安装程序默认包含了英文字库。

如果安装时候忘记选择语言库文件，可以通过以下网址下载剪切进tessdata 目录即可。

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files

step2:pytesseract安装

pip安装pytesserac

step3：

修改pytesseract.py原文件

# tesseract_cmd = 'tesseract'###此处需要修修改

tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'#改成你对应的路径

#如果不修改，会报错：FileNotFoundError: [WinError 2] 系统找不到指定的文件。

#f = open(output_file_name)##此处需要修修改

f = open(output_file_name, encoding='utf-8')

#如果不修改，会儿报错：UnicodeDecodeError: 'gbk' codec can't decode byte 0xyy in position xxx: illegal multibyte sequence

修改处截图：

step4：

测试代码

import pytesseract

from PIL import Image

image = Image.open(r'C:\Users\yaohongfu\Desktop\4.png')

vcode = pytesseract.image_to_string(image,lang='chi_sim')

print(vcode)

#coding:utf-8

#Test one pageimport pytesseract

from PIL import Image

def processImage():

image = Image.open(r'C:\Users\yaohongfu\Desktop\4.png')

#背景色处理，可有可无

image = image.point(lambda x: 0 if x < 143 else 255)

newFilePath = 'raw-test.png'

image.save(newFilePath)

content = pytesseract.image_to_string(Image.open(newFilePath),lang='chi_sim')

#中文图片的话，是lang='chi_sim'

print(content)processImage()

#参考站点

https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#400-alpha-for-windows

https://github.com/tesseract-ocr/tesseract/wiki/Data-Files

https://github.com/madmaze/pytesseract

https://github.com/tesseract-ocr/tesseract

https://github.com/tesseract-ocr/tesseract/wiki

win7,64位系统安装Tesseract和pytesseract，用python轻松实现中文识别，超全攻略！

你可能感兴趣的:(人工智能)