[macOS] tesseract 4.X安装

tesseract下载和安装

        最近在苹果电脑上面安卓tesseract,如果直接用Homebrew安装,如brew install tesseract会安装不成功,提示有个依赖的包无法下载成功,尝试无数次那个出问题的下载地址是无法连到服务器;再试用brew install tesseract --HEAD安装最新的版本,此时编译的时候失败,只说我的电脑版本太低。。。最后参考https://github.com/tesseract-ocr/tesseract/wiki/Compiling#macos 终于安装成功了:

brew install automake autoconf libtool
brew install pkgconfig
brew install icu4c
brew install leptonica
brew install gcc
brew install pango

 

git clone https://github.com/tesseract-ocr/tesseract/
cd tesseract
./autogen.sh
./configure CC=gcc-8 CXX=g++-8 CPPFLAGS=-I/usr/local/opt/icu4c/include LDFLAGS=-L/usr/local/opt/icu4c/lib
make -j
sudo make install  # if desired
make training # if installed with training dependencies

最后用tesseract --version来查看是否安装成功。

安装tesseract多语言

使用tesseract --list-langs查看,会报错:

linfangfangdeMacBook-Pro:tesseract linfangfang$ tesseract --list-langs
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
List of available languages (0):

可以到https://github.com/tesseract-ocr/tessdata_fast查看,发现tesseract多语言包都在这个路径下,需要我们手动下载:

到/usr/local/share下克隆一份,如:

linfangfangdeMacBook-Pro:share linfangfang$ git clone https://github.com/tesseract-ocr/tessdata_fast
正克隆到 'tessdata_fast'...
remote: Enumerating objects: 243, done.
remote: Total 243 (delta 0), reused 0 (delta 0), pack-reused 243
接收对象中: 100% (243/243), 335.11 MiB | 1.92 MiB/s, 完成.
处理 delta 中: 100% (40/40), 完成.
正在检出文件: 100% (163/163), 完成.
linfangfangdeMacBook-Pro:share linfangfang$

再把tessdata_fast的文件复制到tessdata文件夹中

linfangfangdeMacBook-Pro:share linfangfang$ sudo cp -rf tessdata_fast/* tessdata/
Password:
linfangfangdeMacBook-Pro:share linfangfang$ cd tessdata
linfangfangdeMacBook-Pro:tessdata linfangfang$ ls
COPYING				kaz.traineddata
README.md			khm.traineddata
afr.traineddata			kir.traineddata
amh.traineddata			kmr.traineddata
ara.traineddata			kor.traineddata
asm.traineddata			kor_vert.traineddata
aze.traineddata			lao.traineddata
aze_cyrl.traineddata		lat.traineddata
bel.traineddata			lav.traineddata
ben.traineddata			lit.traineddata
bod.traineddata			ltz.traineddata
bos.traineddata			mal.traineddata
bre.traineddata			mar.traineddata
bul.traineddata			mkd.traineddata
cat.traineddata			mlt.traineddata
ceb.traineddata			mon.traineddata
ces.traineddata			mri.traineddata
chi_sim.traineddata		msa.traineddata
chi_sim_vert.traineddata	mya.traineddata
chi_tra.traineddata		nep.traineddata
chi_tra_vert.traineddata	nld.traineddata
chr.traineddata			nor.traineddata
configs				oci.traineddata
cos.traineddata			ori.traineddata
cym.traineddata			osd.traineddata
dan.traineddata			pan.traineddata
deu.traineddata			pdf.ttf
div.traineddata			pol.traineddata
dzo.traineddata			por.traineddata
ell.traineddata			pus.traineddata
eng.traineddata			que.traineddata
enm.traineddata			ron.traineddata
epo.traineddata			rus.traineddata
est.traineddata			san.traineddata
eus.traineddata			script
fao.traineddata			sin.traineddata
fas.traineddata			slk.traineddata
fil.traineddata			slv.traineddata
fin.traineddata			snd.traineddata
fra.traineddata			spa.traineddata
frk.traineddata			spa_old.traineddata
frm.traineddata			sqi.traineddata
fry.traineddata			srp.traineddata
gla.traineddata			srp_latn.traineddata
gle.traineddata			sun.traineddata
glg.traineddata			swa.traineddata
grc.traineddata			swe.traineddata
guj.traineddata			syr.traineddata
hat.traineddata			tam.traineddata
heb.traineddata			tat.traineddata
hin.traineddata			tel.traineddata
hrv.traineddata			tessconfigs
hun.traineddata			tgk.traineddata
hye.traineddata			tha.traineddata
iku.traineddata			tir.traineddata
ind.traineddata			ton.traineddata
isl.traineddata			tur.traineddata
ita.traineddata			uig.traineddata
ita_old.traineddata		ukr.traineddata
jav.traineddata			urd.traineddata
jpn.traineddata			uzb.traineddata
jpn_vert.traineddata		uzb_cyrl.traineddata
kan.traineddata			vie.traineddata
kat.traineddata			yid.traineddata
kat_old.traineddata		yor.traineddata

linfangfangdeMacBook-Pro:~ linfangfang$ tesseract --list-langs
List of available languages (161):
afr
amh
ara
asm
aze
aze_cyrl
。。。。。。。。。

此时就可以说明安装成功了,测试下:

linfangfangdeMacBook-Pro:~ linfangfang$ tesseract /Users/linfangfang/Desktop/pic.jpg /Users/linfangfang/Desktop/out -l chi_sim
Tesseract Open Source OCR Engine v4.0.0-306-gb67f with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 326
Detected 67 diacritics
linfangfangdeMacBook-Pro:~ linfangfang$ 

在桌面上生成一个out.txt的文件,里面就可以获取图片里面的文字了

你可能感兴趣的:(opencv学习)