1.安装
# 下载源码
git clone https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
# 可能出现错误: Unable to find a valid copy of libtoolize or glibtoolize in your PATH!
# 解决方案:
## yum install automake -y
## yum install libtool -y
./configure
# ./configure可能出现以下问题, 附上解决方案
# 问题1 configure: error: Your compiler does not have the necessary C++17 support! Cannot proceed.
# 解决方案: https://segmentfault.com/a/1190000041832780
# 问题2 configure: error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package.
# 解决方案: https://segmentfault.com/a/1190000041833110
make && make install
ldconfig
2.下载语言库
官网下载地址:https://github.com/tesseract-ocr/tessdata
上传到Linux /usr/local/share/tessdata/目录
如果是用java开发,tess4j-5.2.1.jar包里也有tessdata语言库, 可以从jar包解压上传该目录, 不过只有eng、osd两种语言
官方说明文档: https://tesseract-ocr.github.io/tessdoc/Compiling.html
其他教程链接:
Linux环境如何支持使用tess4j进行ORC
linux (centos7)上装Tesseract-OCR最新版本(5.0)