安装centos系统依赖
yum install -y automake autoconf libtool gcc gcc-c++
yum install -y libpng-devel libjpeg-devel libtiff-devel
yum -y install python-devel
yum -y install openssl-devel
yum -y install opencv
yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel
yum install -y libffi libffi-devel
yum install libmount -y
以下是安装linux系统所需的软件
如果没有安openssl,会出现”command ‘gcc’ failed with exit status 1 错误提示
如果tesseract3.0安装leptonica 1.7.2
wget http://www.leptonica.org/source/leptonica-1.72.tar.gz
tar xvzf leptonica-1.72.tar.gz
cd leptonica-1.72/
./configure
make && make install
如果需要tesseract4.0,则需要安装leptonica 1.74.4
http://www.leptonica.org/download.html
wget http://www.leptonica.org/source/leptonica-1.74.4.tar.gz
tar xvzf leptonica-1.74.4.tar.gz
cd leptonica-1.74.4/
./configure
make && make install
安装tesseract3.0-ocr
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.zip
unzip 3.04.zip
cd tesseract-3.04/
autoreconf -I /usr/share/aclocal
./configure
make && make install
sudo ldconfig
安装tesseract4 .0
wget https://codeload.github.com/tesseract-ocr/tesseract/zip/4.00.00dev
或者下面这个
wget http://linux-1251121573.cosgz.myqcloud.com/soft/tesseract/4.0.0-beta.1.tar.gz
unzip tesseract-4.00.00dev.zip
cd tesseract-4.00.00dev
autoreconf -I /usr/share/aclocal
./autogen.sh
./configure --prefix=$HOME/local/
make
make install
ldconfig
运行会提示报错./autogen.sh,缺少autoconf之类的包,需要安装包
wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz
tar -zxvf autoconf-2.69.tar.gz
cd autoconf-2.69
./configure
make && make install
wget http://ftp.gnu.org/gnu/automake/automake-1.14.tar.gz
tar -zxvf automake-1.14.tar.gz
cd automake-1.14
./bootstrap.sh
./configure
make && make install
从这里下载一个autoconf-archive解压之后上传到服务器之后安装
http://mirrors.ustc.edu.cn/gnu/autoconf-archive/
wget http://mirrors.ustc.edu.cn/gnu/autoconf-archive/autoconf-archive-2018.03.13.tar.xz
xz -d autoconf-archive-2018.03.13.tar.xz
tar xvf autoconf-archive-2018.03.13.tar
cd autoconf-archive-2018.03.13
./configure
make && make install
安装glib需要安装如下
wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.40.tar.gz
tar -zvxf pcre-8.40.tar.gz
cd pcre-8.40
./configure
make
make install
wget http://ftp.gnome.org/pub/gnome/sources/glib/2.56/glib-2.56.1.tar.xz
xz -d glib-2.56.1.tar.xz
tar xvf glib-2.56.1.tar
cd glib-2.56.1
./configure --enable-libmount=no
make
make install
如果./configure 失败,则运行 ./configure --enable-libmount=no
wget http://pkgconfig.freedesktop.org/releases/pkg-config-0.29.2.tar.gz
tar -zvxf pkg-config-0.29.2.tar.gz
cd pkg-config-0.29.2
./configure --with-internal-glib
make
make install
如果./configure 失败,则运行 ./configure --with-internal-glib
参考如下文章
https://linux.cn/thread-16838-1-1.html
https://blog.csdn.net/windeal3203/article/details/76608248
部署模型
在https://github.com/tesseract-ocr/tessdata 下载对应语言的模型文件
由于目前只需要识别手机号码和英文,只下载一个eng.traineddata文件即可,
将模型文件移动到/usr/local/share/tessdata
然后即可进行识别
以下是安装python所需的库
pip install pytesseract
pip install tesseract
pip install tesseract-ocr
python -m pip install --upgrade pip setuptools
python -m pip install "django<2"
pip install image
sudo apt-get install tesseract-ocr
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
sudo apt-get install gcc
sudo apt-get install g++
sudo apt-get install automake
1.tesseract-ocr安装
sudo apt-get install tesseract-ocr
2.pytesseract安装
sudo pip install pytesseract
3.Pillow 安装
sudo pip install pillow
示例:
import pytesseract
from PIL import Image
image = Image.open('bb.png')
code = pytesseract.image_to_string(image)
print code
在centos中,出现如下报错1:
pytesseract.pytesseract.TesseractError: (127, u’tesseract: error while loading shared libraries: libtesseract.so.3: cannot open shared object file: No such file or directory’)
是出现这类错误表示,系统不知道xxx.so放在哪个目录下,这时候就要在/etc/ld.so.conf中加入xxx.so所在的目录。
一般而言,有很多的so会存放在/usr/local/lib这个目录底下,去这个目录底下找,果然发现自己所需要的.so文件。
所以,在/etc/ld.so.conf中加入/usr/local/lib这一行,保存之后,再运行:/sbin/ldconfig –v更新一下配置即可。
sudo ldconfig
参考:http://www.eefocus.com/winter1988/blog/13-03/292209_03d5b.html
报错2:
Running automake --add-missing --copy
src/api/Makefile.am:17: error: Libtool library used but 'LIBTOOL' is undefined
这个是因为没有配置正确aclocal的库LIBTOOL.m4的路径
解决方法:
—-查看aclocal的路径
aclocal --print-ac-dir
先查看路径:
之后执行即可
autoreconf -I /usr/local/share/aclocal
参考解决https://blog.csdn.net/sky_qing/article/details/9707647
报错3:
在安装tesseract 执行./configure时:出现如下提示:
./configure: line 4193: syntax error near unexpected token `-mavx,'
./configure: line 4193: `AX_CHECK_COMPILE_FLAG(-mavx, avx=true, avx=false)'
在#647 中能找到:https://github.com/tesseract-ocr/tesseract/issues/647
回答:“You should install autoconf-archive .” 看来时autoconf-archive的原因。 已经装了???
这货在/local/share/aclocal里面。 一堆m4后缀的文件:m4 是一个宏处理器.将输入拷贝到输出,同时将宏展开. 宏可以是内嵌的也可以是用户定义的. 除了可以展开宏,m4还有一些内建的函数,用来引用文件,执行Unix命令,整数运算,文本操作,循环等. m4既可以作为编译器的前端也可以单独作为一个宏处理器.
现在的问题是怎么引用这些文件。 ./configure –help 没有。 许久找到了这篇讲自动编译的文章:https://jin-yang.github.io/post/linux-package.html
有这样一句话:”aclocal,将在 configure.ac 同一目录下生成 aclocal.m4,在扫描 configure.ac 的过程中,将第三方扩展和开发者自己编写的宏定义复制进去;这样,autoconf 在遇到不认识的宏时,就会从 aclocal.m4 中查找”. 的确编译是生成了aclocal.m4文件。 看来的想办法把/local/share/aclocal里的m4引用进去。
打开autogen.sh : 81行有echo “Running aclocal” aclocal -I config. 应该是这儿引用的宏。操作我看不懂 .所以怎么指定加入自己的宏?
aclocal –help 有: -I DIR add directory to search list for .m4 files –install copy third-party files to the first -I directory –system-acdir=DIR directory holding third-party system-wide files
感觉是这几个参数。。。试一试 .. aclocal -I m4 –install –system-acdir= HOME/local/share/aclocal再./autogen.sh./configure奇迹般的跑起来了。因此解决办法就是:在autogen.sh:81行后面添加aclocal−Im4–install–system−acdir= H O M E / l o c a l / s h a r e / a c l o c a l 再 . / a u t o g e n . s h . / c o n f i g u r e 奇 迹 般 的 跑 起 来 了 。 因 此 解 决 办 法 就 是 : 在 a u t o g e n . s h : 81 行 后 面 添 加 a c l o c a l − I m 4 – i n s t a l l – s y s t e m − a c d i r = HOME/local/share/aclocal 保存退出。再安装即可。
aclocal -I m4 –install –system-acdir=$HOME/local/share/aclocal 为aclocal的路径,但是一般为/usr/local/share/aclocal
参考链接
https://www.ggbond.cc/index.php/%E7%BC%96%E8%AF%91%E5%AE%89%E8%A3%85tesseract/
https://blog.csdn.net/dcrmg/article/details/78128026
参考网址:
https://my.oschina.net/u/2328100/blog/889034
https://www.cnblogs.com/arachis/p/OCR.html
https://www.cnblogs.com/blazer/p/7131202.html
https://blog.csdn.net/strugglerookie/article/details/71606540
https://blog.csdn.net/wu_yuanyi/article/details/50254413
https://blog.csdn.net/diandianxiyu_geek/article/details/50522582
https://github.com/tesseract-ocr/tesseract/wiki/Compiling#linux
附件为完整的安装步骤顺序:
如果以前安装过tesseract3.0,那么需要卸载旧版本的
rpm -e tesseract
rpm -e leptonica
然后安装tesseract4.0,如果运行 tesseract提示找不到命令
就把/root/local/bin/tesseract文件复制到/usr/bin/目录即可
#安装系统依赖
yum install -y automake autoconf libtool gcc gcc-c++
yum install -y libpng-devel libjpeg-devel libtiff-devel
yum -y install python-devel
yum -y install openssl-devel
yum -y install opencv
yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel
yum install -y libffi libffi-devel
#安装leptonica-1.74.4
http://www.leptonica.org/download.html
wget http://linux-1251121573.cosgz.myqcloud.com/soft/tesseract/leptonica-1.74.4.tar.gz
tar xvzf leptonica-1.74.4.tar.gz
cd leptonica-1.74.4/
./configure
make && make install
cd ..
# 安装auto*之类的依赖,目的是为了autogen.sh调用autoreconf生成configure文件
wget http://linux-1251121573.cosgz.myqcloud.com/soft/tesseract/autoconf-2.69.tar.gz
tar -zxvf autoconf-2.69.tar.gz
cd autoconf-2.69
./configure
make && make install
cd ..
wget http://linux-1251121573.cosgz.myqcloud.com/soft/tesseract/automake-1.14.tar.gz
tar -zxvf automake-1.14.tar.gz
cd automake-1.14
./bootstrap.sh
./configure
make && make install
cd ..
wget http://linux-1251121573.cosgz.myqcloud.com/soft/tesseract/autoconf-archive-2018.03.13.tar
xz -d autoconf-archive-2018.03.13.tar.xz
tar xvf autoconf-archive-2018.03.13.tar
cd autoconf-archive-2018.03.13
./configure
make && make install
cd ..
# 安装glib之类的依赖
wget http://linux-1251121573.cosgz.myqcloud.com/soft/tesseract/pcre-8.40.tar.gz
tar -zvxf pcre-8.40.tar.gz
cd pcre-8.40
./configure
make
make install
cd ..
wget http://linux-1251121573.cosgz.myqcloud.com/soft/tesseract/glib-2.56.1.tar
tar xvf glib-2.56.1.tar
cd glib-2.56.1
./configure --enable-libmount=no
make
make install
cd ..
wget http://linux-1251121573.cosgz.myqcloud.com/soft/tesseract/pkg-config-0.29.2.tar.gz
tar -zvxf pkg-config-0.29.2.tar.gz
cd pkg-config-0.29.2
./configure --with-internal-glib
make
make install
cd ..
# 安装tesseract-ocr4.0
wget http://linux-1251121573.cosgz.myqcloud.com/soft/tesseract/4.0.0-beta.1.tar.gz
tar zxvf 4.0.0-beta.1.tar.gz
cd tesseract-4.0.0-beta.1
autoreconf -I /usr/local/share/aclocal
./autogen.sh
./configure --prefix=$HOME/local/
make
make install
由于未知原因
autoreconf -I /usr/local/share/aclocal
./autogen.sh
这两个命令需要反复试几次,否则会缺少东西报错。
下载训练模型并移动到文件夹中:
wget http://linux-1251121573.cosgz.myqcloud.com/soft/tesseract/chi_sim.traineddata
wget http://linux-1251121573.cosgz.myqcloud.com/soft/tesseract/eng.traineddata
mv chi_sim.traineddata eng.traineddata /usr/local/share/tessdata