https://github.com/tesseract-ocr/tesseract
http://www.leptonica.org/
https://github.com/tesseract-ocr/tesseract/wiki
https://github.com/tesseract-ocr/tesseract/wiki/Compiling#linux
官网中写的是Ubuntu上安装步骤,在CentOS上有些差别,基础组件一定要安装,例如
sudo apt-get install libpng-dev
在CentOS上命令是
sudo yum install libpng-devel
Leptonica版本请严格按照官网上的下载。需要先安装 Leptonica,再安装 Tesseract。
笔者之前选用的是3.05.01 Release 和 leptonica-1.74.4.tar.gz,反复确认leptonica安装无误,遇到的报错是
configure: error: Leptonica 1.7.4 or higher is required. Try to install libleptonica-dev package.
后来使用tesseract-3.0.5.Release 和leptonica-1.7.0版本,图片解析成功。
如果libpng, libjpeg, libtiff这三个依赖包未安装,还会遇到的报错是
Tesseract Open Source OCR Engine v3.05.00 with Leptonica
Error in pixReadStreamJpeg: function not present
Error in pixReadStream: jpeg: no pix returned
Error in pixRead: pix not read
Error during processing.
安装三个依赖包后,需要重新编译leptonica。
https://github.com/tesseract-ocr/tesseract/wiki/Compiling
https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#400-alpha-for-windows
直接安装即可,注意选用中文简繁体语言包
/**
* 判断当前操作系统是 linux or windows
* @author eko.zhan at 2017年12月19日 下午6:20:54
* @return
*/
private boolean isLinux(){
Properties prop = System.getProperties();
String defaultOS = prop.getProperty("os.name").toUpperCase();
if (defaultOS.indexOf(OS_LINUX) > -1) {
return true;
}
return false;
}
/**
* 运行 tesseract 进行图片识别
* @author eko.zhan at 2017年12月19日 下午6:22:49
* @param inputPath
* @param outputPath
* @throws IOException
* @throws InterruptedException
*/
private void runCmd(String inputPath, String outputPath) throws IOException, InterruptedException{
String command = null;
Process process = null;
if (isLinux()){
command = "tesseract " + inputPath + " " + outputPath + " -l chi_sim";
process = Runtime.getRuntime().exec(new String[]{"/bin/sh", "-c", command});
}else{
command = TESSERACT_PATH + "/tesseract " + inputPath + " " + outputPath + " -l chi_sim";
process = Runtime.getRuntime().exec(command);
}
InputStream inputStream = process.getInputStream();
List list = IOUtils.readLines(inputStream, UTF8);
StringBuffer result = new StringBuffer();
for (String s : list) {
result.append(s);
}
logger.debug(result);
process.waitFor();
}