如果要使用Tesseract库,首先需要安装它到你的机器上。
针对macOS用户,我们使用Homebrew 去安装Tesseract:
brew install tesseract
这里安装都是国外的网,建议更换下镜像源加速下载速度
# 替换brew.git:
$ cd "$(brew --repo)"
# 清华大学:
$ git remote set-url origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/brew.git
# 替换homebrew-core.git:
$ cd "$(brew --repo)/Library/Taps/homebrew/homebrew-core"
# 清华大学:
$ git remote set-url origin https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/homebrew-core.git
# 替换homebrew-bottles:
# 清华大学:
$ echo 'export HOMEBREW_BOTTLE_DOMAIN=https://mirrors.tuna.tsinghua.edu.cn/homebrew-bottles' >> ~/.bash_profile
$ source ~/.bash_profile
# 应用生效:
$ brew update
全过程下来,绝对会有一些错误出现的,现在将我遇到的错误总结如下:
==> Downloading https://storage.googleapis.com/downloads.webmproject.org/release
curl: (35) Server aborted the SSL handshake
Error: An exception occurred within a child process:
DownloadError: Failed to download resource "webp"
Download failed: https://storage.googleapis.com/downloads.webmproject.org/releases/webp/libwebp-1.1.0.tar.gz
错误上面的错误,就是这个webp被墙了
解决办法:
brew edit webp
修改下下载webp的下载地址(只需要修改url)
class Webp < Formula
desc "Image format providing lossless and lossy compression for web images"
homepage "https://developers.google.com/speed/webp/"
url "http://downloads.webmproject.org/releases/webp/libwebp-1.1.0.tar.gz"
sha256 "98a052268cc4d5ece27f76572a7f50293f439c17a98e67c4ea0c7ed6f50ef043"
然后:
brew upgrade
brew cleanup
再重新输入下载
brew install tesseract
但此时我到了最后一步还是没有下载成功。
curl: (7) Failed to connect to raw.githubusercontent.com port 443: Connection refused
Error: An exception occurred within a child process:
DownloadError: Failed to download resource "tesseract--snum"
Download failed: https://github.com/USCDataScience/counterfeit-electronics-tesseract/raw/319a6eeacff181dad5c02f3e7a3aff804eaadeca/Training%20Tesseract/snum.traineddata
出现上面的错误的时候,我下意识访问下:https://github.com/USCDataScience/counterfeit-electronics-tesseract/raw/319a6eeacff181dad5c02f3e7a3aff804eaadeca/Training%20Tesseract/snum.traineddata
发现这个网址已经404了。所以我开始找替代方案,继续使用错误1的解决办法,把404的网址更换为:https://github.com/USCDataScience/counterfeit-electronics-tesseract/tree/master/Training%20Tesseract/snum.traineddata
输入下面的修改代码
brew edit tesseract
然后找到resource "snum" do
resource "snum" do
url "https://github.com/USCDataScience/counterfeit-electronics-tesseract/tree/master/Training%20Tesseract/snum.traineddata"
sha256 "36f772980ff17c66a767f584a0d80bf2302a1afa585c01a226c1863afcea1392"
end
还是一样只修改网址就行了。对了,忘了说保存方式,和vim一样,按下esc后输入:wq!强制保存即可。
然后再去下载。没错,还是出现了错误!!!
Error: An exception occurred within a child process:
ChecksumMismatchError: SHA256 mismatch
Expected: 36f772980ff17c66a767f584a0d80bf2302a1afa585c01a226c1863afcea1392
Actual: 372b93aca56b0e7145c7975ab023f6c7796b199b42d5901c8baf151d515b2ce6
Archive: /Users/************/Library/Caches/Homebrew/downloads/a8e28ef5d96dadeb9c9a01d9f8f34e10aae2e401d88e1fea343b12d13a49885c--snum.traineddata
To retry an incomplete download, remove the file above.
这个错误说明错误2修改网址的时候,没有修改sha256的值!
上面错误代码中Expected是说你现在错误的错误sha256码,而Actual是正确的。
所以去使用错误2的办法去修改成
resource "snum" do
url "https://github.com/USCDataScience/counterfeit-electronics-tesseract/tree/master/Training%20Tesseract/snum.traineddata"
sha256 "372b93aca56b0e7145c7975ab023f6c7796b199b42d5901c8baf151d515b2ce6"
end
此时!!!我终于下载好了。
执行如下命令,可以验证Tesseract是否安装成功:
$ tesseract -v
tesseract 3.05.00
leptonica-1.74.1
libjpeg 8d : libpng 1.6.29 : libtiff 4.0.7 : zlib 1.2.8
如果你看到Tesseract的版本和其依赖的库的版本列表,证明你已经安装成功。
如果你安装失败:
-bash: tesseract: command not found
如果看到以上结果,证明你的机器没有安装Tesseract。请返回到步骤1重写开始。或者你需要更新你的PATH环境(具体的修改PATH可以去网上找找)。
现在可以开始测试下识别:
tesseract 你的文件夹/要识别的图片(格式没有限制gif都可以).png stdout
如果出现识别结果,就证明Mac可以识别,并且Java项目在Mac上也能调用代码识别了!
当使用Tesseract时,我建议:
①Warning: Parameter not found: enable_new_segsearch
Mac出现的时候(把语言包文件拷贝到你在Java代码设定好的目录下,原因是此目录没有中文简体的语言包)
ITesseract iTesseract = new Tesseract();
iTesseract.setDatapath("你的语言包绝对路径");
②Warning: Invalid resolution 0 dpi. Using 70 instead.
ITesseract iTesseract = new Tesseract();
iTesseract.setDatapath("你的语言包绝对路径");
iTesseract.setTessVariable("user_defined_dpi", "300");
设置一下dpi即可,默认设置300是最好的
完结。有问题欢迎评论留言解决!