tesseract 使用说明

PS C:\Users\xyw\Desktop> tesseract --help-extra

1.Usage:

选项 描述
–help | --help-extra | --help-psm | --help-oem | --version 帮助
–list-langs [–tessdata-dir PATH] 查看支持的语言
–print-parameters [options…] [configfile…] 查看配置
imagename|imagelist|stdin outputbase|stdout [options…] [configfile…] 查看文件

Usage:
D:\program\Tesseract-OCR\tesseract.exe --help | --help-extra | --help-psm | --help-oem | --version
D:\program\Tesseract-OCR\tesseract.exe --list-langs [–tessdata-dir PATH]
D:\program\Tesseract-OCR\tesseract.exe --print-parameters [options…] [configfile…]
D:\program\Tesseract-OCR\tesseract.exe imagename|imagelist|stdin outputbase|stdout [options…] [configfile…]

2.OCR 配置

选项 描述
–tessdata-dir PATH 指定tessdata路径
–user-words PATH 指定本地使用者字符文件
–user-patterns PATH 指定本地使用者模式文件
–dpi VALUE 设置dpi
-l LANG[+LANG] 设置OCR识别的语言
-c VAR=VALUE 设置配置文件的变量
–psm NUM 设置page segmentation mode
–oem NUM 设置OCR Engine mode

OCR options:
–tessdata-dir PATH Specify the location of tessdata path.
–user-words PATH Specify the location of user words file.
–user-patterns PATH Specify the location of user patterns file.
–dpi VALUE Specify DPI for input image.
-l LANG[+LANG] Specify language(s) used for OCR.
-c VAR=VALUE Set value for config variables.
Multiple -c arguments are allowed.
–psm NUM Specify page segmentation mode.
–oem NUM Specify OCR Engine mode.
NOTE: These options must occur before any configfile.

3.psm模式设置

参数 描述
0 方向和脚本检测(OSD)
1 使用OSD自动分页
2 自动分页,但没有OSD或OCR
3 全自动页面分割,但没有OSD(默认)
4 假设一列可变大小的文本
5 假定一个统一的垂直排列文本块
6 假设一个统一的文本块
7 将图像视为单个文本行
8 将图像视为一个单词
9 将图像视为一个圆圈中的单个单词
10 将图像视为单个字符
11
12
13

Page segmentation modes:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.

4.OCR引擎模式选择

参数 描述
0 Legacy
1 LSTM
2 Legacy+LSTM
3 使用可用的模式(default)

OCR Engine modes:
0 Legacy engine only.
1 Neural nets LSTM engine only.
2 Legacy + LSTM engines.
3 Default, based on what is available.

5.单选项

参数 描述
-h, --help 简要帮助
–help-extra 额外帮助
–help-psm psm配置
–help-oem oem配置
-v, --version 版本信息
–list-langs 支持的语言
–print-parameters 打印相关的参数

Single options:
-h, --help Show minimal help message.
–help-extra Show extra help for advanced users.
–help-psm Show page segmentation modes.
–help-oem Show OCR Engine modes.
-v, --version Show version information.
–list-langs List available languages for tesseract engine.
–print-parameters Print tesseract parameters.

参考文献:
https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#tsv-output-currently-available-in-305-dev-in-master-branch-on-github

你可能感兴趣的:(图像识别)