tesseract4.0训练脚本(四)

lstmtraining

NAME
       lstmtraining - Training program for LSTM-based networks.
       基于LSTM的训练程序

SYNOPSIS
       lstmtraining 
       --continue_from train_output_dir/continue_from_lang.lstm 
       --old_traineddata bestdata_dir/continue_from_lang.traineddata 
       --traineddata train_output_dir/lang/lang.traineddata 
       --max_iterations NNN 
       --debug_interval 0|-1
       --train_listfile train_output_dir/lang.training_files.txt 
       --model_output train_output_dir/newlstmmodel

DESCRIPTION
       lstmtraining(1) trains LSTM-based networks using a list of lstmf files and starter
       traineddata file as the main input. Training from scratch is not recommended to be done by
       users. Finetuning (example command shown in synopsis above) or replacing a layer options
       can be used instead. Different options apply to different types of training. Read [Training
       Wiki page](https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00) for
       details.
       # lstmtraining 通过一组lstmf文件和一个起始的traineddata文件来训练lstm网络。
       # 不建议从头开始进行训练
       # 可以使用微调以及替换网络层的方法
       # 不同的参数配置意味着不同的训练方式

OPTIONS
       '--debug_interval '
           How often to display the alignment. (type:int default:0)
           # 显示匹配结果的频率

       '--net_mode '
           Controls network behavior. (type:int default:192)
           # 神经网络控制模式

       '--perfect_sample_delay '
           How many imperfect samples between perfect ones. (type:int default:0)
           # 完美的样本之间有多少不完美的样本。
           #(干嘛的?)

       '--max_image_MB '
           Max memory to use for images. (type:int default:6000)
           # 最大内存占用

       '--append_index '
           Index in continue_from Network at which to attach the new network defined by net_spec
           (type:int default:-1)
           #  拼接net_spec网络和continue_from网络的位置索引

       '--max_iterations '
           If set, exit after this many iterations (type:int default:0)
           # 在满足这个设置的迭代次数后退出

       '--target_error_rate '
           Final error rate in percent. (type:double default:0.01)
           # 目标错误率,达到之后会退出训练

       '--weight_range '
           Range of initial random weights. (type:double default:0.1)
           # 初始权重的范围

       '--learning_rate '
           Weight factor for new deltas. (type:double default:0.001)
           # 学习率

       '--momentum '
           Decay factor for repeating deltas. (type:double default:0.5)
           # (动力下降算法因子)

       '--adam_beta '
           Decay factor for repeating deltas. (type:double default:0.999)
           # ADAM中的平方梯度下降因子

       '--stop_training '
           Just convert the training model to a runtime model. (type:bool default:false)
           # 将来自--continue_from的checkpoint转换成可识别的模型(traineddata file)

       '--convert_to_int '
           Convert the recognition model to an integer model. (type:bool default:false)
           # 将识别模型转换成一个8位整形模型
           # 牺牲一部分准确率换得更快的速度

       '--sequential_training '
           Use the training files sequentially instead of round-robin. (type:bool default:false)
           # 训练文件使用循环模式还是使用顺序模式
           # 默认false 是循环模式
           # true是顺序模式

       '--debug_network '
           Get info on distribution of weight values (type:bool default:false)
           # 

       '--randomly_rotate '
           Train OSD and randomly turn training samples upside-down (type:bool default:false)

       '--net_spec '
           Network specification (type:string default:)
           # 用来指定网络的拓扑结构

       '--continue_from '
           Existing model to extend (type:string default:)
           # 想要拓展的现有模型

       '--model_output '
           Basename for output models (type:string default:lstmtrain)
           # 输出模型的基本名称

       '--train_listfile '
           File listing training files in lstmf training format. (type:string default:)
           # 列出所要训练的files(字体.lstmf)的txt文件

       '--eval_listfile '
           File listing eval files in lstmf training format. (type:string default:)
           # 列出所要验证的files(字体.lstmf)的txt文件

       '--traineddata '
           Starter traineddata with combined Dawgs/Unicharset/Recoder for language model
           (type:string default:)
           # 利用combine_lang_data 合并语言模型的dwags/unicharset/recorder生成的traineddata

       '--old_traineddata '
           When changing the character set, this specifies the traineddata with the old character
           set that is to be replaced (type:string default:)
           # 当修改字符集(如language.training_text)时,这个文件指定一个拥有旧的字符集的将要被替换的traineddata文件
HISTORY
       lstmtraining(1) was first made available for tesseract4.00.00alpha.

RESOURCES
       Main web site: https://github.com/tesseract-ocr Information on training tesseract LSTM:
       https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

SEE ALSO
       tesseract(1)

COPYING
       Copyright (C) 2012 Google, Inc. Licensed under the Apache License, Version 2.0

AUTHOR
       The Tesseract OCR engine was written by Ray Smith and his research groups at Hewlett
       Packard (1985-1995) and Google (2006-present).

你可能感兴趣的:(tesseract)