tesseract训练

tesseract训练

1.下载jTessBoxEditor(jre使用的是jre7),用TIFF/BoxGenerator添加常用的宋体中文,Output:zhong chi_sim.exp0.tif ->Generate,生成

zhong.chi_sim.exp0.tif和zhong.chi_sim.exp0.box2个文件

2.创建文件font_properties,内容:chi_sim 0 0 0 0 0

3.创建bat文件start.bat,内容:

rem 执行改批处理前先要目录下创建font_properties文件  

  

echo Run Tesseract for Training..  

D:\app\Tesseract-OCR\tesseract.exe zhong.chi_sim.exp0.tif zhong.chi_sim.exp0 nobatch box.train  

  

echo Compute the Character Set..  

D:\app\Tesseract-OCR\unicharset_extractor.exe zhong.chi_sim.exp0.box  

D:\app\Tesseract-OCR\mftraining.exe -F font_properties -U unicharset -O zhong.unicharset zhong.chi_sim.exp0.tr  

  

echo Clustering..  

D:\app\Tesseract-OCR\cntraining.exe zhong.chi_sim.exp0.tr  

 

echo Rename Files..  

rename normproto zhong.normproto  

rename inttemp zhong.inttemp  

rename pffmtable zhong.pffmtable  

rename shapetable zhong.shapetable   

  

echo Create Tessdata..  

D:\app\Tesseract-OCR\combine_tessdata.exe zhong.

pause

 

4.运行start.bat,等待命令行结果:1,3,4,5,13不为-1就是成功了!

TessdataManager combined tesseract data files.

Offset for type 0 is -1

Offset for type 1 is 140

Offset for type 2 is -1

Offset for type 3 is 509098

Offset for type 4 is 42657207

Offset for type 5 is 42726936

Offset for type 6 is -1

Offset for type 7 is -1

Offset for type 8 is -1

Offset for type 9 is -1

Offset for type 10 is -1

Offset for type 11 is -1

Offset for type 12 is -1

Offset for type 13 is 43579530

Offset for type 14 is -1

Offset for type 15 is -1

Offset for type 16 is -1


tesseract训练_第1张图片
 
tesseract训练_第2张图片
 
tesseract训练_第3张图片
 

5.生成zhong.traineddata,copy到tesseract的tessdata文件夹下


tesseract训练_第4张图片
 

6.运行命令tesseract.exe E:\temp\image\y.jpg E:\temp\image\y -l zhong,可以在y.txt中查看识别的结果

 

你可能感兴趣的:(tesseract训练)