conda环境下Failed loading language \‘chi_sim\‘ Tesseract couldn\‘t load any languages问题解决

1 问题描述

使用Tesseract进行图片的中文字符识别,代码如下:

import pytesseract
from PIL import Image

image_path = '../data/ocr_chinest.jpg'
result = pytesseract.image_to_string(Image.open(image_path), lang='chi_sim')
print(result)

运行程序报如下错误:

Traceback (most recent call last):
  File "D:\code\ptcontainer\ocr\tesseract_test.py", line 14, in 
    result = pytesseract.image_to_string(Image.open(image_path), lang='chi_sim')
  File "C:\Users\lishu\anaconda3\envs\pt2\lib\site-packages\pytesseract\pytesseract.py", line 423, in image_to_string
    return {
  File "C:\Users\lishu\anaconda3\envs\pt2\lib\site-packages\pytesseract\pytesseract.py", line 426, in 
    Output.STRING: lambda: run_and_get_output(*args),
  File "C:\Users\lishu\anaconda3\envs\pt2\lib\site-packages\pytesseract\pytesseract.py", line 288, in run_and_get_output
    run_tesseract(**kwargs)
  File "C:\Users\lishu\anaconda3\envs\pt2\lib\site-packages\pytesseract\pytesseract.py", line 264, in run_tesseract
    raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file D:\\Tesseract-OCR/tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'chi_sim\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

2 问题分析

从错误信息中可以看到,在安装目录的tessdata的目录下,找不到chi_sim.traineddata文件

Error opening data file D:\\Tesseract-OCR/tessdata/chi_sim.traineddata

通过目录浏览查看,在此路径下确实没有这个文件

3 问题解决

通过如下路径下载模型:https://github.com/tesseract-ocr/tessdata/blob/main/chi_sim.traineddata

存储到tessdata目录下,再次运行,程序成功执行。

你可能感兴趣的:(AI运行环境,Tesseract,ocr,中文字符识别,conda)