python从图片提取文字,如何使用python从图像中提取文本或数字

I want to extract text (mainly numbers) from images like this

I tried this code

import pytesseract

from PIL import Image

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

img = Image.open('1.jpg')

text = pytesseract.image_to_string(img, lang='eng')

print(text)

but all i get is this

(hE PPAR)

解决方案

When performing OCR, it is important to preprocess the image so the desired text to detect is in black with the background in white. To do this, here's a simple approach using OpenCV to Otsu's threshold the image which will result in a binary image. Here's the image after preprocessing:

We use the --psm 6 configuration setting to treat the image as a uniform block of text. Here's other configuration options you can try. Result from Pytesseract

01153521976

Code

import cv2

import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.png', 0)

thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')

print(data)

cv2.imshow('thresh', thresh)

cv2.waitKey()

你可能感兴趣的:(python从图片提取文字)