用到第三方库pytesseract,配合PIL使用;
pip install pytesseract
对简单的无干扰的图片识别效果还行;
识别中文要装tesseract-ocr;安装
imgbuf = s.get(imgurl).content
f = BytesIO()
f.write(imgbuf)
img = Image.open(f)
img.show()
vercode = pytesseract.image_to_string(img)
print("Verification Code:", vercode)
# vercode = input("Verification Code:")
效果:
Quickstart: https://pypi.org/project/pytesseract/
**Quickstart**
.. code-block:: python
try:
import Image
except ImportError:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = ''
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'
# Simple image to string
print(pytesseract.image_to_string(Image.open('test.png')))
# French text image to string
print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
# Get bounding box estimates
print(pytesseract.image_to_boxes(Image.open('test.png')))
# Get verbose data including boxes, confidences, line and page numbers
print(pytesseract.image_to_data(Image.open('test.png')))
# Get informations about orientation and script detection
print(pytesseract.image_to_osd(Image.open('test.png'))
Support for OpenCV image/NumPy array objects
.. code-block:: python
import cv2
img = cv2.imread('/**path_to_image**/digits.png')
print(pytesseract.image_to_string(img))
# OR explicit beforehand converting
print(pytesseract.image_to_string(Image.fromarray(img))
Add the following config, if you have tessdata error like: "Error opening data file..."
.. code-block:: python
tessdata_dir_config = '--tessdata-dir ""'
# Example config: '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
# It's important to add double quotes around the dir path.
pytesseract.image_to_string(image, lang='chi_sim', config=tessdata_dir_config)
**Functions**
* **get_tesseract_version** Returns the Tesseract version installed in the system.
* **image_to_string** Returns the result of a Tesseract OCR run on the image to string
* **image_to_boxes** Returns result containing recognized characters and their box boundaries
* **image_to_data** Returns result containing box boundaries, confidences, and other information. Requires Tesseract 3.05+. For more information, please check the `Tesseract TSV documentation `_
* **image_to_osd** Returns result containing informations about orientation and script detection.
**Parameters**
``image_to_data(image, lang=None, config='', nice=0, output_type=Output.STRING)``
* **image** Object, PIL Image/NumPy array of the image to be processed by Tesseract
* **lang** String, Tesseract language code string
* **config** String, Any additional configurations as a string, ex: ``config='--psm 6'``
* **nice** Integer, modifies the processor priority for the Tesseract run. Not supported on Windows. Nice adjusts the niceness of unix-like processes.
* **output_type** Class attribute, specifies the type of the output, defaults to ``string``. For the full list of all supported types, please check the definition of `pytesseract.Output `_ class.