扩展阅读:(tesseract配置学习1)[http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version]
扩展阅读:(tesseract配置学习2)[https://stackoverflow.com/questions/13007245/how-to-find-parameters-supported-in-tesseract-ocr-config-file]
本文主要介绍两个问题:
image = BytesIO(response.content)
转换为流数据config="--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789"
然后直接贡献出代码:
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import requests
import pytesseract
from PIL import Image
from io import BytesIO
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36"
}
url = "http://static8.ziroom.com/phoenix/pc/images/price/e72ac241b410eac63a652dc1349521fd.png"
response = requests.get(url=url, headers=headers)
with open("test.png", "wb") as f:
f.write(response.content)
image = BytesIO(response.content)
im = Image.open(image)
text = pytesseract.image_to_string(im, lang="eng", config="--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789")
print(text)