出于好奇心和学术研究的目的,我尝试了破解某东的验证码,也查了很多别人的博客和资料,最后算是成功了吧。
通过网页源码我们发现,图片是一base64的编码方式显示在网页中的,当然其他的网站可能是图片链接,这种情况我们需要先将其解码,然后写入文件中就好了。
下载图片的代码
def pic_download(url,type):
url = url
root = "../img_db/"
# path = root + str(time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()))+'.png'
path = root + type + '.png'
try:
if not os.path.exists(root):
os.mkdir(root)
if os.path.exists(path):
os.remove(path)
#如果图片是url的格式
# r = requests.get(url)
# r.raise_for_status()
#如果图片是base64编码的
data=url.split(',')[1]
img=base64.b64decode(data)
# 使用with语句可以不用自己手动关闭已经打开的文件流
with open(path, "wb") as f: # 开始写文件,wb代表写二进制文件
f.write(img)
print(f.name)
print("下载完成")
return f.name
except Exception as e:
print("获取失败!" + str(e))
def get_distance(small_url, big_url):
# 引用上面的图片下载
otemp = pic_download(small_url, 'small')
time.sleep(2)
# 引用上面的图片下载
oblk = pic_download(big_url, 'big')
# # 计算拼图还原距离
target = cv2.imread(otemp, 0)
template = cv2.imread(oblk, 0)
w, h = target.shape[::-1]
temp = 'temp.jpg'
targ = 'targ.jpg'
cv2.imwrite(temp, template)
cv2.imwrite(targ, target)
target = cv2.imread(targ)
target = cv2.cvtColor(target, cv2.COLOR_BGR2GRAY)
target = abs(255 - target)
cv2.imwrite(targ, target)
target = cv2.imread(targ)
template = cv2.imread(temp)
result = cv2.matchTemplate(target, template, cv2.TM_CCOEFF_NORMED)
x, y = np.unravel_index(result.argmax(), result.shape)
# 缺口位置
print((y, x, y + w, x + h))
# 调用PIL Image 做测试
image = Im.open(oblk)
xy = (y + 20, x + 20, y + w - 20, x + h - 20)
# 切割
imagecrop = image.crop(xy)
# 保存切割的缺口
imagecrop.save("../img_db/new_image.png")
return y
这里我们需要注意一点,我们计算出了缺口的位置,但是页面显示的图片大小是通过css布局的,所以和我们下载的图片或写入的图片大写是不一样的,所以我们在移动的时候需要计算一个比例。
def move_mouse(browser,distance,element):
has_gone_dist=0
remaining_dist = distance
# distance += randint(-10, 10)
# 按下鼠标左键
ActionChains(browser).click_and_hold(element).perform()
time.sleep(0.5)
while remaining_dist > 0:
ratio = remaining_dist / distance
if ratio < 0.1:
# 开始阶段移动较慢
span = random.randint(3, 5)
elif ratio > 0.9:
# 结束阶段移动较慢
span = random.randint(5, 8)
else:
# 中间部分移动快
span = random.randint(15, 20)
ActionChains(browser).move_by_offset(span, random.randint(-5, 5)).perform()
remaining_dist -= span
has_gone_dist += span
time.sleep(random.randint(5, 20) / 100)
ActionChains(browser).move_by_offset(remaining_dist, random.randint(-5, 5)).perform()
ActionChains(browser).release(on_element=element).perform()
极验的验证码会识别我们拖动的过程,分析我们的移动轨迹,但是虽然我们的移动轨迹是模拟人,先缓慢后加速最后减速的过程。但是这样还是不够的,我们还需要多设计几个移动轨迹,根据我的测试经验得出的结论。
import os
import random
import time
import base64
# import requests
import cv2
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
import numpy as np
from PIL import Image as Im
完整的代码:https://github.com/onlyonedaniel/onlyone/blob/master/jd_test.py
转载请标明出处,欢迎留言交流