Python爬虫-Cloudflare五秒盾-绕过TLS指纹

什么是TLS指纹

TLS指纹是一种用于识别和验证TLS(传输层安全)通信的技术。

TLS指纹可以通过检查TLS握手过程中使用的密码套件、协议版本和加密算法等信息来确定TLS通信的特征。由于每个TLS实现使用的密码套件、协议版本和加密算法不同,因此可以通过比较TLS指纹来判断通信是否来自预期的源或目标。

TLS指纹可以用于检测网络欺骗、中间人攻击、间谍活动等安全威胁,也可以用于识别和管理设备和应用程序。

简单来说,就是伪装ja3_text值,让其不被拦截即可,以修改支持的加密算法为主。

使用Python的curl_cffi库,主打的就是模拟各种指纹

pip install --upgrade curl_cffi

支持的模拟版本,由curl-impersonate支持

edge99 = "edge99"
edge101 = "edge101"
chrome99 = "chrome99"
chrome100 = "chrome100"
chrome101 = "chrome101"
chrome104 = "chrome104"
chrome107 = "chrome107"
chrome110 = "chrome110"
chrome99_android = "chrome99_android"
safari15_3 = "safari15_3"
safari15_5 = "safari15_5"

curl_cffi 测试请求

from curl_cffi import requests

# Notice the impersonate parameter
r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110")

print(r.json())
# output: {..., "ja3n_hash": "aa56c057ad164ec4fdcb7a5a283be9fc", ...}
# the js3n fingerprint should be the same as target browser

# http/socks proxies are supported
proxies = {"https": "http://localhost:3128"}
r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110", proxies=proxies)

proxies = {"https": "socks://localhost:3128"}
r = requests.get("https://tls.browserleaks.com/json", impersonate="chrome110", proxies=proxies)

以下由某文献网站爬取测试为例: https://onlinelibrary.wiley.com/

from curl_cffi import requests

url = "https://onlinelibrary.wiley.com/action/doSearch?AllField=ADC&sortBy=Earliest&startPage=0&pageSize=10"
headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "zh,zh-CN;q=0.9",
    "Cache-Control": "no-cache",
    "Pragma": "no-cache",
    "Referer": "https://onlinelibrary.wiley.com/",
    "Sec-Ch-Ua-Mobile": "?0",
    "Sec-Ch-Ua-Platform": "\"macOS\"",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "same-site",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36"
}
s = requests.Session()
response = s.get(url, impersonate="chrome110", headers=headers, verify=False)
if response.status_code == 200:
    result = response.content.decode()
    print(result)
else:
    print(response.status_code)

可以完整请求原始网页,成功打印绕过指纹验证。

分享来源:

GitHub - yifeikong/curl_cffi: Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.

你可能感兴趣的:(爬虫,TLS指纹,python)