获取到请求地址---->查看请求方式----->post请求一定是有表单的---->粘贴复制表单,里面不知道的东西,先空过
比如下面的salt sign
def seng_request(self):
form_data = {
#'i': '啊哈',
'i': '',
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': 'fanyideskweb',
#'salt': '15932458043921',
'salt': '',
#'sign': '24d1ac950b72ae268b1704034a5c172c',
'sign': '',
#'ts': '1593245804392',时间戳
'ts': self.ts,
#'bv': '02a6ad4308a3443b3732d855273259bf',
'bv': '',
'doctype': 'json',
'version': '2.1',
'keyfrom':' fanyi.web',
'action': 'FY_BY_CLICKBUTTION',
}
我们又要完善这个表单,所以必须破解出来
可以去网页上找可能有关联的脚本,把它复制到pycharm上
这里有一个小技巧,当pycharm上的代码比较乱的时候,可以使用快捷键 ctrl + alt+L来改善代码的整洁度,或者
复制出来之后,代码是根本读不懂的,我们只是想要找到我们想找的东西,就可以使用==ctrl + l == 在pycharm里面搜索关键参数,看看能不能分析出来对应的值。
我们把salt搞定了,然后可以先完善代码。
完善完成之后,接着分析,
import time
import requests
import random
import hashlib
class YouDaoSpider():
def __init__(self):
self.url = 'http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
self.headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Length': '260',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Cookie':'[email protected]; _ntes_nnid=b117014d95a4c5cf6832f8c92a045dcc,1589801960374; OUTFOX_SEARCH_USER_ID_NCOO=598000807.9036449; JSESSIONID=aaa3wgVaNvg1XdFRZ10lx; ___rl__test__cookies=1593256681647',
'Host': 'fanyi.youdao.com',
'Origin': 'http://fanyi.youdao.com',
'Referer': 'http://fanyi.youdao.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
self.appversion = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
self.kw = input('请输入你要翻译的单词:')
self.ts = self.get_ts()
self.salt = self.get_salt()
self.bv = self.get_bv()
self.sign = self.get_sign()
def send_request(self):
form_data = {
#'i': '啊哈',
'i': self.kw,
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': 'fanyideskweb',
#'salt': '15932458043921',
'salt': self.salt,
#'sign': '24d1ac950b72ae268b1704034a5c172c',
'sign': self.sign,
#'ts': '1593245804392',时间戳
'ts': self.ts,
#'bv': '02a6ad4308a3443b3732d855273259bf',
'bv': self.bv,
'doctype': 'json',
'version': '2.1',
'keyfrom':' fanyi.web',
'action': 'FY_BY_CLICKBUTTION',
}
response =requests.post(url=self.url,data=form_data,headers=self.headers)
print(response.text)
def get_ts(self):
#他的时间戳是13位,但是python里面默认的时间戳是13
return str(int(time.time())*1000)
def get_salt(self):
return self.ts + str(random.randint(0,10))
def get_bv(self):
md5 = hashlib.md5()
md5.update(self.appversion.encode())
return md5.hexdigest()
def get_sign(self):
md5 = hashlib.md5()
data = "fanyideskweb" + self.kw + self.salt + "mmbP%A-r6U3Nw(n]BjuEU"
md5.update(data.encode())
return md5.hexdigest()
if __name__ == '__main__':
yd = YouDaoSpider()
yd.send_request()
上面的有道里面的参数,我们还可能猜出来 ,但是像产品目录的话,我们请求并保存数据
展示出来的就是这种,并没有网页上的数据,我们也只能看出来是js语法,其他的就看不明白了(比如函数)
这个时候,我们可以直接运行js
创建一个js文件,把有用的东西复制出来
pip install PyExecJS
镜像源安装 pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple PyExecJS
execjs.eval("Date.now()")
返回:1522847001080
ctx = execjs.compile("""
function add(x, y) {
return x + y;
}
""")
ctx.call("add", 1, 2)
返回值:3
node = execjs.get() # 通过python代码去执行JavaScript代码的库
file = 'product.js'
ctx = node.compile(open(file).read())
data = ctx.eval("data") # 去执行js里面的函数变量
verify_data = ctx.eval("verify")
import requests
import execjs
# url ='http://www.300600900.cn/'
#
# headers = {
# 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
# }
# response = requests.get(url=url,headers=headers)
# with open('prodct.html','w') as f:
# f.write(response.text)
#
ej = execjs.get()
js_name = 'product.js'
node = ej.compile(open(js_name).read())
cookie_date = node.eval('cookie_date')
security_verify_data = node.eval('security_verify_data')
print(cookie_date)
print(security_verify_data)
url ='http://www.300600900.cn/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
}
key,value = cookie_date.split('=')
session = requests.session()
session.get(url=url,headers=headers)
session.cookies.set(key,value)
full_url = url + security_verify_data
session.get(url=full_url,headers=headers)
response = session.get(url,headers=headers)
with open('product11.html','w')as f :
f.write(response.content.decode())
function stringToHex(str) {
var val = "";
for (var i = 0; i < str.length; i++) {
if (val == "") val = str.charCodeAt(i).toString(16); else val += str.charCodeAt(i).toString(16);
}
return val;
}
var width = 1400;
var height = 900;
var screendate = width + "," + height;
cookie_date = "srcurl=" + stringToHex('http://www.300600900.cn/');
security_verify_data = "/?security_verify_data=" + stringToHex(screendate);