很多时候需要获取的数据,API提供不了。
以风控数据为例:
下面开始数据抓取教程:
1、使用打开fiddle,并配置wind客户端代理抓包
打开风控界面,再查看fiddle发现
wind.risk.platform/risknews/get_news接口就是风控展示信息的内容
复制接口参数到python代码
通过测试发现wind.sessionid是认证session。
2.获取session
打开CE,加载wind进程
发现最下面有
可以通过内存地址来获取session
完整代码如下:
import pymem
Game = pymem.Pymem("wmain.exe") # wind进程
def Get_moduladdr(dll): # 读DLL模块基址
modules = list(Game.list_modules()) # 列出exe的全部DLL模块
for module in modules:
if module.name == dll:
Moduladdr = module.lpBaseOfDll
return Moduladdr
Char_Modlue = Get_moduladdr("CSector.DLL") # 读DLL模块基址
session = Game.read_bytes(Char_Modlue+0x139088, 32).decode("utf8")
import requests, json
url = "https://114.80.154.45/wind.risk.platform/risknews/get_news"
headers = {
"Host": "114.80.154.45",
"Connection": "keep-alive",
"Content-Length": "289",
"Accept": "*/*",
"Content-Type": "application/json;charset=UTF-8",
"Origin": "https://114.80.154.45",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36",
"wind-language":"zh-CN",
"wind.sessionid": session,
"Referer": "https://114.80.154.45/wind.risk.platform/index.html?lan=cn",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,en-US;q=0.9",
}
body = {"pageSize":30,"tagCode":[],"areaCode":[],"industryCode":[],"emotionId":["7012000001"],"companyNature":[],"companyCode":[],"keywords":[],"timeFrom":"2023-05-25T00:00:00Z","timeTo":"2023-05-25T23:59:59Z","sector":["a001010c00000000"],"importanceId":[],"filterType":"1","windcodeEnable":True,"pageNo":1}
r = requests.post(url,headers=headers,data= json.dumps(body))
print(r.text)