企查查首页商业快讯抓取
打开调试分析请求
每次请求返回10条数据
翻页主要依靠lastRankIndex和lastRankTime参数, 而且这两个值是一样的
(首次请求 firstRankIndex=1,lastRankIndex=0, lastRankTime=None)
每次请求回来的数据最后一行包含下次请求lastRankIndex的值
代码如下:
import time
import requests
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'}
data = {
'firstRankIndex': 1,
'lastRankIndex': 0,
'lastRankTime': None,
'pageSize': 10
}
def main():
url = 'https://www.qcc.com/api/home/getNewsFlash'
while 1:
response = requests.get(url, headers=headers, data=data)
data['firstRankIndex'] = None
for item in response.json():
print(item)
print()
data['lastRankIndex'] = item['rankIndex']
data['lastRankTime'] = item['rankIndex']
time.sleep(2)
if __name__ == '__main__':
main()
接口返回的类型为json(10条结果数据的列表), 可直接查看
最后print的结果