在爬取网络数据时,会碰到门户网站的反爬机制,这里提到的是UA伪装,即对User-Agent进行伪装,具体代码为:
import requests
if __name__ == '__main__':
url = 'https://www.sogou.com/web'
# 反反爬机制,UA伪装
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}
kw = input("enter a keyword: ")
param = {
'query': kw
}
response = requests.get(url=url, params=param, headers=headers)
page_text = response.text
fileName = kw + '.html'
with open(fileName, 'w',encoding='utf-8') as fp:
fp.write(page_text)
print(fileName, '保存成功')