灵活使用requests爬虫(2)

上一期,我们完成了基本的requests爬虫,今天,我们来实现保存登录(cookle)和超时参数(timeout)。

1.保存登录(cookle)

        https://live.csdn.net/v/264823

        

#!/usr/bin/python
# coding:gb18030
import requests
url = input("请输入要爬取的网址:")
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.54","Cookie": "BIDUPSID=944535ADC40720D676F9890D772306DC; PSTM=1666509825; BAIDUID=944535ADC40720D6541F38754AC61A88:FG=1; BD_UPN=12314753; BDUSS=FhRG8tMDM5dnRJeXJ2bzJGdHMxWE1seVhFSjhPVkxteHQyZi1ENFlqNGNRYWhqRVFBQUFBJCQAAAAAAAAAAAEAAAAB42Kjal96aGlsaXNtaWxlAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABy0gGMctIBjWG; BDUSS_BFESS=FhRG8tMDM5dnRJeXJ2bzJGdHMxWE1seVhFSjhPVkxteHQyZi1ENFlqNGNRYWhqRVFBQUFBJCQAAAAAAAAAAAEAAAAB42Kjal96aGlsaXNtaWxlAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABy0gGMctIBjWG; MCITY=-%3A; BDORZ=FFFB88E999055A3F8A630C64834BD6D0; BA_HECTOR=25al248ka004212ka10006em1hq5gj51j; delPer=0; BD_CK_SAM=1; PSINO=1; channel=baidusearch; baikeVisitId=ef8bdec1-feb9-4add-b8a0-5248fa0dc257; ZFY=5TmyfYUXghYhckoDa6stICDRMzzCd7lUQiuJP:Bn0ICs:C; BAIDUID_BFESS=944535ADC40720D6541F38754AC61A88:FG=1; COOKIE_SESSION=154379_0_4_4_1_10_0_0_4_2_0_0_0_0_14_0_1671610994_0_1671610980%7C4%230_0_1671610980%7C1; BD_HOME=1; H_PS_PSSID=36552_37975_37646_37521_37691_37909_37623_37799_37929_37903_26350_37788_37881"}
file_name = input("请输入保存的文件:")
response = requests.get(url=url,headers=headers) # 爬取网址
with open(file_name,mode='w+') as f: # 打开文件
    f.write(response.content.decode('gb18030')) # 保存文件
    print("保存成功!")

注:每个账户都有自己的cookie,获取方法为:右键>检查>网络>刷新>第一项>cookie

2.超时参数(timeout)

#!/usr/bin/python
# coding:gb18030
import requests
url = input("请输入要爬取的网址:")
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.54","Cookie": "BIDUPSID=944535ADC40720D676F9890D772306DC; PSTM=1666509825; BAIDUID=944535ADC40720D6541F38754AC61A88:FG=1; BD_UPN=12314753; BDUSS=FhRG8tMDM5dnRJeXJ2bzJGdHMxWE1seVhFSjhPVkxteHQyZi1ENFlqNGNRYWhqRVFBQUFBJCQAAAAAAAAAAAEAAAAB42Kjal96aGlsaXNtaWxlAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABy0gGMctIBjWG; BDUSS_BFESS=FhRG8tMDM5dnRJeXJ2bzJGdHMxWE1seVhFSjhPVkxteHQyZi1ENFlqNGNRYWhqRVFBQUFBJCQAAAAAAAAAAAEAAAAB42Kjal96aGlsaXNtaWxlAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABy0gGMctIBjWG; MCITY=-%3A; BDORZ=FFFB88E999055A3F8A630C64834BD6D0; BA_HECTOR=25al248ka004212ka10006em1hq5gj51j; delPer=0; BD_CK_SAM=1; PSINO=1; channel=baidusearch; baikeVisitId=ef8bdec1-feb9-4add-b8a0-5248fa0dc257; ZFY=5TmyfYUXghYhckoDa6stICDRMzzCd7lUQiuJP:Bn0ICs:C; BAIDUID_BFESS=944535ADC40720D6541F38754AC61A88:FG=1; COOKIE_SESSION=154379_0_4_4_1_10_0_0_4_2_0_0_0_0_14_0_1671610994_0_1671610980%7C4%230_0_1671610980%7C1; BD_HOME=1; H_PS_PSSID=36552_37975_37646_37521_37691_37909_37623_37799_37929_37903_26350_37788_37881"}
file_name = input("请输入保存的文件:")
timeout = eval(input("请输入超时:"))
response = requests.get(url=url,headers=headers,timeout=timeout) # 爬取网址
with open(file_name,mode='w+') as f: # 打开文件
    f.write(response.content.decode('gb18030')) # 保存文件
    print("保存成功!")

以上就是今天的内容,拜拜~

你可能感兴趣的:(爬虫,python,开发语言)