Python学习网络爬虫:复制网页中的cookie有引号怎么办

文章目录


例如:

Cookie:
_zap=3719d565-9bca-44de-9b02-ed714258e599; d_c0=“AMCf599-hhGPTltrHaZ91mg1vjF3HaLikx4=|1593863623”; _ga=GA1.2.1482983231.1593863637; _xsrf=47112a4a-9751-4136-85d4-4cd0ff3e57fc; _gid=GA1.2.1072146028.1596031364; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1595839169,1595921642,1595932793,1596031375; capsion_ticket=“2|1:0|10:1596031378|14:capsion_ticket|44:NTcwOGNkOGNhNTY2NDc0NTgwMzgwNjQ2NGY0YTg2MWQ=|02d6f4cf78555e932a8cb7f79c159935ce9373a7e2546e0be63971e288846bfc”; SESSIONID=DtgOvBScEJCX7GFsKvFtxNzXrnYijTajpqfaPx7kkMk; JOID=Vl4QA0oG8CXIhPNMIALX_xtO4NkwQIdHhvezHEZGuGWVsrEIUJN3AJOD8kokJZhEeW5Qs7PiMlIcjk_5S3UHixE=; osd=UlwcBkIC8inNjPdOLAff-xlC5dE0QotCjvOxEENOvGeZt7kMUp9yCJeB_k8sIZpIfGZUsb_nOlYegkrxT3cLjhk=; l_n_c=1; r_cap_id=“N2FlYjU3Y2Q3YjQ1NDEzYTk5MmJkMWE4NmU2ZTVjNjA=|1596031386|902835602a8c71f45ed7564e15947880c55b5a26”; cap_id=“MTgzMDNiYjU0NzZhNGQ0ODg5OWY2ZjBmNTlkZDc2Njg=|1596031386|456383a04b6c02f6d2318a0b387af57b17cc77b5”; l_cap_id=“ZDA5ZjNiMmYyN2NhNDg1ZmFmZmU5NTZmZjk4MTgwM2M=|1596031386|5395222721c6cd25d5feb6651ed585079e59f6bc”; n_c=1; z_c0=Mi4xeVJrX0JBQUFBQUFBd0pfbjMzNkdFUmNBQUFCaEFsVk5zTThPWUFDTkc5R2R5QmRQRDBBWUZnNEpLQlVNSTdYS1NR|1596031408|a2b2c9b741d3e389e979e5ce8d34068430a9d105; tst=r; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1596031411; KLBRSID=4843ceb2c0de43091e0ff7c22eadca8c|1596031433|1596031372

因为这里面有很多双引号,如果直接拿去解析会有问题

这时候的处理方法是:

string = '_zap=3719d565-9bca-44de-9b02-ed714258e599; d_c0="AMCf599-hhGPTltrHaZ91mg1vjF3HaLikx4=|1593863623"; _ga=GA1.2.1482983231.1593863637; _xsrf=47112a4a-9751-4136-85d4-4cd0ff3e57fc; _gid=GA1.2.1072146028.1596031364; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1595839169,1595921642,1595932793,1596031375; capsion_ticket="2|1:0|10:1596031378|14:capsion_ticket|44:NTcwOGNkOGNhNTY2NDc0NTgwMzgwNjQ2NGY0YTg2MWQ=|02d6f4cf78555e932a8cb7f79c159935ce9373a7e2546e0be63971e288846bfc"; SESSIONID=DtgOvBScEJCX7GFsKvFtxNzXrnYijTajpqfaPx7kkMk; JOID=Vl4QA0oG8CXIhPNMIALX_xtO4NkwQIdHhvezHEZGuGWVsrEIUJN3AJOD8kokJZhEeW5Qs7PiMlIcjk_5S3UHixE=; osd=UlwcBkIC8inNjPdOLAff-xlC5dE0QotCjvOxEENOvGeZt7kMUp9yCJeB_k8sIZpIfGZUsb_nOlYegkrxT3cLjhk=; l_n_c=1; r_cap_id="N2FlYjU3Y2Q3YjQ1NDEzYTk5MmJkMWE4NmU2ZTVjNjA=|1596031386|902835602a8c71f45ed7564e15947880c55b5a26"; cap_id="MTgzMDNiYjU0NzZhNGQ0ODg5OWY2ZjBmNTlkZDc2Njg=|1596031386|456383a04b6c02f6d2318a0b387af57b17cc77b5"; l_cap_id="ZDA5ZjNiMmYyN2NhNDg1ZmFmZmU5NTZmZjk4MTgwM2M=|1596031386|5395222721c6cd25d5feb6651ed585079e59f6bc"; n_c=1; z_c0=Mi4xeVJrX0JBQUFBQUFBd0pfbjMzNkdFUmNBQUFCaEFsVk5zTThPWUFDTkc5R2R5QmRQRDBBWUZnNEpLQlVNSTdYS1NR|1596031408|a2b2c9b741d3e389e979e5ce8d34068430a9d105; tst=r; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1596031411; KLBRSID=4843ceb2c0de43091e0ff7c22eadca8c|1596031433|1596031372'
string.replace('\'',"")

使用这种方法将引号去掉,然后在进行解析

完整程序如下:

import requests
string = '_zap=3719d565-9bca-44de-9b02-ed714258e599; d_c0="AMCf599-hhGPTltrHaZ91mg1vjF3HaLikx4=|1593863623"; _ga=GA1.2.1482983231.1593863637; _xsrf=47112a4a-9751-4136-85d4-4cd0ff3e57fc; _gid=GA1.2.1072146028.1596031364; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1595839169,1595921642,1595932793,1596031375; capsion_ticket="2|1:0|10:1596031378|14:capsion_ticket|44:NTcwOGNkOGNhNTY2NDc0NTgwMzgwNjQ2NGY0YTg2MWQ=|02d6f4cf78555e932a8cb7f79c159935ce9373a7e2546e0be63971e288846bfc"; SESSIONID=DtgOvBScEJCX7GFsKvFtxNzXrnYijTajpqfaPx7kkMk; JOID=Vl4QA0oG8CXIhPNMIALX_xtO4NkwQIdHhvezHEZGuGWVsrEIUJN3AJOD8kokJZhEeW5Qs7PiMlIcjk_5S3UHixE=; osd=UlwcBkIC8inNjPdOLAff-xlC5dE0QotCjvOxEENOvGeZt7kMUp9yCJeB_k8sIZpIfGZUsb_nOlYegkrxT3cLjhk=; l_n_c=1; r_cap_id="N2FlYjU3Y2Q3YjQ1NDEzYTk5MmJkMWE4NmU2ZTVjNjA=|1596031386|902835602a8c71f45ed7564e15947880c55b5a26"; cap_id="MTgzMDNiYjU0NzZhNGQ0ODg5OWY2ZjBmNTlkZDc2Njg=|1596031386|456383a04b6c02f6d2318a0b387af57b17cc77b5"; l_cap_id="ZDA5ZjNiMmYyN2NhNDg1ZmFmZmU5NTZmZjk4MTgwM2M=|1596031386|5395222721c6cd25d5feb6651ed585079e59f6bc"; n_c=1; z_c0=Mi4xeVJrX0JBQUFBQUFBd0pfbjMzNkdFUmNBQUFCaEFsVk5zTThPWUFDTkc5R2R5QmRQRDBBWUZnNEpLQlVNSTdYS1NR|1596031408|a2b2c9b741d3e389e979e5ce8d34068430a9d105; tst=r; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1596031411; KLBRSID=4843ceb2c0de43091e0ff7c22eadca8c|1596031433|1596031372'
string = string.replace('\"',"")
print(string)
headers = {
    'Host':'www.zhihu.com',
    'Cookie':string,
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
}
r = requests.get('http://www.zhihu.com',headers=headers)
print(r.text)

Python学习网络爬虫:复制网页中的cookie有引号怎么办_第1张图片

你可能感兴趣的:(Python写网络爬虫)