https://blog.csdn.net/weixin_42081389/article/details/89521297
当爬虫时报错:requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
可以 request请求时添加allow_redirects=False,默认时allow_redirects=True,所以这样就可以解决我的问题了。
resp = requests.get(url=url, headers=headers,allow_redirects=False)
这个不报错了,但是返回米有数据了。
另一个解决方法:
# coding:utf-8
import requests
kv = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0"}
r = requests.get(url, headers = kv, allow_redirects=False)
print r.status_code
new_url = r.headers["Location"]
https://www.jianshu.com/p/19f631cbb21a
拿到新url,也不能拿到数据
一开始使用requests.get(url)就开始报错,后面查了下资料,说需要在后面加上allow_redirects=False。可惜当加上这个条件的时候,直接返回304,获取不了实际内容。还有的资料显示是因为没有hearders的问题,后面设置上去了也是不行。
Traceback (most recent call last): File "E:/my_project/project/测试/简单分布式爬虫(cs)/爬虫节点/HTMLdown.py", line 18, in t = h.download(url) File "E:/my_project/project/测试/简单分布式爬虫(cs)/爬虫节点/HTMLdown.py", line 8, in download r = requests.get(url, headers=headers, allow_redirects=True) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\api.py", line 72, in get return request('get', url, params=params, **kwargs) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\api.py", line 58, in request return session.request(method=method, url=url, **kwargs) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 508, in request resp = self.send(prep, **send_kwargs) File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 640, in send history = [resp for resp in gen] if allow_redirects else [] File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 640, in history = [resp for resp in gen] if allow_redirects else [] File "C:\Users\Administrator\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 140, in resolve_redirects raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp) requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
后面想到是否是因为重定向,而导致hearders没有维持,后面通过session去get请求
class HtmlDownloader(object):
def download(self, url):
if urlis None:
return None
user_agent ='Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
headers = {'User_Agent': user_agent}
sessions = requests.session()
sessions.headers = headers
r = sessions.get(url, allow_redirects=True)
print(r.url)
if r.status_code ==200:
r.encoding ='utf-8'
return r.text
return None
if __name__ =='__main__':
h = HtmlDownloader()
url ='https://baike.baidu.com/view/284853.htm'
t = h.download(url)
print(t)
发现可以正常访问
转自:作者:咸鱼功阀术
链接:https://www.jianshu.com/p/bcf8bf2b3152