当起大量爬虫,爬同一个网页,运行到后面会大量出现错误

通过python 的request库请求网页:
s=requests.get(url)
错误日志如下:
HTTPConnectionPool(host:XX)Max retries exceeded with url ': Failed to establish a new connection: [Errno 99] Cannot assign requested address'
经分析发现TCP连接默认为keep-alive的,不能结束并回到连接池中,导致不能产生新的连接
s.headers中的Connection为keep-alive
解决方法:
将header中的Connection一项置为close
newheader=dict()
newheader['Connection']='close'
s=requests.get(url, newheader)
此时问题解决

你可能感兴趣的:(网络)