需要从5万+个url中下载图片,使用python的requests下载:
# 省略for循环代码
>>> requests.get(url, stream=True)
发现在下载到几千幅的时候代码无被阻塞了,故添加超时时间
>>> requests.get(url, stream=True, timeout=5)
想添加超时重试
from urllib3.util.retry import Retry
import requests
from requests.adapters import HTTPAdapter
retries = Retry(
total=3,
backoff_factor=0.1,
status_forcelist=[ 500, 502, 503, 504 ]
)
s = requests.Session()
s.mount('http://', HTTPAdapter(max_retries=retries))
s.mount('https://', HTTPAdapter(max_retries=retries))
但发现依旧出现超时现象,且程序被中断
使用try-except手动判断超时错误并重试
import time
from urllib3.util.retry import Retry
import requests
from requests.adapters import HTTPAdapter
retries = Retry(
total=3,
backoff_factor=0.1,
status_forcelist=[ 500, 502, 503, 504 ]
)
s = requests.Session()
s.mount('http://', HTTPAdapter(max_retries=retries))
s.mount('https://', HTTPAdapter(max_retries=retries))
count = 0
while True:
if count < 5:
try:
file = s.get(url, stream=True, timeout=5)
return file.content
except Exception as e:
print('Retry!')
time.sleep(5)
count += 1
exception = e
else:
raise e
超时后等待5秒再次重试,重试5次后退出程序
目前通过上面的代码,程序的有效性已得到极大提高