1.User-Agent
scrapy默认的由UserAgentMiddleware设置为 "User-Agent": "Scrapy/1.5.1 (+https://scrapy.org)"
一、可以在setting中设置USER-AGENT设置
1 USER_AGENT='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36'
二、自定义随机user-agent 设置完成后在setting中解放
1 class RandomMiddlewares(object):
2 def __init__(self):
3 self.user_agent=['Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
4 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.133 Safari/534.16',
5 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36',
6 'Mozilla/5.0 (compatible; Baiduspider/2.0; - +http://www.baidu.com/search/spider.html)',
7 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',]
8
9 def process_request(self,request,spider):
10 request.headers['User-Agent']=choice(self.user_agent)