scrapy中设置headers和referer 字段,代理

  • 你只要在middlewares 中添加你自己的middleswares
    方法:
class MyUseragent(object):
    def process_request(self,request,spider):
        referer=request.url
        if referer:
            request.headers["referer"] = referer
        agent=random.choice(MY_USER_AGENT)
        request.headers['User-Agent'] = agent

在downloadmiddlewares中开启

DOWNLOADER_MIDDLEWARES = {
   'kt.middlewares.MyUseragent': 543,
}

添加其他功能你只要重写process_requests这个方法就行了。

class ProxyMiddleware(object):
     def process_request(self,request,spider):
         proxy = random.choice(proxies) # 1.从代理池中选择代理
         proxy = "代理网站地址" # 2,第二种要在settings 中配值得账号和密码
         request.meta['proxy'] = proxy

你可能感兴趣的:(爬虫)