scrapy爬取知乎的中添加代理ip

都是伪代码,不要直接复制,进攻参考

ip都不可以使用,只是我自己写的格式。

 

 

zhihu.py

proxy_pool = [{'HTTP': '182.253.112.43:8080'}]

    def start_requests(self):
        proxy_addr = random.choice(proxy_pool)
        yield Request('.........,meta={'proxy': proxy_addr})

 

setting

DOWNLOADER_MIDDLEWARES = {

    'zhihuuser.middlewares.MyproxiesSpiderMiddleware': 543,
}



IPPOOL = [
    {'ipaddr': '61.19.154.12:8080'},
    {'ipaddr': '185.99.64.75:50167'},
    {'ipaddr': '109.251.185.20:44410'},
    {'ipaddr': '118.174.220.133:50616'},
    {'ipaddr': '182.253.112.43:8080'},
]

 

middlewares.py
lass MyproxiesSpiderMiddleware(object):

      def __init__(self,ip=''):
          self.ip=ip

      def process_request(self, request, spider):
          thisip=random.choice(IPPOOL)
          print("this is ip:"+thisip["ipaddr"])
          request.meta["proxy"]="http://"+thisip["ipaddr"]

 

你可能感兴趣的:(爬虫)