【爬虫】Scrapy实战

Scrapy中间件

实例

1、绕过5秒盾

安装并配置nodejs环境

打开 https://nodejs.org/en/ 查看版本号

curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -
sudo apt-get install -y nodejs
nodejs -v

settings.py

DOWNLOADER_MIDDLEWARES = {
	'NegativeMonitor.middlewares.MyspiderDownloaderMiddleware': 543,
}

middlewares.py

from scrapy import signals
from scrapy.http import HtmlResponse
import cfscrape

class NegativemonitorDownloaderMiddleware(object):

    def process_request(self, request, spider):
        if "sp_spidername" in spider.name and "sp_host" in request.url:
            print("调用中间件,绕过5秒盾...",spider.name)
            scraper = cfscrape.create_scraper(delay=10)
            resp = scraper.get(request.url)
            resp.encoding = "utf-8"
            return HtmlResponse(url=request.url, body=resp.text, request=request, encoding='utf-8',
                                status=200)

你可能感兴趣的:(【爬虫】)