在scrapy与selemium对接结束后,关闭浏览器的方法

参考https://blog.csdn.net/Hepburn_li/article/details/91039747博客。

一般在DownloaderMiddleware中建立browser对象。例如:

class NewscrawlerDownloaderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.

    def __init__(self, timeout=None, service_args=[]):
        self.logger = getLogger(__name__)
        self.timeout = timeout
        self.browser = webdriver.Chrome(service_args=service_args)
        self.browser.set_window_size(1400, 700)
        self.browser.set_page_load_timeout(self.timeout)
        self.wait = WebDriverWait(self.browser, self.timeout)

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls(timeout=crawler.settings.get('SELENIUM_TIMEOUT'),
                service_args=crawler.settings.get('CHROME_SERVICE_ARGS'))
        crawler.signals.connect(s.spider_closed, signal=signals.spider_closed)
        return s

在新建立的中间件中需要连接信号才可以完成信号与函数的对接,有点像QT的信号槽机制。

那么,接下来只需要定义self.spider_closed这个函数就可以。

    def spider_closed(self):
        self.browser.quit()

使用browser.quit()来完全退出浏览器对象。

你可能感兴趣的:(爬虫,python)