参考https://blog.csdn.net/Hepburn_li/article/details/91039747博客。
一般在DownloaderMiddleware中建立browser对象。例如:
class NewscrawlerDownloaderMiddleware:
# Not all methods need to be defined. If a method is not defined,
# scrapy acts as if the downloader middleware does not modify the
# passed objects.
def __init__(self, timeout=None, service_args=[]):
self.logger = getLogger(__name__)
self.timeout = timeout
self.browser = webdriver.Chrome(service_args=service_args)
self.browser.set_window_size(1400, 700)
self.browser.set_page_load_timeout(self.timeout)
self.wait = WebDriverWait(self.browser, self.timeout)
@classmethod
def from_crawler(cls, crawler):
# This method is used by Scrapy to create your spiders.
s = cls(timeout=crawler.settings.get('SELENIUM_TIMEOUT'),
service_args=crawler.settings.get('CHROME_SERVICE_ARGS'))
crawler.signals.connect(s.spider_closed, signal=signals.spider_closed)
return s
在新建立的中间件中需要连接信号才可以完成信号与函数的对接,有点像QT的信号槽机制。
那么,接下来只需要定义self.spider_closed这个函数就可以。
def spider_closed(self):
self.browser.quit()
使用browser.quit()来完全退出浏览器对象。