按照网上教程尝试编写scrapy spider程序,在运行scrapy crawl sean执行时发现一下错误:
E:\工作\python\scrapy\lagou\lagou>scrapy crawl sean
2018-12-21 12:04:51 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: lagou)
2018-12-21 12:04:51 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Windows-10-10.0.10240-SP0
Traceback (most recent call last):
File "e:\python\python37\lib\site-packages\scrapy\spiderloader.py", line 69, in load
return self._spiders[spider_name]
KeyError: 'sean'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "e:\python\python37\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "e:\python\python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "E:\Python\Python37\Scripts\scrapy.exe\__main__.py", line 9, in <module>
File "e:\python\python37\lib\site-packages\scrapy\cmdline.py", line 150, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "e:\python\python37\lib\site-packages\scrapy\cmdline.py", line 90, in _run_print_help
func(*a, **kw)
File "e:\python\python37\lib\site-packages\scrapy\cmdline.py", line 157, in _run_command
cmd.run(args, opts)
File "e:\python\python37\lib\site-packages\scrapy\commands\crawl.py", line 57, in run
self.crawler_process.crawl(spname, **opts.spargs)
File "e:\python\python37\lib\site-packages\scrapy\crawler.py", line 170, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "e:\python\python37\lib\site-packages\scrapy\crawler.py", line 198, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "e:\python\python37\lib\site-packages\scrapy\crawler.py", line 202, in _create_crawler
spidercls = self.spider_loader.load(spidercls)
File "e:\python\python37\lib\site-packages\scrapy\spiderloader.py", line 71, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: sean'
如下面代码,应该将scrapy crawl sean将其修改为scrapy crawl spider_lagou,与爬虫文件定义的名字一样,修改后即可运行通过。
class SpiderLagouSpider(scrapy.Spider):
name = 'spider_lagou'
allowed_domains = ['lagou.com']
start_urls = ['http://www.lagou.com/'