1、更改scrapy中settings.py
,将爬虫的环境设置为django的环境,导入django的环境:
import os
import django
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'Soufan_crawl.settings') # 导入django的环境,第二个参数是django项目名.settings
django.setup() # 启用Django环境
2、更改scrapy中pipelines.py
:
新建pipeline:
from apps.spider_manage.models import Data
class DjangoORMPipeline(object):
def process_item(self, item, spider):
for data in item['data']:
title = data['title']
number = data['number']
Data.objects.create(title=title, number=number)
return item # 下一个pipline接着处理
settings.py中注册:
ITEM_PIPELINES = {
'data_acquisition.pipelines.DjangoORMPipeline': 2,
}
补充:Data
class Data(models.Model):
title = models.CharField(max_length=128)
number = models.IntegerField()
create_time = models.DateTimeField(default=now)
以上做法虽能给数据的保存带来一点方便,但事实上造成了一些弊端,scrapy的正常调试失效,比如:
(Py3_spider) Soufan_crawl\data_acquisition> scrapy
Traceback (most recent call last):
File "c:\users\onefine\appdata\local\programs\python\python37\Lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\onefine\appdata\local\programs\python\python37\Lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\ONEFINE\Envs\Py3_spider\Scripts\scrapy.exe\__main__.py", line 9, in <module>
File "c:\users\onefine\envs\py3_spider\lib\site-packages\scrapy\cmdline.py", line 110, in execute
settings = get_project_settings()
File "c:\users\onefine\envs\py3_spider\lib\site-packages\scrapy\utils\project.py", line 68, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "c:\users\onefine\envs\py3_spider\lib\site-packages\scrapy\settings\__init__.py", line 292, in setmodule
module = import_module(module)
File "c:\users\onefine\envs\py3_spider\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "" , line 1006, in _gcd_import
File "" , line 983, in _find_and_load
File "" , line 967, in _find_and_load_unlocked
File "" , line 677, in _load_unlocked
File "" , line 728, in exec_module
File "" , line 219, in _call_with_frames_removed
File "D:\PythonProject\DjangoProject\Soufan_crawl\data_acquisition\data_acquisition\settings.py", line 15, in <module>
django.setup() # 启用Django环境
File "c:\users\onefine\envs\py3_spider\lib\site-packages\django\__init__.py", line 19, in setup
configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)
File "c:\users\onefine\envs\py3_spider\lib\site-packages\django\conf\__init__.py", line 57, in __getattr__
self._setup(name)
File "c:\users\onefine\envs\py3_spider\lib\site-packages\django\conf\__init__.py", line 44, in _setup
self._wrapped = Settings(settings_module)
File "c:\users\onefine\envs\py3_spider\lib\site-packages\django\conf\__init__.py", line 107, in __init__
mod = importlib.import_module(self.SETTINGS_MODULE)
File "c:\users\onefine\envs\py3_spider\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "" , line 1006, in _gcd_import
File "" , line 983, in _find_and_load
File "" , line 953, in _find_and_load_unlocked
File "" , line 219, in _call_with_frames_removed
File "" , line 1006, in _gcd_import
File "" , line 983, in _find_and_load
File "" , line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'Soufan_crawl'
(Py3_spider) Soufan_crawl\data_acquisition>
所以不推荐这种做法,若要使用Django的ORM,请直接使用scrapy-djangoitem
,如下。
由于github上介绍的够详细,所以直接给出github地址。
附:
GitHub地址 https://github.com/scrapy-plugins/scrapy-djangoitem