scrapy custom_settings

单独爬虫配置
custom_settings = {
        'SOME_SETTING': 'some value',
    }
不同爬虫pipeline设置
custom_settings = {
    'ITEM_PIPELINES': {
        'video.pipelines.VideoPipeline': 301,
    }
}

cookie设置

custom_settings = {
        'COOKIES_ENABLED':True,  # 在配置文件settings中可以设置成False, 在这个spider中这样设置就可以开启cookies了,其他的配置一样适用
    }

 

settings/在settings同目录下新建custom_settings.py
 

 # -*- coding: utf-8 -*- 
 custom_settings_for_spider1 = { 
      'LOG_LEVEL': 'INFO', 
      'DOWNLOAD_DELAY': 0, 
      'COOKIES_ENABLED': False, # enabled by default 
      'DOWNLOADER_MIDDLEWARES': { 
             'video_spider.middlewares.ProxiesMiddleware': 400, 
             'video_spider.middlewares.SeleniumMiddleware': 543, 
             # 将scrapy默认的user-agent中间件关闭 12                         
            'scrapy.downloadmiddlewares.useragent.UserAgentMiddleware': None, 
         }, 
      }

在spider文件中引入custom_settings
 

import scrapy 
from scrapy import Request
from scrapy.utils.project import get_project_settings 
from scrapy import signals 
from pydispatch import dispatcher 
# setting 
class ShanbaySpider(scrapy.Spider): 
     name = 'shanbay' 
     allowed_domains = ['shanbay.com'] 
     start_urls = ['http://shanbay.com/'] 
     custom_settings = custom_settings_for_spider1

 

你可能感兴趣的:(技术)