scrapyd自定义下载pipeline

当标准scrapyd 下载 pipeline 无法满足需求时,可以自定义pipeline。
仅举例文件下载和图片下载pipeline。
扩展文件(图片) FilesPipeline (ImagesPipeline)仅需重写以下两个方法:

get_media_request(self, item, info)   # 返回一个Request对象
 # 当上面的Requsts下载完成后回调这个方法,然后填充files或images字段
item_completed(self, results, item, info) 
举例:

pipelines.py

import scrapyd
from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem

class MyImagePipeline(ImagesPipeline):

    def get_media_requests(self, item, info):
        for image_url in item['image_urls']:
            yield scrapy.Request(image_url)

    def item_completed(self, request, item, info):
        image_paths = [x['path'] for ok, x in request if ok]
        if not image_paths:
            raise DropItem("item contains no images")
        item['image_paths'] = image_paths
        return item

你可能感兴趣的:(scrapyd自定义下载pipeline)