用scrapy爬取GIF图

本篇内容与上一篇大致一致,主要不同的地方为pipelines.py,因为ImagesPipeline不支持GIF格式,因此我们需要重构保存图片方法。


一、items.py


import scrapy


class HupuGifItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    hupu_image_url = scrapy.Field()
    images = scrapy.Field()

二、pipelines.py


# -*- coding: utf-8 -*-

from scrapy.pipelines.images import ImagesPipeline
from hupu_gif import settings
import requests
import os


class HupuGifPipeline(ImagesPipeline):
    def process_item(self, item, spider):
        if 'hupu_image_url' in item:
            images = []


        dir_path = '%s/%s' % (settings.IMAGES_STORE, spider.name)
        if not os.path.exists(dir_path):
            os.makedirs(dir_path)


        for image_url in item['hupu_image_url']:
            us = image_url.split('/')[-1]
            file_path = '%s/%s' % (dir_path, us)
            images.append(file_path)
            if os.path.exists(file_path):
                continue
            with open(file_path, 'wb') as handle:
                response = requests.get('http:'+image_url, stream=True)
                for block in response.iter_content(1024):
                    if not block:
                        break
                    handle.write(block)


        item['images'] = images
        return item


PS:使用重新构造的函数,下载下来的GIF图就是正确的,如果使用scrapy自带的函数保存图片,保存下来的图片都为JPEG格式,请知悉!

你可能感兴趣的:(python,scrapy,爬虫)