本篇内容与上一篇大致一致,主要不同的地方为pipelines.py,因为ImagesPipeline不支持GIF格式,因此我们需要重构保存图片方法。
一、items.py
import scrapy class HupuGifItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() hupu_image_url = scrapy.Field() images = scrapy.Field()
二、pipelines.py
# -*- coding: utf-8 -*-
from scrapy.pipelines.images import ImagesPipeline
from hupu_gif import settings
import requests
import os
class HupuGifPipeline(ImagesPipeline):
def process_item(self, item, spider):
if 'hupu_image_url' in item:
images = []
dir_path = '%s/%s' % (settings.IMAGES_STORE, spider.name)
if not os.path.exists(dir_path):
os.makedirs(dir_path)
for image_url in item['hupu_image_url']:
us = image_url.split('/')[-1]
file_path = '%s/%s' % (dir_path, us)
images.append(file_path)
if os.path.exists(file_path):
continue
with open(file_path, 'wb') as handle:
response = requests.get('http:'+image_url, stream=True)
for block in response.iter_content(1024):
if not block:
break
handle.write(block)
item['images'] = images
return item
PS:使用重新构造的函数,下载下来的GIF图就是正确的,如果使用scrapy自带的函数保存图片,保存下来的图片都为JPEG格式,请知悉!