python爬虫抓取头条街拍美女图片

开发环境:windows 7

开发工具:pycharm

python版本:python 3.7

用到的库:os,urllib,requests,hashlib

关键步骤:

  1. 通过浏览器分析找到请求接口
  2. 分析接口返回的内容及数据格式
  3. 提取出图片链接
  4. 将图片保存到本地

实现代码:

# coding = utf-8
# author: Alvin

import os
import requests
from urllib.parse import urlencode
from hashlib import md5

def get_page(offset):
    params = {
        'offset': offset,
        'format': 'json',
        'keyword': '街拍',
        'autoload': 'true',
        'count': '20',
        'cur_tab': '1'
    }

    url = 'https://www.toutiao.com/search_content/?' + urlencode(params)
    user_agent = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like     Gecko) Chrome/68.0.3440.84 Safari/537.36'
    headers = {
    'User_Agent': user_agent
    }

    response = requests.get(url,headers = headers)

    if(response.status_code == 200):
        return response.json()

def get_image(json):
    data = json.get('data')
    if data:
        for item in data:
            image_list = item.get('image_list')
            title = item.get('title')

            if(image_list):
                for image in image_list:
                    yield{
                        'image': image.get('url'),
                        'title': title
                    }


def save_image(item):
    if not os.path.exists(item.get('title')):
        os.mkdir(item.get('image'))
    
    local_image_url = item.get('image')
    response = requests.get('http:' + local_image_url)
    if response.status_code == 200:
        file_path = '{0}/{1}.{2}'.format(item.get('title'),md5(response.content).hexdigest(),'jpg')
        if not os.path.exists(file_path):
            with open(file_path,'wb') as f:
                f.write(response.content)

def main(offset):
    json = get_page(offset)
    for item in get_image(json):
        print(item)
        save_image(item)

if __name__ == '__main__':
    main(5)

 

你可能感兴趣的:(技术,人生苦短,我用python,python,爬虫,爬虫,头条,街拍,urllib)