【Python】+【selenium】从百度图片上爬取图片

【Python】+【selenium】从百度图片上爬取图片

利用selenium从百度图片下载图片,程序很简单,在网页上找到 img 标签,获取其中的 src 属性值,利用**从GAN学习指南:从原理入门到制作生成Demo**中在第一步搜集数据的代码,改的简单些方便以后的使用。

因为喜欢玉桂狗所以是就用来下玉桂狗的图片啦~

图片详细查看页: 【Python】+【selenium】从百度图片上爬取图片_第1张图片

代码如下

import requests
import os
import traceback
from selenium import webdriver
from selenium.webdriver.common.by import By
import time

# 定义下载函数
def download(url, filename):
    if os.path.exists(filename):
        print('file exists!')
        return
    try:
        r = requests.get(url, stream=True, timeout=60)
        r.raise_for_status()
        with open(filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=1024):
                if chunk:  # filter out keep-alive new chunks
                    f.write(chunk)
                    f.flush()
        return filename
    except KeyboardInterrupt:
        if os.path.exists(filename):
            os.remove(filename)
        raise KeyboardInterrupt
    except Exception:
        traceback.print_exc()
        if os.path.exists(filename):
            os.remove(filename)

# 创建下载目录
if os.path.exists('imgs') is False:
    os.makedirs('imgs')

# 打开谷歌浏览器Chrome 
browser = webdriver.Chrome()

# 进入百度图片详细查看页
url = 'https://image.baidu.com/search/detail?ct=503316480&z=0&ipn=d&word=%E7%8E%89%E6%A1%82%E7%8B%97&step_word=&hs=2&pn=0&spn=0&di=122032263430&pi=0&rn=1&tn=baiduimagedetail&is=0%2C0&istype=0&ie=utf-8&oe=utf-8&in=&cl=2&lm=-1&st=undefined&cs=4287920973%2C3127191504&os=1474905083%2C20535971&simid=3427141834%2C56145588&adpicid=0&lpn=0&ln=1059&fr=&fmq=1540558077513_R&fm=&ic=undefined&s=undefined&se=&sme=&tab=0&width=undefined&height=undefined&face=undefined&ist=&jit=&cg=&bdtype=0&oriquery=&objurl=http%3A%2F%2Fcdn.duitang.com%2Fuploads%2Fitem%2F201602%2F14%2F20160214211446_izjX4.thumb.700_0.jpeg&fromurl=ippr_z2C%24qAzdH3FAzdH3Fooo_z%26e3B17tpwg2_z%26e3Bv54AzdH3Fks52AzdH3F%3Ft1%3Dcnldal90c&gsm=0&rpstart=0&rpnum=0&islist=&querylist='
browser.get(url)

# 设置下载的图片数量及进行下载
start = 1
end = 800
for i in range(start,end + 1):
#     # 获取图片位置
    img = browser.find_elements_by_xpath("//img[@class='currentImg']")
    for ele in img:
        #   获取图片链接
        target_url = ele.get_attribute("src")
        #   设置图片名称。以图片链接中的名字为基础选取最后25个字节为图片名称。
        img_name = target_url.split('/')[-1]
        filename = os.path.join('imgs', img_name[-25:])
        download(target_url, filename)
#     # 下一页
    next_page = browser.find_element_by_class_name("img-next").click()
    time.sleep(3)
#     # 显示进度
    print('%d / %d' % (i, end))

# 关闭浏览器
browser.quit()

希望能帮到你~~

你可能感兴趣的:(Demo记录)