用python和Beautiful Soup来做一个图片下载器(优化版)

前言
这是一个使用python和beautifulsoup4编写的图片下载程序,主要支持视觉中国网站,可以获取当前网址网页的图片列表,有3种图片下载方式,全部下载、单张下载以及任意张下载。
本例是在之前的一篇博文的基础上,进行了优化了,使界面看起来更整齐,增加了网页浏览。

实际窗口:
用python和Beautiful Soup来做一个图片下载器(优化版)_第1张图片
如上图,可以直接在“图片获取网址”处输入网址,然后点击获取列表按钮,也可以先在“网址”处输入网址,如视觉中国,可以在界面内的浏览器浏览网页,寻找需要的图片页面,网址会自动更新。

右侧是下载按钮,图片保存地址默认是桌面,也可以手动修改。

所以,本程序大致分为三部分,一是界面内网页浏览,使用Qwebengineview模块,一是网页数据获取和分析,使用requests和beautifulsoup4模块,一是图片下载,使用urllib模块。

以下分别说明:

一、界面网页浏览
在窗口界面内浏览网页,主要使用Qt的Qwebengineview模块,或者说部件。
通常情况下,创建一个实力即可:

self.webview=QWebEngineView(self)

但本例中为了实现网页浏览时,可以点击网页上的链接,实现窗口内跳转,于是重写了Qwebengineview的createwindow函数:

class MyWebEngineView(QWebEngineView): 
    """此处重载了QWebEngineView,当点击网页内链接时,可以在本窗口内跳转""" 
    def createWindow(self,QWebEnginePage_WebWindowType):
        page = MyWebEngineView(self)
        page.urlChanged.connect(self.on_url_changed)
        return page
    def on_url_changed(self,url):
        self.setUrl(url)

然后在主窗口界面调用上面的类实例:

self.webview=MyWebEngineView(self)
        self.webview.setGeometry(20,120,480,600)
        self.webview.setZoomFactor(0.4)
        self.webview.loadFinished.connect(self.urlchange_f)
        self.webview.urlChanged.connect(self.urlchange_f)

这样,当我们在窗口上打开一个网页时,就可以实现窗口内跳转了。
用python和Beautiful Soup来做一个图片下载器(优化版)_第2张图片
二、网页获取数据
首先是通过requests的get函数,获取给定网址的原始数据。

 def get_url_text(self, url):   

        # 获取当前网址下的总体信息
        url_info = requests.get(url)
        url_info.encoding = url_info.apparent_encoding
        url_text = url_info.text
        return url_text

然后将获取的网页数据返回,网页原始数据大致如下:
数据来自视觉中国网页

DOCTYPE html>             
<html class="no-js" lang="zh-CN"> 
  <head>
    <meta charset="utf-8"> 
    <meta http-equiv="x-ua-compatible" content="IE=edge,Chrome=1"> 
    <title>科技图片_科技高清图片素材库 title>
    <meta name="description" content="视觉中国旗下网站(VCG.COM)通过科技图片搜索页面分享:科技高清图片,优质科技图片素材,方便用户下载
与购买正版科技图片,国内独家优质图片,100%正版保障,免除侵权烦恼,一次授权全球永久可商用。">
    <meta name="keywords" content="科技图片,科技高清图片,科技图片素材,科技图片素材库">
    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=0"> 
    <meta http-equiv="x-dns-prefetch-control" content="on"> 
    <link rel="dns-prefetch" href="https://res.cfp.cn"> 
    <link rel="dns-prefetch" href="https://alifei00.cfp.cn"> 
    <link rel="dns-prefetch" href="https://alifei01.cfp.cn"> 
    <link rel="dns-prefetch" href="https://alifei02.cfp.cn"> 
    <link rel="dns-prefetch" href="https://alifei03.cfp.cn"> 
    <link rel="dns-prefetch" href="https://alifei04.cfp.cn"> 
    <link rel="dns-prefetch" href="https://tenfei01.cfp.cn"> 
    <link rel="dns-prefetch" href="https://tenfei02.cfp.cn"> 
    <link rel="dns-prefetch" href="https://tenfei03.cfp.cn"> 
    <link rel="dns-prefetch" href="https://tenfei04.cfp.cn"> 
    <link rel="dns-prefetch" href="https://tenfei05.cfp.cn"> 
    <link rel="dns-prefetch" href="https://gossv.cfp.cn"> 
    <link rel="icon" href="//alifei00.cfp.cn/static/favicon.ico"> 
    <link rel="preload" href="https://res.cfp.cn/res/2_1702966625307/css/index.css" as="style"> 
    <link rel="stylesheet" href="https://res.cfp.cn/res/2_1702966625307/css/index.css"> 
    <link rel="preload" href="https://res.cfp.cn/vci/vci7b0b497/vcgicon.cdn.css" as="style"> 
    <link rel="stylesheet" href="https://res.cfp.cn/vci/vci7b0b497/vcgicon.cdn.css"> 
    <link rel="preload" href="/creative-image/keji/?cssId=csscreativesearchjnkeprh0re5bt6s1g0&css=precss.css" as="style"> 
    <link rel="stylesheet" href="/creative-image/keji/?cssId=csscreativesearchjnkeprh0re5bt6s1g0&css=precss.css"> 
    <link rel="canonical" href="https://www.vcg.com/creative-image/keji/"> 
    <base href="/"> 
  <script src="//aeu.alicdn.com/waf/antidomxss_v640.js">script><script src="//aeu.alicdn.com/waf/interfaceacting220819.js">script>head> 
  <body>
    <div id="root"><section class="jss5"><div class="jss7"><header class="wd-header"><div class="wd-navigate"><div class="wd-logo"><a class="wd-logo-i" href="/" title="网站首页"><img class="site-logo" src="//alifei00.cfp.cn/static/img/logo.svg" alt="logo"/>a>div><button type="button" class="wd-nav-menu"><i class="vcico vcico-menu">i>button><nav class="wd-nav wd-nav-web"><div class="wd-navlink-w"><a class="wd-navlink" href="/creative/" title="图片">图片<i class="vcico vcico-down wd-icon">i>a><div class="wd-navpop"><div class="wd-navpop-cell"><a class="wd-navpop-link" href="/creative/" title="照片">照片a><a class="wd-navpop-link" href="/illustration/" title="插画">插画a>div>div>div><div class="wd-navlink-w"><a class="wd-navlink" href="/creative-video/" title="视频">视频a>div><div class="wd-navlink-w"><a class="wd-navlink" href="/music/" title="音频">音频<i class="vcico vcico-down wd-icon">i>a><div class="wd-navpop"><div class="wd-navpop-cell"><a class="wd-navpop-link" href="/music/" title="音乐">音乐a><a class="wd-navpop-link" href="/music/sound-effects/" title="音效">音效a>div>div>div><div class="wd-navlink-w"><a class="wd-navlink" href="/font-search/" title="字体">字体a>div><div class="wd-navlink-w"><a class="wd-navlink" href="javascript:void(0)" title="更多服务">更多服务<i class="vcico vcico-down wd-icon">i>a><div class="wd-navpop"><div class="wd-navpop-cell"><a class="wd-navpop-link" href="https://www.vcgapi.com" title="开放平台">开放平台a><a class="wd-navpop-link" href="/dam/" title="视觉云库">视觉云库a><a class="wd-navpop-link" href="/aigc/" title="灵感绘图">灵感绘图a><a class="wd-navpop-link" href="/ailab/" title="AI Lab">AI Laba>div><div class="wd-navpop-cell"><a class="wd-navpop-link" href="/addservice/" title="整合营销">整合营销a><a class="wd-navpop-link" href="/creative-insight/topSearches" title="创意洞察">创意洞察a><a class="wd-navpop-link" href="/public/" title="公益">公益a><a class="wd-navpop-link" href="/aigcHelp/" title="AIGC">AIGCa>div>div>div>nav><div class="wd-usermenu"><div class="wd-um-i hook-login-btn"><button type="button" class="wd-button" title="登录 / 注册"><span>登录 / 注册span>button>div>div>div><div class="wd-sticky stricky-header wd-st-none"><div class="wd-search"><form><div class="wd-s-content"><div class="wd-s-input-w"><input type="text" class="wd-s-input" autoComplete="off" placeholder="搜索图片素材..." value="科技"/><div class="wd-suggest">div>div><button class="wd-s-btn" type="button" title="搜索"><i class="vcico vcico-search">i>button>div>form>div><div class="wd-sourcebar"><div class="wd-sa-tabs"><div class="wd-sa-tabs-inner"><a class="wd-sa-link active" href="/creative-image/keji/"><i class="vcico vcico-creative wd-ico">i>图片a><a class="wd-sa-link" href="/creative-video-search/keji/"><i class="vcico vcico-video wd-ico">i>视频a><a class="wd-sa-link" href="/music-search/keji/" target="_blank"><i class="vcico vcico-music wd-ico">i>音乐a><div class="wd-sa-tabs-sort"><span class="wd-sa-select-tit"><i class="vcico vcico-creative wd-ico">i>图片<i class="vcico vcico-down wd-icon">i>span><span class="wd-sa-link active">最佳span><span class="wd-sa-link">最新span>div>div>div><div class="wd-sa-mobile wd-md-flex-show"><div class="wd-pl-title" title="全部选项"><div class="wd-pl-showmore"><i>i><i>i><i>i>div>div>div><div class="wd-sa-control wd-md-hide"><div class="rec-label-wapper"><label 
class="wd-checkbox-label wd-pl-title"> <span class="wd-checkbox"><input type="checkbox" class="wd-checkbox-input"/><span class="wd-checkbox-inner">span>span><span>竖图span>label><label class="wd-checkbox-label wd-pl-title"><span class="wd-checkbox"><input type="checkbox" class="wd-checkbox-input"/><span class="wd-checkbox-inner">span>span><span>本土内容span>label>div><div class="wd-pl-title"><span>构图span> <i class="vcico vcico-down wd-icon">i>div><div class="wd-pl-title"><span>色彩span> <i class="vcico vcico-down wd-icon">i>div><div class="wd-pl-title"><span>格式span> <i class="vcico vcico-down wd-icon">i>div><div class="wd-pl-title" title="全部选项"><div class="wd-pl-showmore"><i>i><i>i><i>i>div>div>div><button tabindex="0" class="jss31 jss16 toggle-color-btn" type="button"><span class="jss17"><i class="colorsearchico">i>色彩搜索span><span class="jss34">span>button>div>div>header><div class="search-image-tabs"><span class="search-image-tab search-image-active">全部span><span class="search-image-tab">照片span><span class="search-image-tab">插画span><span class="search-image-tab">模板span><span class="search-image-tab">元素span><span class="search-image-tab">图标span>div><div class="view_content"><section class="view_section"><div class="onebox-wapper fullKeyword"><div class="scroll-keywords"><a class="scroll-keywords-item" href="/creative-image/153798" title="科技 展览">科技 展览a><a class="scroll-keywords-item" href="/creative-image/694844" title="科技 交易">科技 交易a><a class="scroll-keywords-item" href="/creative-image/2835420" title="科技 大道">科技 大道a><a class="scroll-keywords-item" href="/creative-image/3886842" title="科技 新城">科技 
 新城a><a class="scroll-keywords-item" href="/creative-image/6959212" title="科技 倒影">科技 倒影a><a class="scroll-keywords-item" 
href="/creative-image/33229" title="科技 创意"> 科技 创意a><a class="scroll-keywords-item" href="/creative-image/248621" title="科技  
线稿"> 科技 线稿a><a class="scroll-keywords-item" href="/creative-image/6817189" title="科技 晚上">科技 晚上a><a class="scroll-keywords-item" href="/creative-image/shipinhuiyi" title="视频会议">视频会议a><a class="scroll-keywords-item" href="/creative-image/hulianwang" title="互联网">互联网a><a class="scroll-keywords-item" href="/creative-image/sanweituxing" title="三维图形">三维图形a><a class="scroll-keywords-item" href="/creative-image/chengshi" title="城市">城市a>div>div><div class="secend-tab-wapper"><div class="wd-cross-bar"><div title="最佳" class="secend-tab active">最佳div><div title="最新" class="secend-tab">最新div><div class="wd-cross-indicator">div>div>div><div class="source-box"><div class="filter_lit_waper"><div class="result_count"><span class="count"><strong class="number">3,313,018strong> 个结果span>div>div><div class="batch_control isDisplayFlex"><div class="batch-tab"><div class="batch-tab-inner"><span class="ctl_item batch">

获取到原始数据后,就可以对这个数据进行处理,提取其中的图片信息:

 def get_img_url_list(self, url):     

        # 对获取的网页信息进行处理,提取图片链接地址
        url_text = self.get_url_text(url) 
        #print(url_text)
        soup = BeautifulSoup(url_text, 'html.parser') 
        url_list = soup.find('div', class_='gallery_inner') 
        img_url_list = url_list.find_all('a') 
        # print(img_url_list) 
        lli = img_url_list[:(len(img_url_list) - 1)] 

        # print(len(lli)) 
        img_num = str(len(lli)) 
        self.lbl_imgnumber.setText('共找到: ' + img_num + ' 张图片') 
        #self.lbl_imgnumber.adjustSize()

        self.te.clear()
        # 对获取到的图片链接地址信息再次处理,将其分割并有序存入list表中。
        for i, img_url_list_s in enumerate(lli): 
            # print(img_url_list_s) 
            li = img_url_list_s.find('img') 
            name1 = li['alt'] 
            # print(name1) 
            img_url = li['data-src'] 
            img_src = 'https:' + img_url 
            # self.get_image_info(img_src,name1) 
            self.img_url_list.append({'name': name1, 'img-src': img_src}) 
        #使图片列表显示与list中的索引是一一对应的,这样下载时编号不会出错
        for i,itm in enumerate(self.img_url_list): 
            index=str(i) 
            imgret=itm 
            imgname=imgret['name'] 
            imgurl=imgret['img-src'] 
            print(index) 
            print(imgname+imgurl) 
            self.te.append(str(i + 1) + ' ' + imgname+imgurl+ '\r\n') 

我们提取了图片数据后,储存在一个list中,同时显示到文本框中。

三、图片下载
图片下载,其实是将当前网址指向的图片保存到本地,使用request.urlretrieve函数:

request.urlretrieve(urlpath, filename=fl)

因为我们已经获取了网页的图片列表,并保存在list中,此时无论是全下载还是单张下载,只需要对list中的元素进行操作即可。

完整代码:

#!/usr/bin/env python 
# -*- coding: utf-8 -*-
import webbrowser
import requests
import time
import sys
from urllib import request
from bs4 import BeautifulSoup
from PyQt5.QtCore import *
from PyQt5.QtWidgets import *
from PyQt5.QtGui import *
from PyQt5.QtWebEngineWidgets import QWebEngineView,QWebEnginePage
from common_help import CommonHelper


class MyWebEngineView(QWebEngineView): 
    """此处重载了QWebEngineView,当点击网页内链接时,可以在本窗口内跳转""" 
    def createWindow(self,QWebEnginePage_WebWindowType):
        page = MyWebEngineView(self)
        page.urlChanged.connect(self.on_url_changed)
        return page
    def on_url_changed(self,url):
        self.setUrl(url)



class example(QWidget):
    def __init__(self) -> None:
        super().__init__()
        self.initUI()

    def initUI(self):
        # 加载QSS样式文件
        self.stylefile = 'python-paichong\qss_main_style.qss'

        self.qssstyle = CommonHelper.readQSS(self.stylefile)

        self.img_url_list = []
        
        self.dic="C:\\Users\\rongjv\Desktop"
        
        #self.webview=QWebEngineView(self)
        self.webview=MyWebEngineView(self)
        self.webview.setGeometry(20,120,480,600)
        self.webview.setZoomFactor(0.4)
        self.webview.loadFinished.connect(self.urlchange_f)
        self.webview.urlChanged.connect(self.urlchange_f)
        
        self.lbl_webview=QLabel(self)
        self.lbl_webview.setText("网址:")
        self.lbl_webview.setGeometry(20,100,40,20)
        
        self.le_website=QLineEdit(self)
        self.le_website.setGeometry(60,100,300,20)
        self.le_website.setText("https://www.vcg.com/")
        
        self.btn_website=QPushButton('浏览',self)
        self.btn_website.setGeometry(380,100,60,20)
        self.btn_website.clicked.connect(self.photo_link2)
        
        
        self.lbl_url = QLabel(self)
        self.lbl_url.setText('图片获取网址:')
        self.lbl_url.setGeometry(20, 60, 80, 20)
        #self.lbl_url.adjustSize()

        # 网址输入框
        self.le_url = QLineEdit(self)
        self.le_url.setGeometry(100, 60, 400, 20)
        self.le_url.setText('https://www.vcg.com/creative-image/keji/') #视觉中国网站-科技主题图片
        # self.le_url.adjustSize()

        self.btn_back=QPushButton('返回',self)
        self.btn_back.setGeometry(460,100,40,20)
        self.btn_back.clicked.connect(self.webpageback)
        
        
        self.btn_get_url = QPushButton('获取列表', self)
        self.btn_get_url.setGeometry(520, 60, 80, 30)
        #self.btn_get_url.adjustSize()
        self.btn_get_url.clicked.connect(self.get_image_list)
        
        # 图片列表
        self.te = QTextEdit(self)
        self.te.setGeometry(520, 120, 320, 600)

        self.lbl_list_img = QLabel(self)
        self.lbl_list_img.setText('图片列表:')
        self.lbl_list_img.setGeometry(520, 100, 100, 20)
        #self.lbl_list_img.adjustSize()
        
          # 图片张数统计
        self.lbl_imgnumber = QLabel(self)
        self.lbl_imgnumber.setText('0张图片')
        self.lbl_imgnumber.setGeometry(620, 100, 160, 20)
        #self.lbl_imgnumber.adjustSize()
        
        
        self.btn_save_document = QPushButton('选择保存地址', self)
        self.btn_save_document.setGeometry(900, 130, 100, 30)
        #self.btn_save_document.adjustSize()
        self.btn_save_document.clicked.connect(self.save_document)

        # 显示选择的保存文件夹地址
        self.lbl_save_document1=QLabel(self)
        self.lbl_save_document1.setGeometry(900,160,60,20)
        self.lbl_save_document1.setText("保存路径:")
        
        self.lbl_save_document2 = QLabel(self)
        self.lbl_save_document2.setGeometry(900, 180, 300, 20)
        self.lbl_save_document2.setText(self.dic)
        #self.lbl_save_document2.adjustSize()

        self.btn_download_all = QPushButton('全部下载', self)
        self.btn_download_all.setGeometry(900, 240, 100, 30)
        #self.btn_download_all.adjustSize()
        self.btn_download_all.clicked.connect(self.download_img_all)

        self.btn_download_single = QPushButton('单张下载', self)
        self.btn_download_single.setGeometry(900, 320, 100, 30)
        #self.btn_download_single.adjustSize()
        self.btn_download_single.clicked.connect(self.download_img_single)
        
        # 单张下载选择号
        self.le_img_num_single = QLineEdit(self)
        self.le_img_num_single.setGeometry(1100, 320, 60, 20)

        self.lbl_img_num_single = QLabel(self)
        self.lbl_img_num_single.setText('下载编号')
        self.lbl_img_num_single.setGeometry(1030, 320, 60, 20)

        self.btn_download_x = QPushButton('任意张下载', self)
        self.btn_download_x.setGeometry(900, 400, 100, 30)
        #self.btn_download_x.adjustSize()
        self.btn_download_x.clicked.connect(self.download_img_x)
        
        # 多张下载起始编号
        self.le_img_start_num_x = QLineEdit(self)
        self.le_img_start_num_x.setGeometry(1030, 430, 60, 20)

        # 起始编号文本
        self.lbl_img_start_num_x = QLabel(self)
        self.lbl_img_start_num_x.setText('起始编号')
        self.lbl_img_start_num_x.setGeometry(1030, 400, 60, 20)

        # 多张下载张数
        self.le_img_num_x = QLineEdit(self)
        self.le_img_num_x.setGeometry(1100, 430, 60, 20)

        # 下载张数文本
        self.lbl_img_num_x = QLabel(self)
        self.lbl_img_num_x.setText('下载张数')
        self.lbl_img_num_x.setGeometry(1100, 400, 60, 20)


        # 标题带超链接
        self.lbl_head_title = QLabel(self)
        self.lbl_head_title.setOpenExternalLinks(True)
        # self.lbl_head_title.linkHovered.connect()
        # self.lbl_head_title.linkActivated.connect()
        self.lbl_head_title.setText("支持网站:视觉中国网https://www.vcg.com/")
        self.lbl_head_title.setGeometry(20, 20, 400, 20)
        #self.lbl_head_title.adjustSize()

        # 图片带超链接(baidu)
        self.lbl_link_1 = QLabel(self)
        self.lbl_link_1.setPixmap(QPixmap('python-paichong\img\百度logo.png'))  # 设置图标,与文字冲突,则setText的文字不显示
        self.lbl_link_1.mousePressEvent = self.photo_link  # 设置图片点击事件
        self.lbl_link_1.setGeometry(320, 10, 80, 40)

        self.lbl_link_2 = QLabel(self)
        self.lbl_link_2.setPixmap(QPixmap('python-paichong\img\sjzglogo.png'))
        self.lbl_link_2.mousePressEvent = self.photo_link2
        self.lbl_link_2.setGeometry(400, 10, 100, 40)
        
        

        # 窗口
        self.setGeometry(100, 40, 1000, 600)
        self.setWindowTitle('图片下载器')
        self.setWindowIcon(QIcon('python-paichong\img\img1.png'))
        self.setStyleSheet(self.qssstyle)
        self.showMaximized()            #窗口最大化
        self.show()

    def photo_link(self, test):
        pass
        #webbrowser.open('https://www.baidu.com/')
        #self.webview.load(QUrl("https://www.baidu.com/"))

    def photo_link2(self, test):
        #webbrowser.open('https://www.vcg.com/')
        self.urltemp=self.le_website.text()
        self.webview.load(QUrl(self.urltemp))
        
    def urlchange_f(self):
        """当网页URL改变时,获取其URL并显示"""
        self.webview.setZoomFactor(0.5)
        self.urltemp=self.webview.url().url()
        #self.dic=self.webview.url().url()
        self.le_url.setText(self.urltemp)
        #print(self.webview.url().url())
        self.le_website.setText(self.urltemp)
    def webpageback(self):
        self.history=self.webview.history()
        self.history.back()

    def get_image_list(self):

        # 获取图片列表

        try:
            self.url = self.le_url.text()
            # print(self.url)
            print("已经开始执行,请等待!")
            begin_time = int(time.time())
            self.get_img_url_list(self.url)
            # get_page_info()
            end_time = int(time.time())
            print(f"持续时间:{end_time - begin_time}秒")
            print("执行结束")
        except Exception as e:
            QMessageBox.information(self, '输入提示', '异常: '+str(e), QMessageBox.Yes | QMessageBox.No,
                                    defaultButton=QMessageBox.No)
            print('异常:',e)

    def save_document(self):
        # 选择文件夹并显示文件夹路径
        self.dic = QFileDialog.getExistingDirectory(self, "保存地址", 'C:/')
        # print(self.dic)
        self.lbl_save_document2.setText(self.dic)
        self.lbl_save_document2.adjustSize()

    def download_img_all(self):

        # 全部图片下载

        for self.img_url_list_1 in self.img_url_list:
            # print(self.img_url_list_1)
            self.img_src = self.img_url_list_1['img-src']
            self.name = self.img_url_list_1['name']
            # print('当前下载图片:'+self.name+'\n'+self.img_src)
            self.save_image(self.img_src, self.name)

    def download_img_single(self):

        # 根据所选择的编号,下载单张图片

        self.img_num_single = int(self.le_img_num_single.text())
        self.img_url_single = self.img_url_list[self.img_num_single - 1]
        self.img_url_single_name = self.img_url_single['name']
        self.img_url_single_src = self.img_url_single['img-src']
        print('当前下载图片:' + '\n' + '图片名称:' + self.img_url_single_name +
              '\n' + '图片链接:' + self.img_url_single_src)
        self.save_image(self.img_url_single_src, self.img_url_single_name)

    def download_img_x(self):

        # 根据所选起始编号和张数,下载图片
        if self.le_img_num_x.text() == '' or self.le_img_start_num_x.text() == '':
            QMessageBox.information(self, '输入提示', '请输入正确的起始编号和张数!', QMessageBox.Yes | QMessageBox.No,
                                    defaultButton=QMessageBox.No)
            # print('输入错误!')
            exit
        else:

            self.img_num_x_startnum = int(self.le_img_start_num_x.text())
            self.img_num_x_num = int(self.le_img_num_x.text())
            self.img_x_list = self.img_url_list[
                              (self.img_num_x_startnum - 1):(self.img_num_x_startnum - 1 + self.img_num_x_num)]

            for self.img_x_list_1 in self.img_x_list:
                self.img_x_src = self.img_x_list_1['img-src']
                self.name_x = self.img_x_list_1['name']
                # print('当前下载图片:'+self.name+'\n'+self.img_src)
                print('当前下载图片:' + '\n' + '图片名称:' + self.name_x +
                      '\n' + '图片链接:' + self.img_x_src)
                self.save_image(self.img_x_src, self.name_x)

            print(self.img_num_x_num)

    def get_url_text(self, url):

        # 获取当前网址下的总体信息
        url_info = requests.get(url)
        url_info.encoding = url_info.apparent_encoding
        url_text = url_info.text
        return url_text

    def get_img_url_list(self, url):

        # 对获取的网页信息进行处理,提取图片链接地址
        url_text = self.get_url_text(url)
        #print(url_text)
        soup = BeautifulSoup(url_text, 'html.parser')
        url_list = soup.find('div', class_='gallery_inner')
        img_url_list = url_list.find_all('a')
        # print(img_url_list)
        lli = img_url_list[:(len(img_url_list) - 1)]

        # print(len(lli))
        img_num = str(len(lli))
        self.lbl_imgnumber.setText('共找到: ' + img_num + ' 张图片')
        #self.lbl_imgnumber.adjustSize()

        self.te.clear()
        # 对获取到的图片链接地址信息再次处理,将其分割并有序存入list表中。
        for i, img_url_list_s in enumerate(lli):
            # print(img_url_list_s)
            li = img_url_list_s.find('img')
            name1 = li['alt']
            # print(name1)
            img_url = li['data-src']
            img_src = 'https:' + img_url
            # self.get_image_info(img_src,name1)
            self.img_url_list.append({'name': name1, 'img-src': img_src})
        #使图片列表显示与list中的索引是一一对应的,这样下载时编号不会出错
        for i,itm in enumerate(self.img_url_list):
            index=str(i)
            imgret=itm
            imgname=imgret['name']
            imgurl=imgret['img-src']
            print(index)
            print(imgname+imgurl)
            self.te.append(str(i + 1) + ' ' + imgname+imgurl+ '\r\n')

    def save_image(self, urlpath, name):

        # 依据图片的详细完整链接地址,保存图片至本地文件夹

        # img_dir='C:/Users/rongjv/Desktop'
        img_dir = self.dic
        # 远程打开图片写入到本地  第一种方式open
        # with open(f"{image_dir}/{img_name}", mode="wb") as add:
        # add.write(requests.get(image_path).content)
        # 远程打开图片写入到本地  第二种方式urllib
        fl=f"{img_dir}/{name}.jpg"
        #print(fl)
        request.urlretrieve(urlpath, filename=fl)
        QMessageBox.information(self, '提示', '下载完成', QMessageBox.Ok,
                                    defaultButton=QMessageBox.Ok)

if __name__ == '__main__':
    app = QApplication(sys.argv)
    ex = example()
    sys.exit(app.exec_())


实例演示:

你可能感兴趣的:(python,python,开发语言,beautifulsoup,爬虫)