利用Python爬取公众号上的图片(附源码)

代码爬取公众号上的图片

一、导入模块(这个部分有不懂的可以看我的文章《Python第三方库安装详细教程(图文结合)》)

import requests  
from bs4 import BeautifulSoup
import time

二、获取网站的响应信息,并以text打印

url = 'https://mp.weixin.qq.com/s/J7y6TLECYyl2FmVe6XKpww'
headers = {
    # 'referer': 'https://mp.weixin.qq.com',
    'cookie':'pgv_pvid=6670082751; RK=WMxp6iw+3H; ptcz=831e2d5114bbf9b46ee7956fedb62717ee910417ecd992f3e0027f034213caf1; o_cookie=2925851543; pac_uid=1_2925851543; iip=0; tvfe_boss_uuid=94828b35f56c4131; LW_uid=01d6E8a1d0T8Y6S87134I123O2; eas_sid=J116c8t1G078b6f8N1u4m24059; LW_sid=6166y891k1d2s4h7v9M5A8K6e8; rewardsn=; wxtokenkey=777; wwapp.vid=; wwapp.cst=; wwapp.deviceid=',
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48'
}
response = requests.get(url,headers=headers)
# print(response.status_code)  # 打印响应状态码
# 获取网页数据
html = response.text
# print(html)

三、解析返回的html文件,查找所有的img标签并存放在列表内

soup = BeautifulSoup(html, 'html.parser')
img_list = soup.find_all('img')
# print(img_list)

四、遍历img_list,获取内的图片链接

for img_url in img_list:
    name = str(img_list.index(img_url))
    # print(img_url)
    img_link = img_url.get('data-src')
    # print(img_link)
    if img_link != None:
        # print(img_link)
        response2 = requests.get(img_link)
        # 图片是二进制数据,获取要用content,文本文件用text
        img_content = response2.content
        # 设置休眠时间,防止速度过快被封
        time.sleep(5)    

五、保存图片,保存是在for循环内的if语句内

        with open('D:\\图片\\'+name+'.jpeg','wb+') as f:
             f.write(img_content)
             f.close()
             print(f'第{name}张图片下载成功')

六、结果展示






文章对你有帮助的话,麻烦点个赞吧!

你可能感兴趣的:(Python分享,python,开发语言)