Python | 使用Python爬取Wallhaven网站壁纸并上传百度网盘

更多详情请查看Honker

Python | 使用Python爬取Wallhaven网站壁纸并上传百度网盘

给大家推荐一款超好用的壁纸下载网站—— wallhaven

第一次知道这个网站的时候,惊为天人。顿时有一种挖到宝藏的feel。给用户带来的是丝滑的体验。壁纸全都是免费下载。对比国内相关壁纸网站,可谓是业界良心。

壁纸这么多,当然就要用Python下载。

如何存储?本地空间不够,当然网盘来凑。

如何持续爬取?部署服务器

编程序

见博文

上传百度网盘

因为需要上传百度网盘,需加入相关代码:

class Adapter:
    """
    bypy 适配器
    前提运行 bypy info 登陆成功
    """
    def __init__(self):
        self._bp = ByPy()
 
    def upload(self,localpath,remotepath,**kwargs):
        """
        上传
        :param localpath:
        :param remotepath: /videos  实际路径/bypy/videos
        :param kwargs:
        :return:
        """
        self._bp.upload(localpath=localpath,remotepath=remotepath,**kwargs)

!!!注意:代码运行的前提是 bypy info运行成功

并修改函数 down_pic(image_url)

def down_pic(image_url):

    try:
        path = 'temporary data/{}'.format((image_title.split('/')[-1]) + (image_url.split('/')[-1]))
        print(path)
        opener = request.build_opener()
        opener.addheaders = [('User-Agent',
                            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36')]
        request.install_opener(opener)
        request.urlretrieve(image_url, path)
        adapter.upload(localpath=path, remotepath='image/wallhaven/')
        os.remove(path)
    except Exception as m:
        print(m)
  • !!!注意: 需提前在程序工作目录 创建文件夹 temporary data
  • **!!!注意: **需提前在百度网盘 创建文件夹 image/wallhaven/
  • os.remove():既然已经上传,就可以删除本地壁纸啦(认为本地存储足够的,可以删去此代码)

最终的代码

from requests_html import HTMLSession   # 用于数据请求、数据提取、相较于其他库更加简洁方便
from urllib import request              # 本例中该库只用于下载保存图片
import os
from bypy import ByPy
 
 
class Adapter:
    """
    bypy 适配器
    前提运行 bypy info 登陆成功
    """
    def __init__(self):
        self._bp = ByPy()
 
    def upload(self,localpath,remotepath,**kwargs):
        """
        上传
        :param localpath:
        :param remotepath: /videos  实际路径/bypy/videos
        :param kwargs:
        :return:
        """
        self._bp.upload(localpath=localpath,remotepath=remotepath,**kwargs)

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36'}  # 请求头,用于反反爬

session = HTMLSession()

urls = []

num_int = 2
for i in range(1, num_int)
    # r = session.get('https://wallhaven.cc/toplist?page={}'.format(i))
    try:
        r = session.get('https://wallhaven.cc/search?categories=110&purity=100&topRange=1y&sorting=toplist&order=desc&page={}'.format(i))
        urls.extend(list(r.html.links))
        print(i, len(list(r.html.links)))
    except Exception as m:
        print(m)
print(len(urls))


adapter = Adapter()
def down_pic(image_url):

    try:
        path = 'temporary data/{}'.format((image_title.split('/')[-1]) + (image_url.split('/')[-1]))
        print(path)
        opener = request.build_opener()
        opener.addheaders = [('User-Agent',
                            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36')]
        request.install_opener(opener)
        request.urlretrieve(image_url, path)
        adapter.upload(localpath=path, remotepath='image/wallhaven/')
        os.remove(path)
    except Exception as m:
        print(m)


for url in urls:
    try:
        session1 = HTMLSession()
        r1 = session1.get(url)
        sr = r1.html.find("img#wallpaper", first=True)
        image_url = sr.attrs['src']
        image_title = sr.attrs['alt']
        print(image_url)
        print(image_title)
        down_pic(image_url)
    except BaseException as e:
        print(e)

部署服务器

  • 登录服务器
  • 上传程序
  • 创建文件夹temporary data
  • 输入命令 “nohup python3 程序名.py &”
  • 优雅地去睡觉,睡等壁纸装满网盘

成果展示

爬取的壁纸下载

链接(提取码: 7p8q)Python | 使用Python爬取Wallhaven网站壁纸并上传百度网盘_第1张图片

一晚上爬取了两千多个,还在持续爬取ing

你可能感兴趣的:(Python,python,爬虫,服务器)