python小项目---爬虫---批量爬取视频信息

需求:
批量(大量)下载所指定网站的视频

技术路线:os-requests-re库

以下是逻辑代码

import os
import requests
import re


def GetHTML(wanzhengurl, headers):  #得到目录页HTML文本
    r = requests.get(wanzhengurl,headers=headers)
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    return r.text

def GetVideoURL(url, wanzhengurl, headers): # 得到每个视频的URL链接
    # 第一页的Url爬取
    for page in range(1,51):
        wanzhengurl = url + '?page' + str(page)
        GetHTML(wanzhengurl, headers)
        GetVideoURL(libs, wanzhengurl, headers)
        quchonglibs = list(set(libs))
        print(quchonglibs)
        print(len(quchonglibs))
        houzhuis = re.findall(r'\/v\/v\/[\d]+', GetHTML(wanzhengurl, headers))
        for i in houzhuis:
            shipin = 'https://www.zuirebo.com'  # 所有视频链接统一的前缀
            shipin += i
            libs.append(shipin)
            # libs.append(shipin)
            # return libs
            # print(libs)

def DownloadVideo(libs): # 下载视频的代码
    try:
        for lib in libs:
            os.system("you-get -o D:/shipin "+lib)
        print("Successful")
    except:
        print("下载失败:{}".format(lib))

def main():  #翻页逻辑卸载main函数中
    url = 'https://www.zuirebo.com/v/t/%E6%9E%81%E5%AE%A2%E5%85%AC%E5%9B%AD.html'
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'}
    libs = []
    for page in range(1,3):
        print(page)
        wanzhengurl = url + '?page' + str(page)
        GetHTML(wanzhengurl, headers)
        GetVideoURL(libs, wanzhengurl, headers)
        quchonglibs = list(set(libs))
        print(quchonglibs)
        print(len(quchonglibs))

main()

以上部分代码能够得到一个列表,列表是当前网站内所有要爬取的视频的url地址,然后运用如下代码进行下载到本地文件夹。


通过Python的os库调出cmd且自动输入并使用you-get库(一个可以仅获得url链接就可下载链接内视频的三方库,通过cmd 输入pip install you-get下载然后在cmd中调用【you-get -o 存储路径 url链接】)

import os

# 视频量十分大,截取一部分
def DownloadVideo(): # 下载视频的代码
    libs = ['https://www.zuirebo.com/v/v/824602310', 'https://www.zuirebo.com/v/v/78645861', 'https://www.zuirebo.com/v/v/1158412181', 'https://www.zuirebo.com/v/v/317432139', 'https://www.zuirebo.com/v/v/588746200', 'https://www.zuirebo.com/v/v/218844300', 'https://www.zuirebo.com/v/v/737120805', 'https://www.zuirebo.com/v/v/341441759', 'https://www.zuirebo.com/v/v/82679694', 'https://www.zuirebo.com/v/v/934194926', 'https://www.zuirebo.com/v/v/1180510241'']
    count = 1
    try:
        for lib in libs:
            print('开始下载第{}个视频'.format(count))
            os.system("you-get -o D:/shipin "+lib)
            count += 1
            libs.remove(lib)
            print('当前没下载完的视频列表如下;{}'.format(libs))
        print('第{}个视频下载成功'.format(count))
    except:
        print("下载失败 第{}个:{}".format(count, lib))

def main():
    DownloadVideo()

main()

如下是运行结果
python小项目---爬虫---批量爬取视频信息_第1张图片如下图是文件夹内视频:
python小项目---爬虫---批量爬取视频信息_第2张图片

你可能感兴趣的:(python小项目---爬虫---批量爬取视频信息)