python 爬取视频(利用神器:you-get)

什么是you-get

网上说法:
下载流行网站之音视频,例如YouTube, Youku, Niconico,以及更多。
于您心仪的媒体播放器中观看在线视频,脱离浏览器与广告
下载您喜欢网页上的图片
下载任何非HTML内容,例如二进制文件

个人认为:
学挖掘机技术哪家强,中国山东找蓝翔!
爬取视频哪家强,python大法找yg!

简单来说,you-get是一个指令式的下载视频工具。

安装

安装的方法比较多,可百度,下方只列举一种方法。

pip3 install you-get
pip3 install --update you-get

使用

you-get: version 0.4.1456, a tiny downloader that scrapes the web.
usage: you-get [OPTION]... URL...

A tiny downloader that scrapes the web

optional arguments:
  -V, --version         Print version and exit  获取版本
  -h, --help            Print this help message and exit  获取使用帮助

Dry-run options:
  (no actual downloading)

  -i, --info            Print extracted information	获取视频信息
  -u, --url             Print extracted information with URLs	获取视频的URL
  --json                Print extracted URLs in JSON format	获取视频的Json数据

Download options:
  -n, --no-merge        Do not merge video parts	不合并视频
  --no-caption          Do not download captions (subtitles, lyrics, danmaku,
                        ...)	不下载弹幕、字幕等
  -f, --force           Force overwriting existing files	重写视频文件
  --skip-existing-file-size-check
                        Skip existing file without checking file size	不检查存在视频文件的大小
  -F STREAM_ID, --format STREAM_ID
                        Set video format to STREAM_ID	设置视频格式为STREAM_ID
  -O FILE, --output-filename FILE
                        Set output filename	设置下载视频的名称
  -o DIR, --output-dir DIR
                        Set output directory	设置下载视频的文件夹
  -p PLAYER, --player PLAYER
                        Stream extracted URL to a PLAYER	用播放器播放视频
  -c COOKIES_FILE, --cookies COOKIES_FILE	
                        Load cookies.txt or cookies.sqlite	使用cookies文件加载cookies
  -t SECONDS, --timeout SECONDS
                        Set socket timeout	设置交互超时时间
  -d, --debug           Show traceback and other debug info	调试
  -I FILE, --input-file FILE
                        Read non-playlist URLs from FILE	从文件中读取非播放列表的url
  -P PASSWORD, --password PASSWORD
                        Set video visit password to PASSWORD	使用密码
  -l, --playlist        Prefer to download a playlist	下载多个视频(电视剧n集)
  -a, --auto-rename     Auto rename same name different files	自动命名
  -k, --insecure        ignore ssl errors	忽略ssl 错误

Proxy options:
  -x HOST:PORT, --http-proxy HOST:PORT
                        Use an HTTP proxy for downloading 使用http代理下载
  -y HOST:PORT, --extractor-proxy HOST:PORT	
                        Use an HTTP proxy for extracting only	使用http代理只提取
  --no-proxy            Never use a proxy	不使用代理
  -s HOST:PORT, --socks-proxy HOST:PORT
                        Use an SOCKS5 proxy for downloading	使用SOCKS5代理下载

简单Demo(没有涉及代理和cookies,可以自行更新)

import ssl

ssl._create_default_https_context = ssl._create_unverified_context
import you_get
from you_get.extractors import *  # 可以获取到各个网站的下载器

'''
optional arguments:
  -V, --version         Print version and exit
  -h, --help            Print this help message and exit

Dry-run options:
  (no actual downloading)

  -i, --info            Print extracted information
  -u, --url             Print extracted information with URLs
  --json                Print extracted URLs in JSON format

Download options:
  -n, --no-merge        Do not merge video parts
  --no-caption          Do not download captions (subtitles, lyrics, danmaku,
                        ...)
  -f, --force           Force overwriting existing files
  --skip-existing-file-size-check
                        Skip existing file without checking file size
  -F STREAM_ID, --format STREAM_ID
                        Set video format to STREAM_ID
  -O FILE, --output-filename FILE
                        Set output filename
  -o DIR, --output-dir DIR
                        Set output directory
  -p PLAYER, --player PLAYER
                        Stream extracted URL to a PLAYER
  -c COOKIES_FILE, --cookies COOKIES_FILE
                        Load cookies.txt or cookies.sqlite
  -t SECONDS, --timeout SECONDS
                        Set socket timeout
  -d, --debug           Show traceback and other debug info
  -I FILE, --input-file FILE
                        Read non-playlist URLs from FILE
  -P PASSWORD, --password PASSWORD
                        Set video visit password to PASSWORD
  -l, --playlist        Prefer to download a playlist
  -a, --auto-rename     Auto rename same name different files
  -k, --insecure        ignore ssl errors

Proxy options:
  -x HOST:PORT, --http-proxy HOST:PORT
                        Use an HTTP proxy for downloading
  -y HOST:PORT, --extractor-proxy HOST:PORT
                        Use an HTTP proxy for extracting only
  --no-proxy            Never use a proxy
  -s HOST:PORT, --socks-proxy HOST:PORT
                        Use an SOCKS5 proxy for downloading
'''
if __name__ == '__main__':
    print("\033[37;41m 输入0退出 \033[0m")
    print('1、查看版本')
    print('2、查看使用手册')
    print('3、获取网页的视频信息')
    print('4、下载视频')
    print('5、获取视频的URL')
    print('6、获取视频Json格式的信息')
    print('7、下载多个视频(类似电视剧)')
    userChoose = input('输入你的选择(数字):')
    if not re.fullmatch('[0-7]', userChoose):
        print("\033[37;41m 请按照规矩输入\033[0m")
        exit(1)
    if userChoose.__eq__('0'):
        print("\033[37;41m Bye~ \033[0m")
        exit(1)
    if userChoose.__eq__('1'):
        sys.argv = ['you_get', '-V']
    if userChoose.__eq__('2'):
        sys.argv = ['you_get', '-h']
    if userChoose.__eq__('3'):
        URL = input('输入URL:')
        URL = URL.strip()
        sys.argv = ['you_get', '-i', URL]
    if userChoose.__eq__('4'):
        URL = input('输入URL:')
        path = input('输入视频存储地址:')
        URL = URL.strip()
        sys.argv = ['you_get', '-o', path, URL]
    if userChoose.__eq__('5'):
        URL = input('输入URL:')
        URL = URL.strip()
        sys.argv = ['you_get', '-u', URL]
    if userChoose.__eq__('6'):
        URL = input('输入URL:')
        URL = URL.strip()
        sys.argv = ['you_get', '--json', URL]
    if userChoose.__eq__('7'):
        URL = input('输入URL:')
        path = input('输入视频存储地址:')
        URL = URL.strip()
        sys.argv = ['you_get', '-o', path, '-l', URL]
    you_get.main()
    print("\033[37;41m Done!\033[0m")

你可能感兴趣的:(python)