版权声明:本文为博主原创文章,未经博主允许不得转载。
转载请注明作者和出处:https://blog.csdn.net/weixin_38798634/article/details/80028899
运行平台: Windows
Python版本: Python3.x
博主最近在熟悉python爬虫,看了另几篇关于vip视频解析的博客由于时间原因均已失效。这两天写代码尝试了下,碰巧成功拿出来与大家分享下,python新手,代码粗糙轻喷^_^。
本人也是通过视频解析网站进行抓取: VIP视频解析: http://www.vipjiexi.com/tong.php?url=[播放地址或视频id],尝试了IQY最新的vip电影,已成功。
首先仍然是抓包环节:chrom等浏览器借助network已经够用,习惯fiddler的亦可。
本文以network为例,尽量手动控制network,视频一开始播放即可停止抓包。如下:
首先找到post请求,及formdata,这个解析网站改版后需要提交key及time,成功后返回success 1 及一个url:(这一步不懂的看下抓包哈,这个图没截完,可看下图)
接下来跳过加载播放器等get请求,找到和上文url相关的GET请求 即index.m3u8,它的request请求刚好就是之前那个链接http://vs1.baduziyuan.com/20180419/H4cMgqAU/index.m3u8,而这个get请求会返回一个文档
显然仅有的一行内容是需要拼接的url,仔细观察其实拼接后的url又和下面的get请求有关:
找到这个get请求基本就已成功,看下它返回的文件
乖乖,清一色的下载链接,和抓包里的每个4s的小段效果一样。那么现在已经可以下片了。代码:
# -*- coding:UTF-8 -*-
from urllib import request
from urllib import parse
from http import cookiejar
import re
import os
# VIP视频解析: http://www.vipjiexi.com/tong.php?url=[播放地址或视频id]
# 无名小站:http://www.wmxz.wang/video.php?url=[播放地址或视频id]
def zhunbei(string):
url1 = string
filename = 'E:/vipjx.txt'
head = {}
head['Accept'] = 'application/json, text/javascript, */*; q=0.01'
head['Accept-Language'] = 'zh-CN,zh;q=0.8'
head['Connection'] = 'keep-alive'
head['Content-Length'] = '138'
head['Content-Type'] = 'application/x-www-form-urlencoded; charset=UTF-8'
head[
'Cookie'] = 'BAEID=8DA95EE2B61A87AA16FEF407EF61D37E; UM_distinctid=162e1d4fc390-057ee1b84-4349052c-13c680-162e1d4fc3a30; CNZZDATA1264591021=237830061-1524204208-https%253A%252F%252Fblog.csdn.net%252F%7C1524209609'
head['Host'] = 'www.a305.org'
head['Origin'] = 'http://www.a305.org'
head['Referer'] = 'http://www.a305.org/x1/tong.php?url=http://vs1.baduziyuan.com/20180419/H4cMgqAU/index.m3u8'
head[
'User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36'
head['X-Requested-With'] = 'XMLHttpRequest'
formdata = {}
formdata['referer'] = ''
formdata['time'] = '1524212472'
formdata['key'] = '76e0d26641a24cea7330216dbcedc1b7'
formdata['url'] = 'http://vs1.baduziyuan.com/20180419/H4cMgqAU/index.m3u8'
formdata['type'] = ''
data = parse.urlencode(formdata).encode('utf-8')
cookie = cookiejar.MozillaCookieJar(filename)
cookie_support = request.HTTPCookieProcessor(cookie)
opener = request.build_opener(cookie_support)
req = request.Request(url1, data, headers=head)
resp = opener.open(req)
sta1 = resp.read().decode('utf-8')
s = sta1.split('url":', 2)
s1 = s[1].split('"', 2)
s2 = re.sub('%3A%2F%2F', '://', s1[1])
s3 = re.sub('%2F', '/', s2)
print(s3) # 第一次response拿出的url
cookie.save(ignore_discard=True, ignore_expires=True)
url2 = s3
request.urlretrieve(url2, 'E:/index.txt')
file = open('E:/index.txt', 'r')
lines = file.readlines()
url3 = 'http://vs1.baduziyuan.com' + lines[-1] # 第二次拿出的url
print(url3)
path = 'E:/DownList.txt'
request.urlretrieve(url3, path)
return path
def geturllist(_path):
file = open(_path, 'r')
lines = file.readlines()
movies_url = []
for line in lines:
if '.ts' in line:
s = line.split('#', 2)
ss = 'http://vs1.baduziyuan.com' + s[0]
movies_url.append(ss)
else:
continue
file.close()
return movies_url
def download_movie(movie_url, _path):
i = 1
while (i < movie_url.__len__()):
for url in movie_url:
newpath = _path + '/movie' + str(i) + '.ts'
print(newpath)
print(url)
print('>>> downloading...' + '第' + str(i) + '个视频')
request.urlretrieve(url, newpath)
i += 1
if __name__ == '__main__':
url = 'http://www.a305.org/x1/api.php'
path = zhunbei(url)
movie_url = geturllist(path)
download_movie(movie_url, 'E:/vipmovie')
说明:路径我写的绝对路径,国人看着改嘛,代码写的有些粗糙,感兴趣的可以写一个checkpath。另外有一个小问题'%2f'转换'/'我等于是用的replace,可以通过编译码这种直接转化出来不。抓取的全是几秒的ts视频,1000多个,感兴趣的可以通过dos下 copy命令或者一些软件拼成完整的ts视频就可以看片了,美滋滋。
新手上路,最近开始熟悉pyhton爬虫,欢迎交流哈