西瓜视频(头条)解析下载

随便打开一个西瓜视频地址,比如:https://www.ixigua.com/6903716672067076612
查看源代码

image.png

可以看到所有信息参数 基本都包括在里面了

    url='https://www.ixigua.com/6903716672067076612'
    response = requests.get(url, verify=False, headers=headers).text
    pattern = re.compile('(?<=window._SSR_HYDRATED_DATA=).*?(?=)')
    jsonResult = pattern.findall(response)[0]

这里直接找到_SSR_HYDRATED_DATA参数 正则匹配出来script标签中的内容
结果是一段json数据 不过有一些小问题


image.png

就是部分value值是undefined
所以替换一下 直接给他加个双引号

    jsonResult = jsonResult.replace(':undefined', ':"undefined"')
image.png

我们需要的信息就在这里面了

    infor=jsonData['anyVideo']['gidInformation']['packerData']['video']
    dash=infor['videoResource']['dash']
    if 'dynamic_video' in dash.keys():
        audioUrl=dash['dynamic_video']['dynamic_audio_list'][0]['main_url']
        videoUrl=dash['dynamic_video']['dynamic_video_list'][0]['main_url']
    else:
        print('未获取到源地址')

这里我们直接获取到音频 视频的源地址
不过main_url 还是加密的


image.png
    audio_url = base64.b64decode(audioUrl).decode("utf-8")
    video_url = base64.b64decode(videoUrl).decode("utf-8")

再用base解密一下 就获取到了音频 视频的源地址


image.png

完整代码:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2021/2/23 12:18
# @Author : pp
# @Software: PyCharm


import requests
import urllib3
urllib3.disable_warnings()
import re
import json
import base64


cookie='你的cookie'
headers={
    "user-agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
    "cookie":cookie
}

def getRealUrl(url):
    response = requests.get(url, verify=False, headers=headers).text
    pattern = re.compile('(?<=window._SSR_HYDRATED_DATA=).*?(?=)')
    jsonResult = pattern.findall(response)[0]
    print(jsonResult)
    jsonResult = jsonResult.replace(':undefined', ':"undefined"')
    jsonData = json.loads(jsonResult)
    print(jsonResult)
    infor=jsonData['anyVideo']['gidInformation']['packerData']['video']
    dash=infor['videoResource']['dash']
    if 'dynamic_video' in dash.keys():
        audioUrl=dash['dynamic_video']['dynamic_audio_list'][0]['main_url']
        videoUrl=dash['dynamic_video']['dynamic_video_list'][0]['main_url']
    else:
        print('未获取到源地址')
    audio_url = base64.b64decode(audioUrl).decode("utf-8")
    video_url = base64.b64decode(videoUrl).decode("utf-8")

    return audio_url,video_url

baseUrl='https://www.ixigua.com/6903716672067076612'
audio_url,video_url=getRealUrl(baseUrl)
print(audio_url)
print(video_url)

你可能感兴趣的:(西瓜视频(头条)解析下载)