网易云爬虫代码更新

网易云爬虫代码更新

昨天准备去再刷最后一遍网易云,然后再走一遍网易云评论用来走jieba
但是网易云居然把video.media里面的scr值变化了,
网易云爬虫代码更新_第1张图片
链接变在上方div 里那么 原理和上一次的原理一样 只要拿data-flashvars里的数据就可以了的,
但是有一个小点就是

hurl=http%3A%2F%2Fvodkgeyttp9c.vod.126.net%2Fvodkgeyttp8%2FL5m9SZQf_2612577204_shd.mp4%3FwsSecret%3D83d84656160b8b970dad8232246b6dcf%26wsTime%3D1564313288%26ext%3DNnR5gMvHcZNcbCz592mDGTZBJVKicrNLK2xxVzG94eRjA3zlQBK8R%252FqLFCI9YJ2yG6Ss%252B24gRQWSBqxVFVGWTYRk1vEFx2QVMdGwu%252B5nutFufPu9gPOA8LL8ZOtjMn2utMKbD9Jqpo9EIKHXGpLxYxY3opQ5UjughMIbBdl9BGjSpizFd0IpE%252B6PdkyqTLDfFTq%252F73AYKvb13cEjVzJd5TK%252FUEQwWspFEXHj8VybugteGcVpgpVsdWqttl2BuKEu1Jv89n7psanwJE%252FHcMGMs0i8sM5DaMNV%252BEDjD9aY8x58myxOcw11cF6cJEKRiXcTWx2pVjwaF8Wi%252Bl%252B43rgkAj46gGpiwdXFMS1K3VFeb5hIIG75zHWweDRbQY36BQUy5Gd7BS0lj5hlCLdPkNQyUFXVzRpulblc%252ByWPBwXqIxN44bASNhTwEQ0zJP3Lb38r%252FuYuMBLEy%252B5AMTH7VIsr97mc1Fvpfr4bVobJjXvXbDWhVS3cvXyFQlwJkW6%252FST9q2IOTWI%252BuBFMUHY37uyF4QWqe3%252BK81zGK832bLL%252B4z34RoAlnL8iPDhT7Comdwn3x&murl=http%3A%2F%2Fvodkgeyttp9.vod.126.net%2Fvodkgeyttp8%2FL5m9SZQf_2612577204_shd.mp4%3FwsSecret%3Ddd74bbfc61511fd2564d981acd1b05c2%26wsTime%3D1564313264%26ext%3DNnR5gMvHcZNcbCz592mDGTZBJVKicrNLK2xxVzG94eRjA3zlQBK8R%252FqLFCI9YJ2yG6Ss%252B24gRQWSBqxVFVGWTYRk1vEFx2QVMdGwu%252B5nutFufPu9gPOA8LL8ZOtjMn2utMKbD9Jqpo9EIKHXGpLxYxY3opQ5UjughMIbBdl9BGjSpizFd0IpE%252B6PdkyqTLDfFTq%252F73AYKvb13cEjVzJd5TK%252FUEQwWspFEXHj8VybugteGcVpgpVsdWqttl2BuKEu1Jv89n7psanwJE%252FHcMGMs0i8sM5DaMNV%252BEDjD9aY8x58myxOcw11cF6cJEKRiXcTWx2pVjwaF8Wi%252Bl%252B43rgkAj46gGpiwdXFMS1K3VFeb5hIIG75zHWweDRbQY36BQUy5Gd7BS0lj5hlCLdPkNQyUFXVzRpulblc%252ByWPBwXqIxN44bASNhTwEQ0zJP3Lb38r%252FuYuMBLEy%252B5AMTH7VIsr97mc1Fvpfr4bVobJjXvXbDWhVS3cvXyFQlwJkW6%252FST9q2IOTWI%252BuBFMUHY37uyF4QWqe3%252BK81zGK832bLL%252B4z352RB%252FJd1AJz9kH4RS%252BrBSs&autoPlay=true&trackName=没有字幕怎么办&artistName=绝世的陈逗逗_关注动态视频作品&resourceId=6CB2F4D1D8DFFF49DC3535C38FC8F899&coverImg=http://p1.music.126.net/0STIbDi7p5Boe-26Ns-_nA==/109951164244854623.jpg&restrict=false&width=640&height=400

这里取出的数据包括里面的hurl , 并非是我们平常所看见的html正常格式或者说是没有解析过的样子
还有后面的数据其实我们也可以用到的,都可以取出来用
不都说,上代码,当然萌新写的还是有许多的不足,也要请大佬多多指教。。。

from selenium import webdriver
import urllib.parse,time,requests

browser = webdriver.Firefox()

url_list=input(str("请输入你想爬取是动态主页:"))
print("正在请求视频播放页面请稍候...")
print("视频页面:"+url_list)
browser.get(url_list)
time.sleep(2)
browser.switch_to_frame("contentFrame")
#取得视频下载地址
url_data=browser.find_elements_by_css_selector('div.mv')
download=url_data[0].get_attribute('data-flashvars')
download_url=urllib.parse.parse_qs(download)['hurl'][0]
print(download_url)
trackName=urllib.parse.parse_qs(download)['trackName'][0]#取得视频名称
artistName=urllib.parse.parse_qs(download)['artistName'][0]#取得音乐人名称
names = trackName+"--"+artistName
print(names)
print("请求成功正在下载视频请稍候...")
time.sleep(2)
download = requests.get(download_url).content
with open(names+".mp4","wb")as file:
    file.write(download)
print("下载完成》》》》》")
browser.quit()


你可能感兴趣的:(Python爬虫,selenium,网易云视频)