手把手带你用Python爬取反爬策略的网站

一天不爬,手就痒痒

  • 什么是网络爬虫
  • 需要准备的软件环境
  • 爬虫操作步骤
    • 观察网页
    • 调试网页
    • 寻找Cookie来源
    • 获取X-Client-Data参数
    • 获取From-Data中的参数
    • 使用execjs获取参数
    • 完整代码
  • 结束语

什么是网络爬虫

顾名思义,网络爬虫就是指在网络上按照一定规律的、自动的抓取网络中的程序或者脚本。在这里,并没有用到数据分析过滤等操作,所以只是普通的通用爬虫。

需要准备的软件环境

  • 谷歌浏览器(或其他带有开发者选项的浏览器)
  • Python3(尽量高版本)
  • 标准库或第三方库:(或其他相同功能库)
    • html
    • urllib
    • ctypes
    • random
    • requests
    • 其中html、urllib、ctypes、random在有JavaScript执行库 execjs 时不需要使用(安装方式pip install PyExecJS
  • PyCharm(或其他Python编辑器,有实力的记事本也不是不可以)
  • Fiddler(或其他抓包工具,用于调试访问)

爬虫操作步骤

观察网页

打开浏览器访问今天的倒霉蛋 https://bilibili.iiilab.com ,看到页面是用来获取B站视频地址的,页面非常整洁,一个输入框一个按钮,小孩都会操作。我一开始以为这应该就是简单的爬虫了。

按下F12或者Ctrl + Alt + I打开浏览器开发者工具。在输入框中输入示例B站网页 https://www.bilibili.com/video/BV1Xt41157R4/?spm_id_from=autoNext ,点击解析视频,页面不刷新说明是通过XHR改变页面内容的,观察右边打开的请求头(如图)
手把手带你用Python爬取反爬策略的网站_第1张图片
,发现方框的元素可能是必须要的,现在应该就有几个疑问了:

  • 不登录哪来的Cookie,没有的话对请求有没有影响
  • cookie中需要获得的值不少,哪些是必须获得的
  • X-Client-Data看上去是个随机的值,如果必须要的话如何生成
  • From-Data中link肯定是输入框中的值无疑,还有两个看上去很随机的参数
    这些看完感觉变量可能随机的地方有四处:
  • cookie
  • X-Client-Data
  • r(From-Data中)
  • s(From-Data中)
    验证这些参数是不是随机的方法很简单,多请求几次就行了,那么我们…
    等下!就在我再次点击按钮请求的时候,突然弹出来了这么一行:在这里插入图片描述
    让我措手不及啊,看来有的参数时效性很短,过一两分钟就会无效。
    好现在我刷新赶紧请求几次,挨个查看参数,通过这几次访问后可以明显发现,刷新后Cookie变了,不刷新只点按钮时其他三个参数都在变,而且变的没有规律,这就可以排除跟请求时间有关了,应该就是随机数。通过这些分析已经确定了Cookie应该是在之前设置的,其他三个变量应该是跟内容和随机数有关。

调试网页

打开Fiddler(我这里用的是Fiddler Everywhere,适用一个月)再次点击解析视频请求一次,得到这些网页访问数据
手把手带你用Python爬取反爬策略的网站_第2张图片
一眼就瞅到了刚才看的/bilibili,点开看到是所想要的接口没错:
手把手带你用Python爬取反爬策略的网站_第3张图片
右击网址进入参数编辑页面:
在这里插入图片描述
现在开始逐步减少参数,把没必要的随机数去掉,先去掉大致判断无用的:
手把手带你用Python爬取反爬策略的网站_第4张图片
很好,正常请求不错:
手把手带你用Python爬取反爬策略的网站_第5张图片
接下来逐一尝试,最后发现这些参数都需要,除了请求后显示的Content-Length,有r有s有X-Client-Data更有Cookie中的7个参数,现在来调试Cookie,逐个减少,最后发现除了前4个参数一定,后面可以不需要:
手把手带你用Python爬取反爬策略的网站_第6张图片
虽然还有不少,但比之前7个强,现在浏览器开发者工具选择ALL,网上找Headers中的Se-Cookie参数,毕竟Cookie不会无故出现的。

寻找Cookie来源

找了半天找到两个网页提供了Set-Cookie参数:
手把手带你用Python爬取反爬策略的网站_第7张图片
手把手带你用Python爬取反爬策略的网站_第8张图片

加上一个应该不会变的zzz0821=1,应该就能组成完整的Cookie了。然后通过python代码来看一下Cookie全不全:

import requests

headers = {
    "Origin": "https://bilibili.iiilab.com",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36"
}

session = requests.session()
session.headers = headers
session.get('https://bilibili.iiilab.com/')
session.post('https://service0.iiilab.com/sponsor/getByPage', data=dict(page='bilibili'))
print(session.cookies)

返回结果:

, , ]>

然后加上zzz0821=1(注意,requests.session里的Cookie是requests.cookies.RequestsCookieJar类,是name/value类,不是普通的字符串,所以不能当成字符串加进去)
这里我们用requests.sessions.merge_cookies()方法来管理添加Cookie:

import requests

headers = {
    "Origin": "https://bilibili.iiilab.com",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36"
}

session = requests.session()
session.headers = headers
session.get('https://bilibili.iiilab.com/')
session.post('https://service0.iiilab.com/sponsor/getByPage', data=dict(page='bilibili'))
session.cookies = requests.sessions.merge_cookies(session.cookies, dict(zzz0821='1'))
print(session.cookies)

返回结果:

, , , ]>

现在4个Cookie都全了,接下来来考虑其他参数。

获取X-Client-Data参数

点击刚才的/bilibili网址,在右侧点击Initiator,查看发送请求的对象:
手把手带你用Python爬取反爬策略的网站_第9张图片
jquery只是JavaScript扩展,并不是请求主体,所以跳过它,从上往下找到第一个不是jquery的vue-resource…,看它左边的名字就知道跟网站的功能有关,parseVideo差不多就是从网页中提取视频的意思。
点击后自动跳转至Source界面,正常显示一行代码,当然这肯定不是给人看的,我们点击蓝色按钮或左下角的一对花括号(如果不显示蓝色按钮提示就手动点击代码格式化):
手把手带你用Python爬取反爬策略的网站_第10张图片
手把手带你用Python爬取反爬策略的网站_第11张图片
此处正好一个ajax请求,我们选中 u(t, site) 并右击添加到watches,方便观察变量值的变化
手把手带你用Python爬取反爬策略的网站_第12张图片
并在该行左侧456行数字处点一下作为断点,然后再次点击按钮:
手把手带你用Python爬取反爬策略的网站_第13张图片
现在在右下角watches里能看到这是一串值,点击上面蓝色向右小箭头结束调试,再返回Network选项卡里查看最新的一次/bilibili访问:
手把手带你用Python爬取反爬策略的网站_第14张图片
没错,发现这竟然正是X-Client-Data的值,现在选中u并右击添加watches,再次点击按钮执行,发现这是一个函数:右击可以定位到函数位置
手把手带你用Python爬取反爬策略的网站_第15张图片
现在开始对里面每一个变量或函数添加观察并定位函数,直到找出所有相关函数。
由于里面有不少相关函数,就不在此全篇给出寻找函数的过程了(PS:找个文本文档,后缀名改成html,文件里写上一对标签 并在其中粘贴相关函数,其中md5函数是由e(t, e, n)函数命名而成的,所以可以把md5(…)函数写成e(…)),下面我贴上这段JavaScript代码(由于有两个u函数,所以我将主函数名改成了uu):

function d(t, e) {
        var n = (65535 & t) + (65535 & e);
        return (t >> 16) + (e >> 16) + (n >> 16) << 16 | 65535 & n
    }
    function s(t, e, n, r, i, o) {
		console.log(function a(t, e) {
            return t << e | t >>> 32 - e
        }(d(d(e, t), d(r, o)), i))
        return d(function a(t, e) {
            return t << e | t >>> 32 - e
        }(d(d(e, t), d(r, o)), i), n)
    }
    function h(t, e, n, r, i, o, a) {
        return s(e & n | ~e & r, t, e, i, o, a)
    }
    function f(t, e, n, r, i, o, a) {
        return s(e & r | n & ~r, t, e, i, o, a)
    }
    function g(t, e, n, r, i, o, a) {
        return s(e ^ n ^ r, t, e, i, o, a)
    }
	function p(t, e, n, r, i, o, a) {
		return s(n ^ (e | ~r), t, e, i, o, a)
	}
    function n(t) {
        return unescape(encodeURIComponent(t))
    }
    function c(t) {
        var e, n = "", r = 32 * t.length;
        for (e = 0; e < r; e += 8)
            n += String.fromCharCode(t[e >> 5] >>> e % 32 & 255);
        return n
    }
    function l(t) {
        var e, n = [];
        for (n[(t.length >> 2) - 1] = void 0,
        e = 0; e < n.length; e += 1)
            n[e] = 0;
        var r = 8 * t.length;
        for (e = 0; e < r; e += 8)
            n[e >> 5] |= (255 & t.charCodeAt(e / 8)) << e % 32;
        return n
    }
	function u(t, e) {
		t[e >> 5] |= 128 << e % 32,
		t[14 + (e + 64 >>> 9 << 4)] = e;
		var n, r, i, o, a, s = 1732584193, u = -271733879, c = -1732584194, l = 271733878;
		// console.log(h(s, u, c, l, t[0], 7, -680876936))
		for (n = 0; n < t.length; n += 16)
			u = p(u = p(u = p(u = p(u = g(u = g(u = g(u = g(u = f(u = f(u = f(u = f(u = h(u = h(u = h(u = h(i = u, c = h(o = c, l = h(a = l, s = h(r = s, u, c, l, t[n], 7, -680876936), u, c, t[n + 1], 12, -389564586), s, u, t[n + 2], 17, 606105819), l, s, t[n + 3], 22, -1044525330), c = h(c, l = h(l, s = h(s, u, c, l, t[n + 4], 7, -176418897), u, c, t[n + 5], 12, 1200080426), s, u, t[n + 6], 17, -1473231341), l, s, t[n + 7], 22, -45705983), c = h(c, l = h(l, s = h(s, u, c, l, t[n + 8], 7, 1770035416), u, c, t[n + 9], 12, -1958414417), s, u, t[n + 10], 17, -42063), l, s, t[n + 11], 22, -1990404162), c = h(c, l = h(l, s = h(s, u, c, l, t[n + 12], 7, 1804603682), u, c, t[n + 13], 12, -40341101), s, u, t[n + 14], 17, -1502002290), l, s, t[n + 15], 22, 1236535329), c = f(c, l = f(l, s = f(s, u, c, l, t[n + 1], 5, -165796510), u, c, t[n + 6], 9, -1069501632), s, u, t[n + 11], 14, 643717713), l, s, t[n], 20, -373897302), c = f(c, l = f(l, s = f(s, u, c, l, t[n + 5], 5, -701558691), u, c, t[n + 10], 9, 38016083), s, u, t[n + 15], 14, -660478335), l, s, t[n + 4], 20, -405537848), c = f(c, l = f(l, s = f(s, u, c, l, t[n + 9], 5, 568446438), u, c, t[n + 14], 9, -1019803690), s, u, t[n + 3], 14, -187363961), l, s, t[n + 8], 20, 1163531501), c = f(c, l = f(l, s = f(s, u, c, l, t[n + 13], 5, -1444681467), u, c, t[n + 2], 9, -51403784), s, u, t[n + 7], 14, 1735328473), l, s, t[n + 12], 20, -1926607734), c = g(c, l = g(l, s = g(s, u, c, l, t[n + 5], 4, -378558), u, c, t[n + 8], 11, -2022574463), s, u, t[n + 11], 16, 1839030562), l, s, t[n + 14], 23, -35309556), c = g(c, l = g(l, s = g(s, u, c, l, t[n + 1], 4, -1530992060), u, c, t[n + 4], 11, 1272893353), s, u, t[n + 7], 16, -155497632), l, s, t[n + 10], 23, -1094730640), c = g(c, l = g(l, s = g(s, u, c, l, t[n + 13], 4, 681279174), u, c, t[n], 11, -358537222), s, u, t[n + 3], 16, -722521979), l, s, t[n + 6], 23, 76029189), c = g(c, l = g(l, s = g(s, u, c, l, t[n + 9], 4, -640364487), u, c, t[n + 12], 11, -421815835), s, u, t[n + 15], 16, 530742520), l, s, t[n + 2], 23, -995338651), c = p(c, l = p(l, s = p(s, u, c, l, t[n], 6, -198630844), u, c, t[n + 7], 10, 1126891415), s, u, t[n + 14], 15, -1416354905), l, s, t[n + 5], 21, -57434055), c = p(c, l = p(l, s = p(s, u, c, l, t[n + 12], 6, 1700485571), u, c, t[n + 3], 10, -1894986606), s, u, t[n + 10], 15, -1051523), l, s, t[n + 1], 21, -2054922799), c = p(c, l = p(l, s = p(s, u, c, l, t[n + 8], 6, 1873313359), u, c, t[n + 15], 10, -30611744), s, u, t[n + 6], 15, -1560198380), l, s, t[n + 13], 21, 1309151649), c = p(c, l = p(l, s = p(s, u, c, l, t[n + 4], 6, -145523070), u, c, t[n + 11], 10, -1120210379), s, u, t[n + 2], 15, 718787259), l, s, t[n + 9], 21, -343485551),
			s = d(s, r),
			u = d(u, i),
			c = d(c, o),
			l = d(l, a);
		return [s, u, c, l]
	}
	function a(t) {
		return function e(t) {
			return c(u(l(t), 8 * t.length))
		}(n(t))
	}
	function o(t) {
		var e, n, r = "";
		for (n = 0; n < t.length; n += 1)
			e = t.charCodeAt(n),
			r += "0123456789abcdef".charAt(e >>> 2 & 15) + "0123456789abcdef".charAt(15 & e);
		return r
	}
	function m(t, e) {
		return function s(t, e) {
			var n, r, i = l(t), o = [], a = [];
			for (o[15] = a[15] = void 0,
			16 < i.length && (i = u(i, 8 * t.length)),
			n = 0; n < 16; n += 1)
				o[n] = 909522486 ^ i[n],
				a[n] = 1549556828 ^ i[n];
			return r = u(o.concat(l(e)), 512 + 8 * e.length),
			c(u(a.concat(r), 640))
		}(n(t), n(e))
	}
	function md5(t, e, n) {
		return e ? n ? m(e, t) : function r(t, e) {
			return o(m(t, e))
		}(e, t) : n ? a(t) : function i(t) {
			return o(a(t))
		}(t)
	}
	function uu(t, e) {
		if (!0 === window.navigator.webdriver || window.document.documentElement.getAttribute("webdriver") || window
			.callPhantom || window._phantom)
			return md5(o + t + o);
		var n = e.charAt(t.charCodeAt(0) % e.length),
			r = e.charAt(t.charCodeAt(t.length - 1) % e.length);
		return md5(n + t + r)
	}

代码挺长的,看着都眼花,其中 u(t, e) for循环中的第一行特别长,一行将近2700字符,那行经过我自己的格式化后长这样:

u = p(
	u = p(
		u = p(
			u = p(
				u = g(
					u = g(
						u = g(
							u = g(
								u = f(
									u = f(
										u = f(
											u = f(
												u = h(
													u = h(
														u = h(
															u = h(
																i = u, c = h(
																	o = c, l = h(
																		a = l, s = h(
																		r = s, u, c, l, t[n], 7, -680876936), u, c, t[n + 1], 12, -389564586)
																	, s, u, t[n + 2], 17, 606105819)
																, l, s, t[n + 3], 22, -1044525330)
															, c = h(
																c, l = h(
																l, s = h(s, u, c, l, t[n + 4], 7, -176418897), u, c, t[n + 5], 12, 1200080426)
															, s, u, t[n + 6], 17, -1473231341), l, s, t[n + 7], 22, -45705983)
														, c = h(
															c, l = h(
															l, s = h(s, u, c, l, t[n + 8], 7, 1770035416), u, c, t[n + 9], 12, -1958414417)
														, s, u, t[n + 10], 17, -42063)
														, l, s, t[n + 11], 22, -1990404162)
													, c = h(
														c, l = h(l, s = h(s, u, c, l, t[n + 12], 7, 1804603682), u, c, t[n + 13], 12, -40341101)
														, s, u, t[n + 14], 17, -1502002290)
													, l, s, t[n + 15], 22, 1236535329)
												, c = f(
													c, l = f(
													l, s = f(s, u, c, l, t[n + 1], 5, -165796510), u, c, t[n + 6], 9, -1069501632)
												, s, u, t[n + 11], 14, 643717713)
												, l, s, t[n], 20, -373897302)
											, c = f(
												c, l = f(
												l, s = f(s, u, c, l, t[n + 5], 5, -701558691), u, c, t[n + 10], 9, 38016083)
											, s, u, t[n + 15], 14, -660478335)
											, l, s, t[n + 4], 20, -405537848)
										, c = f(
											c, l = f(
											l, s = f(s, u, c, l, t[n + 9], 5, 568446438), u, c, t[n + 14], 9, -1019803690)
										, s, u, t[n + 3], 14, -187363961)
										, l, s, t[n + 8], 20, 1163531501)
									, c = f(
										c, l = f(
										l, s = f(s, u, c, l, t[n + 13], 5, -1444681467), u, c, t[n + 2], 9, -51403784)
									, s, u, t[n + 7], 14, 1735328473)
									, l, s, t[n + 12], 20, -1926607734)
								, c = g(
									c, l = g(
									l, s = g(s, u, c, l, t[n + 5], 4, -378558), u, c, t[n + 8], 11, -2022574463)
								, s, u, t[n + 11], 16, 1839030562)
								, l, s, t[n + 14], 23, -35309556)
							, c = g(
								c, l = g(
								l, s = g(s, u, c, l, t[n + 1], 4, -1530992060), u, c, t[n + 4], 11, 1272893353)
							, s, u, t[n + 7], 16, -155497632)
							, l, s, t[n + 10], 23, -1094730640)
						, c = g(
							c, l = g(
							l, s = g(s, u, c, l, t[n + 13], 4, 681279174), u, c, t[n], 11, -358537222)
						, s, u, t[n + 3], 16, -722521979)
						, l, s, t[n + 6], 23, 76029189)
					, c = g(
						c, l = g(
						l, s = g(s, u, c, l, t[n + 9], 4, -640364487), u, c, t[n + 12], 11, -421815835)
					, s, u, t[n + 15], 16, 530742520)
					, l, s, t[n + 2], 23, -995338651)
				, c = p(
					c, l = p(
					l, s = p(s, u, c, l, t[n], 6, -198630844), u, c, t[n + 7], 10, 1126891415)
				, s, u, t[n + 14], 15, -1416354905)
				, l, s, t[n + 5], 21, -57434055)
			, c = p(
				c, l = p(
				l, s = p(s, u, c, l, t[n + 12], 6, 1700485571), u, c, t[n + 3], 10, -1894986606)
			, s, u, t[n + 10], 15, -1051523)
			, l, s, t[n + 1], 21, -2054922799)
		, c = p(
			c, l = p(
			l, s = p(s, u, c, l, t[n + 8], 6, 1873313359), u, c, t[n + 15], 10, -30611744)
		, s, u, t[n + 6], 15, -1560198380)
		, l, s, t[n + 13], 21, 1309151649)
	, c = p(
		c, l = p(
		l, s = p(s, u, c, l, t[n + 4], 6, -145523070), u, c, t[n + 11], 10, -1120210379)
	, s, u, t[n + 2], 15, 718787259)
	, l, s, t[n + 9], 21, -343485551),

这么一看还真不错,看的我果断关掉了这个界面,连仔细看的勇气都没了…不过我还是看了,用了好几十分钟,看到我眼睛贴到屏幕上,终于把这整段代码翻译成了python函数,虽然他可能不太好看,我也没有对它优化的想法,纯粹是为了不想多用一个execjs库,说到这里,JavaScript有个32位无符号右位移运算的操作 >>>,这是python没有的。我为了解决脑细胞,从网上找了一个方法:

gt3 = lambda _, __: (_ % (1 << 32)) >> __

这就完成了 >>> 运算,看着挺不可思议的,html中 > 是gt,又有三个 > ,所以我用gt3来命名这个函数。
翻译后的python函数如下:

def x_client_data(t, e='bilibili'):
    def d(t, e):
        n_ = (65535 & t) + (65535 & e)
        return (t >> 16) + (e >> 16) + (n_ >> 16) << 16 | 65535 & n_

    def s(t, e, n_, r, i, o):
        t = d(d(e, t), d(r, o))
        e = i
        return d(ctypes.c_int32(t << i).value | gt3(t, 32 - e), n_)

    def h(t, e, n_, r, i, o, a):
        return s(e & n_ | ~e & r, t, e, i, o, a)

    def f(t, e, n_, r, i, o, a):
        return s(e & r | n_ & ~r, t, e, i, o, a)

    def g(t, e, n_, r, i, o, a):
        return s(e ^ n_ ^ r, t, e, i, o, a)

    def p(t, e, n_, r, i, o, a):
        return s(n_ ^ (e | ~r), t, e, i, o, a)

    def n(t):
        return html.unescape(parse.unquote(t))

    def c(t):
        n = ''
        for e in range(0, 32 * len(t), 8):
            n += chr(gt3(t[e >> 5], e % 32) & 255)
        return n

    def l(t):
        n_ = []
        for e in range(len(t) >> 2):
            n_.append(0)
        for e in range(0, 8 * len(t), 8):
            if (e >> 5) < len(n_):
                n_[e >> 5] |= (255 & ord(t[e // 8])) << e % 32
            else:
                n_.append(0 | (255 & ord(t[e // 8])) << e % 32)
        return n_

    def u(t, e):
        while 14 + (gt3((e + 64), 9) << 4) >= len(t) - 1:
            t.append(0)
        t[e >> 5] |= 128 << e % 32
        t[14 + (gt3((e + 64), 9) << 4)] = e
        s = 1732584193
        u = -271733879
        c = -1732584194
        l = 271733878
        # print(h(s, u, c, l, t[0], 7, -680876936))
        for n_ in range(0, len(t), 16):
            r = s
            s = h(r, u, c, l, t[n_], 7, -680876936)
            a = l
            l = h(a, s, u, c, t[n_ + 1], 12, -389564586)
            o = c
            c = h(o, l, s, u, t[n_ + 2], 17, 606105819)
            i = u
            u = h(i, c, l, s, t[n_ + 3], 22, -1044525330)
            s = h(s, u, c, l, t[n_ + 4], 7, -176418897)
            l = h(l, s, u, c, t[n_ + 5], 12, 1200080426)
            c = h(c, l, s, u, t[n_ + 6], 17, -1473231341)
            u = h(u, c, l, s, t[n_ + 7], 22, -45705983)
            s = h(s, u, c, l, t[n_ + 8], 7, 1770035416)
            l = h(l, s, u, c, t[n_ + 9], 12, -1958414417)
            c = h(c, l, s, u, t[n_ + 10], 17, -42063)
            u = h(u, c, l, s, t[n_ + 11], 22, -1990404162)
            s = h(s, u, c, l, t[n_ + 12], 7, 1804603682)
            l = h(l, s, u, c, t[n_ + 13], 12, -40341101)
            c = h(c, l, s, u, t[n_ + 14], 17, -1502002290)
            u = h(u, c, l, s, t[n_ + 15], 22, 1236535329)
            s = f(s, u, c, l, t[n_ + 1], 5, -165796510)
            l = f(l, s, u, c, t[n_ + 6], 9, -1069501632)
            c = f(c, l, s, u, t[n_ + 11], 14, 643717713)
            u = f(u, c, l, s, t[n_], 20, -373897302)
            s = f(s, u, c, l, t[n_ + 5], 5, -701558691)
            l = f(l, s, u, c, t[n_ + 10], 9, 38016083)
            c = f(c, l, s, u, t[n_ + 15], 14, -660478335)
            u = f(u, c, l, s, t[n_ + 4], 20, -405537848)
            s = f(s, u, c, l, t[n_ + 9], 5, 568446438)
            l = f(l, s, u, c, t[n_ + 14], 9, -1019803690)
            c = f(c, l, s, u, t[n_ + 3], 14, -187363961)
            u = f(u, c, l, s, t[n_ + 8], 20, 1163531501)
            s = f(s, u, c, l, t[n_ + 13], 5, -1444681467)
            l = f(l, s, u, c, t[n_ + 2], 9, -51403784)
            c = f(c, l, s, u, t[n_ + 7], 14, 1735328473)
            u = f(u, c, l, s, t[n_ + 12], 20, -1926607734)
            s = g(s, u, c, l, t[n_ + 5], 4, -378558)
            l = g(l, s, u, c, t[n_ + 8], 11, -2022574463)
            c = g(c, l, s, u, t[n_ + 11], 16, 1839030562)
            u = g(u, c, l, s, t[n_ + 14], 23, -35309556)
            s = g(s, u, c, l, t[n_ + 1], 4, -1530992060)
            l = g(l, s, u, c, t[n_ + 4], 11, 1272893353)
            c = g(c, l, s, u, t[n_ + 7], 16, -155497632)
            u = g(u, c, l, s, t[n_ + 10], 23, -1094730640)
            s = g(s, u, c, l, t[n_ + 13], 4, 681279174)
            l = g(l, s, u, c, t[n_], 11, -358537222)
            c = g(c, l, s, u, t[n_ + 3], 16, -722521979)
            u = g(u, c, l, s, t[n_ + 6], 23, 76029189)
            s = g(s, u, c, l, t[n_ + 9], 4, -640364487)
            l = g(l, s, u, c, t[n_ + 12], 11, -421815835)
            c = g(c, l, s, u, t[n_ + 15], 16, 530742520)
            u = g(u, c, l, s, t[n_ + 2], 23, -995338651)
            s = p(s, u, c, l, t[n_], 6, -198630844)
            l = p(l, s, u, c, t[n_ + 7], 10, 1126891415)
            c = p(c, l, s, u, t[n_ + 14], 15, -1416354905)
            u = p(u, c, l, s, t[n_ + 5], 21, -57434055)
            s = p(s, u, c, l, t[n_ + 12], 6, 1700485571)
            l = p(l, s, u, c, t[n_ + 3], 10, -1894986606)
            c = p(c, l, s, u, t[n_ + 10], 15, -1051523)
            u = p(u, c, l, s, t[n_ + 1], 21, -2054922799)
            s = p(s, u, c, l, t[n_ + 8], 6, 1873313359)
            l = p(l, s, u, c, t[n_ + 15], 10, -30611744)
            c = p(c, l, s, u, t[n_ + 6], 15, -1560198380)
            u = p(u, c, l, s, t[n_ + 13], 21, 1309151649)
            s = p(s, u, c, l, t[n_ + 4], 6, -145523070)
            l = p(l, s, u, c, t[n_ + 11], 10, -1120210379)
            c = p(c, l, s, u, t[n_ + 2], 15, 718787259)
            u = p(u, c, l, s, t[n_ + 9], 21, -343485551)
            s = d(s, r)
            u = d(u, i)
            c = d(c, o)
            l = d(l, a)
        return [s, u, c, l]

    def a(t):
        return c(u(l(n(t)), 8 * len(t)))

    def o(t):
        r = ''
        for n_ in range(len(t)):
            e = ord(t[n_])
            r += "0123456789abcdef"[gt3(e, 2) & 15] + "0123456789abcdef"[15 & e]
        return r

    def m(t, e):
        t = n(t)
        e = n(e)
        i = l(t)
        o = []
        a= []
        if 16 < len(i):
            i = u(i, 8 * len(t))
        for n_ in range(16):
            o.append(909522486 ^ i[n_])
            a.append(1549556828 ^ i[n_])
        r = u(o + e, 512 + 8 * len(e))
        return r, c(u(a + r, 640))

    def md5(t, e=None, n_=None):
        if e:
            if n_:
                m(e, t)
            else:
                return o(m(e, t))
        else:
            if n_:
                a(t)
            else:
                return o(a(t))
    n_ = e[ord(t[0]) % len(e)]
    r = e[ord(t[-1]) % len(e)]
    return md5(n_ + t + r)

这段代码在PyCharm里还是有很多波浪线的,作为强迫症的我还是比较倾向于遵守.PEP8编码规范的,至少没有下划线,还整体好看。如果你们谁有能力让这里面编码规范标准化的话可以留言私信我。
这里面 e=‘bilibili’ 是我通过反复看watches观察出来的固定的值观察出来的变量,这里传入的参数t是 From-Data中的s 参数。

获取From-Data中的参数

经过对变量的观察,很显然这s是被第451行 n = this.generateStr(this.link + “@” + e).toString(10); 生成的,这个随机数是在刚才那个Source界面第450行 e = Math.random().toString(10).substring(2) 随机生成出来的,这也是 From-Data中的r 随机数参数(猜测是random缩写)。随机数简单,所以现在就剩一个 From-Data中的s 参数了。观察watches里的变量值,该字符串是bilibili视频地址@随机数,里面的 generateStr 函数经过查找找到了如下JavaScript代码:

generateStr = function(t) {
		var a = function() {
			for (var t = 0, e = new Array(256), n = 0; 256 != n; ++n)
				t = 1 & (t = 1 & (t = 1 & (t = 1 & (t = 1 & (t = 1 & (t = 1 & (t = 1 & (t = n) ? -306674912 ^ t >>>
							1 : t >>> 1) ? -306674912 ^ t >>> 1 : t >>> 1) ? -306674912 ^ t >>> 1 :
						t >>> 1) ? -306674912 ^ t >>> 1 : t >>> 1) ? -306674912 ^ t >>> 1 : t >>> 1) ? -
					306674912 ^ t >>> 1 : t >>> 1) ? -306674912 ^ t >>> 1 : t >>> 1) ? -306674912 ^ t >>> 1 : t >>>
				1,
				e[n] = t;
			return "undefined" != typeof Int32Array ? new Int32Array(e) : e
		}();
		return function(t) {
			for (var e, n, r = -1, i = 0, o = t.length; i < o;)
				r = (e = t.charCodeAt(i++)) < 128 ? r >>> 8 ^ a[255 & (r ^ e)] : e < 2048 ? (r = r >>> 8 ^ a[255 &
					(r ^ (192 | e >> 6 & 31))]) >>> 8 ^ a[255 & (r ^ (128 | 63 & e))] : 55296 <= e && e < 57344 ? (
					e = 64 + (1023 & e),
					n = 1023 & t.charCodeAt(i++),
					(r = (r = (r = r >>> 8 ^ a[255 & (r ^ (240 | e >> 8 & 7))]) >>> 8 ^ a[255 & (r ^ (128 | e >>
						2 & 63))]) >>> 8 ^ a[255 & (r ^ (128 | n >> 6 & 15 | (3 & e) << 4))]) >>> 8 ^ a[255 & (r ^
						(128 | 63 & n))]) : (r = (r = r >>> 8 ^ a[255 & (r ^ (224 | e >> 12 & 15))]) >>> 8 ^ a[
					255 & (r ^ (128 | e >> 6 & 63))]) >>> 8 ^ a[255 & (r ^ (128 | 63 & e))];
			return -1 ^ r
		}(t) >>> 0
	}

第一个for循环里的参数很有意思,经过我的展开:

t = 1 & (
	t = 1 & (
		t = 1 & (
			t = 1 & (
				t = 1 & (
					t = 1 & (
						t = 1 & (
							t = 1 & (
								t = n) ? 
								-306674912 ^ t >>> 1 : 
							t >>> 1) ? 
							-306674912 ^ t >>> 1 : 
						t >>> 1) ? 
						-306674912 ^ t >>> 1 : 
					t >>> 1) ? 
					-306674912 ^ t >>> 1 : 
				t >>> 1) ? 
				-306674912 ^ t >>> 1 : 
			t >>> 1) ? 
			- 306674912 ^ t >>> 1 : 
		t >>> 1) ? 
		-306674912 ^ t >>> 1 : 
	t >>> 1) ?
	-306674912 ^ t >>> 1 :	
t >>> 1

真治愈我的强迫症啊,不过比上面的简单多了,经过我的转化,Python代码如下:

def generate_str(t):
    a = []
    for n in range(256):
        for _ in range(8):
            if 1 & n:
                n = -306674912 ^ gt3(n, 1)
            else:
                n = gt3(n, 1)
        a.append(n)
    r = -1
    i = 0
    while i < len(t):
        e = ord(t[i])
        i += 1
        if e < 128:
            r = gt3(r, 8) ^ a[255 & (r ^ e)]
        else:
            if e < 2048:
                r = gt3(gt3(r, 8) ^ a[255 & (r ^ (192 | e >> 6 & 31))], 8) ^ a[255 & (r ^ (128 | 63 & e))]
            else:
                if 55296 <= e < 57344:
                    e = 64 + (1023 & e)
                    n = 1023 & ord(t[i])
                    i += 1
                    r = gt3(gt3(gt3(gt3(r, 8) ^ a[255 & (r ^ (240 | e >> 8 & 7))], 8) ^ a[255 & (r ^ (128 | e >> 2 & 63))], 8) ^ a[255 & (r ^ (128 | n >> 6 & 15 | (3 & e) << 4))], 8) ^ a[255 & (r ^ (128 | 63 & n))]
                else:
                    r = gt3(gt3(gt3(r, 8) ^ a[255 & (r ^ (224 | e >> 12 & 15))], 8) ^ a[255 & (r ^ (128 | e >> 6 & 63))], 8) ^ a[255 & (r ^ (128 | 63 & e))]
    return str(gt3(-1 ^ r, 0))

这个随机数根据JavaScript代码的观察,是先获取一个0到1之间的小数,然后去掉 0. 这两个字符,换句话说,他就是一个随机正整数,在Python中简简单单 ran = str(random.random())[2:]就得到了。

使用execjs获取参数

同样获取参数r、s、x_client_data,使用JavaScript执行库获取参数值的代码如下:

import execjs

link = 'https://www.bilibili.com/video/BV1Xt41157R4/?spm_id_from=autoNext'
r = execjs.eval("Math.random().toString(10).substring(2)")
s = str(execjs.compile("""
generateStr = function(t) {
    var a = function() {
        for (var t = 0, e = new Array(256), n = 0; 256 != n; ++n)
            t = 1 & (t = 1 & (t = 1 & (t = 1 & (t = 1 & (t = 1 & (t = 1 & (t = 1 & (t = n) ? -306674912 ^ t >>>
                        1 : t >>> 1) ? -306674912 ^ t >>> 1 : t >>> 1) ? -306674912 ^ t >>> 1 :
                    t >>> 1) ? -306674912 ^ t >>> 1 : t >>> 1) ? -306674912 ^ t >>> 1 : t >>> 1) ? -
                306674912 ^ t >>> 1 : t >>> 1) ? -306674912 ^ t >>> 1 : t >>> 1) ? -306674912 ^ t >>> 1 : t >>>
            1,
            e[n] = t;
        return "undefined" != typeof Int32Array ? new Int32Array(e) : e
    }();
    return function(t) {
        for (var e, n, r = -1, i = 0, o = t.length; i < o;)
            r = (e = t.charCodeAt(i++)) < 128 ? r >>> 8 ^ a[255 & (r ^ e)] : e < 2048 ? (r = r >>> 8 ^ a[255 &
                (r ^ (192 | e >> 6 & 31))]) >>> 8 ^ a[255 & (r ^ (128 | 63 & e))] : 55296 <= e && e < 57344 ? (
                e = 64 + (1023 & e),
                n = 1023 & t.charCodeAt(i++),
                (r = (r = (r = r >>> 8 ^ a[255 & (r ^ (240 | e >> 8 & 7))]) >>> 8 ^ a[255 & (r ^ (128 | e >>
                    2 & 63))]) >>> 8 ^ a[255 & (r ^ (128 | n >> 6 & 15 | (3 & e) << 4))]) >>> 8 ^ a[255 & (r ^
                    (128 | 63 & n))]) : (r = (r = r >>> 8 ^ a[255 & (r ^ (224 | e >> 12 & 15))]) >>> 8 ^ a[
                255 & (r ^ (128 | e >> 6 & 63))]) >>> 8 ^ a[255 & (r ^ (128 | 63 & e))];
        return -1 ^ r
    }(t) >>> 0
}
""").call("generateStr", f"{link}@{r}"))
x_client_data = execjs.compile("""
function d(t, e) {
    var n = (65535 & t) + (65535 & e);
    return (t >> 16) + (e >> 16) + (n >> 16) << 16 | 65535 & n
}
function s(t, e, n, r, i, o) {
    console.log(function a(t, e) {
        return t << e | t >>> 32 - e
    }(d(d(e, t), d(r, o)), i))
    return d(function a(t, e) {
        return t << e | t >>> 32 - e
    }(d(d(e, t), d(r, o)), i), n)
}
function h(t, e, n, r, i, o, a) {
    return s(e & n | ~e & r, t, e, i, o, a)
}
function f(t, e, n, r, i, o, a) {
    return s(e & r | n & ~r, t, e, i, o, a)
}
function g(t, e, n, r, i, o, a) {
    return s(e ^ n ^ r, t, e, i, o, a)
}
function p(t, e, n, r, i, o, a) {
    return s(n ^ (e | ~r), t, e, i, o, a)
}
function n(t) {
    return unescape(encodeURIComponent(t))
}
function c(t) {
    var e, n = "", r = 32 * t.length;
    for (e = 0; e < r; e += 8)
        n += String.fromCharCode(t[e >> 5] >>> e % 32 & 255);
    return n
}
function l(t) {
    var e, n = [];
    for (n[(t.length >> 2) - 1] = void 0,
    e = 0; e < n.length; e += 1)
        n[e] = 0;
    var r = 8 * t.length;
    for (e = 0; e < r; e += 8)
        n[e >> 5] |= (255 & t.charCodeAt(e / 8)) << e % 32;
    return n
}
function u(t, e) {
    t[e >> 5] |= 128 << e % 32,
    t[14 + (e + 64 >>> 9 << 4)] = e;
    var n, r, i, o, a, s = 1732584193, u = -271733879, c = -1732584194, l = 271733878;
    // console.log(h(s, u, c, l, t[0], 7, -680876936))
    for (n = 0; n < t.length; n += 16)
        u = p(u = p(u = p(u = p(u = g(u = g(u = g(u = g(u = f(u = f(u = f(u = f(u = h(u = h(u = h(u = h(i = u, c = h(o = c, l = h(a = l, s = h(r = s, u, c, l, t[n], 7, -680876936), u, c, t[n + 1], 12, -389564586), s, u, t[n + 2], 17, 606105819), l, s, t[n + 3], 22, -1044525330), c = h(c, l = h(l, s = h(s, u, c, l, t[n + 4], 7, -176418897), u, c, t[n + 5], 12, 1200080426), s, u, t[n + 6], 17, -1473231341), l, s, t[n + 7], 22, -45705983), c = h(c, l = h(l, s = h(s, u, c, l, t[n + 8], 7, 1770035416), u, c, t[n + 9], 12, -1958414417), s, u, t[n + 10], 17, -42063), l, s, t[n + 11], 22, -1990404162), c = h(c, l = h(l, s = h(s, u, c, l, t[n + 12], 7, 1804603682), u, c, t[n + 13], 12, -40341101), s, u, t[n + 14], 17, -1502002290), l, s, t[n + 15], 22, 1236535329), c = f(c, l = f(l, s = f(s, u, c, l, t[n + 1], 5, -165796510), u, c, t[n + 6], 9, -1069501632), s, u, t[n + 11], 14, 643717713), l, s, t[n], 20, -373897302), c = f(c, l = f(l, s = f(s, u, c, l, t[n + 5], 5, -701558691), u, c, t[n + 10], 9, 38016083), s, u, t[n + 15], 14, -660478335), l, s, t[n + 4], 20, -405537848), c = f(c, l = f(l, s = f(s, u, c, l, t[n + 9], 5, 568446438), u, c, t[n + 14], 9, -1019803690), s, u, t[n + 3], 14, -187363961), l, s, t[n + 8], 20, 1163531501), c = f(c, l = f(l, s = f(s, u, c, l, t[n + 13], 5, -1444681467), u, c, t[n + 2], 9, -51403784), s, u, t[n + 7], 14, 1735328473), l, s, t[n + 12], 20, -1926607734), c = g(c, l = g(l, s = g(s, u, c, l, t[n + 5], 4, -378558), u, c, t[n + 8], 11, -2022574463), s, u, t[n + 11], 16, 1839030562), l, s, t[n + 14], 23, -35309556), c = g(c, l = g(l, s = g(s, u, c, l, t[n + 1], 4, -1530992060), u, c, t[n + 4], 11, 1272893353), s, u, t[n + 7], 16, -155497632), l, s, t[n + 10], 23, -1094730640), c = g(c, l = g(l, s = g(s, u, c, l, t[n + 13], 4, 681279174), u, c, t[n], 11, -358537222), s, u, t[n + 3], 16, -722521979), l, s, t[n + 6], 23, 76029189), c = g(c, l = g(l, s = g(s, u, c, l, t[n + 9], 4, -640364487), u, c, t[n + 12], 11, -421815835), s, u, t[n + 15], 16, 530742520), l, s, t[n + 2], 23, -995338651), c = p(c, l = p(l, s = p(s, u, c, l, t[n], 6, -198630844), u, c, t[n + 7], 10, 1126891415), s, u, t[n + 14], 15, -1416354905), l, s, t[n + 5], 21, -57434055), c = p(c, l = p(l, s = p(s, u, c, l, t[n + 12], 6, 1700485571), u, c, t[n + 3], 10, -1894986606), s, u, t[n + 10], 15, -1051523), l, s, t[n + 1], 21, -2054922799), c = p(c, l = p(l, s = p(s, u, c, l, t[n + 8], 6, 1873313359), u, c, t[n + 15], 10, -30611744), s, u, t[n + 6], 15, -1560198380), l, s, t[n + 13], 21, 1309151649), c = p(c, l = p(l, s = p(s, u, c, l, t[n + 4], 6, -145523070), u, c, t[n + 11], 10, -1120210379), s, u, t[n + 2], 15, 718787259), l, s, t[n + 9], 21, -343485551),
        s = d(s, r),
        u = d(u, i),
        c = d(c, o),
        l = d(l, a);
    return [s, u, c, l]
}
function a(t) {
    return function e(t) {
        return c(u(l(t), 8 * t.length))
    }(n(t))
}
function o(t) {
    var e, n, r = "";
    for (n = 0; n < t.length; n += 1)
        e = t.charCodeAt(n),
        r += "0123456789abcdef".charAt(e >>> 2 & 15) + "0123456789abcdef".charAt(15 & e);
    return r
}
function m(t, e) {
    return function s(t, e) {
        var n, r, i = l(t), o = [], a = [];
        for (o[15] = a[15] = void 0,
        16 < i.length && (i = u(i, 8 * t.length)),
        n = 0; n < 16; n += 1)
            o[n] = 909522486 ^ i[n],
            a[n] = 1549556828 ^ i[n];
        return r = u(o.concat(l(e)), 512 + 8 * e.length),
        c(u(a.concat(r), 640))
    }(n(t), n(e))
}
function md5(t, e, n) {
    return e ? n ? m(e, t) : function r(t, e) {
        return o(m(t, e))
    }(e, t) : n ? a(t) : function i(t) {
        return o(a(t))
    }(t)
}
function uu(t, e) {
    var n = e.charAt(t.charCodeAt(0) % e.length),
        r = e.charAt(t.charCodeAt(t.length - 1) % e.length);
    return md5(n + t + r)
}
""").call("uu", s, 'bilibili')
print(r)
print(s)
print(x_client_data)

完整代码

import html
import ctypes
import random
import requests
from urllib import parse

# 32位无符号右移
gt3 = lambda _, __: (_ % (1 << 32)) >> __


def x_client_data(t, e='bilibili'):
    def d(t, e):
        n_ = (65535 & t) + (65535 & e)
        return (t >> 16) + (e >> 16) + (n_ >> 16) << 16 | 65535 & n_

    def s(t, e, n_, r, i, o):
        t = d(d(e, t), d(r, o))
        e = i
        return d(ctypes.c_int32(t << i).value | gt3(t, 32 - e), n_)

    def h(t, e, n_, r, i, o, a):
        return s(e & n_ | ~e & r, t, e, i, o, a)

    def f(t, e, n_, r, i, o, a):
        return s(e & r | n_ & ~r, t, e, i, o, a)

    def g(t, e, n_, r, i, o, a):
        return s(e ^ n_ ^ r, t, e, i, o, a)

    def p(t, e, n_, r, i, o, a):
        return s(n_ ^ (e | ~r), t, e, i, o, a)

    def n(t):
        return html.unescape(parse.unquote(t))

    def c(t):
        n = ''
        for e in range(0, 32 * len(t), 8):
            n += chr(gt3(t[e >> 5], e % 32) & 255)
        return n

    def l(t):
        n_ = []
        for e in range(len(t) >> 2):
            n_.append(0)
        for e in range(0, 8 * len(t), 8):
            if (e >> 5) < len(n_):
                n_[e >> 5] |= (255 & ord(t[e // 8])) << e % 32
            else:
                n_.append(0 | (255 & ord(t[e // 8])) << e % 32)
        return n_

    def u(t, e):
        while 14 + (gt3((e + 64), 9) << 4) >= len(t) - 1:
            t.append(0)
        t[e >> 5] |= 128 << e % 32
        t[14 + (gt3((e + 64), 9) << 4)] = e
        s = 1732584193
        u = -271733879
        c = -1732584194
        l = 271733878
        # print(h(s, u, c, l, t[0], 7, -680876936))
        for n_ in range(0, len(t), 16):
            r = s
            s = h(r, u, c, l, t[n_], 7, -680876936)
            a = l
            l = h(a, s, u, c, t[n_ + 1], 12, -389564586)
            o = c
            c = h(o, l, s, u, t[n_ + 2], 17, 606105819)
            i = u
            u = h(i, c, l, s, t[n_ + 3], 22, -1044525330)
            s = h(s, u, c, l, t[n_ + 4], 7, -176418897)
            l = h(l, s, u, c, t[n_ + 5], 12, 1200080426)
            c = h(c, l, s, u, t[n_ + 6], 17, -1473231341)
            u = h(u, c, l, s, t[n_ + 7], 22, -45705983)
            s = h(s, u, c, l, t[n_ + 8], 7, 1770035416)
            l = h(l, s, u, c, t[n_ + 9], 12, -1958414417)
            c = h(c, l, s, u, t[n_ + 10], 17, -42063)
            u = h(u, c, l, s, t[n_ + 11], 22, -1990404162)
            s = h(s, u, c, l, t[n_ + 12], 7, 1804603682)
            l = h(l, s, u, c, t[n_ + 13], 12, -40341101)
            c = h(c, l, s, u, t[n_ + 14], 17, -1502002290)
            u = h(u, c, l, s, t[n_ + 15], 22, 1236535329)
            s = f(s, u, c, l, t[n_ + 1], 5, -165796510)
            l = f(l, s, u, c, t[n_ + 6], 9, -1069501632)
            c = f(c, l, s, u, t[n_ + 11], 14, 643717713)
            u = f(u, c, l, s, t[n_], 20, -373897302)
            s = f(s, u, c, l, t[n_ + 5], 5, -701558691)
            l = f(l, s, u, c, t[n_ + 10], 9, 38016083)
            c = f(c, l, s, u, t[n_ + 15], 14, -660478335)
            u = f(u, c, l, s, t[n_ + 4], 20, -405537848)
            s = f(s, u, c, l, t[n_ + 9], 5, 568446438)
            l = f(l, s, u, c, t[n_ + 14], 9, -1019803690)
            c = f(c, l, s, u, t[n_ + 3], 14, -187363961)
            u = f(u, c, l, s, t[n_ + 8], 20, 1163531501)
            s = f(s, u, c, l, t[n_ + 13], 5, -1444681467)
            l = f(l, s, u, c, t[n_ + 2], 9, -51403784)
            c = f(c, l, s, u, t[n_ + 7], 14, 1735328473)
            u = f(u, c, l, s, t[n_ + 12], 20, -1926607734)
            s = g(s, u, c, l, t[n_ + 5], 4, -378558)
            l = g(l, s, u, c, t[n_ + 8], 11, -2022574463)
            c = g(c, l, s, u, t[n_ + 11], 16, 1839030562)
            u = g(u, c, l, s, t[n_ + 14], 23, -35309556)
            s = g(s, u, c, l, t[n_ + 1], 4, -1530992060)
            l = g(l, s, u, c, t[n_ + 4], 11, 1272893353)
            c = g(c, l, s, u, t[n_ + 7], 16, -155497632)
            u = g(u, c, l, s, t[n_ + 10], 23, -1094730640)
            s = g(s, u, c, l, t[n_ + 13], 4, 681279174)
            l = g(l, s, u, c, t[n_], 11, -358537222)
            c = g(c, l, s, u, t[n_ + 3], 16, -722521979)
            u = g(u, c, l, s, t[n_ + 6], 23, 76029189)
            s = g(s, u, c, l, t[n_ + 9], 4, -640364487)
            l = g(l, s, u, c, t[n_ + 12], 11, -421815835)
            c = g(c, l, s, u, t[n_ + 15], 16, 530742520)
            u = g(u, c, l, s, t[n_ + 2], 23, -995338651)
            s = p(s, u, c, l, t[n_], 6, -198630844)
            l = p(l, s, u, c, t[n_ + 7], 10, 1126891415)
            c = p(c, l, s, u, t[n_ + 14], 15, -1416354905)
            u = p(u, c, l, s, t[n_ + 5], 21, -57434055)
            s = p(s, u, c, l, t[n_ + 12], 6, 1700485571)
            l = p(l, s, u, c, t[n_ + 3], 10, -1894986606)
            c = p(c, l, s, u, t[n_ + 10], 15, -1051523)
            u = p(u, c, l, s, t[n_ + 1], 21, -2054922799)
            s = p(s, u, c, l, t[n_ + 8], 6, 1873313359)
            l = p(l, s, u, c, t[n_ + 15], 10, -30611744)
            c = p(c, l, s, u, t[n_ + 6], 15, -1560198380)
            u = p(u, c, l, s, t[n_ + 13], 21, 1309151649)
            s = p(s, u, c, l, t[n_ + 4], 6, -145523070)
            l = p(l, s, u, c, t[n_ + 11], 10, -1120210379)
            c = p(c, l, s, u, t[n_ + 2], 15, 718787259)
            u = p(u, c, l, s, t[n_ + 9], 21, -343485551)
            s = d(s, r)
            u = d(u, i)
            c = d(c, o)
            l = d(l, a)
        return [s, u, c, l]

    def a(t):
        return c(u(l(n(t)), 8 * len(t)))

    def o(t):
        r = ''
        for n_ in range(len(t)):
            e = ord(t[n_])
            r += "0123456789abcdef"[gt3(e, 2) & 15] + "0123456789abcdef"[15 & e]
        return r

    def m(t, e):
        t = n(t)
        e = n(e)
        i = l(t)
        o = []
        a= []
        if 16 < len(i):
            i = u(i, 8 * len(t))
        for n_ in range(16):
            o.append(909522486 ^ i[n_])
            a.append(1549556828 ^ i[n_])
        r = u(o + e, 512 + 8 * len(e))
        return r, c(u(a + r, 640))

    def md5(t, e=None, n_=None):
        if e:
            if n_:
                m(e, t)
            else:
                return o(m(e, t))
        else:
            if n_:
                a(t)
            else:
                return o(a(t))
    n_ = e[ord(t[0]) % len(e)]
    r = e[ord(t[-1]) % len(e)]
    return md5(n_ + t + r)


def generate_str(t):
    a = []
    for n in range(256):
        for _ in range(8):
            if 1 & n:
                n = -306674912 ^ gt3(n, 1)
            else:
                n = gt3(n, 1)
        a.append(n)
    r = -1
    i = 0
    while i < len(t):
        e = ord(t[i])
        i += 1
        if e < 128:
            r = gt3(r, 8) ^ a[255 & (r ^ e)]
        else:
            if e < 2048:
                r = gt3(gt3(r, 8) ^ a[255 & (r ^ (192 | e >> 6 & 31))], 8) ^ a[255 & (r ^ (128 | 63 & e))]
            else:
                if 55296 <= e < 57344:
                    e = 64 + (1023 & e)
                    n = 1023 & ord(t[i])
                    i += 1
                    r = gt3(gt3(gt3(gt3(r, 8) ^ a[255 & (r ^ (240 | e >> 8 & 7))], 8) ^ a[255 & (r ^ (128 | e >> 2 & 63))], 8) ^ a[255 & (r ^ (128 | n >> 6 & 15 | (3 & e) << 4))], 8) ^ a[255 & (r ^ (128 | 63 & n))]
                else:
                    r = gt3(gt3(gt3(r, 8) ^ a[255 & (r ^ (224 | e >> 12 & 15))], 8) ^ a[255 & (r ^ (128 | e >> 6 & 63))], 8) ^ a[255 & (r ^ (128 | 63 & e))]
    return str(gt3(-1 ^ r, 0))


headers = {
    "Origin": "https://bilibili.iiilab.com",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36"
}

session = requests.session()
session.headers = headers
session.get('https://bilibili.iiilab.com/')
session.post('https://service0.iiilab.com/sponsor/getByPage', data=dict(page='bilibili'))
session.headers["Referer"] = "https://bilibili.iiilab.com/"
session.headers["Content-Type"] = "application/x-www-form-urlencoded; charset=UTF-8"
# cookies = requests.utils.dict_from_cookiejar(session.cookies)
# cookies.update(dict(zzz0821='1'))
# session.cookies = requests.utils.cookiejar_from_dict(cookies)
session.cookies = requests.sessions.merge_cookies(session.cookies, dict(zzz0821='1'))
# session.headers.update(dict(Cookie=';'.join([(lambda _: f'{_}={cookies[_]}')(_) for _ in cookies])))
link = 'https://www.bilibili.com/video/BV1Xt41157R4/?spm_id_from=autoNext'
ran = str(random.random())[2:]
s = generate_str(f"{link}@{ran}")
session.headers["X-Client-Data"] = x_client_data(s)
res = session.post('https://service0.iiilab.com/video/web/bilibili', data=dict(link=link, r=ran, s=s))
if res.ok:
    print(res.json())

以上就是完整的Python脚本代码,希望你不是直接划到这边来复制的,当然一般人可能觉得没必要爬这个,不管你怎么想都没有关系,对我来说这只是一次即兴训练(不过看JavaScript也看的头昏眼花的,眼球都要凸出来了,如果不是即兴,我还是想选择JavaScript执行库的…)。

结束语

可能你会感觉我这个前面特别详细到后面就显得简单掠过,这可能是因为我饿了,在吃饭的路上~
如果喜欢我的文章,可以点赞关注加收藏,走过路过不要错过,毕竟我不知道写下一篇的想法会什么时候蹦出来( ﹡ˆoˆ﹡ )

你可能感兴趣的:(Python爬虫,爬虫,python)