2021年6月知乎指定问题信息爬取 & x-zse-96 2.0版本加密破解分析 爬虫破解反扒思路

一、前言

本文仅供研究与学习使用

知乎现今的 x-zse 参数的加密方法已升级成了:x-zse-96 2.0版本。
来看这篇帖子的应该都知道,这个参数动态唯一,没有就拿不到api数据。
查阅了网上有关文章,仅有x-zse-86 2.0版本的解密方法,现今已不适用,加上之前写的文章中有提及该解密方法,所以出一篇帖子,供大家研究与学习。
2021年6月知乎指定问题信息爬取 & x-zse-96 2.0版本加密破解分析 爬虫破解反扒思路_第1张图片

二、破解思路

打开浏览器,进入知乎,任意搜索一问题后,打开开发者模式,查看网页 js 文件。

步骤如下:

  • 进入开发者模式后点击 Sources 源文件;
  • 在源文件左侧选中 page --> top --> static.zhihu.com --> heifetz
  • 最后选择 开头为 main.app 的文件点击 Pretty-print(格式化打印)
    2021年6月知乎指定问题信息爬取 & x-zse-96 2.0版本加密破解分析 爬虫破解反扒思路_第2张图片

按上述步骤操作后,能看到源 js 文件,接着就能开始查找所需要的信息了。
Ctrl + f 查找 x-zse-96 可以看到它的值就是 “2.0_” + j ,那么接着去找 j ,往上一看就发现 j = _.signature
2021年6月知乎指定问题信息爬取 & x-zse-96 2.0版本加密破解分析 爬虫破解反扒思路_第3张图片

于是 Ctrl + f 接着查找 signature
2021年6月知乎指定问题信息爬取 & x-zse-96 2.0版本加密破解分析 爬虫破解反扒思路_第4张图片

不难发现 signature 是一个函数,作用就是对 d 这个值进行了变化。也就是对 d 进行了加密。打断点进行调试:
2021年6月知乎指定问题信息爬取 & x-zse-96 2.0版本加密破解分析 爬虫破解反扒思路_第5张图片

d 的值拷贝出来进行分析:

101_3_2.0+/api/v4/search_v3?t=general&q=%E6%80%8E%E4%B9%88%E8%BF%BD%E5%A5%B3%E4%BA%BA&correction=1&offset=0&limit=20&lc_idx=0&show_all_topics=0+“AdBdAfpQMBOPTr9MynJdVGoxDuWyXhbMZ_A=|1622439967”

不难发现明文是 headers 里的 x-zse-93 + url + cookie.d_c0
知乎貌似一直都是采用 md5 加密方式进行数据加密的,直接上去测就好了。
在终端输入:r.default(d)
2021年6月知乎指定问题信息爬取 & x-zse-96 2.0版本加密破解分析 爬虫破解反扒思路_第6张图片

打开在线MD5加密网站:
2021年6月知乎指定问题信息爬取 & x-zse-96 2.0版本加密破解分析 爬虫破解反扒思路_第7张图片
到这边就可以确定了,知乎就是通过 signature 函数加 md5 加密方式来对用户的请求进行加密,并返回 x-zse-96 的值。

我们将 signature 函数拷贝出来,并进行细微处理,代码太长了我放在文章最后,我们先讲 py 解密处理。

在python中运行js代码需要安装 jsdom
大致操作为:①去官网下载nodejs安装,②npm install jsdom ③在node_modules文件夹里检查有没有jsdom文件夹,有则代表安装成功,将此路径复制下来在代码里使用。
这边附上我写的任意爬取问题的代码,在数据数据解析方面并未做得太完善,需要的朋友自行拓展了。
这边的 cookie 我是动态获取并保存到本地的,可参考我之前写的文章。里面有详细写,这边篇幅太长就不说了。
直达链接:https://blog.csdn.net/qq_26394845/article/details/118028822

import json
from urllib import parse
import execjs
import hashlib
import requests

class zhihu_v2():
    def __init__(self):
        self.question = input('请输入想搜索的问题,按回车键进行搜索!\n')
        data1 = {'': self.question}

        self.question = parse.urlencode(data1)
        print(self.question)
        self.parse_url = "/api/v4/search_v3?t=general&q" + self.question + "&correction=1&offset=0&limit=20&lc_idx=0&show_all_topics=0"
        self.use_url = 'https://www.zhihu.com/api/v4/search_v3?t=general&q' + self.question + '&correction=1&offset=0&limit=20&lc_idx=0&show_all_topics=0'
        with open('zhihucookie.txt', 'r', encoding='utf-8') as c:
            cookie = c.read()
        self.cookie = cookie

    def get_headers(self):
        star = 'd_c0='
        end = ';'
        cookie_mes = self.cookie[self.cookie.index(star):].replace(star, '')
        cookie_mes = cookie_mes[:cookie_mes.index(end)]
        f = "+".join(["101_3_2.0", self.parse_url, cookie_mes])
        fmd5 = hashlib.new('md5', f.encode()).hexdigest()
        with open('g_encrypt.js', 'r') as f:
            ctx1 = execjs.compile(f.read(), cwd='node_modules')
        encrypt_str = "2.0_%s" % ctx1.call('b', fmd5)
        print(encrypt_str)
        headers = {
            "x-api-version": "3.0.91",
            'x-app-za': 'OS=Web',
            "x-zse-93": "101_3_2.0",
            "x-zse-96": encrypt_str,
            "User-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36",
            "Cookie": self.cookie,
        }
        self.zh_ask(headers)

    def zh_ask(self, headers):
        resp = requests.get(url=self.use_url, headers=headers)
        json_mes = json.loads(resp.text)
        with open('json_mes1.txt', 'a', encoding='utf-8') as f:
            try:
                for i in range(0, 23):
                    try:
                        id = json_mes['data'][i]['object']['id']
                        if len(str(id)) < 15:
                            try:
                                title = json_mes['data'][i]['object']['title']
                                title = str(title).replace('', '').replace('', '')
            
                            except KeyError:
                                title = json_mes['data'][i]['highlight']['title']
                                title = str(title).replace('', '').replace('', '')
            
                            try:
                                excerpt = json_mes['data'][i]['object']['excerpt']
                                excerpt = str(excerpt).replace('', '').replace('', '')
                            except KeyError:
                                excerpt = '该问题,暂无描述!'
            
                            try:
                                url = json_mes['data'][i]['object']['url']
                                url = str(url).replace('api', 'www').replace('questions', 'question').replace('answers', 'answer')
                            except KeyError:
                                try:
                                    url = json_mes['data'][i]['object']['answer']['url']
                                    url = str(url).replace('api', 'www').replace('questions', 'question').replace('answers', 'answer')
                                except KeyError:
                                    url = json_mes['data'][i]['object']['question']['url']
                                    url = str(url).replace('api', 'www').replace('questions', 'question').replace('answers', 'answer')
            
            
                            try:
                                voteup_count = json_mes['data'][i]['object']['voteup_count']
                                comment_count = json_mes['data'][i]['object']['comment_count']
            
                            except KeyError:
                                try:
                                    voteup_count = json_mes['data'][i]['object']['answer']['voteup_count']
                                    comment_count = json_mes['data'][i]['object']['answer']['comment_count']
            
                                except KeyError:
                                    try:
                                        voteup_count = json_mes['data'][i]['object']['question']['voteup_count']
                                        comment_count = json_mes['data'][i]['object']['question']['comment_count']
                                    except KeyError:
                                        voteup_count = json_mes['data'][i]['object']['follower_count']
            
                        else:
                            excerpt = '协会问题,暂无描述!'
                            title = json_mes['data'][i]['object']['content_list'][0]['title']
                            title = str(title).replace('', '').replace('', '')
                            url = '协会问题,暂无链接!'
                            voteup_count = 'null'
                            comment_count = 'null'
            
            
                    except KeyError:
                        try:
                            id = json_mes['data'][i]['object']['answer_obj']['id']
            
                            excerpt = json_mes['data'][i]['object']['answer_obj']['excerpt']
                            excerpt = str(excerpt).replace('', '').replace('', '')
            
                            title = json_mes['data'][i]['object']['body']['title']
                            title = str(title).replace('', '').replace('', '')
            
                            url = json_mes['data'][i]['object']['answer_obj']['url']
                            url = str(url).replace('api', 'www').replace('questions', 'question').replace('answers', 'answer')
            
                            voteup_count =  json_mes['data'][i]['object']['answer_obj']['voteup_count']
                            comment_count =  json_mes['data'][i]['object']['answer_obj']['comment_count']
            
                        except KeyError:
                            id = json_mes['data'][i]['object']['answers'][0]['id']
            
                            excerpt = json_mes['data'][i]['object']['answers'][0]['excerpt']
                            excerpt = str(excerpt).replace('', '').replace('', '')
            
                            title = json_mes['data'][i]['highlight']['title']
                            title = str(title).replace('', '').replace('', '')
            
            
                            url = json_mes['data'][i]['object']['answers'][0]['url']
                            url = str(url).replace('api', 'www').replace('questions', 'question').replace('answers', 'answer')
            
                    print(('{}\t {}\t {}\t {}\n{}\n\n'.format(title, voteup_count, comment_count, url, excerpt,)))
                    f.write(('{}\t {}\t {}\t {}\n{}\n\n'.format(title, voteup_count, comment_count, url, excerpt,)))

            except:
                pass


def start():
    op.get_headers()


if __name__ == '__main__':
    op = zhihu_v2()
    start()

    

最后

附上 g_encrypt.js 代码:

const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const dom = new JSDOM(`

Hello world

`
); window = dom.window; document = window.document; XMLHttpRequest = window.XMLHttpRequest; var exports = {} function t(e) { return (t = "function" == typeof Symbol && "symbol" == typeof Symbol.A ? function(e) { return typeof e; } : function(e) { return e && "function" == typeof Symbol && e.constructor === Symbol && e !== Symbol.prototype ? "symbol" : typeof e } )(e) } Object.defineProperty(exports, "__esModule", { value: !0 }); var A = "2.0" , __g = {}; function s() {} function i(e) { this.t = (2048 & e) >> 11, this.s = (1536 & e) >> 9, this.i = 511 & e, this.h = 511 & e } function h(e) { this.s = (3072 & e) >> 10, this.h = 1023 & e } function a(e) { this.a = (3072 & e) >> 10, this.c = (768 & e) >> 8, this.n = (192 & e) >> 6, this.t = 63 & e } function c(e) { this.s = e >> 10 & 3, this.i = 1023 & e } function n() {} function e(e) { this.a = (3072 & e) >> 10, this.c = (768 & e) >> 8, this.n = (192 & e) >> 6, this.t = 63 & e } function o(e) { this.h = (4095 & e) >> 2, this.t = 3 & e } function r(e) { this.s = e >> 10 & 3, this.i = e >> 2 & 255, this.t = 3 & e } s.prototype.e = function(e) { e.o = !1 } , i.prototype.e = function(e) { switch (this.t) { case 0: e.r[this.s] = this.i; break; case 1: e.r[this.s] = e.k[this.h] } } , h.prototype.e = function(e) { e.k[this.h] = e.r[this.s] } , a.prototype.e = function(e) { switch (this.t) { case 0: e.r[this.a] = e.r[this.c] + e.r[this.n]; break; case 1: e.r[this.a] = e.r[this.c] - e.r[this.n]; break; case 2: e.r[this.a] = e.r[this.c] * e.r[this.n]; break; case 3: e.r[this.a] = e.r[this.c] / e.r[this.n]; break; case 4: e.r[this.a] = e.r[this.c] % e.r[this.n]; break; case 5: e.r[this.a] = e.r[this.c] == e.r[this.n]; break; case 6: e.r[this.a] = e.r[this.c] >= e.r[this.n]; break; case 7: e.r[this.a] = e.r[this.c] || e.r[this.n]; break; case 8: e.r[this.a] = e.r[this.c] && e.r[this.n]; break; case 9: e.r[this.a] = e.r[this.c] !== e.r[this.n]; break; case 10: e.r[this.a] = t(e.r[this.c]); break; case 11: e.r[this.a] = e.r[this.c]in e.r[this.n]; break; case 12: e.r[this.a] = e.r[this.c] > e.r[this.n]; break; case 13: e.r[this.a] = -e.r[this.c]; break; case 14: e.r[this.a] = e.r[this.c] < e.r[this.n]; break; case 15: e.r[this.a] = e.r[this.c] & e.r[this.n]; break; case 16: e.r[this.a] = e.r[this.c] ^ e.r[this.n]; break; case 17: e.r[this.a] = e.r[this.c] << e.r[this.n]; break; case 18: e.r[this.a] = e.r[this.c] >>> e.r[this.n]; break; case 19: e.r[this.a] = e.r[this.c] | e.r[this.n]; break; case 20: e.r[this.a] = !e.r[this.c] } } , c.prototype.e = function(e) { e.Q.push(e.C), e.B.push(e.k), e.C = e.r[this.s], e.k = []; for (var t = 0; t < this.i; t++) e.k.unshift(e.f.pop()); e.g.push(e.f), e.f = [] } , n.prototype.e = function(e) { e.C = e.Q.pop(), e.k = e.B.pop(), e.f = e.g.pop() } , e.prototype.e = function(e) { switch (this.t) { case 0: e.u = e.r[this.a] >= e.r[this.c]; break; case 1: e.u = e.r[this.a] <= e.r[this.c]; break; case 2: e.u = e.r[this.a] > e.r[this.c]; break; case 3: e.u = e.r[this.a] < e.r[this.c]; break; case 4: e.u = e.r[this.a] == e.r[this.c]; break; case 5: e.u = e.r[this.a] != e.r[this.c]; break; case 6: e.u = e.r[this.a]; break; case 7: e.u = !e.r[this.a] } } , o.prototype.e = function(e) { switch (this.t) { case 0: e.C = this.h; break; case 1: e.u && (e.C = this.h); break; case 2: e.u || (e.C = this.h); break; case 3: e.C = this.h, e.w = null } e.u = !1 } , r.prototype.e = function(e) { switch (this.t) { case 0: for (var t = [], n = 0; n < this.i; n++) t.unshift(e.f.pop()); e.r[3] = e.r[this.s](t[0], t[1]); break; case 1: for (var r = e.f.pop(), o = [], i = 0; i < this.i; i++) o.unshift(e.f.pop()); e.r[3] = e.r[this.s][r](o[0], o[1]); break; case 2: for (var a = [], c = 0; c < this.i; c++) a.unshift(e.f.pop()); e.r[3] = new e.r[this.s](a[0],a[1]) } } ; var k = function(e) { for (var t = 66, n = [], r = 0; r < e.length; r++) { var o = 24 ^ e.charCodeAt(r) ^ t; n.push(String.fromCharCode(o)), t = o } return n.join("") }; function Q(e) { this.t = (4095 & e) >> 10, this.s = (1023 & e) >> 8, this.i = 1023 & e, this.h = 63 & e } function C(e) { this.t = (4095 & e) >> 10, this.a = (1023 & e) >> 8, this.c = (255 & e) >> 6 } function B(e) { this.s = (3072 & e) >> 10, this.h = 1023 & e } function f(e) { this.h = 4095 & e } function g(e) { this.s = (3072 & e) >> 10 } function u(e) { this.h = 4095 & e } function w(e) { this.t = (3840 & e) >> 8, this.s = (192 & e) >> 6, this.i = 63 & e } function G() { this.r = [0, 0, 0, 0], this.C = 0, this.Q = [], this.k = [], this.B = [], this.f = [], this.g = [], this.u = !1, this.G = [], this.b = [], this.o = !1, this.w = null, this.U = null, this.F = [], this.R = 0, this.J = { 0: s, 1: i, 2: h, 3: a, 4: c, 5: n, 6: e, 7: o, 8: r, 9: Q, 10: C, 11: B, 12: f, 13: g, 14: u, 15: w } } Q.prototype.e = function(e) { switch (this.t) { case 0: e.f.push(e.r[this.s]); break; case 1: e.f.push(this.i); break; case 2: e.f.push(e.k[this.h]); break; case 3: e.f.push(k(e.b[this.h])) } } , C.prototype.e = function(A) { switch (this.t) { case 0: var t = A.f.pop(); A.r[this.a] = A.r[this.c][t]; break; case 1: var s = A.f.pop() , i = A.f.pop(); A.r[this.c][s] = i; break; case 2: var h = A.f.pop(); A.r[this.a] = eval(h) } } , B.prototype.e = function(e) { e.r[this.s] = k(e.b[this.h]) } , f.prototype.e = function(e) { e.w = this.h } , g.prototype.e = function(e) { throw e.r[this.s] } , u.prototype.e = function(e) { var t = this , n = [0]; e.k.forEach(function(e) { n.push(e) }); var r = function(r) { var o = new G; return o.k = n, o.k[0] = r, o.v(e.G, t.h, e.b, e.F), o.r[3] }; r.toString = function() { return "() { [native code] }" } , e.r[3] = r } , w.prototype.e = function(e) { switch (this.t) { case 0: for (var t = {}, n = 0; n < this.i; n++) { var r = e.f.pop(); t[e.f.pop()] = r } e.r[this.s] = t; break; case 1: for (var o = [], i = 0; i < this.i; i++) o.unshift(e.f.pop()); e.r[this.s] = o } } , G.prototype.D = function(e) { console.log(window.atob(e)); for (var t = window.atob(e), n = t.charCodeAt(0) << 8 | t.charCodeAt(1), r = [], o = 2; o < n + 2; o += 2) r.push(t.charCodeAt(o) << 8 | t.charCodeAt(o + 1)); this.G = r; for (var i = [], a = n + 2; a < t.length; ) { var c = t.charCodeAt(a) << 8 | t.charCodeAt(a + 1) , s = t.slice(a + 2, a + 2 + c); i.push(s), a += c + 2 } this.b = i } , G.prototype.v = function(e, t, n) { for (t = t || 0, n = n || [], this.C = t, "string" == typeof e ? this.D(e) : (this.G = e, this.b = n), this.o = !0, this.R = Date.now(); this.o; ) { var r = this.G[this.C++]; if ("number" != typeof r) break; var o = Date.now(); if (500 < o - this.R) return; this.R = o; try { this.e(r) } catch (e) { this.U = e, this.w && (this.C = this.w) } } } , G.prototype.e = function(e) { var t = (61440 & e) >> 12; new this.J[t](e).e(this) } , "undefined" != typeof window && (new G).v("AxjgB5MAnACoAJwBpAAAABAAIAKcAqgAMAq0AzRJZAZwUpwCqACQACACGAKcBKAAIAOcBagAIAQYAjAUGgKcBqFAuAc5hTSHZAZwqrAIGgA0QJEAJAAYAzAUGgOcCaFANRQ0R2QGcOKwChoANECRACQAsAuQABgDnAmgAJwMgAGcDYwFEAAzBmAGcSqwDhoANECRACQAGAKcD6AAGgKcEKFANEcYApwRoAAxB2AGcXKwEhoANECRACQAGAKcE6AAGgKcFKFANEdkBnGqsBUaADRAkQAkABgCnBagAGAGcdKwFxoANECRACQAGAKcGKAAYAZx+rAZGgA0QJEAJAAYA5waoABgBnIisBsaADRAkQAkABgCnBygABoCnB2hQDRHZAZyWrAeGgA0QJEAJAAYBJwfoAAwFGAGcoawIBoANECRACQAGAOQALAJkAAYBJwfgAlsBnK+sCEaADRAkQAkABgDkACwGpAAGAScH4AJbAZy9rAiGgA0QJEAJACwI5AAGAScH6AAkACcJKgAnCWgAJwmoACcJ4AFnA2MBRAAMw5gBnNasCgaADRAkQAkABgBEio0R5EAJAGwKSAFGACcKqAAEgM0RCQGGAYSATRFZAZzshgAtCs0QCQAGAYSAjRFZAZz1hgAtCw0QCQAEAAgB7AtIAgYAJwqoAASATRBJAkYCRIANEZkBnYqEAgaBxQBOYAoBxQEOYQ0giQKGAmQABgAnC6ABRgBGgo0UhD/MQ8zECALEAgaBxQBOYAoBxQEOYQ0gpEAJAoYARoKNFIQ/zEPkAAgChgLGgkUATmBkgAaAJwuhAUaCjdQFAg5kTSTJAsQCBoHFAE5gCgHFAQ5hDSCkQAkChgBGgo0UhD/MQ+QACAKGAsaCRQCOYGSABoAnC6EBRoKN1AUEDmRNJMkCxgFGgsUPzmPkgAaCJwvhAU0wCQFGAUaCxQGOZISPzZPkQAaCJwvhAU0wCQFGAUaCxQMOZISPzZPkQAaCJwvhAU0wCQFGAUaCxQSOZISPzZPkQAaCJwvhAU0wCQFGAkSAzRBJAlz/B4FUAAAAwUYIAAIBSITFQkTERwABi0GHxITAAAJLwMSGRsXHxMZAAk0Fw8HFh4NAwUABhU1EBceDwAENBcUEAAGNBkTGRcBAAFKAAkvHg4PKz4aEwIAAUsACDIVHB0QEQ4YAAsuAzs7AAoPKToKDgAHMx8SGQUvMQABSAALORoVGCQgERcCAxoACAU3ABEXAgMaAAsFGDcAERcCAxoUCgABSQAGOA8LGBsPAAYYLwsYGw8AAU4ABD8QHAUAAU8ABSkbCQ4BAAFMAAktCh8eDgMHCw8AAU0ADT4TGjQsGQMaFA0FHhkAFz4TGjQsGQMaFA0FHhk1NBkCHgUbGBEPAAFCABg9GgkjIAEmOgUHDQ8eFSU5DggJAwEcAwUAAUMAAUAAAUEADQEtFw0FBwtdWxQTGSAACBwrAxUPBR4ZAAkqGgUDAwMVEQ0ACC4DJD8eAx8RAAQ5GhUYAAFGAAAABjYRExELBAACWhgAAVoAQAg/PTw0NxcQPCQ5C3JZEBs9fkcnDRcUAXZia0Q4EhQgXHojMBY3MWVCNT0uDhMXcGQ7AUFPHigkQUwQFkhaAkEACjkTEQspNBMZPC0ABjkTEQsrLQ=="); var b = function(e) { return __g._encrypt(encodeURIComponent(e)) }; exports.ENCRYPT_VERSION = A, exports.default = b function b(e) { console.log(e); console.log(encodeURIComponent(e)); return __g._encrypt(encodeURIComponent(e)) };

你可能感兴趣的:(python,爬虫,python,爬虫,知乎网,加密解密)