Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)

前言

在这里我就不再一一介绍每个步骤的具体操作了,因为在爬取老版今日头条数据的时候都已经讲的非常清楚了,所以在这里我只会在重点上讲述这个是这么实现的,如果想要看具体步骤请先去看我今日头条的文章内容,里面有非常详细的介绍以及是怎么找到加密js代码和api接口。

Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法

破解as和cp参数

今日头条某用户的链接:https://www.toutiao.com/c/user/token/MS4wLjABAAAAaezOXkHVr0_i2JvWXprb4zLGpRInnKStptFm5WsXHKU/#mid=50044041847

直接搜索getHoney关键字(搜索关键字尽量用字多的参数),发现as和cp还是由getHoney函数生成的,不多说,直接拿过来逆向。
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第1张图片
JS代码:

e.getHoney = function() {
    var i = Math.floor((new Date).getTime() / 1e3)
      , e = i.toString(16).toUpperCase()
      , t = md5(i).toString().toUpperCase();
    if (8 != e.length)
        return {
            as: "479BB4B7254C150",
            cp: "7E0AC8874BB0985"
        };
    for (var o = t.slice(0, 5), n = t.slice(-5), a = "", s = 0; 5 > s; s++)
        a += o[s] + e[s];
    for (var r = "", c = 0; 5 > c; c++)
        r += e[c + 3] + n[c];
    return {
        as: "A1" + a + e.slice(-3),
        cp: e.slice(0, 3) + r + "E1"
    }
}

Python代码:

import time
import hashlib

def getHoney():
    i = int(time.time())
    e = '%x'.upper() % i
    md5 = hashlib.md5()
    md5.update(e.encode('utf-8'))
    t = md5.hexdigest()
    if 8 != len(t):
        return {
            'as':'479BB4B7254C150',
            'cp':'7E0AC8874BB0985'
        }
    o = t[0:5]
    n = t[:-5]
    a = ''
    r = ''
    for i in range(5):
        a += o[i] + e[i]
        r += e[i+3] + n[i]
    return {
        'as':'A1' + a + e[:-3],
        'cp':e[0:3] + r + 'E1'
    }

破解_signature参数

搜索_signature关键词。
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第2张图片
得知_signature是由utils.tacSign(o.url,o.data)生成的,先找到tacSign函数。
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第3张图片
通过调式得知返回值i是由window.byted_acrawler.sign(o)返回的,o={url:e}e=o.urlo.url=链接地址(无_signature参数),如:https://www.toutiao.com/c/user/article/?page_type=1&user_id=4492956276&max_behot_time=0&count=20&as=A1251EFFB42ADCE&cp=5EF4AA0D2C4E8E1,所以我们只需要找到window.byted_acrawler.sign()就可以了,通过控制台,直接把代码找到,然后点击一下就能找到代码所在了。
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第4张图片
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第5张图片
发现window.byted_acrawler.sign()函数是由这里的代码生成的,我们把他直接扣下来。
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第6张图片
sign.js

var _typeof = "function" == typeof Symbol && "symbol" == typeof Symbol.iterator ? function(f) {
    return typeof f
}
: function(f) {
    return f && "function" == typeof Symbol && f.constructor === Symbol && f !== Symbol.prototype ? "symbol" : typeof f
}
;
TAC = function() {
    function f(f, a, b, d, c, r) {
        null == r && (r = this);
        var n, i, o = {}, l = o.d = c ? c.d + 1 : 0;
        for (o["$" + l] = o,
        i = 0; i < l; i++)
            o[n = "$" + i] = c[n];
        for (i = 0,
        l = o.length = d.length; i < l; i++)
            o[i] = d[i];
        return e(f, a, b, o, r)[1]
    }
    function e(r, o, l, t, v, y) {
        function h(f) {
            S[++A] = f
        }
        function k() {
            return S[A--]
        }
        function m(f, e) {
            for (var a = b, d = "", c = 0; c < f.length; c++) {
                var r = f.charCodeAt(c);
                d += String.fromCharCode(a ^ r),
                a = (a << 1) + c + e + 1 + (a >> 1) & 255
            }
            return d
        }
        null == v && (v = this);
        var g, C, x, I, S = [], A = 0;
        y && (g = y);
        for (var w = o + 2 * l; o < w; ) {
            var z = 13 * i(r, o) % 241;
            if (o += 2,
            0 == (3 & z))
                if (0 == (3 & (z >>= 2))) {
                    if (0 == (z >>= 2))
                        return [1, S[A--]];
                    if (2 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        I = oprand[1],
                        S[++A] = +I;
                    else if (4 == z)
                        g = S[A--],
                        S[A] = S[A] * g;
                    else if (6 == z)
                        g = S[A--],
                        S[A] = S[A] != g;
                    else if (13 == z)
                        C = S[A--],
                        x = S[A--],
                        (I = S[A--]).x === e ? S[++A] = f(r, I.pc, I.len, C, I.z, x) : S[++A] = I.apply(x, C);
                    else {
                        if (15 != z)
                            break;
                        oprand = n(r, o),
                        I = oprand[1],
                        S[A] = function(a, b) {
                            var d = function e() {
                                var a = arguments;
                                return f(r, e.pc, e.len, a, e.z, this)
                            };
                            return d.pc = a,
                            d.len = b,
                            d.x = e,
                            d.z = t,
                            d
                        }(o + 6, I - 4),
                        o += 2 * I - 2
                    }
                } else if (1 == (3 & z))
                    if (3 == (z >>= 2))
                        g = S[--A],
                        S[A] = g(S[A + 1]);
                    else if (5 == z)
                        S[A -= 1] = S[A][S[A + 1]];
                    else if (7 == z)
                        S[A] = --S[A];
                    else {
                        if (9 != z)
                            break;
                        g = S[A--],
                        S[A] = typeof g
                    }
                else if (2 == (3 & z))
                    if (6 == (z >>= 2))
                        S[A] = u(S[A]);
                    else if (8 == z)
                        g = S[A--],
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        S[A--][m(a[oprand[1]], oprand[1])] = g;
                    else {
                        if (10 != z) {
                            if (12 == z)
                                throw S[A--];
                            break
                        }
                        S[A] = ~S[A]
                    }
                else if (0 == (z >>= 2))
                    S[++A] = null;
                else if (2 == z)
                    g = S[A--],
                    S[A] = S[A] >= g;
                else if (9 == z)
                    g = k(),
                    C = k(),
                    t[0] = 65599 * t[0] + t[g].charCodeAt(C) >>> 0;
                else if (11 == z)
                    S[++A] = void 0;
                else {
                    if (13 != z)
                        break;
                    g = S[A--],
                    S[A] = S[A] && g
                }
            else if (1 == (3 & z))
                if (0 == (3 & (z >>= 2))) {
                    if (4 == (z >>= 2)) {
                        oprand = n(r, o),
                        I = oprand[1];
                        try {
                            if (d[c][2] = 1,
                            1 == (g = e(r, o + 6, I - 4, t, v))[0])
                                return g
                        } catch (y) {
                            if (d[c] && d[c][1] && 1 == (g = e(r, d[c][1][0], d[c][1][1], t, v, y))[0])
                                return g
                        } finally {
                            if (d[c] && d[c][0] && 1 == (g = e(r, d[c][0][0], d[c][0][1], t, v))[0])
                                return g;
                            d[c] = 0,
                            c--
                        }
                        o += 2 * I - 2
                    } else if (6 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        I = oprand[1],
                        S[A -= I] = p("x,y", "return new x[y](" + Array(I + 1).join(",x[++y]").substr(1) + ")")(S, A);
                    else if (8 == z)
                        g = S[A--],
                        S[A] = S[A] & g;
                    else if (10 != z)
                        break
                } else if (1 == (3 & z))
                    if (0 == (z >>= 2))
                        S[A] = !S[A];
                    else if (7 == z)
                        C = S[A--],
                        g = delete S[A--][C];
                    else if (9 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        S[A] = S[A][m(a[oprand[1]], oprand[1])];
                    else {
                        if (11 != z)
                            break;
                        g = S[A--],
                        S[A] = S[A] << g
                    }
                else if (2 == (3 & z))
                    if (1 == (z >>= 2))
                        S[++A] = g;
                    else if (3 == z)
                        g = S[A--],
                        S[A] = S[A] <= g;
                    else if (10 == z)
                        g = S[A -= 2][S[A + 1]] = S[A + 2],
                        A--;
                    else if (12 == z)
                        g = S[A],
                        S[++A] = g;
                    else {
                        if (14 != z)
                            break;
                        g = S[A--],
                        S[A] = S[A] || g
                    }
                else if (0 == (z >>= 2))
                    S[A] = !S[A];
                else if (2 == z)
                    oprand = n(r, o),
                    o += 2 * (I = oprand[1]) - 2;
                else if (4 == z)
                    g = S[A--],
                    S[A] = S[A] / g;
                else if (6 == z)
                    g = S[A--],
                    S[A] = S[A] !== g;
                else {
                    if (13 != z)
                        break;
                    S[++A] = v
                }
            else if (2 == (3 & z))
                if (0 == (3 & (z >>= 2)))
                    if (1 == (z >>= 2))
                        g = S[A--],
                        S[A] = S[A] > g;
                    else if (8 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        I = oprand[1],
                        C = A + 1,
                        S[A -= I - 1] = I ? S.slice(A, C) : [];
                    else if (10 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        I = oprand[1],
                        g = S[A--],
                        t[I] = g;
                    else {
                        if (12 != z)
                            break;
                        g = S[A--],
                        S[A] = S[A] >> g
                    }
                else if (1 == (3 & z))
                    if (0 == (z >>= 2))
                        S[++A] = s;
                    else if (2 == z)
                        g = S[A--],
                        S[A] = S[A] + g;
                    else if (4 == z)
                        g = S[A--],
                        S[A] = S[A] == g;
                    else if (11 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        I = oprand[1],
                        S[--A] = p("x,y", "return x " + m(a[I], I) + " y")(S[A], S[A + 1]);
                    else {
                        if (13 != z)
                            break;
                        g = S[A - 1],
                        C = S[A],
                        S[++A] = g,
                        S[++A] = C
                    }
                else if (2 == (3 & z))
                    if (1 == (z >>= 2))
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        S[++A] = m(a[oprand[1]], oprand[1]);
                    else if (3 == z)
                        S[A--] ? o += 6 : (oprand = n(r, o),
                        o += 2 * (I = oprand[1]) - 2);
                    else if (5 == z)
                        g = S[A--],
                        S[A] = S[A] % g;
                    else if (7 == z)
                        g = S[A--],
                        S[A] = S[A]instanceof g;
                    else {
                        if (14 != z)
                            break;
                        S[++A] = !1
                    }
                else if (4 == (z >>= 2))
                    oprand = n(r, o),
                    I = oprand[1],
                    d[c][0] && !d[c][2] ? d[c][1] = [o + 6, I - 4] : d[c++] = [0, [o + 6, I - 4], 0],
                    o += 2 * I - 2;
                else if (6 == z)
                    oprand = n(r, o),
                    o += 2 * oprand[0],
                    I = oprand[1],
                    S[++A] = t["$" + I];
                else {
                    if (8 != z)
                        break;
                    g = S[A--],
                    S[A] = S[A] | g
                }
            else if (0 == (3 & (z >>= 2)))
                if (1 == (z >>= 2))
                    oprand = n(r, o),
                    o += 2 * oprand[0],
                    I = oprand[1],
                    S[++A] = +m(a[I], I);
                else if (3 == z)
                    g = S[A--],
                    S[A] = S[A] - g;
                else if (5 == z)
                    g = S[A--],
                    S[A] = S[A] === g;
                else if (12 == z)
                    C = S[A--],
                    x = S[A--],
                    (I = S[A--]).x === e ? S[++A] = f(r, I.pc, I.len, C, I.z, x) : S[++A] = I.apply(x, C);
                else {
                    if (14 != z)
                        break;
                    g = S[A],
                    S[A] = S[A - 1],
                    S[A - 1] = g
                }
            else if (1 == (3 & z))
                if (2 == (z >>= 2))
                    h(function(f) {
                        var e = 0
                          , a = f.length;
                        return function() {
                            var b = e < a;
                            b && h(f[e++]),
                            h(b)
                        }
                    }(S[A]));
                else if (4 == z)
                    oprand = n(r, o),
                    o += 2 * oprand[0],
                    I = oprand[1],
                    g = t[I],
                    S[++A] = g;
                else if (6 == z)
                    S[A] = ++S[A];
                else {
                    if (8 != z)
                        break;
                    g = S[A--],
                    S[A] = S[A]in g
                }
            else if (2 == (3 & z))
                if (5 == (z >>= 2))
                    ;
                else if (7 == z)
                    g = S[A--];
                else if (9 == z)
                    g = S[A--],
                    S[A] = S[A] ^ g;
                else {
                    if (11 != z)
                        break;
                    oprand = n(r, o),
                    I = oprand[1],
                    d[++c] = [[o + 6, I - 4], 0, 0],
                    o += 2 * I - 2
                }
            else if (1 == (z >>= 2))
                g = S[A--],
                S[A] = S[A] < g;
            else if (8 == z)
                oprand = n(r, o),
                o += 2 * oprand[0],
                I = oprand[1],
                S[A] = S[A][I];
            else if (10 == z)
                S[++A] = !0;
            else {
                if (12 != z)
                    break;
                g = S[A--],
                S[A] = S[A] >>> g
            }
        }
        return [0, null]
    }
    var a = []
      , b = 0
      , d = []
      , c = 0
      , r = function(f, e) {
        var a = "" + f[e++] + f[e];
        return parseInt(a, 16)
    }
      , n = function(f, e) {
        var a = f[e++]
          , b = f[e]
          , d = parseInt("" + a + b, 16);
        if (d >> 7 == 0)
            return d >> 6 != 0 && (d = -64 | 63 & d),
            [1, d];
        if (d >> 6 == 2) {
            var c = parseInt("" + f[++e] + f[++e], 16);
            return 0 != (32 & d) ? d = -32 | 31 & d : d &= 31,
            d <<= 8,
            c = d + c,
            [2, c]
        }
        if (d >> 6 == 3) {
            var r = parseInt("" + f[++e] + f[++e], 16)
              , n = parseInt("" + f[++e] + f[++e], 16);
            return 0 != (32 & d) ? d = -32 | 31 & d : d &= 31,
            d <<= 16,
            r <<= 8,
            n = d + r + n,
            [3, n]
        }
    }
      , i = function(f, e) {
        var a = f[e++]
          , b = f[e];
        return parseInt("" + a + b, 16)
    }
      , o = function(f, e) {
        var a = "" + f[e++] + f[e];
        return a = parseInt(a, 16),
        String.fromCharCode(a)
    }
      , l = function(f, e, a) {
        for (var b = "", d = 0; d < a; d++)
            b += o(f, e),
            e += 2;
        return b
    }
      , t = function(f, e, b) {
        for (var d = 0; d < b; d++) {
            var c = n(f, e);
            e += 2 * c[0];
            var r = l(f, e, c[1]);
            a.push(r),
            e += 2 * c[1]
        }
    }
      , s = this
      , p = s.Function
      , u = Object.keys || function(f) {
        var e = {}
          , a = 0;
        for (var b in f)
            e[a++] = b;
        return e.length = a,
        e
    }
    ;
    return function(e) {
        e.length;
        for (var d = 0, c = "", i = d; i < d + 16; )
            c += o(e, i),
            i += 2;
        if ("HNOJ@?RC" != c)
            throw new Error("error magic number " + c);
        n(e, d += 16);
        d += 8,
        b = 0;
        for (var l = 0; l < 4; l++) {
            var s = r(e, d + 2 * l);
            b += (3 & s) << 2 * l
        }
        d += 16;
        var p = n(e, d += 16)
          , u = p[1]
          , v = d += 2 * p[0];
        d += p[1];
        var y = n(e, d);
        y[1];
        d += 2 * y[0],
        a = [],
        t(e, d, y[1]),
        f(e, v, u, [])
    }
}(),
TAC("", []);

放到浏览器里面实验一下,看看能不能成功。
index.html

<script src="sign.js"><script>

发现没有问题。
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第7张图片
接下来就直接实现他吧 ,先用nodejs执行他一下,看看能不能获取到值。
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第8张图片
发现提示没有window对象,那么我们给他一个window=global
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第9张图片
这个时候提示没有hrefhrefdocument.href,那么我们还需要定义document对象,这些需要你参照浏览器跟编辑器的提示和调式一个一个得出来,这里过程省略,直接给你们上代码。

window = global;
var document = new Object();
var params = {
    location:{
        hash: "",
        host: "localhost:63342",
        hostname: "localhost",
        href: "http://localhost:63342/SpiderTest/index.html?_ijt=cbm25vhb9cva9uad3qdo901n7u",
        origin: "http://localhost:63342",
        pathname: "/SpiderTest/index.html",
        port: "63342",
        protocol: "http:",
        search: "?_ijt=cbm25vhb9cva9uad3qdo901n7u"
    },
    navigator:{
        appCodeName: "Mozilla",
        appName: "Netscape",
        appVersion: "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36",
        cookieEnabled: true,
        deviceMemory: 8,
        doNotTrack: null,
        hardwareConcurrency: 4,
        language: "zh-CN",
        languages: ["zh-CN", "zh"],
        maxTouchPoints: 0,
        onLine: true,
        platform: "Win32",
        product: "Gecko",
        productSub: "20030107",
        userAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36",
        vendor: "Google Inc.",
        vendorSub: "",
    }
};

Object.assign(window,params);
Object.assign(document,param);
document.cookie = "";

var _typeof = "function" == typeof Symbol && "symbol" == typeof Symbol.iterator ? function(f) {
    return typeof f
}
: function(f) {
    return f && "function" == typeof Symbol && f.constructor === Symbol && f !== Symbol.prototype ? "symbol" : typeof f
}
;
TAC = function() {
    function f(f, a, b, d, c, r) {
        null == r && (r = this);
        var n, i, o = {}, l = o.d = c ? c.d + 1 : 0;
        for (o["$" + l] = o,
        i = 0; i < l; i++)
            o[n = "$" + i] = c[n];
        for (i = 0,
        l = o.length = d.length; i < l; i++)
            o[i] = d[i];
        return e(f, a, b, o, r)[1]
    }
    function e(r, o, l, t, v, y) {
        function h(f) {
            S[++A] = f
        }
        function k() {
            return S[A--]
        }
        function m(f, e) {
            for (var a = b, d = "", c = 0; c < f.length; c++) {
                var r = f.charCodeAt(c);
                d += String.fromCharCode(a ^ r),
                a = (a << 1) + c + e + 1 + (a >> 1) & 255
            }
            return d
        }
        null == v && (v = this);
        var g, C, x, I, S = [], A = 0;
        y && (g = y);
        for (var w = o + 2 * l; o < w; ) {
            var z = 13 * i(r, o) % 241;
            if (o += 2,
            0 == (3 & z))
                if (0 == (3 & (z >>= 2))) {
                    if (0 == (z >>= 2))
                        return [1, S[A--]];
                    if (2 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        I = oprand[1],
                        S[++A] = +I;
                    else if (4 == z)
                        g = S[A--],
                        S[A] = S[A] * g;
                    else if (6 == z)
                        g = S[A--],
                        S[A] = S[A] != g;
                    else if (13 == z)
                        C = S[A--],
                        x = S[A--],
                        (I = S[A--]).x === e ? S[++A] = f(r, I.pc, I.len, C, I.z, x) : S[++A] = I.apply(x, C);
                    else {
                        if (15 != z)
                            break;
                        oprand = n(r, o),
                        I = oprand[1],
                        S[A] = function(a, b) {
                            var d = function e() {
                                var a = arguments;
                                return f(r, e.pc, e.len, a, e.z, this)
                            };
                            return d.pc = a,
                            d.len = b,
                            d.x = e,
                            d.z = t,
                            d
                        }(o + 6, I - 4),
                        o += 2 * I - 2
                    }
                } else if (1 == (3 & z))
                    if (3 == (z >>= 2))
                        g = S[--A],
                        S[A] = g(S[A + 1]);
                    else if (5 == z)
                        S[A -= 1] = S[A][S[A + 1]];
                    else if (7 == z)
                        S[A] = --S[A];
                    else {
                        if (9 != z)
                            break;
                        g = S[A--],
                        S[A] = typeof g
                    }
                else if (2 == (3 & z))
                    if (6 == (z >>= 2))
                        S[A] = u(S[A]);
                    else if (8 == z)
                        g = S[A--],
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        S[A--][m(a[oprand[1]], oprand[1])] = g;
                    else {
                        if (10 != z) {
                            if (12 == z)
                                throw S[A--];
                            break
                        }
                        S[A] = ~S[A]
                    }
                else if (0 == (z >>= 2))
                    S[++A] = null;
                else if (2 == z)
                    g = S[A--],
                    S[A] = S[A] >= g;
                else if (9 == z)
                    g = k(),
                    C = k(),
                    t[0] = 65599 * t[0] + t[g].charCodeAt(C) >>> 0;
                else if (11 == z)
                    S[++A] = void 0;
                else {
                    if (13 != z)
                        break;
                    g = S[A--],
                    S[A] = S[A] && g
                }
            else if (1 == (3 & z))
                if (0 == (3 & (z >>= 2))) {
                    if (4 == (z >>= 2)) {
                        oprand = n(r, o),
                        I = oprand[1];
                        try {
                            if (d[c][2] = 1,
                            1 == (g = e(r, o + 6, I - 4, t, v))[0])
                                return g
                        } catch (y) {
                            if (d[c] && d[c][1] && 1 == (g = e(r, d[c][1][0], d[c][1][1], t, v, y))[0])
                                return g
                        } finally {
                            if (d[c] && d[c][0] && 1 == (g = e(r, d[c][0][0], d[c][0][1], t, v))[0])
                                return g;
                            d[c] = 0,
                            c--
                        }
                        o += 2 * I - 2
                    } else if (6 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        I = oprand[1],
                        S[A -= I] = p("x,y", "return new x[y](" + Array(I + 1).join(",x[++y]").substr(1) + ")")(S, A);
                    else if (8 == z)
                        g = S[A--],
                        S[A] = S[A] & g;
                    else if (10 != z)
                        break
                } else if (1 == (3 & z))
                    if (0 == (z >>= 2))
                        S[A] = !S[A];
                    else if (7 == z)
                        C = S[A--],
                        g = delete S[A--][C];
                    else if (9 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        S[A] = S[A][m(a[oprand[1]], oprand[1])];
                    else {
                        if (11 != z)
                            break;
                        g = S[A--],
                        S[A] = S[A] << g
                    }
                else if (2 == (3 & z))
                    if (1 == (z >>= 2))
                        S[++A] = g;
                    else if (3 == z)
                        g = S[A--],
                        S[A] = S[A] <= g;
                    else if (10 == z)
                        g = S[A -= 2][S[A + 1]] = S[A + 2],
                        A--;
                    else if (12 == z)
                        g = S[A],
                        S[++A] = g;
                    else {
                        if (14 != z)
                            break;
                        g = S[A--],
                        S[A] = S[A] || g
                    }
                else if (0 == (z >>= 2))
                    S[A] = !S[A];
                else if (2 == z)
                    oprand = n(r, o),
                    o += 2 * (I = oprand[1]) - 2;
                else if (4 == z)
                    g = S[A--],
                    S[A] = S[A] / g;
                else if (6 == z)
                    g = S[A--],
                    S[A] = S[A] !== g;
                else {
                    if (13 != z)
                        break;
                    S[++A] = v
                }
            else if (2 == (3 & z))
                if (0 == (3 & (z >>= 2)))
                    if (1 == (z >>= 2))
                        g = S[A--],
                        S[A] = S[A] > g;
                    else if (8 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        I = oprand[1],
                        C = A + 1,
                        S[A -= I - 1] = I ? S.slice(A, C) : [];
                    else if (10 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        I = oprand[1],
                        g = S[A--],
                        t[I] = g;
                    else {
                        if (12 != z)
                            break;
                        g = S[A--],
                        S[A] = S[A] >> g
                    }
                else if (1 == (3 & z))
                    if (0 == (z >>= 2))
                        S[++A] = s;
                    else if (2 == z)
                        g = S[A--],
                        S[A] = S[A] + g;
                    else if (4 == z)
                        g = S[A--],
                        S[A] = S[A] == g;
                    else if (11 == z)
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        I = oprand[1],
                        S[--A] = p("x,y", "return x " + m(a[I], I) + " y")(S[A], S[A + 1]);
                    else {
                        if (13 != z)
                            break;
                        g = S[A - 1],
                        C = S[A],
                        S[++A] = g,
                        S[++A] = C
                    }
                else if (2 == (3 & z))
                    if (1 == (z >>= 2))
                        oprand = n(r, o),
                        o += 2 * oprand[0],
                        S[++A] = m(a[oprand[1]], oprand[1]);
                    else if (3 == z)
                        S[A--] ? o += 6 : (oprand = n(r, o),
                        o += 2 * (I = oprand[1]) - 2);
                    else if (5 == z)
                        g = S[A--],
                        S[A] = S[A] % g;
                    else if (7 == z)
                        g = S[A--],
                        S[A] = S[A]instanceof g;
                    else {
                        if (14 != z)
                            break;
                        S[++A] = !1
                    }
                else if (4 == (z >>= 2))
                    oprand = n(r, o),
                    I = oprand[1],
                    d[c][0] && !d[c][2] ? d[c][1] = [o + 6, I - 4] : d[c++] = [0, [o + 6, I - 4], 0],
                    o += 2 * I - 2;
                else if (6 == z)
                    oprand = n(r, o),
                    o += 2 * oprand[0],
                    I = oprand[1],
                    S[++A] = t["$" + I];
                else {
                    if (8 != z)
                        break;
                    g = S[A--],
                    S[A] = S[A] | g
                }
            else if (0 == (3 & (z >>= 2)))
                if (1 == (z >>= 2))
                    oprand = n(r, o),
                    o += 2 * oprand[0],
                    I = oprand[1],
                    S[++A] = +m(a[I], I);
                else if (3 == z)
                    g = S[A--],
                    S[A] = S[A] - g;
                else if (5 == z)
                    g = S[A--],
                    S[A] = S[A] === g;
                else if (12 == z)
                    C = S[A--],
                    x = S[A--],
                    (I = S[A--]).x === e ? S[++A] = f(r, I.pc, I.len, C, I.z, x) : S[++A] = I.apply(x, C);
                else {
                    if (14 != z)
                        break;
                    g = S[A],
                    S[A] = S[A - 1],
                    S[A - 1] = g
                }
            else if (1 == (3 & z))
                if (2 == (z >>= 2))
                    h(function(f) {
                        var e = 0
                          , a = f.length;
                        return function() {
                            var b = e < a;
                            b && h(f[e++]),
                            h(b)
                        }
                    }(S[A]));
                else if (4 == z)
                    oprand = n(r, o),
                    o += 2 * oprand[0],
                    I = oprand[1],
                    g = t[I],
                    S[++A] = g;
                else if (6 == z)
                    S[A] = ++S[A];
                else {
                    if (8 != z)
                        break;
                    g = S[A--],
                    S[A] = S[A]in g
                }
            else if (2 == (3 & z))
                if (5 == (z >>= 2))
                    ;
                else if (7 == z)
                    g = S[A--];
                else if (9 == z)
                    g = S[A--],
                    S[A] = S[A] ^ g;
                else {
                    if (11 != z)
                        break;
                    oprand = n(r, o),
                    I = oprand[1],
                    d[++c] = [[o + 6, I - 4], 0, 0],
                    o += 2 * I - 2
                }
            else if (1 == (z >>= 2))
                g = S[A--],
                S[A] = S[A] < g;
            else if (8 == z)
                oprand = n(r, o),
                o += 2 * oprand[0],
                I = oprand[1],
                S[A] = S[A][I];
            else if (10 == z)
                S[++A] = !0;
            else {
                if (12 != z)
                    break;
                g = S[A--],
                S[A] = S[A] >>> g
            }
        }
        return [0, null]
    }
    var a = []
      , b = 0
      , d = []
      , c = 0
      , r = function(f, e) {
        var a = "" + f[e++] + f[e];
        return parseInt(a, 16)
    }
      , n = function(f, e) {
        var a = f[e++]
          , b = f[e]
          , d = parseInt("" + a + b, 16);
        if (d >> 7 == 0)
            return d >> 6 != 0 && (d = -64 | 63 & d),
            [1, d];
        if (d >> 6 == 2) {
            var c = parseInt("" + f[++e] + f[++e], 16);
            return 0 != (32 & d) ? d = -32 | 31 & d : d &= 31,
            d <<= 8,
            c = d + c,
            [2, c]
        }
        if (d >> 6 == 3) {
            var r = parseInt("" + f[++e] + f[++e], 16)
              , n = parseInt("" + f[++e] + f[++e], 16);
            return 0 != (32 & d) ? d = -32 | 31 & d : d &= 31,
            d <<= 16,
            r <<= 8,
            n = d + r + n,
            [3, n]
        }
    }
      , i = function(f, e) {
        var a = f[e++]
          , b = f[e];
        return parseInt("" + a + b, 16)
    }
      , o = function(f, e) {
        var a = "" + f[e++] + f[e];
        return a = parseInt(a, 16),
        String.fromCharCode(a)
    }
      , l = function(f, e, a) {
        for (var b = "", d = 0; d < a; d++)
            b += o(f, e),
            e += 2;
        return b
    }
      , t = function(f, e, b) {
        for (var d = 0; d < b; d++) {
            var c = n(f, e);
            e += 2 * c[0];
            var r = l(f, e, c[1]);
            a.push(r),
            e += 2 * c[1]
        }
    }
      , s = this
      , p = s.Function
      , u = Object.keys || function(f) {
        var e = {}
          , a = 0;
        for (var b in f)
            e[a++] = b;
        return e.length = a,
        e
    }
    ;
    return function(e) {
        e.length;
        for (var d = 0, c = "", i = d; i < d + 16; )
            c += o(e, i),
            i += 2;
        if ("HNOJ@?RC" != c)
            throw new Error("error magic number " + c);
        n(e, d += 16);
        d += 8,
        b = 0;
        for (var l = 0; l < 4; l++) {
            var s = r(e, d + 2 * l);
            b += (3 & s) << 2 * l
        }
        d += 16;
        var p = n(e, d += 16)
          , u = p[1]
          , v = d += 2 * p[0];
        d += p[1];
        var y = n(e, d);
        y[1];
        d += 2 * y[0],
        a = [],
        t(e, d, y[1]),
        f(e, v, u, [])
    }
}(),
TAC("", []);

//sign = window.byted_acrawler.sign({url:process.argv[2]});		//此处是与Python代码交互的,下面那行去掉,只是测试用的
sign = window.byted_acrawler.sign({url:"https://www.toutiao.com/c/user/article/?page_type=1&user_id=50025817786&max_behot_time=0&count=20&as=A115EEAF9937AD9&cp=5EF937BAFDE96E1"});
console.log(sign);

Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第10张图片
已经成功获取到值了,现在只需要拼接一下就可以了,来吧上Python代码

import os
import time
import math
import hashlib

def getHoney():
    i = math.floor(time.time())
    e = str('%X' % i)
    md5 = hashlib.md5()
    md5.update(str(i).encode('utf-8'))
    t = str(md5.hexdigest()).upper()
    if 8 != len(e):
        return {
            'as':"479BB4B7254C150",
            'cp':"7E0AC8874BB0985"
        }
    o = t[0:5]
    n = t[-5:]
    a = ''
    r = ''
    for i in range(5):
        a += o[i] + e[i]
        r += e[i + 3] + n[i]
    return {
        'as':"A1" + a + e[-3:],
        'cp':e[0:3] + r + "E1"
    }

def get_signature(url):
    sign = os.popen('node sign.js {url}'.format(url='"'+url+'"')).read()
    return url + "&_signature=" + sign

if __name__ == '__main__':
    url = 'https://www.toutiao.com/toutiao/c/user/article/?page_type=1&user_id=50025817786&max_behot_time=0&count=20&as={as}&cp={cp}'.format(**getHoney())
    url = get_signature(url)
    print(url)

Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第11张图片
浏览器查看结果。
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第12张图片
我们发现短的仍然可以使用,但是对比一下官方的请求就知道他的_signature是长的,而我们得到的是短的,这是为什么呢,是因为他带有Cookie,而我们自己用nodejs模拟的没有Cookie,所以得自己添加个Cookie
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第13张图片
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第14张图片
设置Cookie后的JS代码:需要安装jsdom,命令:npm i -g jsdom

const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const dom = new JSDOM(`

Hello world

`
); window = global; var document = dom.window.document; var params = { location:{ hash: "#mid=5954781019", host: "www.toutiao.com", hostname: "www.toutiao.com", href: "https://www.toutiao.com/c/user/token/MS4wLjABAAAAvazHMceCo3MeM9IJbll231AC8GkJDcrd__iZFw2hi4o/#mid=5954781019", origin: "https://www.toutiao.com", pathname: "/c/user/token/MS4wLjABAAAAvazHMceCo3MeM9IJbll231AC8GkJDcrd__iZFw2hi4o/", port: "", protocol: "https:", search: "", }, navigator:{ appCodeName: "Mozilla", appName: "Netscape", appVersion: "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36", cookieEnabled: true, deviceMemory: 8, doNotTrack: null, hardwareConcurrency: 4, language: "zh-CN", languages: ["zh-CN", "zh"], maxTouchPoints: 0, onLine: true, platform: "Win32", product: "Gecko", productSub: "20030107", userAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36", vendor: "Google Inc.", vendorSub: "", }, "screen":{ availHeight: 1040, availLeft: 0, availTop: 0, availWidth: 1920, colorDepth: 24, height: 1080, pixelDepth: 24, width: 1920, } }; Object.assign(window,params); function setCookie(name, value, seconds) { seconds = seconds || 0; var expires = ""; if (seconds != 0 ) { var date = new Date(); date.setTime(date.getTime()+(seconds*1000)); expires = "; expires="+date.toGMTString(); } document.cookie = name+"="+escape(value)+expires+"; path=/"; } cookies = "ttcid=905d4227745d4bc5b1012b141030a03424;WEATHER_CITY=%E5%8C%97%E4%BA%AC;SLARDAR_WEB_ID=2ad88390-74f4-4d16-88c3-adec28511963;csrftoken=a36cc765af3273df681c6f2f75562aa6;tt_webid=6844026241293796877;s_v_web_id=verify_kc1k6wcz_8JdlB4ZI_DPn1_432W_Aeul_xMWqTd1IsSTX;__tasessionId=pj1440jfa1593499051248;tt_webid=6844026241293796877;tt_scid=RM8rNVkofFuscNNX1nWIA4R-3D37E1ppsjECemd.JlDHff9foJSV1v0TdLcHwOV2c364"; for(let cookie of cookies.split(";")){ tmp = cookie.split("="); setCookie(tmp[0],tmp[1],1800); } window.document = document; var _typeof = "function" == typeof Symbol && "symbol" == typeof Symbol.iterator ? function(f) { return typeof f } : function(f) { return f && "function" == typeof Symbol && f.constructor === Symbol && f !== Symbol.prototype ? "symbol" : typeof f } ; TAC = function() { ... //被限制字数了,所以这里省略了,你们复制之前的JS代码,或者自己按照我的方式把代码扣下来就可以了。 } window.byted_acrawler && window.byted_acrawler.init({ aid: 24, dfp: true }); //sign = window.byted_acrawler.sign({url:process.argv[2]}); //此处是与Python代码交互的,下面那行去掉,只是测试用的 sign = window.byted_acrawler.sign({url:"https://www.toutiao.com/toutiao/c/user/article/?page_type=1&user_id=50025817786&max_behot_time=0&count=20&as=A105BEFFDA2F4FB&cp=5EFA3F44EF5B2E1"}); console.log(sign);

运行后发现有错误提示,没事,注释掉他就可以了。
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第15张图片
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第16张图片
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第17张图片
长的_signature不就出来了吗,你们只需要修改JS文件即可,Python文件不用修改。来看看结果。
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第18张图片
Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第19张图片
Python代码:

import os
import time
import math
import hashlib
import requests

def getHoney():
    i = math.floor(time.time())
    e = str('%X' % i)
    md5 = hashlib.md5()
    md5.update(str(i).encode('utf-8'))
    t = str(md5.hexdigest()).upper()
    if 8 != len(e):
        return {
            'as':"479BB4B7254C150",
            'cp':"7E0AC8874BB0985"
        }
    o = t[0:5]
    n = t[-5:]
    a = ''
    r = ''
    for i in range(5):
        a += o[i] + e[i]
        r += e[i + 3] + n[i]
    return {
        'as':"A1" + a + e[-3:],
        'cp':e[0:3] + r + "E1"
    }

def get_signature(url):
    sign = os.popen('node sign.js {url}'.format(url='"'+url+'"')).read()
    return "&_signature=" + sign

if __name__ == '__main__':
    headers = {
        'Referer':'https://www.toutiao.com/',
        'authority': 'www.toutiao.com',
        'method': 'GET',
        'path': '/c/user/article/?page_type=1&user_id=50025817786&max_behot_time=0&count=20&as=A1353EDF6CD73B8&cp=5EFCE7435B08EE1&_signature=_02B4Z6wo00f01Uqk1TwAAIBCtVsqwfqSjaVKptGAAAxWvpbZsM2s5i5SDopBRRN1gepc-oLFZ8U7Sg2NxBIeNQxLFHV3oh7OToF-gGDJmcow5ga1WH.fUY8D-3hDrs2d48np9GmwK93teS-03c',
        'scheme': 'https',
        'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'zh-CN,zh;q=0.9',
        'cache-control': 'max-age=0',
        'cookie': 'WEATHER_CITY=%E5%8C%97%E4%BA%AC; csrftoken=c4160cf9a6a4d887adb7f784849a0c5e; ttcid=864839082b36468fa1d79a09b162fb7b11; SLARDAR_WEB_ID=bf374b0c-5f61-48cd-8140-77596a21cbb3; tt_webid=6842259247737308686; tt_webid=6842259247737308686; tt_scid=CTNNhi4a9y5GvhqNIYxZxtHrnN7ojIrl4QTuVtVX-VhtAhpBJ4.OsRum6BSf9d7N5de9',
        'sec-fetch-dest': 'document',
        'sec-fetch-mode': 'navigate',
        'sec-fetch-site': 'none',
        'sec-fetch-user': '?1',
        'upgrade-insecure-requests': '1',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
    }
    base_url = 'https://www.toutiao.com/toutiao'
    param = '/c/user/article/?page_type=1&user_id=50025817786&max_behot_time=0&count=20&as={as}&cp={cp}'.format(**getHoney())
    base_url += param
    signature = get_signature(base_url).replace('\n','')
    path = param + signature
    headers['path'] = path
    url = base_url + signature
    response = requests.get(url=url,headers=headers)
    print(response.text)

Python3爬取今日头条文章视频数据,完美解决as、cp、_signature的加密方法(2020-6-29版)_第20张图片

声明:本文仅供学习交流使用,请勿用于商业用途,违者后果自负。

你可能感兴趣的:(爬虫,Python)