Bot Challenge中的behavior collection分析

很久没有写博客了，主要是最近换了个地打工，开始对一些反自动化的工作开始进行研究；这是一篇学习笔记，欢迎交流~

背景与网站介绍

Bot Challenge是专门的web bot检测的网站：https://bot.incolumitas.com/#botChallenge

该网站对用户行为的检测手段很完整，值得学习；

image

用户行为数据

总体收集的event

this.recordedEvents = ["mousemove", "mousedown", "mouseup", "dblclick", "contextmenu", "scroll", "resize", "keydown", "keyup", "touchstart", "touchmove", "touchcancel", "touchend", "load", "DOMContentLoaded", "visibilitychange", "pagehide", "beforeunload", "unload"],

this.newRecordedEvents = ["copy", "paste", "deviceorientation", "devicemotion"]

this.onlyWindowEvent = ["scroll", "keydown", "keyup", "resize", "copy", "paste", "deviceorientation", "devicemotion", "visibilitychange", "load", "DOMContentLoaded", "pagehide", "beforeunload", "unload"],

this.recordNewEvents && (this.recordedEvents = this.recordedEvents.concat(this.newRecordedEvents))

具体行为收集可以分为以下几类，主要分析下收集的具体数据和触发收集的事件

鼠标动作(MouseEvent)

getMetaKeysBitstring: function(e) {
    var t = "";
    return t += !0 === e.ctrlKey ? "1" : "0",
    t += !0 === e.shiftKey ? "1" : "0",
    t += !0 === e.altKey ? "1" : "0",
    t += !0 === e.metaKey ? "1" : "0"
}
getMouseFrame: function(e, t) {
    return [t, e.clientX, e.clientY, e.screenX, e.screenY, e.button, this.getMetaKeysBitstring(e)]
},
mousemoveListener: function(e, t) {
    return e.getMouseFrame(t, "m")
},
mousedownListener: function(e, t) {
    return e.getMouseFrame(t, "md")
},
mouseupListener: function(e, t) {
    return e.getMouseFrame(t, "mu")
},
dblclickListener: function(e, t) {
    return e.getMouseFrame(t, "dc")
},
contextmenuListener: function(e, t) {
    return e.getMouseFrame(t, "cm")
},

数据（列表形式，都是以事件名简写打头，后面是收集的具体数据，下同）：

clientX：double（原为long）；鼠标在事件触发时的应用浏览器内的水平坐标
clientY：double（原为long）；鼠标在事件触发时的应用浏览器内的垂直坐标
screenX：double（原为long）；鼠标在事件触发时全局（屏幕）中的水平坐标
screenY：double（原为long）；鼠标在事件触发时全局（屏幕）中的垂直坐标
button: number；代表事件触发时按下的鼠标按键：
- 0：主按键，通常指鼠标左键或默认值（译者注：如document.getElementById('a').click()这样触发就会是默认值）
1：辅助按键，通常指鼠标滚轮中键
2：次按键，通常指鼠标右键
3：第四个按钮，通常指浏览器后退按钮
4：第五个按钮，通常指浏览器的前进按钮
MetaKey：String；收集触发事件时对应按键是否被按下；'0'与'1'组成的字符串

事件：

mousemove：鼠标移动
mousedown：鼠标按钮按下时触发
mouseup：鼠标按钮松开时触发
dblclick：鼠标双击时触发
contextmenu：打开上下文菜单时触发，例如在页面右键打开菜单

键盘动作(KeyboardEvent)

getKeyFrame: function(e, t) {
    return [t, e.code, e.key, e.location, e.repeat, this.getMetaKeysBitstring(e)]
},
keydownListener: function(e, t) {
    return e.getKeyFrame(t, "kd")
},
keyupListener: function(e, t) {
    return e.getKeyFrame(t, "ku")
},

数据：

code：String；键盘上的物理键（与按键生成的字符相对）。换句话说，此属性返回一个值，该值不会被键盘布局或修饰键的状态改变。如QWERTY布局键盘上的“q”键返回的code是“KeyQ”
key: String；返回用户按下的真实逻辑输入。它还与 shiftKey 等调节性按键的状态和键盘的区域 / 和布局有关。
location: unsigned long，表示按键在键盘或其他设备上的位置, 主要针对ctrl/shift等键盘上有多个的按键，以及数字/enter等按键：
- 0: 表示不区分或者无法区分
- 1: 来自左边的ctrl/shift/alt...
- 2: 来自右边的按键
- 3: 来自数字小键盘的按键
- 其他值已废弃
repeat: Bool；如果按键被一直按住，返回值为true
Metakey: 与鼠标事件一样

事件：

keydown：键盘按下触发
keyup：键盘松开触发

触摸动作(TouchEvent)

getTouchFrame: function(e, t) {
    for (var n = [], i = 0; i < e.touches.length; i++) {
        var a = e.touches[i]
          , o = [this.round2(a.clientX), this.round2(a.clientY), this.round2(a.screenX), this.round2(a.screenY), a.identifier];
        this.mobileExperimental && (o = o.concat([this.round2(a.radiusX), this.round2(a.radiusY), a.rotationAngle, a.force])),
        n.push(o)
    }
    return [t, n, this.getMetaKeysBitstring(e)]
},
touchstartListener: function(e, t) {
    return e.getTouchFrame(t, "ts")
},
touchmoveListener: function(e, t) {
    return e.getTouchFrame(t, "tm")
},
touchcancelListener: function(e, t) {
    return e.getTouchFrame(t, "tc")
},
touchendListener: function(e, t) {
    return e.getTouchFrame(t, "te")
},

数据：

touches: List；是一个touchList，一个触摸平面上所有触点的列表。例如，如果一个用户用三根手指接触屏幕（或者触控板），与之对应的 TouchList 会包含每根手指的 [Touch](https://developer.mozilla.org/zh-CN/docs/Web/API/Touch) 对象，总共三个
- touch.clientX/Touch.clientY/Touch.screenX/Touch.screenY：double (之前为long)；同鼠标事件同名属性
- touch.identifier：long；返回一个可以唯一地识别和触摸平面接触的点的值. 这个值在这根手指（或触摸笔等）所引发的所有事件中保持一致, 直到它离开触摸平面；主要是touchmove中

底下的事件将是Experimental功能：

touch.radiusX：float；手指与屏幕接触面的椭圆水平轴半径
touch.radiusY：float；手指与屏幕接触面的椭圆垂直轴半径
touch.rotationAngle: float；返回以度为单位的旋转角. 由radiusX 和 radiusY 描述的正方向的椭圆，通过顺时针旋转这个角度后，能最精确地覆盖住用户和触摸平面的接触面的角度. 这个值可能从0到90
touch.force：float；手指挤压触摸平面的压力大小, 从0.0(没有压力)到1.0(最大压力)

事件：

touchstart: 当用户在触摸平面上放置了一个触点时触发
touchmove: 当用户在触摸平面上移动触点时触发; 当触点的半径、旋转角度以及压力大小发生变化时，也将触发此事件
touchcancel: 当触点由于某些原因被中断时触发。有几种可能的原因如下（具体的原因根据不同的设备和浏览器有所不同）：
- 由于某个事件出现而取消了触摸：例如触摸过程被弹窗打断。
触点离开了文档窗口，而进入了浏览器的界面元素、插件或者其他外部内容区域。
当用户产生的触点个数超过了设备支持的个数，从而导致 [TouchList](https://developer.mozilla.org/zh-CN/docs/Web/API/TouchList) 中最早的 [Touch] 对象被取消。
touchend: 当一个触点被用户从触摸平面上移除（即用户的一个手指或手写笔离开触摸平面）时触发。当触点移出触摸平面的边界时也将触发。例如用户将手指划出屏幕边缘

元素移动相关

scrollListener: function(e, t) {
    return ["s", e.round2(document.scrollingElement.scrollLeft), e.round2(document.scrollingElement.scrollTop)]
}
resizeListener: function(e, t) {
    return ["r", window.innerWidth, window.innerHeight]
},

ScrollEvent：文档视图或者一个元素在滚动时，会触发; 主要是收集滚动条数据
- scrollingElement.scrollLeft：integer(有比例缩放的系统可能为float)；滚动条到最左边的距离
- scrollingElement.scrollTop：integer(有比例缩放的系统可能为float)；滚动条到最顶端的距离
resizeEvent：调整视窗大小时触发该事件
- window.innerWidth：integer；返回以像素为单位的窗口的内部宽度。如果垂直滚动条存在，则这个属性将包括它的宽度。
- window.innerHeight：integer；返回以像素为单位的窗口的内部高度度。如果有水平滚动条，也包括滚动条高度。

页面相关事件

主要是页面加载，tab切换等：

loadListener: function(e, t) {
    return ["lo"]
},
DOMContentLoadedListener: function(e, t) {
    return ["dcl"]
},
visibilitychangeListener: function(e, t) {
    return ["vc", document.visibilityState]
},
pagehideListener: function(e, t) {
    return ["ph", t.persisted]
},
beforeunloadListener: function(e, t) {
    return ["bu"]
},
unloadListener: function(e, t) {
    return ["ul"]
},

load：当整个页面及所有依赖资源如样式表和图片都已完成加载时，将触发
DOMContentLoaded：当纯HTML被完全加载以及解析时，事件会被触发，而不必等待样式表，图片或者子框架完成加载
visibilitychange：当其选项卡的内容变得可见或被隐藏时，会在文档上触发
- document.visibilityState：String；返回document的可见性, 即当前可见元素的上下文环境. 由此可以知道当前文档(即为页面)是在背后, 或是不可见的隐藏的标签页，或者(正在)预渲染.可用的值如下：
  - 'visible' : 此时页面内容至少是部分可见. 即此页面在前景标签页中，并且窗口没有最小化.
'hidden' : 此时页面对用户不可见. 即文档处于背景标签页或者窗口处于最小化状态，或者操作系统正处于 '锁屏状态' .
'prerender' : 页面此时正在渲染中, 因此是不可见的 (considered hidden for purposes of document.hidden). 文档只能从此状态开始，永远不能从其他值变为此状态.注意: 浏览器支持是可选的.
pagehide：当浏览器在隐藏当前页面时, 页面隐藏事件会被发送到一个window 。例如，当用户单击浏览器的“后退”按钮时，当前页面在显示上一页之前会收到一个页面隐藏事件。
- persisted：代表一个页面是否从缓存中加载的，可以判断隐藏页面是否已缓存以进行可能的重用时执行特殊处理
beforeunload：window、document 和它们的资源即将卸载时触发，例如可以弹窗确定是否关闭选项卡
unload：window、document 和它们的资源正在卸载时触发

用户操作相关

Copy & paste

copyListener: function(e, t) {
    var n = document.getSelection()
      , i = ["co"];
    return n && i.push(Math.abs(n.anchorOffset, n.focusOffset)),
    i
},

pasteListener: function(e, t) {
    return ["pa", (t.clipboardData || window.clipboardData).getData("text").length]
},

getSelection：返回一个选中对象
- selection.anchorOffset: integer；返回选中元素在DOM节点中起始位置（按下鼠标）偏移
- selection.focusOffset：integer；返回选中元素在DOM节点中终止位置（松开鼠标）偏移

例子:

abcdefg

若选中该text元素内的"bcd"，则anchorOffset = 1，focusOffset = 3

clipboardData. getData("text").length: integer；粘贴板上字符串长度
Deviceorientation: 设备（指手机，平板等移动设备）在浏览页面时物理旋转的信息；注意safari未实现

deviceorientationListener: function(e, t) {
    if (!(Math.abs(e.rotateDegrees - t.alpha) < 2 || Math.abs(e.leftToRight - t.gamma) < 1 || Math.abs(e.frontToBack - t.beta) < 1)) {
        e.rotateDegrees = t.alpha,
        e.frontToBack = t.beta,
        e.leftToRight = t.gamma;
        t = t.absolute;
        return null !== e.rotateDegrees && null !== e.frontToBack && null !== e.leftToRight ? ["do", e.round2(e.rotateDegrees), e.round2(e.frontToBack), e.round2(e.leftToRight), t] : void 0
    }
},

收集逻辑以1度为精度，若误差小于一度则不记录

alpha：double；一个表示设备绕z轴旋转的角度（范围在0-360之间）的数字
beta：double：一个表示设备绕x轴旋转（范围在－180到180之间）的数字，从前到后的方向为正方向
gamma：double；一个表示设备绕y轴旋转（范围在－90到90之间）的数字，从左向右为正方向。
absolute：boolean；表示该设备是否提供绝对定位数据 (这个数据是关于地球的坐标系) 或者使用了由设备决定的专门的坐标系.
devicemotion：关于设备在浏览页面时的位置和方向的改变速度的信息；同样Safari不支持

devicemotionListener: function(e, t) {
    var n = e.round2(t.acceleration.x)
      , i = e.round2(t.acceleration.y)
      , e = e.round2(t.acceleration.z)
      , t = (t.rotationRate,
    t.interval);
    if (null !== n && null !== i && null !== e && (1 < Math.abs(n) || 1 < Math.abs(i) || 1 < Math.abs(e)))
        return ["dm", n, i, e, t]
}

acceleration.x/acceleration.y/acceleration.z: double；x, y, z方向上的加速度信息
rotationRate.alpha/rotationRate.beta/rotationRate.gamma: double；三个方向上旋转的加速度信息
Interval: integer；返回从底层硬件获取数据的时间间隔（单位：毫秒）。可以使用它来确定运动事件的粒度

其他公共信息

getTimestamp: function() {
    return "performance"in window && "now"in window.performance ? this.round(performance.now(), 3) : (new Date).getTime() - 1e3 * this.startedAt
},

getPassiveSupported: function() {
    let t = !1;
    try {
        var e = {
            get passive() {
                return !(t = !0)
            }
        };
        window.addEventListener("test", null, e),
        window.removeEventListener("test", null, e)
    } catch (e) {
        t = !1
    }
    return t
},

Timestamp：触发时间戳，可以看到此处优先使用window.performance.now()函数
PassiveSupported：用于检查addEventlistener时是否支持使用passive模式：设置为true时，可以优化收集滚屏事件的性能，可查看https://developer.mozilla.org/zh-CN/docs/Web/API/EventTarget/addEventListener#%E4%BD%BF%E7%94%A8_passive_%E6%94%B9%E5%96%84%E7%9A%84%E6%BB%9A%E5%B1%8F%E6%80%A7%E8%83%BD
event.isTrusted：boolean；当事件是由用户行为生成的时候，这个属性的值为 true ，而当事件是由脚本创建、修改、通过 [EventTarget.dispatchEvent()](https://developer.mozilla.org/zh-CN/docs/Web/API/EventTarget/dispatchEvent) 派发的时候，这个属性的值为 false 。

收集

开始recording：

Record接口提供开始行为记录收集

getFrameHandler: function(n, i) {
    return function(e) {
        var t = i(n, e)
          , e = 1 == e.isTrusted ? 1 : 0
          , t = t.concat([e, n.getTimestamp()]);
        n.frames.push(t),
        n.pdFlag && n.frames.length >= n.push_after && (e = new Event("musPushData"),
        window.dispatchEvent(e),
        n.pdFlag = !1),
        n.onFrame && n.onFrame instanceof Function && n.onFrame(t)
    }
},
record: function() {
    if (!this.recording) {
        0 == this.startedAt && (this.startedAt = (new Date).getTime() / 1e3),
        document.scrollingElement && this.frames.push(["s", this.round2(document.scrollingElement.scrollLeft), this.round2(document.scrollingElement.scrollTop), this.getTimestamp()]);
        for (var e = 0; e < this.recordedEvents.length; e++) {
            var t = this.recordedEvents[e]
              , n = "scroll" === t
              , i = null
              , i = this.onlyWindowEvent.includes(t) && this.listenNode !== window ? window : this.listenNode;
            "visibilitychange" === t && (i = document);
            var a = this.passiveSupported ? {
                passive: !0,
                capture: n
            } : n
              , n = this.getFrameHandler(this, this[t + "Listener"]);
            this.eventListenerParams[t] = [i, t, n, a],
            i.addEventListener(t, n, a)
        }
        this.recording = !0
    }
},

本段代码主要用来逐一注册事件的listener(Line27-29):

记录开始时间 (Line 15)
当开始记录时会首先记录一次当前滚动条的位置(Line 16)
addEventListener的capture设置为true是用来阻止事件向上冒泡的，只有对scroll阻止冒泡：例如针对一个iframe开启了scroll listener，该事件不会触发window侧scroll listener(Line19)
onlyWindowEvent主要记录只有window拥有的事件，由于该脚本支持设置监听DOM中某个node的event，所以此时若监听node非window则应该去对应监听window下的事件，即运行到29行时，i == window(Line 21)
优先使用passive模式进行监听(Line 23)
使用了**eventListenerParams**列表来保存了所有监听的事件，用于后续stop，该条值得学习
Line 4 - 5，每次收集都包含的公共信息
可以设置push_after来控制收集多少条信息后触发上报，所有收集的信息没有分类，全部放在frame列表中；触发上报的本质是通过dispatchEvent触发一个事件，该事件的处理函数将发起上报，后面将讲述具体触发上报的时机 (Line 7)
recording设置为1，表示开始数据收集

Stop

stop: function() {
    for (var e in this.finishedAt = (new Date).getTime() / 1e3,
    this.eventListenerParams) {
        var t = this.eventListenerParams[e];
        t[0].removeEventListener(t[1], t[2], t[3])
    }
    this.recording = !1
},

记录下停止的时间后，将record时记录的事件全部remove掉，recording置为0表示当前未收集数据

上报触发时机

以下事件触发时，将发起数据上报；其中"musPushData"事件即为上文描述的主动控制收集多少条数据后进行上报

document.addEventListener("visibilitychange", function(e) {
    "hidden" === document.visibilityState && (t = !0,
    i("vc"))
}),
window.addEventListener("pagehide", function(e) {
    !1 === t && (t = !0,
    i("ph"))
}),
window.addEventListener("beforeunload", function(e) {
    !1 === t && (t = !0,
    i("bu"))
}),
window.addEventListener("unload", function(e) {
    !1 === t && (t = !0,
    i("un"))
}),
window.addEventListener("musPushData", function(e) {
    i("pd"),
    mus.pdFlag = !0
})

DeviceData收集

该脚本同样会收集当前浏览器的信息，此处只列出部分值得学习的部分

Sayswho

用于识别当前浏览器及其版本；通常会注册在navigator中，非标准接口；参考代码：

navigator.sayswho= (function(){
    var ua= navigator.userAgent, tem, 
    M= ua.match(/(opera|chrome|safari|firefox|msie|trident(?=\/))\/?\s*(\d+)/i) || [];
    if(/trident/i.test(M[1])){
        tem=  /\brv[ :]+(\d+)/g.exec(ua) || [];
        return 'IE '+(tem[1] || '');
    }
    if(M[1]=== 'Chrome'){
        tem= ua.match(/\b(OPR|Edge)\/(\d+)/);
        if(tem!= null) return tem.slice(1).join(' ').replace('OPR', 'Opera');
    }
    M= M[2]? [M[1], M[2]]: [navigator.appName, navigator.appVersion, '-?'];
    if((tem= ua.match(/version\/(\d+)/i))!= null) M.splice(1, 1, tem[1]);
    return M.join(' ');
})();

console.log(navigator.sayswho);

我们可以用此条快速解决UA解析版本的问题

String.prototype.toSource异常检测

主流浏览器都会发生异常，除非是特别低版本的浏览器，可以快速定位低版本浏览器，参考代码：

getErrorFF: function() {
    try {
        throw "a"
    } catch (e) {
        try {
            return e.toSource(),
            !0
        } catch (e) {
            return !1
        }
    }
},

Audio/Video解码能力测试

利用canPlayType接口，若大概率可以播放，则返回"probably"，若确定无能力则返回空字符串; 不同的主流浏览器及版本会有比较显著的特性，低版本浏览器将全部为空

audioCodecs: function() {
    var e = document.createElement("audio")
      , t = {}
      , n = {
        ogg: 'audio/ogg; codecs="vorbis"',
        mp3: "audio/mpeg;",
        wav: 'audio/wav; codecs="1"',
        m4a: "audio/x-m4a;",
        aac: "audio/aac;"
    };
    if (e.canPlayType)
        for (var i in n)
            t[i] = e.canPlayType(n[i]);
    return t
},
videoCodecs: function() {
    var e = document.createElement("video")
      , t = {}
      , n = {
        ogg: 'video/ogg; codecs="theora"',
        h264: 'video/mp4; codecs="avc1.42E01E"',
        webm: 'video/webm; codecs="vp8, vorbis"',
        mpeg4v: 'video/mp4; codecs="mp4v.20.8, mp4a.40.2"',
        mpeg4a: 'video/mp4; codecs="mp4v.20.240, mp4a.40.2"',
        theora: 'video/x-matroska; codecs="theora, vorbis"'
    };
    if (e.canPlayType)
        for (var i in n)
            t[i] = e.canPlayType(n[i]);
    return t
},

window.eval hook检测

不同浏览器长度会有所不同，firefox为37，chrome类的为33，同时eval中会包含'native code'关键字

u.deviceData.emptyEvalLength = eval.toString().length

网络相关检测

仅chrome支持，获取网络环境信息

navigator && navigator.connection && (r = navigator.connection,
u.deviceData.connection = {
    effectiveType: r.effectiveType,
    rtt: r.rtt,
    downlink: r.downlink
})

webAssembly能力检测

本条是在查阅资料过程中发现了还有类似功能的一个开源项目friendly challenge：GitHub - FriendlyCaptcha/friendly-challenge: The widget and docs for the proof of work challenge use，其中发现的一个检测点；关于该项目一些相关点后续可以再总结

检测方法其实比较简单，使用一串可以被编译的字串，使用webAssembly.compile进行编译，尝试捕获异常，若捕获则检测失败：

const A = WebAssembly.compile(function(A) {
    const C = A.length;
    let t = 3 * C >>> 2;
    A.charCodeAt(C - 1) === I && t--, A.charCodeAt(C - 2) === I && t--;
    const B = new Uint8Array(t);
    for (let I = 0, t = 0; I < C; I += 4) {
        const C = g[A.charCodeAt(I + 0)],
            Q = g[A.charCodeAt(I + 1)],
            e = g[A.charCodeAt(I + 2)],
            r = g[A.charCodeAt(I + 3)];
        B[t++] = C << 2 | Q >> 4, B[t++] = (15 & Q) << 4 | e >> 2, B[t++] = (3 & e) << 6 | 63 & r
    }
    return B
}("一个base64编码的可编译webAssembly源码"))