自研抓包服务实现(1)—Mitmproxy抓包数据处理

抓包服务实现(1)—Mitmproxy抓包数据处理

mitmproxy是什么?
  • 官网介绍:mitmproxy is a set of tools that provide an interactive, SSL/TLS-capable intercepting proxy for HTTP/1, HTTP/2, and WebSockets.
  • 简而言之:mitmproxy支持拦截http、https请求
mitmproxy怎么使用?

由于mitmproxy是python第三方库,首先需要安装,建议直接用python安装
这里用mac系统(两个命令供选择)

python3 -m pip install mitmproxy 或者 pip3 install mitmproxy

当然,如果想用mac自带系统安装也行

brew install mitmproxy

mitmproxy提供了三个命令,启动模式不同:

  • mitmproxy提供编译器执行命令行
  • mitmdump提供终端输出
  • mitmweb提供浏览器界面

至于如何配置环境和启动,可自行百度搜索。但在本文中将不会用到上述三个命令,而是以一个python第三方包的形式import,利用其包含的方法进行二次开发

mitmproxy提供了很多api供调用,用来改变mitmproxy的行为
最简单的用法是打印flow,了解flow的构造

class Counter:
    def request(self,flow):
        print(flow)

addons=[Counter()]

注意这里不能用python直接启动,因为还没有options配置,不能让python与mitmproxy直接进行交互
执行命令mitmdump -s addons.py

为什么传参flow,这是mitmproxy的实例:flow:http.HTTPFlow,类似于Django的request。此时我们能看到flow流的构造


  server_conn = >

从上述我们不能直接解析到request、response明细,但我们至少知道了flow包含了两个属性:
request、response

知道了flow流,接下来我们就要深入二次开发

怎么二次开发?
  1. 首先,我们希望是以python启动的方式来打开mitmproxy。原因无非是为以后集成接口去调用打基础
    参考某大佬的代码,至今都没有分析出为啥这么做
from mitmproxy import proxy, options
from mitmproxy.tools.dump import DumpMaster

class Counter:
    def request(self,flow):
        print(flow)

def start():
    myaddon = AddHeader()
    opts = options.Options(listen_port=8090)
    pconf = proxy.config.ProxyConfig(opts)
    m = DumpMaster(opts)
    m.server = proxy.server.ProxyServer(pconf)
    m.addons.add(myaddon)
    try:
        m.run()
    except KeyboardInterrupt:
        m.shutdown()

if __name__ == '__main__':
    start()

此时用python addons.py即可启动

启动成功后,终端输出

Proxy server listening at http://*:8090

表示已经启动代理,默认端口号是8090。客户端连接代理,此时http能正常抓包。

  1. 其次,我们希望接下来能够抓https包
  • 很简单,mitmproxy提供一个下载安全证书地址:http://mitm.it/
  • 下载后是一个以cer结尾的文件,可以直接安装并信任,避免charles的pem文件造成像oppo、vivo等系统的不支持
  1. 然后,我们希望是能将flow数据存储到mongoDB。原因是mongoDB存储的是文件流的形式,提高了读取和存储的性能,避免抓包数据展示延时、卡顿。
    1)python连接mongoDB需要安装pymongo依赖
    2)mongoDB如何启动,参照百度搜索,命令建议使用

sudo ./mongod --dbpath ~/data

  1. 最后,我们需要将flow数据按照我们需要的字段进行存储。为什么不能直接存储flow,因为它是不可迭代对象。
from mitmproxy import proxy, options
from mitmproxy.tools.dump import DumpMaster
import time
import pymongo
import json

class Counter:
    def __init__(self):
        client = pymongo.MongoClient('mongodb://localhost:27017/')
        #创建数据库
        mydb = client["mitmproxy"]
        #创建数据表
        self.mycol = mydb["all_capture"]
        
    def response(self, flow):
        Flow_request=flow.request
        Flow_response=flow.response
        #过滤字段
        response_data={
            'request_headers':Flow_request.headers,
            'host':Flow_request.host,
            'url':Flow_request.url,
            'path':Flow_request.path,
            'body':Flow_request.text,
            'query':Flow_request.query,
            'method':Flow_request.method,
            'protocol':Flow_request.scheme,
            'timestamp_start':int(round(Flow_request.timestamp_start*1000)),
            'time_start':time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(int(Flow_request.timestamp_start))),
            'timestamp_end':int(round(Flow_request.timestamp_end*1000)),
            'duration':str(int(round(Flow_request.timestamp_end*1000))-int(round(Flow_request.timestamp_start*1000)))+' ms',
            'response_headers':Flow_response.headers,
            'status':Flow_response.status_code,
            'response':Flow_response.text,
            'size':str(len(Flow_response.raw_content))+' B'
        }
         #每个flow过滤后插入mongoDB
         self.mycol.insert_one(response_data.copy())

def start():
    myaddon = Counter()
    #配置端口和域名
    opts = options.Options(listen_port=8090)
    pconf = proxy.config.ProxyConfig(opts)
    m = DumpMaster(opts)
    m.server = proxy.server.ProxyServer(pconf)
    m.addons.add(myaddon)
    try:
        m.run()
    except KeyboardInterrupt:
        m.shutdown()

if __name__ == '__main__':
    start()

1)mitmproxy提供了HTTP、WebSocket、TCP事件。本次二开只考虑HTTP事件。
2)HTTP事件提供一些api,详情参考https://docs.mitmproxy.org/stable/addons-events/

"""HTTP-specific events."""
import mitmproxy.http
class Events:
    def http_connect(self, flow: mitmproxy.http.HTTPFlow):
        """
            An HTTP CONNECT request was received. Setting a non 2xx response on
            the flow will return the response to the client abort the
            connection. CONNECT requests and responses do not generate the usual
            HTTP handler events. CONNECT requests are only valid in regular and
            upstream proxy modes.
        """
        
    def requestheaders(self, flow: mitmproxy.http.HTTPFlow):
        """
            HTTP request headers were successfully read. At this point, the body
            is empty.
        """

    def request(self, flow: mitmproxy.http.HTTPFlow):
        """
            The full HTTP request has been read.
        """

    def responseheaders(self, flow: mitmproxy.http.HTTPFlow):
        """
            HTTP response headers were successfully read. At this point, the body
            is empty.
        """

    def response(self, flow: mitmproxy.http.HTTPFlow):
        """
            The full HTTP response has been read.
        """

    def error(self, flow: mitmproxy.http.HTTPFlow):
        """
            An HTTP error has occurred, e.g. invalid server responses, or
            interrupted connections. This is distinct from a valid server HTTP
            error response, which is simply a response with an HTTP error code.
        """

3)使用response API主要是因为此api对应flow的响应body阶段,后面会用到request API,对应flow的请求阶段
4)flow拥有哪些字段,这个需要查看mitmproxy文档,这里查看types.py得知

valid_prefixes = [
        "request.method",
        "request.scheme",
        "request.host",
        "request.http_version",
        "request.port",
        "request.path",
        "request.url",
        "request.text",
        "request.content",
        "request.raw_content",
        "request.timestamp_start",
        "request.timestamp_end",
        "request.header[",

        "response.status_code",
        "response.reason",
        "response.text",
        "response.content",
        "response.timestamp_start",
        "response.timestamp_end",
        "response.raw_content",
        "response.header",
    ]

根据自己需要过滤字段,这里我们在timestamp_start、time_start、timestamp_end、duration、size上进行了个性化需求,为了更直观的体现接口的可读性
5)mongoDB方法需要封装,这为以后的读取数据建立基础,比如insert_one,delete_one等
6)为什么使用copy( ),因为mongodb的id属性是根据进入db的时间戳而随机生成的,如果同时间出现不同数据,则会出现id重复的报错,此时用copy( )浅拷贝,当源数据是字典时,会开启新的地址存放,改变id属性

至此,我们基本完成了mitmproxy的抓包处理,这些数据更直观。
当然,要实现服务,仅仅只是抓包处理还不够,我们更希望能够像Charles一样功能多样性,尤其是Map Remote&Map Local&Rewrite等,所以我们继续迭代。

你可能感兴趣的:(抓包服务)