微软new bing chatgpt 逆向爬虫实战

gospider 介绍

gospider 是一个golang 爬虫神器,它内置了多种反爬虫模块,是golang 爬虫必备的工具包

安装

go get -u gitee.com/baixudong/gospider

gitee地址

https://gitee.com/baixudong/gospider

github地址

https://github.com/baixudong007/gospider

开始new bing 逆向

通过抓包得到websocket 地址

微软new bing chatgpt 逆向爬虫实战_第1张图片

分析websocket 发送参数

微软new bing chatgpt 逆向爬虫实战_第2张图片

这里一连接就连续发送三个 text 类型的数据,其中第一个数据为

{"protocol":"json","version":1}

第二个数据为

{"type":6}

第三个数据为

{"arguments":[{"source":"cib","optionsSets":["nlu_direct_response_filter","deepleo","disable_emoji_spoken_text","responsible_ai_policy_235","enablemm","dlislog","dloffstream","dv3sugg","harmonyv3"],"allowedMessageTypes":["Chat","InternalSearchQuery","InternalSearchResult","InternalLoaderMessage","RenderCardRequest","AdsQuery","SemanticSerp"],"sliceIds":["0113dllog","216dloffstream"],"traceId":"63f8b16700104c9db4609875735e3f12","isStartOfSession":true,"message":{"locale":"ru-RU","market":"ru-RU","region":"US","location":"lat:47.639557;long:-122.128159;re=1000m;","locationHints":[{"country":"United States","state":"Pennsylvania","city":"Pittsburgh","zipcode":"15211","timezoneoffset":-5,"dma":508,"countryConfidence":9,"cityConfidence":5,"Center":{"Latitude":40.4393,"Longitude":-80.0213},"RegionType":2,"SourceType":1}],"timestamp":"2023-02-24T20:45:32+08:00","author":"user","inputMethod":"Keyboard","text":"你好","messageType":"Chat"},"conversationSignature":"Qf1G4jxAl5c2h1frcuYraT8+4L4f9IfjvARi+xT+IoI=","participant":{"id":"844428589040485"},"conversationId":"51D|BingProd|36AD9D4018B442F39EC6E4055099E8E728B019942E0F02A1C99016EA127A3E0E"}],"invocationId":"0","target":"chat","type":4}

第一个数据和第二个数据简单易懂,第三个数据有很多没有用的参数,这里删除精简后如下

{"arguments":[{"conversationId":conversationId,"source":"cib","isStartOfSession":isStartOfSession,"message":{"text":text,"messageType":"Chat"},"conversationSignature":conversationSignature,"participant":{"id":clientId}}],"invocationId":"1","target":"chat","type":4}

第三个数据,四个变量:

  1. conversationId /turing/conversation/create 处获得
  2. conversationSignature /turing/conversation/create 处获得
  3. clientId /turing/conversation/create 处获得
  4. isStartOfSession 是否是会话的第一个问题
  5. text 要问的问题

获取conversationId,conversationSignature,clientId 值。通过抓包分析

微软new bing chatgpt 逆向爬虫实战_第3张图片
只要带上登录信息,发送这个请求,就可以得到以上三个值了,这一步没有难度。登录的cookie 为 _U ,只要_U 这个值就可以了

参数都准备完毕了,现在我们一起通过代码演示下。

首先获取 conversationId,conversationSignature,clientId

	reqCli, err = requests.NewClient(nil)
	if err != nil {
		log.Panic(err)
	}
	response, err := reqCli.Request(nil, "get", "https://www.bing.com/turing/conversation/create", requests.RequestOption{
		Cookies: "登录的cookies",
	})
	if err != nil {
		log.Panic(err)
	}
	jsonData := response.Json()
	conversationId := jsonData.Get("conversationId").String()
	clientId := jsonData.Get("clientId").String()
	conversationSignature := jsonData.Get("conversationSignature").String()

发送websocket 请求

	Response, err := reqCli.Request(nil, "get", "wss://sydney.bing.com/sydney/ChatHub")
	if err != nil {
		log.Panic(err)
	}
	wsCli := Response.WebSocket()
	if err = wsCli.Send(context.TODO(), websocket.MessageText, append(tools.StringToBytes(`{"protocol":"json","version":1}`), 0x1e)); err != nil {
		log.Panic(err)
	}
	if err = wsCli.Send(context.TODO(), websocket.MessageText, append(tools.StringToBytes(`{"type":6}`), 0x1e)); err != nil {
		log.Panic(err)
	}
	data := map[string]any{
		"arguments": []map[string]any{
			map[string]any{
				"source":           "cib",
				"isStartOfSession": isStartOfSession,
				"message": map[string]any{
					"text":        text,
					"messageType": "Chat",
				},
				"conversationSignature": conversationSignature,
				"participant": map[string]any{
					"id": clientId,
				},
				"conversationId": conversationId,
			},
		},
		"invocationId": "1",
		"target":       "chat",
		"type":         4,
	}
	if err = wsCli.Send(context.TODO(), websocket.MessageText, append(tools.StringToBytes(tools.Any2json(data).Raw), 0x1e)); err != nil {
		log.Panic(err)
	}

接受微软回答的消息

	var offset int
	run:=true
	for run{
		msgType, msgCon, err := wsCli.Recv(context.TODO())
		if err != nil {
			log.Panic(err)
		}
		if msgType == websocket.MessageText {
			msgData := tools.Any2json(msgCon)
			switch msgData.Get("type").Int() {
			case 1:
				txt := msgData.Get("arguments.0.messages.0.text").String()
				lls := []rune(txt)
				fmt.Print(string(lls[offset:]))
				offset = len(lls)
			case 2:
				log.Print(msgData)
				run=false
			}
		}

这样就可以和new bing 愉快的玩耍了

注意事项

  1. isStartOfSession 会话的第一个问题为true,后面几个问题为false
  2. 发送text 消息时要在文本的末尾添加1e 这个字符。

你可能感兴趣的:(爬虫,chatgpt,爬虫,python)