Cefsharp抓取拼多多每日关键词

        最近业余时间都在学习Cefsharp实现本地客户端,发现Cefsharp可以很方便的爬取一些网站的信息,于是就一些爬取过程做一下记录。

 拼多多商品搜索请求过程分析

拼多多商品搜索的链接是:拼多多,我们在CefSharp发起请求时加上log打印相关的请求信息,经过分析log,发现这条有关键字的信息是在一次MimeType为json的HTTP资源请求。代码段和日志如下

protected override IResponseFilter GetResourceResponseFilter(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response)
        {
            logger.Debug(" request_url=" + request.Url + ";request_id=" + request.Identifier + ";response_MimeType=" + response.MimeType + ";response_charset=" + response.Charset + ";response_status=" + response.StatusText);

            return base.GetResourceResponseFilter(chromiumWebBrowser, browser, frame, request, response);
        }

2022-07-09 09:46:18.6335 DEBUG 20076-12 Chrome.MyChrome.CefHandlers.MyResourceRequestHandler.GetResourceResponseFilter  request_url=https://mobile.yangkeduo.com/proxy/api/search_hotquery?pdduid=0&plat=h5&source=index;request_id=759816;response_MimeType=application/json;response_charset=utf-8;response_status=
             

使用我上传的资源下载器也可以加载到对应的json文件.使用CefSharp结合vue3实现简单URL资源下载器-C#文档类资源-CSDN下载 

Cefsharp抓取拼多多每日关键词_第1张图片

Cefsharp抓取拼多多每日关键词_第2张图片

 商品关键词解析

        获取关键词对应的http请求后,我们就可以在C#里面对本次的资源请求进行截取分析,关键步骤如下。

重写CefSharp.Handler.RequestHandler类的GetResourceRequestHandler方法返回自定义的资源处理类

 protected override IResourceRequestHandler GetResourceRequestHandler(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, bool isNavigation, bool isDownload, string requestInitiator, ref bool disableDefaultHandling)
        {
            chrome.Logger.Debug("request_url=" + request.Url + ";request_id=" + request.Identifier + ";TransitionType=" + request.TransitionType + ";ReferrerUrl=" + request.ReferrerUrl + ";Method=" + request.Method + ";IsReadOnly=" + request.IsReadOnly + ";isNavigation=" + isNavigation + ";isDownload=" + isDownload + ";requestInitiator=" + requestInitiator);
            return new MyResourceRequestHandler(webRequest);
        }

重写CefSharp.Handler.ResourceRequestHandler类的GetResourceResponseFilter方法,将自定义的Stream传入。

        Stream DataStream = new MemoryStream();   
protected override IResponseFilter GetResourceResponseFilter(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response)
        {
            logger.Debug(" request_url=" + request.Url + ";request_id=" + request.Identifier + ";response_MimeType=" + response.MimeType + ";response_charset=" + response.Charset + ";response_status=" + response.StatusText);
            return new CefSharp.ResponseFilter.StreamResponseFilter(DataStream);
        }

 重写CefSharp.Handler.ResourceRequestHandler类的OnResourceLoadComplete方法对Stream存储的数据进行解析处理

protected override void OnResourceLoadComplete(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response, UrlRequestStatus status, long receivedContentLength)
        {
            logger.Debug("request_url=" + request.Url + ";request_id=" + request.Identifier + ";request_status=" + status + ";recv_length=" + receivedContentLength);
             var ms = DataStream as MemoryStream;
            string Response2String = "";
                var bytes = ms.ToArray();
                if (Charset.IndexOf("utf-8", System.StringComparison.OrdinalIgnoreCase) >= 0)
                {
                    Response2String = System.Text.Encoding.UTF8.GetString(bytes);
                }
                else if (Charset.IndexOf("gbk", System.StringComparison.OrdinalIgnoreCase) >= 0)
                {
                    Response2String = System.Text.Encoding.GetEncoding("GB2312").GetString(bytes);
                }
                else
                {
                    Response2String = System.Text.Encoding.UTF8.GetString(bytes);
                    Logger.Error("unknow_charset Charset=" + Charset);
                }

                  JObject jsonObj = JObject.Parse(Response2String);
                    Logger.Debug("parse_json_success json_str=" + jsonObj["items"].ToString());
        }

json处理我选用的是Newtonsoft.Json.Linq的JObject进行动态解析。

 

 

你可能感兴趣的:(C#,&,CefSharp,&,VUE3,c#)