Fiddler抓取微信公众号数据

 

写这篇博文的主要目的,记录我使用Fiddler 抓包工具完成公众号请求信息的抓取,并解析抓取的数据的全过程。

 

准备工作:

下载:Fiddler_5.0.20173.49666_Setup.exe

官网链接:https://www.telerik.com/download/fiddler

 

 

1.安装Fiddler_5.0.20173.49666_Setup.exe,很简单,打开效果如下图:

Fiddler抓取微信公众号数据_第1张图片

2.生成证书文件FiddlerRoot.cer

     在菜单栏中依次选择 【Tools】->【Options】->【HTTPS】,勾上如下图的选项

Fiddler抓取微信公众号数据_第2张图片

    然后点击【Actions】选择导出证书到桌面

Fiddler抓取微信公众号数据_第3张图片

3.手动安装证书

   在fiddler目录下有一个makecert.exe ,创建myTest.bat 内容如下:

makecert.exe -r -ss my -n “CN=DO_NOT_TRUST_FiddlerRoot, O=DO_NOT_TRUST, OU=Created by http://www.fiddler2.com” -sky signature -eku 1.3.6.1.5.5.7.3.1 -h 1 -cy authority -a sha1 -m 120 -b 09/05/2012

4.抓取我想要的微信公众号的数据
  a.原理:fiddler工具为我们提供了请求前的方法和请求响应后的方法

OnBeforeRequest(),OnBeforeResponse()

  b.配置抓取规则

     选择菜单【rules 】--- >【customs rules】选项,然后重启一下进入到如图所示的界面

Fiddler抓取微信公众号数据_第4张图片

     修改OnBeforeRequest()

 if (oSession.fullUrl.Contains("mp.weixin.qq.com"))
 {
     var fso;
     var file;
     fso = new ActiveXObject("Scripting.FileSystemObject");
     //文件保存路径,可自定义
     file = fso.OpenTextFile("c:\\Sessions.txt",8 ,true, true);
     file.writeLine("Request url: " + oSession.url);
     file.writeLine("Request header:" + "\n" + oSession.oRequest.headers);
     file.writeLine("Request body: " + oSession.GetRequestBodyAsString());
     file.writeLine("\n");
     file.close();
 }

   修改OnBeforeResponse()

if(oSession.fullUrl.Contains("weixin/searchShiFu.php"))
        {
		 oSession.utilDecodeResponse();//消除保存的请求可能存在乱码的情况
            var fso;
            var file;
            fso=new ActiveXObject("Scripting.FileSystemObject");
            //文件保存路径,可自定义

            file=fso.OpenTextFile("d:\\Response.txt",8,true,true);
            //file.writeLine("Response code: "+oSession.responseCode);
            file.writeLine("Response body: "+oSession.GetResponseBodyAsString());
            file.writeLine("\n");
            file.close();
        }

   保存退出,重启fiddler即可使用。

5.解析抓取的内容

   a.响应获取的解析数据d:\\Response.txt中,内容如下:

Response body: {"nickname":"秦人","totalTimes":33,"todayTimes":2,"total":598,"thisNum":8,"yuanjin":1,"oneSF":[{"juli":"2.1 公里","name":"马帅军","phone":"15529016011","address":"陕西西安未央区建章路","longitude_S":"108.848384","latitude_S":"34.318548","jianjie":"工龄4年。     施工20~25元/卷","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"12527","weixin":"0","headimgurl":""},{"juli":"2.4 公里","name":"张帅","phone":"13571547952","address":"陕西西安施工范围。全。","longitude_S":"108.893215","latitude_S":"34.332735","jianjie":"无妨壁纸25元一卷。长纤,蚕丝等30元一卷。壁画20元一平。壁布10元一平。","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"14235","weixin":"1","headimgurl":"http://wx.qlogo.cn/mmopen/PiajxSqBRaEJkMDHthV4HGnCWtEk7TCTvDOQUId5uvHaOZkzxN8nRJv8C7YicFia8KibNhvyjW0NL6WiboPhw6X6VqA/64"},{"juli":"2.9 公里","name":"黄师傅","phone":"13289380958","address":"西安西安市","longitude_S":"108.894098","latitude_S":"34.305800","jianjie":"工龄:6年,本人从事墙纸粘贴行业已有6年,积累了丰富的墙纸施工方面的经验和技术能对各种高中","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"2248","weixin":"0","headimgurl":""},{"juli":"3 公里","name":"尚俊","phone":"18802920027","address":"陕西西安未央区汉城街办西查村","longitude_S":"108.900890","latitude_S":"34.331328","jianjie":"6年工龄,无纺纸20其他25。团队6人,随时准备24小时为您服务。","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"13814","weixin":"0","headimgurl":""},{"juli":"3.2 公里","name":"刘小虎","phone":"18710629117","address":"陕西咸阳武功县普集镇令新村","longitude_S":"108.904032","latitude_S":"34.316132","jianjie":"","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"11609","weixin":"0","headimgurl":""},{"juli":"3.6 公里","name":"张跃武","phone":"13772014639","address":"陕西西安莲湖区莲湖区邓家村小学","longitude_S":"108.880513","latitude_S":"34.292046","jianjie":"贴了6年壁纸,工费是根据纸的材料而定,合作搭档两人,","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"5794","weixin":"0","headimgurl":""},{"juli":"3.6 公里","name":"小何","phone":"15291480050","address":"陕西西安凤城四路","longitude_S":"108.909152","latitude_S":"34.324399","jianjie":"","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"13726","weixin":"0","headimgurl":""},{"juli":"3.6 公里","name":"魏师傅","phone":"15291814440","address":"西安西安市","longitude_S":"108.904098","latitude_S":"34.306800","jianjie":"工龄:7年,专业","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"2241","weixin":"0","headimgurl":""}]}

   我的目的是从上面的响应数据获取到name,phone,addrss的信息;

   备注:默认生成的Response.txt文件的字符集是ucs-2 little endian ,在java中的字符集类型为:UTF-16LE

 

   b.解析Response.txt文本内容,输出name,phone,addrss信息到D:\\Handle_Response.txt

     我用java实现文本内容的解析,代码如下:

package com.wang.readText;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;

import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;

public class ReadTextUtils {

    private static List resultList = new ArrayList<>();
    private static String SRC_PATH = "D:/Response.txt";
    private static String OUT_PATH = "D:/Handle_Response.txt";

    public static void main(String[] args) {
    	
    	
    	String srcPath = args[0];
    	String outPath = args[1];
//    	String srcPath = SRC_PATH;
//    	String outPath = OUT_PATH;
        readTxtContent(srcPath);
        writeTxtContent(outPath);
          
    }
    
    public static void readTxtContent(String srcPath){

        /* 读取数据 */
        try {
            BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(new File(srcPath)),"UTF-16LE"));
            String lineTxt = null;
            while ((lineTxt = br.readLine()) != null) {
            	if(!"".equals(lineTxt)) {
            		lineTxt = lineTxt.substring(15);
            		JSONObject object = (JSONObject) JSONObject.parse(lineTxt);
            		if(!object.get("oneSF").equals(0)) {
            			JSONArray jsonArray =  (JSONArray)object.get("oneSF");
                		String jsonarrayString = jsonArray.toJSONString();
                		List userList=JSONArray.parseArray(jsonarrayString, UserInfo.class);
                		resultList.addAll(userList);
                		System.out.println("--read line data count---"+userList.size());
            		}
            	}
            }
            br.close();
        } catch (Exception e) {
            System.err.println("read errors :" + e);
        }
    }

    public  static List quchongfu() {
    	HashMap  userMap = new HashMap<>();
    	List userInfoList = new ArrayList<>();
    	for(UserInfo userInfo :resultList) {
    		userMap.put(userInfo.getPhone(), userInfo);
    	}
    	Set keySet = userMap.keySet();
    	for(String str:keySet) {
    		userInfoList.add(userMap.get(str));
    	}
    	return userInfoList;
    }
    public static void writeTxtContent(String outPath){
        /* 输出数据 */
        try {
            BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(outPath)),"UTF-8"));

            bw.write("name\t\t    phone\t\t\t    address");
            bw.newLine();
            for(UserInfo userInfo :quchongfu()){
                bw.write(userInfo.getName()+"\t\t "+userInfo.getPhone()+"\t\t "+userInfo.getAddress());
                bw.newLine();
            }
            bw.close();
        } catch (Exception e) {
            System.err.println("write errors :" + e);
        }
    }


}

   输出的文本内容如下图:

Fiddler抓取微信公众号数据_第5张图片

  6.一键式执行数据处理:

     a.将ReadTextUtils类打包成可执行的jar;

  Fiddler抓取微信公众号数据_第6张图片

    b.编写简单的runReadTxt.bat文件,内容如下:

   

@echo off
echo -----------read infos-----------  
java -jar %cd%\"readTxt.jar" %cd%\"Response.txt" %cd%\"Handle_Response.txt"
echo ---------------finish!!!-----------------------------  
PAUSE

   c.整体一键式运行小工具搞定;备注:运行jar的前提是要安装java运行环境

    Fiddler抓取微信公众号数据_第7张图片

     希望能帮到你,欢迎指正!

     参考文章:http://blog.csdn.net/lhorse003/article/details/72473212

 

Fiddler抓取微信公众号数据_第8张图片

推荐

自己搭建了一套logoly环境,欢迎搭建来体验。

http://www.mhtclub.com/logoDesign/

也欢迎朋友们来我的博客逛逛!

http://www.mhtclub.com

一位朋友的人工智能教程。零基础,通俗易懂!

 

你可能感兴趣的:(爬虫,技术分享)