写这篇博文的主要目的,记录我使用Fiddler 抓包工具完成公众号请求信息的抓取,并解析抓取的数据的全过程。
准备工作:
下载:Fiddler_5.0.20173.49666_Setup.exe
官网链接:https://www.telerik.com/download/fiddler
1.安装Fiddler_5.0.20173.49666_Setup.exe,很简单,打开效果如下图:
2.生成证书文件FiddlerRoot.cer
在菜单栏中依次选择 【Tools】->【Options】->【HTTPS】,勾上如下图的选项
然后点击【Actions】选择导出证书到桌面
3.手动安装证书
在fiddler目录下有一个makecert.exe ,创建myTest.bat 内容如下:
makecert.exe -r -ss my -n “CN=DO_NOT_TRUST_FiddlerRoot, O=DO_NOT_TRUST, OU=Created by http://www.fiddler2.com” -sky signature -eku 1.3.6.1.5.5.7.3.1 -h 1 -cy authority -a sha1 -m 120 -b 09/05/2012
4.抓取我想要的微信公众号的数据
a.原理:fiddler工具为我们提供了请求前的方法和请求响应后的方法
OnBeforeRequest(),OnBeforeResponse()
b.配置抓取规则
选择菜单【rules 】--- >【customs rules】选项,然后重启一下进入到如图所示的界面
修改OnBeforeRequest()
if (oSession.fullUrl.Contains("mp.weixin.qq.com"))
{
var fso;
var file;
fso = new ActiveXObject("Scripting.FileSystemObject");
//文件保存路径,可自定义
file = fso.OpenTextFile("c:\\Sessions.txt",8 ,true, true);
file.writeLine("Request url: " + oSession.url);
file.writeLine("Request header:" + "\n" + oSession.oRequest.headers);
file.writeLine("Request body: " + oSession.GetRequestBodyAsString());
file.writeLine("\n");
file.close();
}
修改OnBeforeResponse()
if(oSession.fullUrl.Contains("weixin/searchShiFu.php"))
{
oSession.utilDecodeResponse();//消除保存的请求可能存在乱码的情况
var fso;
var file;
fso=new ActiveXObject("Scripting.FileSystemObject");
//文件保存路径,可自定义
file=fso.OpenTextFile("d:\\Response.txt",8,true,true);
//file.writeLine("Response code: "+oSession.responseCode);
file.writeLine("Response body: "+oSession.GetResponseBodyAsString());
file.writeLine("\n");
file.close();
}
保存退出,重启fiddler即可使用。
5.解析抓取的内容
a.响应获取的解析数据d:\\Response.txt中,内容如下:
Response body: {"nickname":"秦人","totalTimes":33,"todayTimes":2,"total":598,"thisNum":8,"yuanjin":1,"oneSF":[{"juli":"2.1 公里","name":"马帅军","phone":"15529016011","address":"陕西西安未央区建章路","longitude_S":"108.848384","latitude_S":"34.318548","jianjie":"工龄4年。 施工20~25元/卷","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"12527","weixin":"0","headimgurl":""},{"juli":"2.4 公里","name":"张帅","phone":"13571547952","address":"陕西西安施工范围。全。","longitude_S":"108.893215","latitude_S":"34.332735","jianjie":"无妨壁纸25元一卷。长纤,蚕丝等30元一卷。壁画20元一平。壁布10元一平。","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"14235","weixin":"1","headimgurl":"http://wx.qlogo.cn/mmopen/PiajxSqBRaEJkMDHthV4HGnCWtEk7TCTvDOQUId5uvHaOZkzxN8nRJv8C7YicFia8KibNhvyjW0NL6WiboPhw6X6VqA/64"},{"juli":"2.9 公里","name":"黄师傅","phone":"13289380958","address":"西安西安市","longitude_S":"108.894098","latitude_S":"34.305800","jianjie":"工龄:6年,本人从事墙纸粘贴行业已有6年,积累了丰富的墙纸施工方面的经验和技术能对各种高中","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"2248","weixin":"0","headimgurl":""},{"juli":"3 公里","name":"尚俊","phone":"18802920027","address":"陕西西安未央区汉城街办西查村","longitude_S":"108.900890","latitude_S":"34.331328","jianjie":"6年工龄,无纺纸20其他25。团队6人,随时准备24小时为您服务。","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"13814","weixin":"0","headimgurl":""},{"juli":"3.2 公里","name":"刘小虎","phone":"18710629117","address":"陕西咸阳武功县普集镇令新村","longitude_S":"108.904032","latitude_S":"34.316132","jianjie":"","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"11609","weixin":"0","headimgurl":""},{"juli":"3.6 公里","name":"张跃武","phone":"13772014639","address":"陕西西安莲湖区莲湖区邓家村小学","longitude_S":"108.880513","latitude_S":"34.292046","jianjie":"贴了6年壁纸,工费是根据纸的材料而定,合作搭档两人,","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"5794","weixin":"0","headimgurl":""},{"juli":"3.6 公里","name":"小何","phone":"15291480050","address":"陕西西安凤城四路","longitude_S":"108.909152","latitude_S":"34.324399","jianjie":"","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"13726","weixin":"0","headimgurl":""},{"juli":"3.6 公里","name":"魏师傅","phone":"15291814440","address":"西安西安市","longitude_S":"108.904098","latitude_S":"34.306800","jianjie":"工龄:7年,专业","s1":"","s2":"","s3":"","sa":"","p1":"","p2":"","p3":"","pa":"","uid":"2241","weixin":"0","headimgurl":""}]}
我的目的是从上面的响应数据获取到name,phone,addrss的信息;
备注:默认生成的Response.txt文件的字符集是ucs-2 little endian ,在java中的字符集类型为:UTF-16LE
b.解析Response.txt文本内容,输出name,phone,addrss信息到D:\\Handle_Response.txt
我用java实现文本内容的解析,代码如下:
package com.wang.readText;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
public class ReadTextUtils {
private static List resultList = new ArrayList<>();
private static String SRC_PATH = "D:/Response.txt";
private static String OUT_PATH = "D:/Handle_Response.txt";
public static void main(String[] args) {
String srcPath = args[0];
String outPath = args[1];
// String srcPath = SRC_PATH;
// String outPath = OUT_PATH;
readTxtContent(srcPath);
writeTxtContent(outPath);
}
public static void readTxtContent(String srcPath){
/* 读取数据 */
try {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(new File(srcPath)),"UTF-16LE"));
String lineTxt = null;
while ((lineTxt = br.readLine()) != null) {
if(!"".equals(lineTxt)) {
lineTxt = lineTxt.substring(15);
JSONObject object = (JSONObject) JSONObject.parse(lineTxt);
if(!object.get("oneSF").equals(0)) {
JSONArray jsonArray = (JSONArray)object.get("oneSF");
String jsonarrayString = jsonArray.toJSONString();
List userList=JSONArray.parseArray(jsonarrayString, UserInfo.class);
resultList.addAll(userList);
System.out.println("--read line data count---"+userList.size());
}
}
}
br.close();
} catch (Exception e) {
System.err.println("read errors :" + e);
}
}
public static List quchongfu() {
HashMap userMap = new HashMap<>();
List userInfoList = new ArrayList<>();
for(UserInfo userInfo :resultList) {
userMap.put(userInfo.getPhone(), userInfo);
}
Set keySet = userMap.keySet();
for(String str:keySet) {
userInfoList.add(userMap.get(str));
}
return userInfoList;
}
public static void writeTxtContent(String outPath){
/* 输出数据 */
try {
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(outPath)),"UTF-8"));
bw.write("name\t\t phone\t\t\t address");
bw.newLine();
for(UserInfo userInfo :quchongfu()){
bw.write(userInfo.getName()+"\t\t "+userInfo.getPhone()+"\t\t "+userInfo.getAddress());
bw.newLine();
}
bw.close();
} catch (Exception e) {
System.err.println("write errors :" + e);
}
}
}
输出的文本内容如下图:
6.一键式执行数据处理:
a.将ReadTextUtils类打包成可执行的jar;
b.编写简单的runReadTxt.bat文件,内容如下:
@echo off
echo -----------read infos-----------
java -jar %cd%\"readTxt.jar" %cd%\"Response.txt" %cd%\"Handle_Response.txt"
echo ---------------finish!!!-----------------------------
PAUSE
c.整体一键式运行小工具搞定;备注:运行jar的前提是要安装java运行环境
希望能帮到你,欢迎指正!
参考文章:http://blog.csdn.net/lhorse003/article/details/72473212
推荐
自己搭建了一套logoly环境,欢迎搭建来体验。
http://www.mhtclub.com/logoDesign/
也欢迎朋友们来我的博客逛逛!
http://www.mhtclub.com
一位朋友的人工智能教程。零基础,通俗易懂!