爬虫很多数据需要登录,登录成功后才能获取一些数据,这里记录一下模拟登录江西移动查询自己手机话费。
首先需要抓包获取首页网址:http://service.jx.10086.cn/service/resources/indexNew.html
抓取登录页面网址:https://jx.ac.10086.cn/Login
获取登录参数(手机号码和服务密码已和谐,自己测试填上就可以):
准备条件做好了下面开始用代码来实现:
Connection.Response response = JsoupUtil.get(index_url, null, 10000)
.header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8")
.execute();
Map cookies = response.cookies();
update(cookieMap, cookies);
Document document = Jsoup.parse(response.body());
String type = document.getElementById("type").val();
String loginStatus = document.getElementById("loginStatus").val();
String loginFlag = document.getElementById("loginFlag").val();
String menuid = document.getElementById("menuid").val();
String spid = document.select("[name=spid]").val();
String backurl = document.getElementById("backurl").val();
String errorurl = document.select("[name=errorurl]").val();
String sessionToken = document.getElementById("sessionToken").val();
String login_backurl = document.getElementById("_login_backurl").val();
String mobileNum = "xxxxxxxxx";
String servicePassword = "xxx";
String smsValidCode = "";
String validCode = getImageCode();
验证码是动态生成的,自动打码的工具开源的有tess4j,具体使用方式可以自己百度,我使用的时候好像无法识别我的验证码,数字的还是可以,我这里使用的若快打码,工具类下载:http://wiki.ruokuai.com/,验证码的获取方式:
String url = "https://jx.ac.10086.cn/common/image.jsp?l=" + Math.random()
获取到验证码的值之后开始做登录操作:
String login_url= "https://jx.ac.10086.cn/Login";
Connection.Response responses = JsoupUtil.post(login_url,cookieMap,10000)
.header("Host","jx.ac.10086.cn")
.header("Upgrade-Insecure-Requests","1")
.header("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8")
.header("Content-Type","application/x-www-form-urlencoded")
.header("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36")
.header("Referer","http://service.jx.10086.cn/service/resources/indexNew.html")
.data("type",type)
.data("loginStatus",loginStatus)
.data("loginFlag",loginFlag)
.data("menuid",menuid)
.data("spid",spid)
.data("backurl",backurl)
.data("errorurl",errorurl)
.data("sessionToken",sessionToken)
.data("login_backurl",login_backurl)
.data("mobileNum",mobileNum)
.data("servicePassword",servicePassword)
.data("smsValidCode",smsValidCode)
.data("validCode",validCode)
.execute();
String responstr = responses.body();
登陆成功后需要调到首页地址,通过这个地址获取话费信息:
if (!StringUtil.isBlank(responstr)&&responstr.contains("location.replace"))
{
update(cookieMap,response.cookies());
System.out.println("登陆成功");
String url = substring("('", "')", responstr);
System.out.println("url:"+url);
response = JsoupUtil.get(url,cookieMap,10000)
.header("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8")
.header("Host","login.10086.cn")
.execute();
String body = response.body();
if(body.contains("location.href"))
{
update(cookieMap,response.cookies());
url = "http://service.jx.10086.cn/service/common/indexNew.jsp";
response = JsoupUtil.get(url,cookieMap,10000)
.header("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8")
.header("Host","service.jx.10086.cn")
.execute();
update(cookieMap,response.cookies());
body = response.body();
System.out.println("body: "+body);
String mobile = Jsoup.parse(body).select("div[class=lc_text]").select("p").get(0).text();
String amount = Jsoup.parse(body).getElementsByClass("tt_blink").text();
System.out.println("手机号码:"+mobile+"话费余额:"+amount);
return;
}
}