webclient 爬虫bug

webclient爬取网页信息bug


例如福建移动,短信验证码发送时(忽略前方请求):

使用webclient 模拟得到cookie:


[JSESSIONID=120ef2fc6a7833f237fa37ff04cb;domain=fj.ac.10086.cn;path=/, [email protected];domain=10086.cn;path=/;secure, SSO_SID=612d9f4625f24250b4c90c1f7a38e05e;domain=fj.ac.10086.cn;path=/, JSESSIONID=121200ad90f0adc8c5df6a51c249;domain=www.fj.10086.cn;path=/my, BIGipServerpool_wy_hexing_7007=wC3NflQF8FggU2eZFRw0Joie03QFPtkhaQlBZW/IILU13sAz5K2eSJ8lp1PAkWSoPxllnllTCdOy;domain=www.fj.10086.cn;path=/;httpOnly, fj_areaCode=591;domain=.10086.cn;path=/;expires=Sun May 01 17:22:06 CST 2016, CmLocation=591|591;domain=10086.cn;path=/;expires=Thu Jun 30 17:23:38 CST 2016, CmProvid=fj;domain=10086.cn;path=/;expires=Thu Jun 30 17:23:38 CST 2016, WT_FPC=id=20db0e1d2615e9d36a21459502619326:lv=1459502619326:ss=1459502619326;domain=.10086.cn;path=/;expires=Mon Mar 30 17:23:39 CST 2026, WEBTRENDS_ID=183.12.152.103-1459505460.721265;domain=112.5.185.63;path=/;expires=Mon Mar 30 18:11:00 CST 2026, wtDingBuTongLan=1;domain=www.fj.10086.cn;path=/my;expires=Sat Apr 02 00:00:00 CST 2016, cdnweb=web_2403;domain=www.fj.10086.cn;path=/, CmWebtokenid="15806075051,fj";domain=10086.cn;path=/]


请注意cookie中不同域以及不同链接(例如/my和/)下的cookie


官网:https://fj.ac.10086.cn/SMSCodeSend?mobileNum=15806075051&validCode=0000&errorurl=http://www.fj.10086.cn:80/my/login/send.jsp

cookie:

fj_areaCode=591; [email protected]; CmWebtokenid="15806075051,fj"; SSO_SID=6323d5a9cde944edbe2c0f47dbf86328; CmLocation=591|591; CmProvid=fj; WT_FPC=id=2667a2beafc2a68de481459489941975:lv=1459497645045:ss=1459497076571; JSESSIONID=110ed071e0e8798c24528a4a5fbd

头信息:
Host: fj.ac.10086.cnConnection: keep-alivePragma: no-cacheCache-Control: no-cacheAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36Referer: http://www.fj.10086.cn/my/index.jsp?id_type=YANZHENGMAAccept-Encoding: gzip, deflate, sdchAccept-Language: zh-CN,zh;q=0.8


返回302,location :ttp://www.fj.10086.cn:80/my/login/send.jsp?code=0000&displayPic=null


请求该地址cookie:

wtDingBuTongLan=1; JSESSIONID=110edbd38cbe38bd4f864cf9c702; BIGipServerpool_wy_hexing_7007=MCtQticHukuKaJq6dKe0Q7vuubYK9M+eSyqSgVF/Aiu/LxOoLCDc48o3wY36HdxA3RiaVd2O8n7Y; fj_areaCode=591; CmWebtokenid="15806075051,fj"; CmLocation=591|591; CmProvid=fj; WT_FPC=id=2667a2beafc2a68de481459489941975:lv=1459497645045:ss=1459497076571; cdnweb=web_2408

头信息:

Host: www.fj.10086.cnConnection: keep-alivePragma: no-cacheCache-Control: no-cacheAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36Referer: http://www.fj.10086.cn/my/index.jsp?id_type=YANZHENGMAAccept-Encoding: gzip, deflate, sdchAccept-Language: zh-CN,zh;q=0.8

注意头信息中host已经改变。

官网无问题。

而使用webclient模拟,设置setRedirectEnabled 自动跳转为true时,最后504.

只能设置为false,获取location请求。

头信息发生改变,cookie 与请求路径不同而改变。

你可能感兴趣的:(java,爬虫)