网络爬虫抓包使用及通过表单请求

近期,有人将本人博客,复制下来,直接上传到百度文库等平台。
本文为原创博客,仅供技术学习使用。未经允许,禁止将其复制下来上传到百度文库等平台。如有转载请注明本文博客的地址(链接)
如需源码程序,请联系我。

有些网站抓包请求时,发现数据的真实地址,但在使用httpclient请求该真实地址时,却发现数据为空。该怎么办呢?以下以该网站为例进行讲解。

网站地址为:https://las.cnas.org.cn/LAS/publish/lab/keyBranchListView.jsp?baseInfoId=3ee5aa672cbf44d0a2d9906b2bae70c5

如下为数据截图:
这里写图片描述

通过抓包发现,该数据是通过json返回的,抓包获取了真实的请求地址。如下截图:
网络爬虫抓包使用及通过表单请求_第1张图片

真实请求地址为:https://las.cnas.org.cn/LAS/publish/queryPublishKeyBranch.action?

单独请求该地址时,发现返回数据为空,如下截图:
网络爬虫抓包使用及通过表单请求_第2张图片
数据如下:

{"pageCount":0,"remark":null,"addpost":null,"isModify":null,"mainActivityOther":null,"mainactivity":null,"labfeature":null,"remarkEn":null,"sizePerPage":0,"asstId":null,"addcode":null,"startIndex":0,"mainActivityOtherEn":null,"nameCn":null,"primaryRecommend":null,"branchId":null,"currPage":0,"statementPrefix":"getPageKeyBranch","totalSize":0,"labFeatureList":null,"keyNum":null,"data":[],"addEn":null,"addCn":null,"postCode":null,"labFeatureJson":null,"provider":[],"limit":0}

针对此问题,继续返回到抓包页面,发现还有一个表单传参,基于此分析,可设计如下程序:

package navi.main;

import java.util.ArrayList;
import java.util.List;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicHeader;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;
/**
 * @author:合肥工业大学 管理学院 钱洋
 * @email:[email protected]
 * @ 
 */
public class Test {

    public static void main(String[] args) throws Exception {
        DefaultHttpClient client = new DefaultHttpClient();  
        String newUrl = "https://las.cnas.org.cn/LAS/publish/queryPublishKeyBranch.action?";  
        HttpPost post = new HttpPost(newUrl);  
        //设置参数,可有可无,并不是最关键的
        post.addHeader(new BasicHeader("Cookie",  
                "JSESSIONID=0000qty6OnqsYHgBdc3VKzr4zbI:1a5s8ura0"));  
        post.addHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");  
        post.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36");  
        post.addHeader("Host", "las.cnas.org.cn");  
        post.addHeader("Accept", "*/*");  
        post.addHeader("Accept-Language", "zh-CN,zh;q=0.8");  
        post.addHeader("X-Requested-With", "XMLHttpRequest");
        post.addHeader("Referer", "https://las.cnas.org.cn/LAS/publish/lab/keyBranchListView.jsp?baseInfoId=3ee5aa672cbf44d0a2d9906b2bae70c5");
        post.addHeader("Origin", "https://las.cnas.org.cn");
        //表单传参数,关键的,必不可少
        List list=new ArrayList();
        list.add(new BasicNameValuePair("asstId", "3ee5aa672cbf44d0a2d9906b2bae70c5"));
        post.setEntity(new UrlEncodedFormEntity(list));
        org.apache.http.HttpResponse httpResponse = client.execute(post);  
        String responseString = EntityUtils.toString(httpResponse.getEntity());  
        System.out.println(responseString); 
    }
}

如下,为程序返回的数据:

{"pageCount":1,"remark":null,"addpost":null,"isModify":null,"mainActivityOther":null,"mainactivity":null,"labfeature":null,"remarkEn":null,"sizePerPage":1,"asstId":"3ee5aa672cbf44d0a2d9906b2bae70c5","addcode":null,"startIndex":0,"mainActivityOtherEn":null,"nameCn":null,"primaryRecommend":null,"branchId":null,"currPage":1,"statementPrefix":"getPageKeyBranch","totalSize":1,"labFeatureList":null,"keyNum":null,"data":[{"remark":null,"addpost":null,"isModify":null,"keyNum":1,"labFeatureList":[{"baseInfoId":null,"branchId":null,"createBy":null,"createTs":null,"feature":"101001","id":"1974ed78b9a8409ba1ddd9dbc349098c","isModify":null,"labfeatureId":"1974ed78b9a8409ba1ddd9dbc349098c","other":null,"otherEn":null,"sourceId":null,"sqlUpdateType":null,"updateBy":null,"updateTs":null}],"mainactivity":"177001, 177003, 177004, 177005","mainActivityOther":null,"remarkEn":null,"labfeature":null,"addEn":"Bioassay and Safety Assessment Building, No.1500, Zhangheng Road, Zhangjiang Hi-Tech Park, Pudong New District, Shanghai, China","addCn":"上海市浦东新区张江高科技园区张衡路1500号生物与安全检测楼","postCode":"201203","addcode":null,"asstId":null,"mainActivityOtherEn":null,"labFeatureJson":"[{\"feature\":\"101001\"}]","nameCn":"上海市检测中心生物与安全检测实验室","primaryRecommend":null,"branchId":"3b00ef1f777247e1a2abd6e4b51ea1a8"}],"addEn":null,"addCn":null,"postCode":null,"labFeatureJson":null,"provider":[],"limit":0}

你可能感兴趣的:(java,基于java网络爬虫)