【httpclient编写爬虫】post提交json数据和普通键值

写在开头

在开发爬虫的过程中,难免碰到post提交的问题。
本文比较了两种数据提交方式,并且使用httpclient模拟网站post提交两种数据。

我见过的post提交方式有两种:

  1. 普通的键值对提交方式;
  2. 提交json数据。

我所使用的httpclient版本

<dependency>
    <groupId>org.apache.httpcomponentsgroupId>
    <artifactId>httpclientartifactId>
    <version>4.5.2version>
dependency>

普通键值对的提交方式

CloseableHttpClient httpclient = HttpClients.createDefault();

HttpPost httpPost = new HttpPost("http://targethost/login");
List nvps = new ArrayList();
nvps.add(new BasicNameValuePair("username", "vip"));
nvps.add(new BasicNameValuePair("password", "secret"));
httpPost.setEntity(new UrlEncodedFormEntity(nvps));
CloseableHttpResponse response2 = httpclient.execute(httpPost);

try {
    System.out.println(response2.getStatusLine());
    HttpEntity entity2 = response2.getEntity();
    // do something useful with the response body
    // and ensure it is fully consumed
    EntityUtils.consume(entity2);
} finally {
    response2.close();
}

JSON数据提交方式

要提交的数据

{
    "username" : "vip",
    "password" : "secret"
}

代码

import org.apache.http.Consts;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.IOException;

/**
 * Created by CarlZhang on 2017/1/1.
 */
public class PostJsonTest {
    public static void main(String[] args) {
        CloseableHttpClient httpclient = HttpClients.createDefault();
        try {

            HttpPost httpPost = new HttpPost("http://targethost/login");

            //json数据{"username":"vip","password":"secret"}
            String jsonStr  = "{\"username\":\"vip\",\"password\":\"secret\"}";

            StringEntity se = new StringEntity(jsonStr, Consts.UTF_8);
            se.setContentEncoding("UTF-8");
            se.setContentType("application/json");

            httpPost.setEntity(se);
            CloseableHttpResponse response2 = httpclient.execute(httpPost);

            try {
                System.out.println(response2.getStatusLine());
                HttpEntity entity2 = response2.getEntity();
                // do something useful with the response body
                // and ensure it is fully consumed
                //EntityUtils.consume(entity2);
                String res = EntityUtils.toString(entity2);
                System.out.println(res);
            } finally {
                response2.close();
            }

        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            try {
                httpclient.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

实例-JSON提交

如下是我在某个网站点击发帖,然后在chrome (按住F12键)打开的dubug工具,可以看到我提交的post请求。其中Form Data就是使用post提交的json数据。
【httpclient编写爬虫】post提交json数据和普通键值_第1张图片

后端怎么拿到这些数据的呢?
该网站使用的开源库latke中的Requests类
可以看到,它是通过Reader对象去读入流数据的。

 /**
     * Gets the request json object with the specified request.
     *
     * @param request the specified request
     * @param response the specified response, sets its content type with "application/json"
     * @return a json object
     * @throws ServletException servlet exception
     * @throws IOException io exception
     */
    public static JSONObject parseRequestJSONObject(final HttpServletRequest request, final HttpServletResponse response)
        throws ServletException, IOException {
        response.setContentType("application/json");

        final StringBuilder sb = new StringBuilder();
        BufferedReader reader;

        final String errMsg = "Can not parse request[requestURI=" + request.getRequestURI() + ", method=" + request.getMethod()
            + "], returns an empty json object";

        try {
            try {
                reader = request.getReader();
            } catch (final IllegalStateException illegalStateException) {
                reader = new BufferedReader(new InputStreamReader(request.getInputStream()));
            }

            String line = reader.readLine();

            while (null != line) {
                sb.append(line);
                line = reader.readLine();
            }
            reader.close();

            String tmp = sb.toString();

            if (Strings.isEmptyOrNull(tmp)) {
                tmp = "{}";
            }

            return new JSONObject(tmp);
        } catch (final Exception ex) {
            LOGGER.log(Level.ERROR, errMsg, ex);

            return new JSONObject();
        }
    }

另外,前端js代码是通过jquery的类库去提交的json数据
【httpclient编写爬虫】post提交json数据和普通键值_第2张图片

实例-普通post提交

【httpclient编写爬虫】post提交json数据和普通键值_第3张图片

有兴趣可以查看我这篇文章【js类库AngularJs】解决angular+springmvc的post提交问题

你可能感兴趣的:(java)