HttpClient使用详解

在实际开发中,常常用到开源项目 HttpClinet 。HttpClinet 用到一个基础的 jar 包,如 httpcore-4.1.2.jar、应用层的 jar 包、httpclient-4.1.2.jar 包、commons-logging-1.1.1.jar 包


不同客户端的请求方式

/**************不同客户端创建*******************/
CloseableHttpClient httpclient = HttpClientBuilder.create().build();
DefaultHttpClient httpclient = new DefaultHttpClient();  //

/************* GET 请求创建及请求头配置***********/
//第一种方式,但在执行请求时,使用 HttpResponse response = httpclient.execute(targetHost, httpget);
HttpHost targetHost = new HttpHost("www.baidu.com");
HttpGet httpget = new HttpGet("/");
//第二种方式
HttpGet httpget = new HttpGet("www.baidu.com");

//请求头配置
httpget.setHeader("Host", "www.baidu.com");
httpget.setHeader("Accept-Language", "en-US,en;q=0.8");
httpget.setHeader("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/60.0.3112.113 Chrome/60.0.3112.113 Safari/537.36");

/******************POST 请求创建及请求头配置********************/
HttpPost httppost = new HttpPost("www.baidu.com/login");


/************不同的客户端的网页响应共用代码块***************/
HttpResponse response = httpclient.execute(httpget);  //发出请求
//获取响应状态码
int code = response.getStatusLine().getStatusCode();  
//获取网页内容
HttpEntity entity = response.getEntity();  //获取网页内容流
EntityUtils.toString(entity, "utf-8");     //以字符串的形式返回
//关闭内容流
EntityUtils.consume(entity);               


/*************不同的客户端的释放连接方法****************/
//CloseableHttpClient 类的对象专用的释放连接方法,即 close() 方法
httpclient.close(); //推荐使用 EntityUtils.consumeQuietly(entity) 方法保证完全消费了实体对象

//DefaultHttpClient 类的释放连接方法
httpclient.getConnectionManager().shutdown();

请求头信息设置的方法

第一种方法:

HttpGet httpget = new HttpGet("/");
httpget.setHeader("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/60.0.3112.113 Chrome/60.0.3112.113 Safari/537.36");

第二种方法:

public static List<Header> getHeaders(){
    //头信息
    String userAgent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/60.0.3112.113 Chrome/60.0.3112.113 Safari/537.36";
    List<Header> headers = new ArrayList<Header>();
    headers.add(new BasicHeader("Accept-Language", "en-US,en;q=0.8"));
    headers.add(new BasicHeader("Accept-Encoding", "gzip, deflate, br"));
    headers.add(new BasicHeader("User-Agent", userAgent));
    return headers;
}
List<Header> headers = getHeaders();
CloseableHttpClient httpclient = HttpClientBUilder.create().setDefaultHeaders(headers).build();

使用 HTTP GET 方法下载网页(其中,增加判断网页字符编码)

public static string downHtml(String url) throws IOException {

    //创建一个客户端,类似于打开一个浏览器
    DefaultHttpClient httpclient = new DefaultHttpClient();
    try{
        //创建一个 GET 方法,类似于在浏览器地址栏输入一个地址
        HttpGet httpget = new HttpGet(url);
        //类似于在浏览器地址栏中输入回车,获得网页内容
        HttpResponse response = httpclient.execute(httpget);

        //网页字符编码判断
        Pattern pattern = Pattern.compile("text/html;[\\s*charset=(.*)]");
        Header arr = response.getHeaders("Content-Type");
        String charset = "utf-8";
        if(arr!=null || arr.length!=0){
            String content = arr[0].getValue().toLowerCase();
            Matcher m = pattern.matcher(content);
            if(m.find()){
                charset = m.group(1);
            }
        }

        //查看返回的网页内容,类似于在浏览器查看网页源代码
        HttpEntity entity = response.getEntity();
        if(entity != null){
            InputStream instream = entity.getContent();
            InputStreamReader ir = new InputStreamReader(instream, charset);
            BufferedReader reader = new BufferedReader(ir);
            StringBuilder builder = new StringBuidler();
            char[] chars = new char[4096];
            int length = 0;
            while(0<(length=reader.read(chars))){
                builder.append(chars, 0, length);
            }
            return builder.toString();
        } catch(Exception e) {
            e.printStackTrace();
        } finally {
            httpclient.getConnectionManager().shutdown();  //释放连接 
        }
        return null;
    }

HttpClient 使用 POST 方法下载网页

例子:按城市抓取酒店信息,http://xxx.aspx

"aspnetForm" method="post" action="SearchHotel.aspx" id="aspnetFrom"> type="hidden" name="cityId" value="" /> type="hidden" name="checkIn" value="" /> type="hidden" name="checkOut" value="" />

由于 POST 方法需要提交的三个参数分别包括键和值,NameValuePair 是一个接口,而 BasicNameValuePair 则是这个接口的实现,使用 BasicNameValuePair 封装键值对,例如 参数 cityId 对应的值是 1,代码如下所示

new BasicNameValuePair("cityId", "1");
HttpClient httpclient = new DefaultHttpClient();

//使用 HttpPost 发送 POST 请求
HttpPost httppost = newPost("http://xxx.aspx");
//POST 数据
List nameValuePairs = new ArrayList(3);  // 3 个参数
nameValuePairs.add(new BasicNameValuePair("checkIn", "2011-4-15"));    //入住日期
nameValuePairs.add(new BasicNameValuePair("checkOut", "2011-4-25"));   //离店日期
nameValuePairs.add(new BasicNameValuePair("cityId", "1"));             //城市编码
httppost.setEntity(new UrlEncodedFormEntity(nameValuePairs));

//执行 HTTP POST 请求
HttpResponse response = httpclient.execute(httppost);

//取得内容流
HttpEntity entity = response.getEntity();
InputStream is = entity.getContent();
BufferedInputStream bis = new BufferedInputStream(is);
ByteArrayBuffer baf = new ByteArrayBuffer(20);

//按字节读入内容流到字节数组缓存
int current = 0;
while((current=bis.read()) != -1){
    baf.append((byte) current);
}
String text = new String(baf.toByteArray(), "utf-8");  //指定编码
System.out.println(text);

//当不再需要 HttpClient 实例时,关闭连接管理器,以确保立即释放所有系统资源
httpclient.getConnectionManager().shutdown();

你可能感兴趣的:(java-爬虫)