HttpClient类包基本使用方法

爬虫之HttpClient类包基本使用方法(一)

前言

定位:客户端http运输实现库

基于:HttpCore

目标:发送和接收http报文

特点:经典IO(阻塞)

一.使用过程介绍

1.1Get与Post:

HttpClient类包主要目标是发送和接收报文,即HttpGet和HttpPost

1.2使用HttpGet时共分为7步:

A. 创建HttpClient对象,由于HttpClient是抽象类,所以一般使用DefaultHttpClient类

B.创建一个HttpGet对象,一般可传入url参数将其初始化

C.创建HttpResponse对象,将HttpGet对象传入HttpClient对象的execute方法,使用其结果初始化HttpResponse

D.创建HttpEntity对象,使用HttpResponse的getEntity方法初始化HttpEntity

E.对HttpEntity(http实体)进行操作

F.销毁HttpEntity对象

G.关闭httpGet连接

经典的代码如下所示:

DefaultHttpClient httpClient = new DefaultHttpClient();                               (A)

HttpGet httpGet = new HttpGet(url);                                             (B)

HttpResponse httpResponse = httpClient.execute(httpGet);                           (C)

HttpEntity httpEntity = httpResponse.getEntity();                                    (D)

Do something on httpEntity                                                    (E)

EntityUtils.consume(httpEntity)                                                 (F)

httpGet.releaseConnection()                                                   (G)

1.3使用HttpPost时:

A.创建HttpClient对象,由于HttpClient是抽象类,所以一般使用DefaultHttpClient类

B.创建一个HttpPost对象,一般可传入url参数将其初始化

C.创建一个链表,按要求添加提交的内容

D.将链表转化为实体并关联HttpPost

E.创建HttpResponse对象,使用HttpClient执行HttpPost的结果初始化

F.创建HttpEntity对象,使用HttpResponse对象getEntity方法初始化

G.对HttpEntity(http实体)进行操作

H.销毁HttpEntity对象

I.关闭HttpPost连接(经典的代码略,可参见二)

二 代码及一些注释

使用Get与Post方法(参考QuickStart.java,并给出一些注释,见结束部分):

package testhttpclient;

import java.util.ArrayList;

import java.util.List;

import java.util.logging.Level;

import java.util.logging.Logger;

 

import java.io.UnsupportedEncodingException;

import java.io.IOException;

 

import org.apache.http.impl.client.DefaultHttpClient;

import org.apache.http.client.methods.HttpPost;

import org.apache.http.client.methods.HttpGet;

import org.apache.http.NameValuePair;//在httpcore中,想必是与http底层有关

import org.apache.http.message.BasicNameValuePair;//这个貌似也是在httpcore中

import org.apache.http.client.entity.UrlEncodedFormEntity;

import org.apache.http.HttpResponse;

import org.apache.http.client.ClientProtocolException;

import org.apache.http.HttpEntity;

import org.apache.http.util.EntityUtils;

/*

 * commoms-logging加入之后就没有抛出异常了

*/

/*

 * test1:测试get与post方法

 * 目标:能理解get与post方法的使用过程

 */

/**

 *

 * @author dongweiliu

 */

public class TestHttpClient {

   

    public static void main(String[] args) throwsException{

        // TODO codeapplication logic here

        DefaultHttpClienthttpClient = new DefaultHttpClient();

       

        HttpGet httpGet = newHttpGet("http://www.baidu.com");

        HttpResponsehttpResponseG = httpClient.execute(httpGet);

        try{

           System.out.println(httpResponseG.getStatusLine());

            HttpEntityhttpEntityG = httpResponseG.getEntity();

            /*

             * do somethingwith httpEntityG

             */

           System.out.println(httpEntityG.getContentLength());

           EntityUtils.consume(httpEntityG);

        }

        finally{

           httpGet.releaseConnection();

        }

       

        HttpPost httpPost =new HttpPost("http://www.baidu.com");

        List<NameValuePair> nvp = new ArrayList<NameValuePair>();

        nvp.add(newBasicNameValuePair("username", "liudongwei"));

        nvp.add(newBasicNameValuePair("password", "nuaa"));

        try {

            httpPost.setEntity(newUrlEncodedFormEntity(nvp));

        } catch(UnsupportedEncodingException ex) {

           Logger.getLogger(TestHttpClient.class.getName()).log(Level.SEVERE, null,ex);

        }

        HttpResponsehttpResponseP = httpClient.execute(httpPost);

       

        try{

           System.out.println(httpResponseP.getStatusLine());

            HttpEntityhttpEntityP = httpResponseP.getEntity();

            /*

             * do somethingwith httpEntity

             */

            EntityUtils.consume(httpEntityP);

        }

        finally{

           httpPost.releaseConnection();

        }

    }

}


/*

 * 注:需要导入的类:

 * htmllexer.jar  (一种词法分析器,核心是采用的核心算法是通过函数readch()把下一个输入字符读到变量peek中,函数scan在略过所有空白字符后,首先试图识别像"<="这样的复合词法单元和整数数字,如果不成功,它就试图读入一个字符)

 * httpclient-4.2.3.jar

 * httpclient-cache-4.2.3.

 * httpcore-4.2.2.jar  httpclient的内核基于httpcore

 * common-codec-1.6.jarcommon  项目中常用于处理编码方法问题的工具包,如DES,MD5,Base64

 *commons-logging-1.1.1.jar   作为日志文件用

 */

后记

经过一个晚上的学习,我发现自己也学到了不少东西,但是同样也走了不少弯路。希望这篇文章能为将要和打算要学习HtmlClient的人提供一些前期的帮助。

你可能感兴趣的:(httpclient,爬虫,介绍)