Heritrix 3.1.0 源码解析(二十五)

Heritrix 3.1.0 源码解析(二十三)中我们分析了Heritrix3.1.0系统是怎样扩展HttpClient组件的HttpConnection连接对象和相应的管理接口HttpConnectionManager

HttpConnection连接对象里面创建了SOCKET连接,但是还没用向输出流写数据,也没有从输入流读数据, 这里面HttpClient组件是怎么实现的,Heritrix3.1.0系统又是怎么扩展的呢?

我们知道,当我们用HttpClient组件执行网页请求时,根据我们要请求的网页是GET请求还是POST请求我们创建相应的GetMethod类或PostMethod类(当然还有其他方式,浏览器暂不支持)

这些请求类实现了共同的接口HttpMethod,该接口声明了所有请求需要实现的方法(该接口声明方法比较多,逻辑上可以将它们分为与Request相关部分和与Response相关部分,便于理解),下面列出的是里面的重要方法

public interface HttpMethod {   // ---------------------------------------------------------------- Queries

    //与Response相关部分

    boolean validate();



    int getStatusCode();

   

    byte[] getResponseBody() throws IOException;



    String getResponseBodyAsString() throws IOException;



    InputStream getResponseBodyAsStream() throws IOException;    int execute(HttpState state, HttpConnection connection) 

        throws HttpException, IOException;    void releaseConnection();boolean getDoAuthentication();



    void setDoAuthentication(boolean doAuthentication);



    public HttpMethodParams getParams();



    public void setParams(final HttpMethodParams params);



    public AuthState getHostAuthState();



    public AuthState getProxyAuthState();



    boolean isRequestSent();

}

当我们执行一个请求时,实际会调用接口实现类的execute方法

实现该接口有一个抽象类HttpMethodBase,该抽象类实现了所有继承类(所有请求方式)的共同方法,主要是SOCKET输出流和输入流的处理,其中最重要的是execute方法

/**

     * Executes this method using the specified <code>HttpConnection</code> and

     * <code>HttpState</code>. 

     *

     * @param state {@link HttpState state} information to associate with this

     *        request. Must be non-null.

     * @param conn the {@link HttpConnection connection} to used to execute

     *        this HTTP method. Must be non-null.

     *

     * @return the integer status code if one was obtained, or <tt>-1</tt>

     *

     * @throws IOException if an I/O (transport) error occurs

     * @throws HttpException  if a protocol exception occurs.

     */

    public int execute(HttpState state, HttpConnection conn)

        throws HttpException, IOException {

                

        LOG.trace("enter HttpMethodBase.execute(HttpState, HttpConnection)");



        // this is our connection now, assign it to a local variable so 

        // that it can be released later

        this.responseConnection = conn;



        checkExecuteConditions(state, conn);

        this.statusLine = null;

        this.connectionCloseForced = false;



        conn.setLastResponseInputStream(null);



        // determine the effective protocol version

        if (this.effectiveVersion == null) {

            this.effectiveVersion = this.params.getVersion(); 

        }

        //Socket输出流

        writeRequest(state, conn);

        this.requestSent = true;

        //Socket输入流

        readResponse(state, conn);

        // the method has successfully executed

        used = true; 



        return statusLine.getStatusCode();

    }

上面方法中的writeRequest(state, conn)负责写入流,readResponse(state, conn)负责读取流

writeRequest(state, conn)方法写入流的过程无非是组装数据,Heritrix3.1.0系统就是通过这个入口切入的,并改写了HttpMethodBase类,写入自定义的逻辑,包括cookies的写入和form参数的写入等(这部分待分析HERITRIX3.1.0系统的自定义cookies和form封装再分析吧)

该方法除了执行上述公用的逻辑外,还继续调用了boolean writeRequestBody(HttpState state, HttpConnection conn)方法,该方法通常由子类实现

该抽象类HttpMethodBase的继承类提供对应请求方式的自身方法实现,我这里只分析Heritrix3.1.0系统自定义的HttpRecorderGetMethod类和HttpRecorderPostMethod类

public class HttpRecorderGetMethod extends GetMethod {

    

    protected static Logger logger =

        Logger.getLogger(HttpRecorderGetMethod.class.getName());

    

    /**

     * Instance of http recorder method.

     */

    protected HttpRecorderMethod httpRecorderMethod = null;

    



    public HttpRecorderGetMethod(String uri, Recorder recorder) {

        super(uri);

        this.httpRecorderMethod = new HttpRecorderMethod(recorder);

    }



    protected void readResponseBody(HttpState state, HttpConnection connection)

    throws IOException, HttpException {

        // We're about to read the body.  Mark transition in http recorder.

        this.httpRecorderMethod.markContentBegin(connection);

        super.readResponseBody(state, connection);

    }



    protected boolean shouldCloseConnection(HttpConnection conn) {

        // Always close connection after each request. As best I can tell, this

        // is superfluous -- we've set our client to be HTTP/1.0.  Doing this

        // out of paranoia.

        return true;

    }



    public int execute(HttpState state, HttpConnection conn)

    throws HttpException, IOException {

        // Save off the connection so we can close it on our way out in case

        // httpclient fails to (We're not supposed to have access to the

        // underlying connection object; am only violating contract because

        // see cases where httpclient is skipping out w/o cleaning up

        // after itself).

        this.httpRecorderMethod.setConnection(conn);

        return super.execute(state, conn);

    }

    

    protected void addProxyConnectionHeader(HttpState state, HttpConnection conn)

            throws IOException, HttpException {

        super.addProxyConnectionHeader(state, conn);

        this.httpRecorderMethod.handleAddProxyConnectionHeader(this);

    }

}

该类的构造方法除了传入URL字符串外,还包括Recorder recorder对象用于初始化成员对象HttpRecorderMethod httpRecorderMethod,该对象包含两个成员Recorder httpRecorder对象和HttpConnection connection对象,在HttpRecorderPostMethod类的相关方法里面,除了调用父类的同名方法外,就是调用HttpRecorderMethod httpRecorderMethod对象的相关方法,包括设置自身的HttpConnection connection成员对象和回调Recorder httpRecorder对象方法(输入流的预备工作)

HttpRecorderPostMethod类继承自PostMethod类,与HttpRecorderGetMethod类的基本逻辑很类似,我就不再分析了

---------------------------------------------------------------------------

本系列Heritrix 3.1.0 源码解析系本人原创

转载请注明出处 博客园 刺猬的温驯

本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/28/3048387.html

你可能感兴趣的:(Heritrix)