Heritrix 3.1.0 源码解析(二十三)

上文分析了Heritrix3.1.0系统是怎么扩展HttpClient组件的ProtocolSocketFactory接口用于创建HTTP和HTTPS连接的SOCKET对象的

接下来我们分析Heritrix3.1.0系统是怎么扩展HttpClient组件的HttpConnection对象的(创建SOCKET连接)

先看一下HttpConnection类的成员变量

// ----------------------------------------------------- Instance Variables

    

    /** My host. */

    private String hostName = null;

    

    /** My port. */

    private int portNumber = -1;

    

    /** My proxy host. */

    private String proxyHostName = null;

    

    /** My proxy port. */

    private int proxyPortNumber = -1;

    

    /** My client Socket. */

    private Socket socket = null;

    

    /** My InputStream. */

    private InputStream inputStream = null;



    /** My OutputStream. */

    private OutputStream outputStream = null;

    

    /** An {@link InputStream} for the response to an individual request. */

    private InputStream lastResponseInputStream = null;

    

    /** Whether or not the connection is connected. */

    protected boolean isOpen = false;

    

    /** the protocol being used */

    private Protocol protocolInUse;

    

    /** Collection of HTTP parameters associated with this HTTP connection*/

    private HttpConnectionParams params = new HttpConnectionParams();

    

    /** flag to indicate if this connection can be released, if locked the connection cannot be 

     * released */

    private boolean locked = false;

    

    /** Whether or not the socket is a secure one. */

    private boolean usingSecureSocket = false;

    

    /** Whether the connection is open via a secure tunnel or not */

    private boolean tunnelEstablished = false;

    

    /** the connection manager that created this connection or null */

    private HttpConnectionManager httpConnectionManager;

    

    /** The local interface on which the connection is created, or null for the default */

    private InetAddress localAddress;

这些成员变量都是创建SOCKET对象需要用到的参数或对象以及SOCKET的输入流输出流等,Heritrix3.1.0系统是怎么创建HttpConnection对象的呢

SingleHttpConnectionManager类的HttpConnection getConnectionWithTimeout(HostConfiguration hostConfiguration, long timeout)方法

public HttpConnection getConnectionWithTimeout(

        HostConfiguration hostConfiguration, long timeout) {



        HttpConnection conn = new HttpConnection(hostConfiguration);

        conn.setHttpConnectionManager(this);

        conn.getParams().setDefaults(this.getParams());

        return conn;

    }

 我们再来看一下HttpConnection类的构造方法

/**

     * Creates a new HTTP connection for the given host configuration.

     * 

     * @param hostConfiguration the host/proxy/protocol to use

     */

    public HttpConnection(HostConfiguration hostConfiguration) {

        this(hostConfiguration.getProxyHost(),

             hostConfiguration.getProxyPort(),

             hostConfiguration.getHost(),

             hostConfiguration.getPort(),

             hostConfiguration.getProtocol());

        this.localAddress = hostConfiguration.getLocalAddress();

    }

/**

     * Creates a new HTTP connection for the given host with the virtual 

     * alias and port via the given proxy host and port using the given 

     * protocol.

     * 

     * @param proxyHost the host to proxy via

     * @param proxyPort the port to proxy via

     * @param host the host to connect to. Parameter value must be non-null.

     * @param port the port to connect to

     * @param protocol The protocol to use. Parameter value must be non-null.

     */

    public HttpConnection(

        String proxyHost,

        int proxyPort,

        String host,

        int port,

        Protocol protocol) {



        if (host == null) {

            throw new IllegalArgumentException("host parameter is null");

        }

        if (protocol == null) {

            throw new IllegalArgumentException("protocol is null");

        }



        proxyHostName = proxyHost;

        proxyPortNumber = proxyPort;

        hostName = host;

        portNumber = protocol.resolvePort(port);

        protocolInUse = protocol;

    }

HttpConnection类的构造方法里面基本上就是初始化成员变量,我们注意到里面的初始化Protocol protocolInUse成员对象,接下来的获取SocketFactory工厂就是通过Protocol protocolInUse成员对象获取的(上文中提到Protocol类中注册了HTTP和HTTPS的SocketFactory工厂)

下面的void open()方法首先通过Protocol protocolInUse成员对象获取SOCKET对象,接着设置相关参数,得到SOCKET对象的InputStream和OutStream等

/**

     * Establishes a connection to the specified host and port

     * (via a proxy if specified).

     * The underlying socket is created from the {@link ProtocolSocketFactory}.

     *

     * @throws IOException if an attempt to establish the connection results in an

     *   I/O error.

     */

    public void open() throws IOException {

        LOG.trace("enter HttpConnection.open()");



        final String host = (proxyHostName == null) ? hostName : proxyHostName;

        final int port = (proxyHostName == null) ? portNumber : proxyPortNumber;

        assertNotOpen();

        

        if (LOG.isDebugEnabled()) {

            LOG.debug("Open connection to " + host + ":" + port);

        }

        

        try {

            if (this.socket == null) {

                usingSecureSocket = isSecure() && !isProxied();

                // use the protocol's socket factory unless this is a secure

                // proxied connection

                ProtocolSocketFactory socketFactory = null;

                if (isSecure() && isProxied()) {

                    Protocol defaultprotocol = Protocol.getProtocol("http");

                    socketFactory = defaultprotocol.getSocketFactory();

                } else {

                    socketFactory = this.protocolInUse.getSocketFactory();

                }

                this.socket = socketFactory.createSocket(

                            host, port, 

                            localAddress, 0,

                            this.params);

            }



            /*

            "Nagling has been broadly implemented across networks, 

            including the Internet, and is generally performed by default 

            - although it is sometimes considered to be undesirable in 

            highly interactive environments, such as some client/server 

            situations. In such cases, nagling may be turned off through 

            use of the TCP_NODELAY sockets option." */



            socket.setTcpNoDelay(this.params.getTcpNoDelay());

            socket.setSoTimeout(this.params.getSoTimeout());

            

            int linger = this.params.getLinger();

            if (linger >= 0) {

                socket.setSoLinger(linger > 0, linger);

            }

            

            int sndBufSize = this.params.getSendBufferSize();

            if (sndBufSize >= 0) {

                socket.setSendBufferSize(sndBufSize);

            }        

            int rcvBufSize = this.params.getReceiveBufferSize();

            if (rcvBufSize >= 0) {

                socket.setReceiveBufferSize(rcvBufSize);

            }        

            int outbuffersize = socket.getSendBufferSize();

            if ((outbuffersize > 2048) || (outbuffersize <= 0)) {

                outbuffersize = 2048;

            }

            int inbuffersize = socket.getReceiveBufferSize();

            if ((inbuffersize > 2048) || (inbuffersize <= 0)) {

                inbuffersize = 2048;

            }

            

            // START IA/HERITRIX change

            Recorder httpRecorder = Recorder.getHttpRecorder();

            if (httpRecorder == null || (isSecure() && isProxied())) {

                // no recorder, OR defer recording for pre-tunnel leg

                inputStream = new BufferedInputStream(

                    socket.getInputStream(), inbuffersize);

                outputStream = new BufferedOutputStream(

                    socket.getOutputStream(), outbuffersize);

            } else {

                inputStream = httpRecorder.inputWrap((InputStream)

                        (new BufferedInputStream(socket.getInputStream(),

                        inbuffersize)));

                outputStream = httpRecorder.outputWrap((OutputStream)

                        (new BufferedOutputStream(socket.getOutputStream(), 

                        outbuffersize)));

            }

            // END IA/HERITRIX change



            isOpen = true;

        } catch (IOException e) {

            // Connection wasn't opened properly

            // so close everything out

            closeSocketAndStreams();

            throw e;

        }

    }

我们注意到,Heritrix3.1.0系统用Recorder httpRecorder = Recorder.getHttpRecorder()对象封装了SOCKET连接的输出流和输入流,这样系统可以通过Recorder httpRecorder = Recorder.getHttpRecorder()对象得到SOCKET连接的输入流和输出流了

在Heritrix3.1.1系统里面同时封装了获取上述HttpConnection对象的构建类,它是通过扩展HttpCllient组件的SimpleHttpConnectionManager类来实现的

SimpleHttpConnectionManager类本身实现了HttpConnectionManager接口,HttpConnectionManager接口定义了构建HttpConnection连接对象的方法声明

public interface HttpConnectionManager {

    

    HttpConnection getConnection(HostConfiguration hostConfiguration);



    HttpConnection getConnection(HostConfiguration hostConfiguration, long timeout)

        throws HttpException;

    

    HttpConnection getConnectionWithTimeout(HostConfiguration hostConfiguration, long timeout)

        throws ConnectionPoolTimeoutException;

    

    void releaseConnection(HttpConnection conn);

    

    void closeIdleConnections(long idleTimeout);    

    

    HttpConnectionManagerParams getParams();

    

    void setParams(final HttpConnectionManagerParams params);

}

Heritrix3.1.0系统扩展的SingleHttpConnectionManager类如下

public class SingleHttpConnectionManager extends SimpleHttpConnectionManager {

    public SingleHttpConnectionManager() {

        super();

    }

    @Override

    public HttpConnection getConnectionWithTimeout(

        HostConfiguration hostConfiguration, long timeout) {



        HttpConnection conn = new HttpConnection(hostConfiguration);

        conn.setHttpConnectionManager(this);

        conn.getParams().setDefaults(this.getParams());

        return conn;

    }

    @Override

    public void releaseConnection(HttpConnection conn) {

        // ensure connection is closed

        conn.close();

        finishLast(conn);

    }

    

    static void finishLast(HttpConnection conn) {

        // copied from superclass because it wasn't made available to subclasses

        InputStream lastResponse = conn.getLastResponseInputStream();

        if (lastResponse != null) {

            conn.setLastResponseInputStream(null);

            try {

                lastResponse.close();

            } catch (IOException ioe) {

                //FIXME: badness - close to force reconnect.

                conn.close();

            }

        }

    }



}

扩展了获取HttpConnection连接对象和释放HttpConnection连接对象资源的方法 

---------------------------------------------------------------------------

本系列Heritrix 3.1.0 源码解析系本人原创

转载请注明出处 博客园 刺猬的温驯

本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/27/3047510.html

你可能感兴趣的:(Heritrix)