Heritrix 3.1.0 源码解析(二十二)

本文继续分析Heritrix3.1.0系统的源码,其实本人感觉接下来待分析的问题不是一两篇文章能够澄清,本人不能因为迫于表述而乱了问题本身的章法,接下来的分析的Heritrix3.1.0系统封装HttpClient组件可能要分几篇文章来解析

我们知道,Heritrix3.1.0系统是通过封装HttpClient组件(里面封装了Socket)来与服务器通信的,Socket的输出流写入数据,输入流接收数据

那么Heritrix3.1.0系统是怎样封装Httpclient(Heritrix3.1.0系统是采用的以前的Apache版本)组件的呢?

我们可以看到,在FetchHTTP处理器里面有一段静态代码块,用于注册Socket工厂,分别用于HTTP通信与HTTPS通信协议(基于TCP协议通信,至于两者的关系本文就不再分析了,不懂的读者可以参考网络通信方面的教程)

/**

     * 注册http和https协议

     */

    static {

        Protocol.registerProtocol("http", new Protocol("http",

                new HeritrixProtocolSocketFactory(), 80));

        try {

            ProtocolSocketFactory psf = new HeritrixSSLProtocolSocketFactory();

            Protocol p = new Protocol("https", psf, 443); 

            Protocol.registerProtocol("https", p);

        } catch (KeyManagementException e) {

            e.printStackTrace();

        } catch (KeyStoreException e) {

            e.printStackTrace();

        } catch (NoSuchAlgorithmException e) {

            e.printStackTrace();

        }

    }

上面的两个类HeritrixProtocolSocketFactory和HeritrixSSLProtocolSocketFactory都实现了HttpClient组件的ProtocolSocketFactory接口,用于创建客户端Socket对象(HeritrixSSLProtocolSocketFactory类间接实现了ProtocolSocketFactory接口)

ProtocolSocketFactory接口定义了创建SOCKET对象的方法(package org.apache.commons.httpclient.protocol)

/**

 * A factory for creating Sockets.

 * 

 * <p>Both {@link java.lang.Object#equals(java.lang.Object) Object.equals()} and 

 * {@link java.lang.Object#hashCode() Object.hashCode()} should be overridden appropriately.  

 * Protocol socket factories are used to uniquely identify <code>Protocol</code>s and 

 * <code>HostConfiguration</code>s, and <code>equals()</code> and <code>hashCode()</code> are 

 * required for the correct operation of some connection managers.</p>

 * 

 * @see Protocol

 * 

 * @author Michael Becke

 * @author <a href="mailto:mbowler@GargoyleSoftware.com">Mike Bowler</a>

 * 

 * @since 2.0

 */

public interface ProtocolSocketFactory {



    /**

     * Gets a new socket connection to the given host.

     * 

     * @param host the host name/IP

     * @param port the port on the host

     * @param localAddress the local host name/IP to bind the socket to

     * @param localPort the port on the local machine

     * 

     * @return Socket a new socket

     * 

     * @throws IOException if an I/O error occurs while creating the socket

     * @throws UnknownHostException if the IP address of the host cannot be

     * determined

     */

    Socket createSocket(

        String host, 

        int port, 

        InetAddress localAddress, 

        int localPort

    ) throws IOException, UnknownHostException;



    /**

     * Gets a new socket connection to the given host.

     * 

     * @param host the host name/IP

     * @param port the port on the host

     * @param localAddress the local host name/IP to bind the socket to

     * @param localPort the port on the local machine

     * @param params {@link HttpConnectionParams Http connection parameters}

     * 

     * @return Socket a new socket

     * 

     * @throws IOException if an I/O error occurs while creating the socket

     * @throws UnknownHostException if the IP address of the host cannot be

     * determined

     * @throws ConnectTimeoutException if socket cannot be connected within the

     *  given time limit

     * 

     * @since 3.0

     */

    Socket createSocket(

        String host, 

        int port, 

        InetAddress localAddress, 

        int localPort,

        HttpConnectionParams params

    ) throws IOException, UnknownHostException, ConnectTimeoutException;



    /**

     * Gets a new socket connection to the given host.

     *

     * @param host the host name/IP

     * @param port the port on the host

     *

     * @return Socket a new socket

     *

     * @throws IOException if an I/O error occurs while creating the socket

     * @throws UnknownHostException if the IP address of the host cannot be

     * determined

     */

    Socket createSocket(

        String host, 

        int port

    ) throws IOException, UnknownHostException;



}

HeritrixProtocolSocketFactory类实现了上面的ProtocolSocketFactory接口(用于HTTP通信)

public class HeritrixProtocolSocketFactory implements ProtocolSocketFactory {

    /**

     * Constructor.

     */

    public HeritrixProtocolSocketFactory() {

        super();

    }

    @Override

    public Socket createSocket(String host, int port, InetAddress localAddress,

            int localPort) throws IOException, UnknownHostException {

        // TODO Auto-generated method stub

        return new Socket(host, port, localAddress, localPort);

    }

    @Override

    public Socket createSocket(String host, int port, InetAddress localAddress,

            int localPort, HttpConnectionParams params) throws IOException,

            UnknownHostException, ConnectTimeoutException {

        // TODO Auto-generated method stub

        // Below code is from the DefaultSSLProtocolSocketFactory#createSocket

        // method only it has workarounds to deal with pre-1.4 JVMs.  I've

        // cut these out.

        if (params == null) {

            throw new IllegalArgumentException("Parameters may not be null");

        }

        Socket socket = null;

        int timeout = params.getConnectionTimeout();

        if (timeout == 0) {

            socket = createSocket(host, port, localAddress, localPort);

        } else {

            socket = new Socket();

            

            InetAddress hostAddress;

            Thread current = Thread.currentThread();

            if (current instanceof HostResolver) {

                HostResolver resolver = (HostResolver)current;

                hostAddress = resolver.resolve(host);

            } else {

                hostAddress = null;

            }

            InetSocketAddress address = (hostAddress != null)?

                    new InetSocketAddress(hostAddress, port):

                    new InetSocketAddress(host, port);

            socket.bind(new InetSocketAddress(localAddress, localPort));

            try {

                socket.connect(address, timeout);

            } catch (SocketTimeoutException e) {

                // Add timeout info. to the exception.

                throw new SocketTimeoutException(e.getMessage() +

                    ": timeout set at " + Integer.toString(timeout) + "ms.");

            }

            assert socket.isConnected(): "Socket not connected " + host;

        }

        return socket;

    }

    @Override

    public Socket createSocket(String host, int port) throws IOException,

            UnknownHostException {

        // TODO Auto-generated method stub

        return new Socket(host, port);

    }

    /**

     * All instances of DefaultProtocolSocketFactory are the same.

     * @param obj Object to compare.

     * @return True if equal

     */

    public boolean equals(Object obj) {

        return ((obj != null) &&

            obj.getClass().equals(HeritrixProtocolSocketFactory.class));

    }



    /**

     * All instances of DefaultProtocolSocketFactory have the same hash code.

     * @return Hash code for this object.

     */

    public int hashCode() {

        return HeritrixProtocolSocketFactory.class.hashCode();

    }



}

HeritrixSSLProtocolSocketFactory类通过SecureProtocolSocketFactory实现SecureProtocolSocketFactory接口(间接实现了ProtocolSocketFactory接口)用于HTTPS通信

SecureProtocolSocketFactory接口方法如下

/**

 * A ProtocolSocketFactory that is secure.

 * 

 * @see org.apache.commons.httpclient.protocol.ProtocolSocketFactory

 * 

 * @author Michael Becke

 * @author <a href="mailto:mbowler@GargoyleSoftware.com">Mike Bowler</a>

 * @since 2.0

 */

public interface SecureProtocolSocketFactory extends ProtocolSocketFactory {



    /**

     * Returns a socket connected to the given host that is layered over an

     * existing socket.  Used primarily for creating secure sockets through

     * proxies.

     * 

     * @param socket the existing socket 

     * @param host the host name/IP

     * @param port the port on the host

     * @param autoClose a flag for closing the underling socket when the created

     * socket is closed

     * 

     * @return Socket a new socket

     * 

     * @throws IOException if an I/O error occurs while creating the socket

     * @throws UnknownHostException if the IP address of the host cannot be

     * determined

     */

    Socket createSocket(

        Socket socket, 

        String host, 

        int port, 

        boolean autoClose

    ) throws IOException, UnknownHostException;              



}

HeritrixSSLProtocolSocketFactory类实现上面的SecureProtocolSocketFactory接口

/**

 * Implementation of the commons-httpclient SSLProtocolSocketFactory so we

 * can return SSLSockets whose trust manager is

 * {@link org.archive.httpclient.ConfigurableX509TrustManager}.

 * 

 * We also go to the heritrix cache to get IPs to use making connection.

 * To this, we have dependency on {@link HeritrixProtocolSocketFactory};

 * its assumed this class and it are used together.

 * See {@link HeritrixProtocolSocketFactory#getHostAddress(ServerCache,String)}.

 *

 * @author stack

 * @version $Id: HeritrixSSLProtocolSocketFactory.java 6637 2009-11-10 21:03:27Z gojomo $

 * @see org.archive.httpclient.ConfigurableX509TrustManager

 */

public class HeritrixSSLProtocolSocketFactory implements SecureProtocolSocketFactory {

    // static final String SERVER_CACHE_KEY = "heritrix.server.cache";

    static final String SSL_FACTORY_KEY = "heritrix.ssl.factory";

    /***

     * Socket factory with default trust manager installed.

     */

    private SSLSocketFactory sslDefaultFactory = null;

    

    /**

     * Shutdown constructor.

     * @throws KeyManagementException

     * @throws KeyStoreException

     * @throws NoSuchAlgorithmException

     */

    public HeritrixSSLProtocolSocketFactory()

    throws KeyManagementException, KeyStoreException, NoSuchAlgorithmException{

        // Get an SSL context and initialize it.

        SSLContext context = SSLContext.getInstance("SSL");



        // I tried to get the default KeyManagers but doesn't work unless you

        // point at a physical keystore. Passing null seems to do the right

        // thing so we'll go w/ that.

        context.init(null, new TrustManager[] {

            new ConfigurableX509TrustManager(

                ConfigurableX509TrustManager.DEFAULT)}, null);

        this.sslDefaultFactory = context.getSocketFactory();

    }

    @Override

    public Socket createSocket(String host, int port, InetAddress clientHost,

        int clientPort)

    throws IOException, UnknownHostException {

        return this.sslDefaultFactory.createSocket(host, port,

            clientHost, clientPort);

    }

    @Override

    public Socket createSocket(String host, int port)

    throws IOException, UnknownHostException {

        return this.sslDefaultFactory.createSocket(host, port);

    }

    @Override

    public synchronized Socket createSocket(String host, int port,

        InetAddress localAddress, int localPort, HttpConnectionParams params)

    throws IOException, UnknownHostException {

        // Below code is from the DefaultSSLProtocolSocketFactory#createSocket

        // method only it has workarounds to deal with pre-1.4 JVMs.  I've

        // cut these out.

        if (params == null) {

            throw new IllegalArgumentException("Parameters may not be null");

        }

        Socket socket = null;

        int timeout = params.getConnectionTimeout();

        if (timeout == 0) {

            socket = createSocket(host, port, localAddress, localPort);

        } else {

            SSLSocketFactory factory = (SSLSocketFactory)params.

                getParameter(SSL_FACTORY_KEY);//SSL_FACTORY_KEY

            SSLSocketFactory f = (factory != null)? factory: this.sslDefaultFactory;

            socket = f.createSocket();

            

            Thread current = Thread.currentThread();

            InetAddress hostAddress;

            if (current instanceof HostResolver) {

                HostResolver resolver = (HostResolver)current;

                hostAddress = resolver.resolve(host);

            } else {

                hostAddress = null;

            }

            InetSocketAddress address = (hostAddress != null)?

                    new InetSocketAddress(hostAddress, port):

                    new InetSocketAddress(host, port);

            socket.bind(new InetSocketAddress(localAddress, localPort));

            try {

                socket.connect(address, timeout);

            } catch (SocketTimeoutException e) {

                // Add timeout info. to the exception.

                throw new SocketTimeoutException(e.getMessage() +

                    ": timeout set at " + Integer.toString(timeout) + "ms.");

            }

            assert socket.isConnected(): "Socket not connected " + host;

        }

        return socket;

    }

    @Override

    public Socket createSocket(Socket socket, String host, int port,

        boolean autoClose)

    throws IOException, UnknownHostException {

        return this.sslDefaultFactory.createSocket(socket, host,

            port, autoClose);

    }

    

    public boolean equals(Object obj) {

        return ((obj != null) && obj.getClass().

            equals(HeritrixSSLProtocolSocketFactory.class));

    }



    public int hashCode() {

        return HeritrixSSLProtocolSocketFactory.class.hashCode();

    }

}

HTTPS通信的SOCKET对象是通过SSLSocketFactory sslDefaultFactory(SSLSocket工厂)对象创建的,为了创建SSLSocketFactory sslDefaultFactory对象

Heritrix3.1.0系统定义了X509TrustManager接口的实现类ConfigurableX509TrustManager(用于SSL通信,自动接收证书)

/**

 * A configurable trust manager built on X509TrustManager.

 *

 * If set to 'open' trust, the default, will get us into sites for whom we do

 * not have the CA or any of intermediary CAs that go to make up the cert chain

 * of trust.  Will also get us past selfsigned and expired certs.  'loose'

 * trust will get us into sites w/ valid certs even if they are just

 * selfsigned.  'normal' is any valid cert not including selfsigned.  'strict'

 * means cert must be valid and the cert DN must match server name.

 *

 * <p>Based on pointers in

 * <a href="http://jakarta.apache.org/commons/httpclient/sslguide.html">SSL

 * Guide</a>,

 * and readings done in <a

 * href="http://java.sun.com/j2se/1.4.2/docs/guide/security/jsse/JSSERefGuide.html#Introduction">JSSE

 * Guide</a>.

 *

 * <p>TODO: Move to an ssl subpackage when we have other classes other than

 * just this one.

 *

 * @author stack

 * @version $Id: ConfigurableX509TrustManager.java 6637 2009-11-10 21:03:27Z gojomo $

 */

public class ConfigurableX509TrustManager implements X509TrustManager

{

    /**

     * Logging instance.

     */

    protected static Logger logger = Logger.getLogger(

        "org.archive.httpclient.ConfigurableX509TrustManager");



    public static enum TrustLevel { 

        /**

         * Trust anything given us.

         *

         * Default setting.

         *

         * <p>See <a href="http://javaalmanac.com/egs/javax.net.ssl/TrustAll.html">

         *  e502. Disabling Certificate Validation in an HTTPS Connection</a> from

         * the java almanac for how to trust all.

         */

        OPEN,



        /**

         * Trust any valid cert including self-signed certificates.

         */

        LOOSE,

    

        /**

         * Normal jsse behavior.

         *

         * Seemingly any certificate that supplies valid chain of trust.

         */

        NORMAL,

    

        /**

         * Strict trust.

         *

         * Ensure server has same name as cert DN.

         */

        STRICT,

    }



    /**

     * Default setting for trust level.

     */

    public final static TrustLevel DEFAULT = TrustLevel.OPEN;



    /**

     * Trust level.

     */

    private TrustLevel trustLevel = DEFAULT;





    /**

     * An instance of the SUNX509TrustManager that we adapt variously

     * depending upon passed configuration.

     *

     * We have it do all the work we don't want to.

     */

    private X509TrustManager standardTrustManager = null;





    public ConfigurableX509TrustManager()

    throws NoSuchAlgorithmException, KeyStoreException {

        this(DEFAULT);

    }



    /**

     * Constructor.

     *

     * @param level Level of trust to effect.

     *

     * @throws NoSuchAlgorithmException

     * @throws KeyStoreException

     */

    public ConfigurableX509TrustManager(TrustLevel level)

    throws NoSuchAlgorithmException, KeyStoreException {

        super();

        TrustManagerFactory factory = TrustManagerFactory.

            getInstance(TrustManagerFactory.getDefaultAlgorithm());



        // Pass in a null (Trust) KeyStore.  Null says use the 'default'

        // 'trust' keystore (KeyStore class is used to hold keys and to hold

        // 'trusts' (certs)). See 'X509TrustManager Interface' in this doc:

        // http://java.sun.com

        // /j2se/1.4.2/docs/guide/security/jsse/JSSERefGuide.html#Introduction

        factory.init((KeyStore)null);

        TrustManager[] trustmanagers = factory.getTrustManagers();

        if (trustmanagers.length == 0) {

            throw new NoSuchAlgorithmException(TrustManagerFactory.

                getDefaultAlgorithm() + " trust manager not supported");

        }

        this.standardTrustManager = (X509TrustManager)trustmanagers[0];



        this.trustLevel = level;

    }

    @Override

    public void checkClientTrusted(X509Certificate[] certificates, String type)

    throws CertificateException {

        if (this.trustLevel.equals(TrustLevel.OPEN)) {

            return;

        }



        this.standardTrustManager.checkClientTrusted(certificates, type);

    }

    @Override

    public void checkServerTrusted(X509Certificate[] certificates, String type)

    throws CertificateException {

        if (this.trustLevel.equals(TrustLevel.OPEN)) {

            return;

        }



        try {

            this.standardTrustManager.checkServerTrusted(certificates, type);

            if (this.trustLevel.equals(TrustLevel.STRICT)) {

                logger.severe(TrustLevel.STRICT + " not implemented.");

            }

        } catch (CertificateException e) {

            if (this.trustLevel.equals(TrustLevel.LOOSE) &&

                certificates != null && certificates.length == 1)

            {

                    // If only one cert and its valid and it caused a

                    // CertificateException, assume its selfsigned.

                    X509Certificate certificate = certificates[0];

                    certificate.checkValidity();

            } else {

                // If we got to here, then we're probably NORMAL. Rethrow.

                throw e;

            }

        }

    }

    @Override

    public X509Certificate[] getAcceptedIssuers() {

        return this.standardTrustManager.getAcceptedIssuers();

    }

}

---------------------------------------------------------------------------

本系列Heritrix 3.1.0 源码解析系本人原创

转载请注明出处 博客园 刺猬的温驯

本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/25/3042207.html

你可能感兴趣的:(Heritrix)