httpclient -- HttpClientBuilder(长期更新。。。)

1.httpclientBuilder 域
1.1HttpRequestExecutorrequestExec

1.2HostnameVerifierhostnameVerifier

1.3LayeredConnectionSocketFactorysslSocketFactory

1.4HttpClientConnectionManagerconnManager

1.5booleanconnManagerShared

1.6SchemePortResolverschemePortResolver

1.7ConnectionReuseStrategyreuseStrategy

1.8ConnectionKeepAliveStrategykeepAliveStrategy

1.9AuthenticationStrategytargetAuthStrategy

1.10AuthenticationStrategyproxyAuthStrategy

1.11UserTokenHandleruserTokenHandler

1.12HttpProcessorhttpprocessor

RequestContent implements HttpRequestInterceptor

该请求拦截机主要处理请求报文中的报文主体部分,如果需要重写报文首部中的TRANSFER_ENCODING 和CONTENT_LEN,
overwrite为True ,如果请求报文主体部分为空,那么增加首部
CONTENT_LEN ,”0” ,不用再添加TRANSFER_ENCODING ,
如果报文主体不为空,而且是分块的,查看请求状态的HTTp版本号,如果是1.0版本,抛出不支持异常,否则添加首部
TRANSFER_ENCODING,chunked
如果报文首部不为空,但不是分块的,添加报文首部
CONTENT_LEN,报文主体的长度。
除此之外,还可以将报文主体的ContentType和ContentEncoding添加进请求首部中。ContentEncoding表示报文主体的压缩格式。例如:gzip

RequestTargetHost: 添加目标主机IP地址(Target-Host)首部
这些信息是从httpcontent中获取的,所以这里有个问题,httpContent中的关于主机IP地址的信息是如果得到的。

RequestClientConnControl:该拦截机的目的在于添加Connection和Proxy-Connection首部。这对于持久性连接有实质性作用。同样这些信息需要从Httpcontent中获取,有疑问连接路由不清楚。

RequestUserAgent:该拦截机负责UserAgent首部的添加。

RequestExpectContinue:该拦截机负责添加Expect首部。

RequestAddCookies:添加Cookies首部。从HttpContent中获取CookiesStore,然后其中的cookie添加到request的首部。从HttpContent中获取cookieSpecRegister(这里包含若干种cookieSpe方法)。从HttpContent中获取连接路由,以及RequestConfig,requestConfig中有cookieSpe的具体方法。

RequestAcceptEncoding:添加Accept-Encoding请求首部,
接受的对响应报文主体的编码格式,例如gzip

RequestAuthCache:关于缓存。
ResponseProcessCookies:处理响应报文中的set-cookies首部
ResponseContentEncoding:响应报文中的
1.10 automaticRetriesDisabled
是否自动重新尝试连接
1.11 HttpRoutePlanner routePlannerCopy
连接路径相关
1.12 ServiceUnavailableRetryStrategy serviceUnavailStrategyCopy
1.13 redirectHandlingDisabled
重定向
1.14 backoffManager connectionBackoffStrategy
1.15 Lookup AuthSchemeProvider authSchemeRegistryCopy
1.13DnsResolverdnsResolver

1.14LinkedList<HttpRequestInterceptor>requestFirst

1.15LinkedList<HttpRequestInterceptor>requestLast

1.16LinkedList<HttpResponseInterceptor>responseFirst

1.17LinkedList<HttpResponseInterceptor>responseLast

1.18HttpRequestRetryHandlerretryHandler

1.19HttpRoutePlannerroutePlanner

1.20RedirectStrategyredirectStrategy

1.21ConnectionBackoffStrategyconnectionBackoffStrategy

1.22BackoffManagerbackoffManager

1.23ServiceUnavailableRetryStrategyserviceUnavailStrategy

1.24Lookup<AuthSchemeProvider>authSchemeRegistry

1.25Lookup<CookieSpecProvider>cookieSpecRegistry

1.26Map<String,InputStreamFactory>contentDecoderMap

1.27CookieStorecookieStore

1.28CredentialsProvidercredentialsProvider

1.28StringuserAgent

1.29HttpHostproxy

1.30Collection<?extendsHeader>defaultHeaders

1.31SocketConfigdefaultSocketConfig

1.32ConnectionConfigdefaultConnectionConfig

1.33RequestConfigdefaultRequestConfig

1.35booleanevictExpiredConnections

1.36booleanevictIdleConnections

1.37longmaxIdleTime

1.38TimeUnitmaxIdleTimeUnit

1.39booleansystemProperties

1.40booleanredirectHandlingDisabled

1.41booleanautomaticRetriesDisabled

1.42booleancontentCompressionDisabled

1.43booleancookieManagementDisabled

1.44booleanauthCachingDisabled

1.45booleanconnectionStateDisabled

1.46intmaxConnTotal=0

1.47intmaxConnPerRoute=0

1.48longconnTimeToLive=1

1.49TimeUnitconnTimeToLiveTimeUnit=TimeUnit.MILLISECONDS

1.50List<Closeable>closeables

1.50PublicSuffixMatcherpublicSuffixMatcher
用于检查给出的域名是否和public suffix list中的后缀匹配。

2.ClientExecChain中涉及到的HttpClientContext。
HttpclientBuilder中的build产生一个InternalHttpClient对象,但是一定注意在build的过程中是不涉及HttpcleintContext。HttpclientContext是execute Context也就是在调用InternalHttpClient中的execute时,才添加进来的。

final HttpClientContext localcontext = HttpClientContext.adapt(
                    context != null ? context : new BasicHttpContext());

首先取出RequestConfig,添加进入localcontext
如果没有设定如下的RequestConfig属性,那么InternalHttpClient就会设置这些属性
HttpClientContext.TARGET_AUTH_STATE
HttpClientContext.PROXY_AUTH_STATE
以下几个属性是在HttpClientBuilder的build方法中确定的,见代码片段
HttpClientContext.AUTHSCHEME_REGISTRY
HttpClientContext.COOKIESPEC_REGISTRY
HttpClientContext.COOKIE_STORE
HttpClientContext.CREDS_PROVIDER
HttpClientContext.REQUEST_CONFIG
如果用户没有设置RequestConfig,那么就用默认的RequestConfig,由Httpclientbuilder中的build返回提供,见下面的代码段

return new InternalHttpClient(
                execChain,
                connManagerCopy,
                routePlannerCopy,
                cookieSpecRegistryCopy,
                authSchemeRegistryCopy,
                defaultCookieStore,
                defaultCredentialsProvider,
                defaultRequestConfig != null ? defaultRequestConfig : RequestConfig.DEFAULT,
                closeablesCopy);
//RequestConfig.DEFAULT通过构造RequestConfig中的静态内部类builder来形成默认的requestConfig
 Builder() {
            super();
            this.staleConnectionCheckEnabled = false;
            this.redirectsEnabled = true;
            this.maxRedirects = 50;
            this.relativeRedirectsAllowed = true;
            this.authenticationEnabled = true;
            this.connectionRequestTimeout = -1;
            this.connectTimeout = -1;
            this.socketTimeout = -1;
            this.contentCompressionEnabled = true;
        }

this.staleConnectionCheckEnabled = false;
this.redirectsEnabled = true;
this.maxRedirects = 50;
this.relativeRedirectsAllowed = true;
this.authenticationEnabled = true;
this.connectionRequestTimeout = -1;
this.connectTimeout = -1;
this.socketTimeout = -1;
this.contentCompressionEnabled = true;

3.路由的确定
1.RouteInfo接口 ,用来确定路径的相关信息,例如该路径是否是管道化,是否是安全的,是否是层级的。以及给出该路径上的节点信息,例如开始节点信息getLocalAddress(),目标节点信息getTargetHost(),节点数信息getHopCount()

public interface RouteInfo {

    /**
     * The tunnelling type of a route.
     * Plain routes are established by   connecting to the target or
     * the first proxy.
     * Tunnelled routes are established by connecting to the first proxy
     * and tunnelling through all proxies to the target.
     * Routes without a proxy cannot be tunnelled.
     */
    public enum TunnelType { PLAIN, TUNNELLED }

    /**
     * The layering type of a route.
     * Plain routes are established by connecting or tunnelling.
     * Layered routes are established by layering a protocol such as TLS/SSL
     * over an existing connection.
     * Protocols can only be layered over a tunnel to the target, or
     * or over a direct connection without proxies.
     * 

* Layering a protocol * over a direct connection makes little sense, since the connection * could be established with the new protocol in the first place. * But we don't want to exclude that use case. *

*/ public enum LayerType { PLAIN, LAYERED } /** * Obtains the target host. * * @return the target host */ HttpHost getTargetHost(); /** * Obtains the local address to connect from. * * @return the local address, * or {@code null} */ InetAddress getLocalAddress(); /** * Obtains the number of hops in this route. * A direct route has one hop. A route through a proxy has two hops. * A route through a chain of n proxies has n+1 hops. * * @return the number of hops in this route */ int getHopCount(); /** * Obtains the target of a hop in this route. * The target of the last hop is the {@link #getTargetHost target host}, * the target of previous hops is the respective proxy in the chain. * For a route through exactly one proxy, target of hop 0 is the proxy * and target of hop 1 is the target host. * * @param hop index of the hop for which to get the target, * 0 for first * * @return the target of the given hop * * @throws IllegalArgumentException * if the argument is negative or not less than * {@link #getHopCount getHopCount()} */ HttpHost getHopTarget(int hop); /** * Obtains the first proxy host. * * @return the first proxy in the proxy chain, or * {@code null} if this route is direct */ HttpHost getProxyHost(); /** * Obtains the tunnel type of this route. * If there is a proxy chain, only end-to-end tunnels are considered. * * @return the tunnelling type */ TunnelType getTunnelType(); /** * Checks whether this route is tunnelled through a proxy. * If there is a proxy chain, only end-to-end tunnels are considered. * * @return {@code true} if tunnelled end-to-end through at least * one proxy, * {@code false} otherwise */ boolean isTunnelled(); /** * Obtains the layering type of this route. * In the presence of proxies, only layering over an end-to-end tunnel * is considered. * * @return the layering type */ LayerType getLayerType(); /** * Checks whether this route includes a layered protocol. * In the presence of proxies, only layering over an end-to-end tunnel * is considered. * * @return {@code true} if layered, * {@code false} otherwise */ boolean isLayered(); /** * Checks whether this route is secure. * * @return {@code true} if secure, * {@code false} otherwise */ boolean isSecure(); }

2.HttpRoute类实现RouteInfo接口,确定了多种的路径的构建方式,通过使用ArrayList来存储中间节点。

package org.apache.http.conn.routing;

import java.net.InetAddress;
import java.net.InetSocketAddress;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

import org.apache.http.HttpHost;
import org.apache.http.annotation.Contract;
import org.apache.http.annotation.ThreadingBehavior;
import org.apache.http.util.Args;
import org.apache.http.util.LangUtils;

/**
 * The route for a request.
 *
 * @since 4.0
 */
@Contract(threading = ThreadingBehavior.IMMUTABLE)
public final class HttpRoute implements RouteInfo, Cloneable {

    /** The target host to connect to. */
    private final HttpHost targetHost;

    /**
     * The local address to connect from.
     * {@code null} indicates that the default should be used.
     */
    private final InetAddress localAddress;

    /** The proxy servers, if any. Never null. */
    private final List proxyChain;

    /** Whether the the route is tunnelled through the proxy. */
    private final TunnelType tunnelled;

    /** Whether the route is layered. */
    private final LayerType layered;

    /** Whether the route is (supposed to be) secure. */
    private final boolean secure;

    private HttpRoute(final HttpHost target, final InetAddress local, final List proxies,
                     final boolean secure, final TunnelType tunnelled, final LayerType layered) {
        Args.notNull(target, "Target host");
        this.targetHost = normalize(target);
        this.localAddress = local;
        if (proxies != null && !proxies.isEmpty()) {
            this.proxyChain = new ArrayList(proxies);
        } else {
            this.proxyChain = null;
        }
        if (tunnelled == TunnelType.TUNNELLED) {
            Args.check(this.proxyChain != null, "Proxy required if tunnelled");
        }
        this.secure       = secure;
        this.tunnelled    = tunnelled != null ? tunnelled : TunnelType.PLAIN;
        this.layered      = layered != null ? layered : LayerType.PLAIN;
    }

    //TODO: to be removed in 5.0
    private static int getDefaultPort(final String schemeName) {
        if ("http".equalsIgnoreCase(schemeName)) {
            return 80;
        } else if ("https".equalsIgnoreCase(schemeName)) {
            return 443;
        } else {
            return -1;
        }

    }

    //TODO: to be removed in 5.0
    private static HttpHost normalize(final HttpHost target) {
        if (target.getPort() >= 0 ) {
            return target;
        } else {
            final InetAddress address = target.getAddress();
            final String schemeName = target.getSchemeName();
            if (address != null) {
                return new HttpHost(address, getDefaultPort(schemeName), schemeName);
            } else {
                final String hostName = target.getHostName();
                return new HttpHost(hostName, getDefaultPort(schemeName), schemeName);
            }
        }
    }

    /**
     * Creates a new route with all attributes specified explicitly.
     *
     * @param target    the host to which to route
     * @param local     the local address to route from, or
     *                  {@code null} for the default
     * @param proxies   the proxy chain to use, or
     *                  {@code null} for a direct route
     * @param secure    {@code true} if the route is (to be) secure,
     *                  {@code false} otherwise
     * @param tunnelled the tunnel type of this route
     * @param layered   the layering type of this route
     */
    public HttpRoute(final HttpHost target, final InetAddress local, final HttpHost[] proxies,
                     final boolean secure, final TunnelType tunnelled, final LayerType layered) {
        this(target, local, proxies != null ? Arrays.asList(proxies) : null,
                secure, tunnelled, layered);
    }

    /**
     * Creates a new route with at most one proxy.
     *
     * @param target    the host to which to route
     * @param local     the local address to route from, or
     *                  {@code null} for the default
     * @param proxy     the proxy to use, or
     *                  {@code null} for a direct route
     * @param secure    {@code true} if the route is (to be) secure,
     *                  {@code false} otherwise
     * @param tunnelled {@code true} if the route is (to be) tunnelled
     *                  via the proxy,
     *                  {@code false} otherwise
     * @param layered   {@code true} if the route includes a
     *                  layered protocol,
     *                  {@code false} otherwise
     */
    public HttpRoute(final HttpHost target, final InetAddress local, final HttpHost proxy,
                     final boolean secure, final TunnelType tunnelled, final LayerType layered) {
        this(target, local, proxy != null ? Collections.singletonList(proxy) : null,
                secure, tunnelled, layered);
    }

    /**
     * Creates a new direct route.
     * That is a route without a proxy.
     *
     * @param target    the host to which to route
     * @param local     the local address to route from, or
     *                  {@code null} for the default
     * @param secure    {@code true} if the route is (to be) secure,
     *                  {@code false} otherwise
     */
    public HttpRoute(final HttpHost target, final InetAddress local, final boolean secure) {
        this(target, local, Collections.emptyList(), secure,
                TunnelType.PLAIN, LayerType.PLAIN);
    }

    /**
     * Creates a new direct insecure route.
     *
     * @param target    the host to which to route
     */
    public HttpRoute(final HttpHost target) {
        this(target, null, Collections.emptyList(), false,
                TunnelType.PLAIN, LayerType.PLAIN);
    }

    /**
     * Creates a new route through a proxy.
     * When using this constructor, the {@code proxy} MUST be given.
     * For convenience, it is assumed that a secure connection will be
     * layered over a tunnel through the proxy.
     *
     * @param target    the host to which to route
     * @param local     the local address to route from, or
     *                  {@code null} for the default
     * @param proxy     the proxy to use
     * @param secure    {@code true} if the route is (to be) secure,
     *                  {@code false} otherwise
     */
    public HttpRoute(final HttpHost target, final InetAddress local, final HttpHost proxy,
                     final boolean secure) {
        this(target, local, Collections.singletonList(Args.notNull(proxy, "Proxy host")), secure,
             secure ? TunnelType.TUNNELLED : TunnelType.PLAIN,
             secure ? LayerType.LAYERED    : LayerType.PLAIN);
    }

    /**
     * Creates a new plain route through a proxy.
     *
     * @param target    the host to which to route
     * @param proxy     the proxy to use
     *
     * @since 4.3
     */
    public HttpRoute(final HttpHost target, final HttpHost proxy) {
        this(target, null, proxy, false);
    }

    @Override
    public final HttpHost getTargetHost() {
        return this.targetHost;
    }

    @Override
    public final InetAddress getLocalAddress() {
        return this.localAddress;
    }

    public final InetSocketAddress getLocalSocketAddress() {
        return this.localAddress != null ? new InetSocketAddress(this.localAddress, 0) : null;
    }

    @Override
    public final int getHopCount() {
        return proxyChain != null ? proxyChain.size() + 1 : 1;
    }

    @Override
    public final HttpHost getHopTarget(final int hop) {
        Args.notNegative(hop, "Hop index");
        final int hopcount = getHopCount();
        Args.check(hop < hopcount, "Hop index exceeds tracked route length");
        if (hop < hopcount - 1) {
            return this.proxyChain.get(hop);
        } else {
            return this.targetHost;
        }
    }

    @Override
    public final HttpHost getProxyHost() {
        return proxyChain != null && !this.proxyChain.isEmpty() ? this.proxyChain.get(0) : null;
    }

    @Override
    public final TunnelType getTunnelType() {
        return this.tunnelled;
    }

    @Override
    public final boolean isTunnelled() {
        return (this.tunnelled == TunnelType.TUNNELLED);
    }

    @Override
    public final LayerType getLayerType() {
        return this.layered;
    }

    @Override
    public final boolean isLayered() {
        return (this.layered == LayerType.LAYERED);
    }

    @Override
    public final boolean isSecure() {
        return this.secure;
    }

    /**
     * Compares this route to another.
     *
     * @param obj         the object to compare with
     *
     * @return  {@code true} if the argument is the same route,
     *          {@code false}
     */
    @Override
    public final boolean equals(final Object obj) {
        if (this == obj) {
            return true;
        }
        if (obj instanceof HttpRoute) {
            final HttpRoute that = (HttpRoute) obj;
            return
                // Do the cheapest tests first
                (this.secure    == that.secure) &&
                (this.tunnelled == that.tunnelled) &&
                (this.layered   == that.layered) &&
                LangUtils.equals(this.targetHost, that.targetHost) &&
                LangUtils.equals(this.localAddress, that.localAddress) &&
                LangUtils.equals(this.proxyChain, that.proxyChain);
        } else {
            return false;
        }
    }


    /**
     * Generates a hash code for this route.
     *
     * @return  the hash code
     */
    @Override
    public final int hashCode() {
        int hash = LangUtils.HASH_SEED;
        hash = LangUtils.hashCode(hash, this.targetHost);
        hash = LangUtils.hashCode(hash, this.localAddress);
        if (this.proxyChain != null) {
            for (final HttpHost element : this.proxyChain) {
                hash = LangUtils.hashCode(hash, element);
            }
        }
        hash = LangUtils.hashCode(hash, this.secure);
        hash = LangUtils.hashCode(hash, this.tunnelled);
        hash = LangUtils.hashCode(hash, this.layered);
        return hash;
    }

    /**
     * Obtains a description of this route.
     *
     * @return  a human-readable representation of this route
     */
    @Override
    public final String toString() {
        final StringBuilder cab = new StringBuilder(50 + getHopCount()*30);
        if (this.localAddress != null) {
            cab.append(this.localAddress);
            cab.append("->");
        }
        cab.append('{');
        if (this.tunnelled == TunnelType.TUNNELLED) {
            cab.append('t');
        }
        if (this.layered == LayerType.LAYERED) {
            cab.append('l');
        }
        if (this.secure) {
            cab.append('s');
        }
        cab.append("}->");
        if (this.proxyChain != null) {
            for (final HttpHost aProxyChain : this.proxyChain) {
                cab.append(aProxyChain);
                cab.append("->");
            }
        }
        cab.append(this.targetHost);
        return cab.toString();
    }

    // default implementation of clone() is sufficient
    @Override
    public Object clone() throws CloneNotSupportedException {
        return super.clone();
    }

}

3.DefaultRoutePlanner来确定如何根据目标主机,http请求,HttContext来生成HttpRoute。 该类只用来生成不超过一个的代理的路由。

public class DefaultRoutePlanner implements HttpRoutePlanner {

    private final SchemePortResolver schemePortResolver;

    public DefaultRoutePlanner(final SchemePortResolver schemePortResolver) {
        super();
        this.schemePortResolver = schemePortResolver != null ? schemePortResolver :
            DefaultSchemePortResolver.INSTANCE;
    }

    @Override
    public HttpRoute determineRoute(
            final HttpHost host,
            final HttpRequest request,
            final HttpContext context) throws HttpException {
        Args.notNull(request, "Request");
        if (host == null) {
            throw new ProtocolException("Target host is not specified");
        }
        final HttpClientContext clientContext = HttpClientContext.adapt(context);
        final RequestConfig config = clientContext.getRequestConfig();
        final InetAddress local = config.getLocalAddress();
        HttpHost proxy = config.getProxy();
        if (proxy == null) {
            proxy = determineProxy(host, request, context);
        }

        final HttpHost target;
        if (host.getPort() <= 0) {
            try {
                target = new HttpHost(
                        host.getHostName(),
                        this.schemePortResolver.resolve(host),
                        host.getSchemeName());
            } catch (final UnsupportedSchemeException ex) {
                throw new HttpException(ex.getMessage());
            }
        } else {
            target = host;
        }
        final boolean secure = target.getSchemeName().equalsIgnoreCase("https");
        if (proxy == null) {
            return new HttpRoute(target, local, secure);
        } else {
            return new HttpRoute(target, local, proxy, secure);
        }
    }

    /**
     * This implementation returns null.
     *
     * @throws HttpException may be thrown if overridden
     */
    protected HttpHost determineProxy(
            final HttpHost target,
            final HttpRequest request,
            final HttpContext context) throws HttpException {
        return null;
    }

}

4.DefaultProxyRoutePlanner

public class DefaultProxyRoutePlanner extends DefaultRoutePlanner {

    private final HttpHost proxy;

    public DefaultProxyRoutePlanner(final HttpHost proxy, final SchemePortResolver schemePortResolver) {
        super(schemePortResolver);
        this.proxy = Args.notNull(proxy, "Proxy host");
    }

    public DefaultProxyRoutePlanner(final HttpHost proxy) {
        this(proxy, null);
    }

    @Override
    protected HttpHost determineProxy(
        final HttpHost target,
        final HttpRequest request,
        final HttpContext context) throws HttpException {
        return proxy;
    }

}

4.HttpRouteDirector接口,用来确定下一步应该如何走,nextStep只能返回7种状态-1,0,1,2,3,4,5

public interface HttpRouteDirector {

    /** Indicates that the route can not be established at all. */
    public final static int UNREACHABLE = -1;

    /** Indicates that the route is complete. */
    public final static int COMPLETE = 0;

    /** Step: open connection to target. */
    public final static int CONNECT_TARGET = 1;

    /** Step: open connection to proxy. */
    public final static int CONNECT_PROXY = 2;

    /** Step: tunnel through proxy to target. */
    public final static int TUNNEL_TARGET = 3;

    /** Step: tunnel through proxy to other proxy. */
    public final static int TUNNEL_PROXY = 4;

    /** Step: layer protocol (over tunnel). */
    public final static int LAYER_PROTOCOL = 5;


    /**
     * Provides the next step.
     *
     * @param plan      the planned route
     * @param fact      the currently established route, or
     *                  {@code null} if nothing is established
     *
     * @return  one of the constants defined in this interface, indicating
     *          either the next step to perform, or success, or failure.
     *          0 is for success, a negative value for failure.
     */
    public int nextStep(RouteInfo plan, RouteInfo fact);

}

将request和

你可能感兴趣的:(爬虫)