Jsoup访问https网址异常SSLHandshakeException(已解决)

爬取网页遇到的目标站点证书不合法问题。

使用jsoup爬取解析网页时,出现了如下的异常情况。

[html] view plain copy
print ?
  1. javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target  
  2.         at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)  
  3.         at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1627)  
  4.         at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:204)  
  5.         at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:198)  
  6.         at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:994)  
  7.         at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:142)  
  8.         at sun.security.ssl.Handshaker.processLoop(Handshaker.java:533)  
  9.         at sun.security.ssl.Handshaker.process_record(Handshaker.java:471)  
  10.         at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:904)  
  11.         at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1132)  
  12.         at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:643)  
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
        at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1627)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:204)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:198)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:994)
        at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:142)
        at sun.security.ssl.Handshaker.processLoop(Handshaker.java:533)
        at sun.security.ssl.Handshaker.process_record(Handshaker.java:471)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:904)
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1132)
        at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:643)

查明是无效的SSL证书问题。由于现在很多网站由http全站升级到https,可能是原站点SSL没有部署好,导致证书无效,也有可能是其证书本身就不被认可。 对于爬取其网页就会出现证书验证出错的问题。
对于使用Jsoup自带接口来下载网页的,最新版本的1.9.2有validateTLSCertificates(boolean false)接口即可。
[html] view plain copy
print ?
  1. Jsoup.connect(url).timeout(30000).userAgent(UA).validateTLSCertificates(false).get()  
Jsoup.connect(url).timeout(30000).userAgent(UA).validateTLSCertificates(false).get()
java默认的证书集合里面不存在对于多数自注册的证书,对于不使用第三方库来做http请求的话,我们可以手动
创建 TrustManager 来解决。确定要建立的链接的站点,否则不推荐这种方式
[java] view plain copy
print ?
  1. public static InputStream getByDisableCertValidation(String url) {  
  2.         TrustManager[] trustAllCerts = new TrustManager[] {new X509TrustManager() {  
  3.             public X509Certificate[] getAcceptedIssuers() {  
  4.                 return new X509Certificate[0];  
  5.             }  
  6.             public void checkClientTrusted(X509Certificate[] certs, String authType) {  
  7.             }  
  8.             public void checkServerTrusted(X509Certificate[] certs, String authType) {  
  9.             }  
  10.         } };  
  11.   
  12.         HostnameVerifier hv = new HostnameVerifier() {  
  13.             public boolean verify(String hostname, SSLSession session) {  
  14.                 return true;  
  15.             }  
  16.         };  
  17.   
  18.         try {  
  19.             SSLContext sc = SSLContext.getInstance(”SSL”);  
  20.             sc.init(null, trustAllCerts, new SecureRandom());  
  21.             HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());  
  22.             HttpsURLConnection.setDefaultHostnameVerifier(hv);  
  23.   
  24.             URL uRL = new URL(url);  
  25.             HttpsURLConnection urlConnection = (HttpsURLConnection) uRL.openConnection();  
  26.             InputStream is = urlConnection.getInputStream();  
  27.             return is;  
  28.         } catch (Exception e) {  
  29.         }  
  30.         return null;  
  31.     }  
public static InputStream getByDisableCertValidation(String url) {
        TrustManager[] trustAllCerts = new TrustManager[] {new X509TrustManager() {
            public X509Certificate[] getAcceptedIssuers() {
                return new X509Certificate[0];
            }
            public void checkClientTrusted(X509Certificate[] certs, String authType) {
            }
            public void checkServerTrusted(X509Certificate[] certs, String authType) {
            }
        } };

        HostnameVerifier hv = new HostnameVerifier() {
            public boolean verify(String hostname, SSLSession session) {
                return true;
            }
        };

        try {
            SSLContext sc = SSLContext.getInstance("SSL");
            sc.init(null, trustAllCerts, new SecureRandom());
            HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
            HttpsURLConnection.setDefaultHostnameVerifier(hv);

            URL uRL = new URL(url);
            HttpsURLConnection urlConnection = (HttpsURLConnection) uRL.openConnection();
            InputStream is = urlConnection.getInputStream();
            return is;
        } catch (Exception e) {
        }
        return null;
    }


refer:

http://snowolf.iteye.com/blog/391931

http://stackoverflow.com/questions/1828775/how-to-handle-invalid-ssl-certificates-with-apache-httpclient

Jsoup访问https网址异常SSLHandshakeException:
解决方式:

Jsoup.connect(url)
.timeout(30000)
.userAgent(UA)
.validateTLSCertificates(false)
.get()

原文地址:http://blog.csdn.net/louxuez/article/details/52814538
感谢原作者的分享,谢谢。如有侵犯,请联系笔者删除。QQ:337081267

你可能感兴趣的:(Jsoup访问https网址异常SSLHandshakeException(已解决))