Kotlin爬虫https安全校验问题

在使用Kotlin进行页面分析和爬取数据时,我们需要用到爬虫。但是如果是https协议,可能需要进行安全校验。

我们以某网站(内容保护,不指明)为例,使用Jsoup库进行爬取。

当我们运行时,会报错:

Exception in thread "main" javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
	at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1946)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:316)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:310)
	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1639)
	at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:223)
	at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037)
	at sun.security.ssl.Handshaker.process_record(Handshaker.java:965)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1064)
	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1367)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1395)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1379)
	at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1570)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:268)
	at java.net.URL.openStream(URL.java:1068)
	at kotlin.io.TextStreamsKt.readBytes(ReadWrite.kt:150)
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:450)
	at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:259)
	at sun.security.validator.Validator.validate(Validator.java:262)
	at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:330)
	at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:237)
	at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:132)
	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1621)
	... 16 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
	at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
	at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
	at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:445)
	... 22 more

这就说明我们遇到了https的安全校验问题,下面是解决办法。

我们只需要创建一个方法:

package com.. // 你的包名

import java.security.SecureRandom
import java.security.cert.CertificateException
import java.security.cert.X509Certificate
import javax.net.ssl.HttpsURLConnection
import javax.net.ssl.SSLContext
import javax.net.ssl.X509TrustManager

fun trustAny() {
    try {
        HttpsURLConnection.setDefaultHostnameVerifier { _, _ -> true }
        val context = SSLContext.getInstance("TLS")
        context.init(null, arrayOf<X509TrustManager>(object : X509TrustManager {
            @Throws(CertificateException::class)
            override fun checkClientTrusted(chain: Array<X509Certificate>, authType: String) {
            }

            @Throws(CertificateException::class)
            override fun checkServerTrusted(chain: Array<X509Certificate>, authType: String) {
            }

            override fun getAcceptedIssuers(): Array<X509Certificate?> {
                return arrayOfNulls(0)
            }
        }), SecureRandom())
        HttpsURLConnection.setDefaultSSLSocketFactory(context.socketFactory)
    } catch (e: Exception) {
        // e.printStackTrace()
    }
}

在抓取前使用 trustAny() 方法即可。

你可能感兴趣的:(Demos,https,ssl,kotlin)