JSOUP 抓取HTTPS/HTTP网页,校验问题

针对一般的http请求是不需要的校验的。但是https安全校验过总过不去。最后找到以下方法,终于成功。

让我们的站点信任所有站点,不需要引包,系统自带ssl证书校验,话不多数,贴代码。

package app_info;

import java.io.IOException;
import java.security.SecureRandom;
import java.security.cert.CertificateException;
import java.security.cert.X509Certificate;
import java.util.Map;

import javax.net.ssl.HostnameVerifier;
import javax.net.ssl.HttpsURLConnection;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLSession;
import javax.net.ssl.X509TrustManager;

import org.apache.commons.lang.StringUtils;
import org.jsoup.Connection;
import org.jsoup.Connection.Response;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class TestJsoup {

	public static void main(String[] args) {
		System.out.println(getRedirectUrl("url"));
	}	

	/**
	 * 信任任何站点,实现https页面的正常访问
	 * 
	 */
	public static void trustEveryone() {
		try {
			HttpsURLConnection.setDefaultHostnameVerifier(new HostnameVerifier() {
				public boolean verify(String hostname, SSLSession session) {
					return true;
				}
			});

			SSLContext context = SSLContext.getInstance("TLS");
			context.init(null, new X509TrustManager[] { new X509TrustManager() {
				public void checkClientTrusted(X509Certificate[] chain, String authType) throws CertificateException {
				}

				public void checkServerTrusted(X509Certificate[] chain, String authType) throws CertificateException {
				}

				public X509Certificate[] getAcceptedIssuers() {
					return new X509Certificate[0];
				}
			} }, new SecureRandom());
			HttpsURLConnection.setDefaultSSLSocketFactory(context.getSocketFactory());
		} catch (Exception e) {
			// e.printStackTrace(); 
		}
	}

        public static String getRedirectUrl(String url) {
		trustEveryone();  //需要的地方加引用
		String baseuri = "";
		Document doc = null;
		Connection conn = Jsoup.connect(url);
		conn.header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
		conn.header("Accept-Encoding", "gzip, deflate, sdch");
		conn.header("Accept-Language", "zh-CN,zh;q=0.8");
		conn.header("Cache-Control", "max-age=0");
		conn.header("Connection", "keep-alive");
		conn.header("Upgrade-Insecure-Requests", "1");
		conn.header("User-Agent",
				"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36");

		try {
			Map cookies = null;
			Response res = Jsoup.connect(url).timeout(30000).execute();
			cookies = res.cookies();
			conn.cookies(cookies);
			doc = conn.timeout(10000).get();
			baseuri = doc.baseUri();
		} catch (IOException e) {
			System.out.println("url :" + url);
			e.printStackTrace();

			if (StringUtils.isEmpty(baseuri)) {

				Map cookies = null;
				Response res;
				try {
					res = Jsoup.connect(url).timeout(30000).execute();
					cookies = res.cookies();
					conn.cookies(cookies);
					doc = conn.timeout(10000).get();
					baseuri = doc.baseUri();
				} catch (IOException e1) {
					System.out.println("url :" + url);
					e.printStackTrace();
					return baseuri;
				}

			}
			System.out.println("http.proxyHost:" + System.getProperty("http.proxyHost") + " http.proxyPort:"
					+ System.getProperty("http.proxyPort"));
			return baseuri;
		}
		return baseuri;
	}
}

然后就是使用了   ,

在需要进行创建请求对象之前加入这个方法就行。

 

亲测有效,这是目前我正在使用的方法。

不加的时候跑异常:javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

你可能感兴趣的:(jsoup)