httpclient爬取https网页数据

1、下载证书,打开谷歌浏览器,输入网址,点击安全锁,查看证书
httpclient爬取https网页数据_第1张图片
2、打开IE,点击设置->internet选项->内容->证书->中间证书颁发机构,找到对应的证书,导出保存,我保存在D./test.cer。
httpclient爬取https网页数据_第2张图片
3、进入JDK 安装bin目录,命令行运行:keytool -import -alias Root -file d:/test.cer -keystore “d:/test.keystore” -storepass 123456。转换证书
4、代码实现

public static Document sendHttp(String url) {
        String html = "";
        CloseableHttpClient httpclient = null;
        CloseableHttpResponse response = null;
        try {
            SSLConnectionSocketFactory sslsf = createSSLConnSocketFactory();
            httpclient = HttpClients.custom()
                .setSSLSocketFactory(sslsf).build();
            HttpGet httpget = new HttpGet(url);
            httpget.addHeader(HttpHeaders.USER_AGENT,
                    "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0");
            RequestConfig requestConfig = RequestConfig.custom()
                    .setSocketTimeout(10000).setConnectTimeout(10000).build();// 设置请求和传输超时时间
            httpget.setConfig(requestConfig);
            System.out.println("Executing request " + httpget.getRequestLine());
            response = httpclient.execute(httpget);
            HttpEntity entity = response.getEntity();
            System.out.println("----------------------------------------");
            System.out.println(response.getStatusLine());
            int resStatu = response.getStatusLine().getStatusCode();// 返回码
            if (resStatu == HttpStatus.SC_OK) {// 200正常 其他就不对
                // 获得相应实体
                if (entity != null) {
                    html = EntityUtils.toString(entity, "UTF-8");
                    html = html.replace(" ", " ");
                }
            }
            EntityUtils.consume(entity);
        } catch(Exception e){
            e.printStackTrace();
        }finally{
            if(response!=null){
                try {
                    response.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            if(httpclient!=null){
                try {
                    httpclient.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        Document document = Jsoup.parse(html);
		return document;
    }
	
	// ssl通道证书的创建
    private static SSLConnectionSocketFactory createSSLConnSocketFactory()
            throws Exception {
        SSLContext sslcontext = SSLContexts
                .custom()
                .loadTrustMaterial(
                        new File(
                                "D:/test.keystore"),
                        "123456".toCharArray(), new TrustSelfSignedStrategy())   //文件和密码要对应
                .build();
        SSLConnectionSocketFactory sslsf = new SSLConnectionSocketFactory(
                sslcontext, new String[] { "TLSv1" }, null,
                SSLConnectionSocketFactory.getDefaultHostnameVerifier());
        return sslsf;
    }

5、完美运行爬取到需要数据。

你可能感兴趣的:(Java)