java 爬虫之 HelloWorld

本案例没有模拟浏览器,可能被某些网站拒绝访问。
新建 maven 项目


java 爬虫之 HelloWorld_第1张图片

pom.xml 依赖

    
      org.apache.httpcomponents
      httpclient
      4.5.6
    

Demo.java

package com.open1111.httpclient;
public class Demo {
    /**
     * HttpClient 用来提供高效的支持 HTTP 协议的客户端编程工具包
     * 它支持 HTTP 协议最新的版本和建议。
     * @param args
     */
    public static void main(String[] args) {

        // create entity for HttpClient
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // get a webset
        HttpGet httpGet = new HttpGet("http://www.baidu.com");

        HttpResponse response = null;
        try {
            // httpClient execute http and return a response
            response = httpClient.execute(httpGet);
        } catch (IOException e) {
            e.printStackTrace();
        }

        // get entity from response to resolve
        HttpEntity entity = response.getEntity();

        try {
            System.out.println(EntityUtils.toString(entity, "utf-8"));
        } catch (IOException e) {
            e.printStackTrace();
        }

        try {
            httpClient.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

结果:

java 爬虫之 HelloWorld_第2张图片

结论:
对于没有限制爬虫的网站,可以访问
referenece:感谢 java1234 的教程
http://www.java1234.com/javapachongxuexiluxiantu.html

你可能感兴趣的:(java 爬虫之 HelloWorld)