Java爬虫入门案例,第一个爬虫程序

首先创建maven工程添加依赖



    4.0.0

    crawler
    crawler
    1.0-SNAPSHOT

    
    
        org.apache.httpcomponents
        httpclient
        4.5.2
    
    
    
        org.slf4j
        slf4j-log4j12
        1.7.25
        
    



创建log4j.properties

### #配置根Logger ###
log4j.rootLogger=debug,stdout

### 输出到控制台 ###
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyy-MM-dd HH\:mm\:ss} %5p %c{1}\:%L - %m%n

创建FristCrawler类

import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.IOException;


public class FristCrawler {
    public static void main(String[] args)  {
        //1.打开浏览器
        CloseableHttpClient httpClient = HttpClients.createDefault();

        //2.输入网址
        HttpGet httpGet=new HttpGet("http://news.baidu.com/");

        //3.按回车发起请求,返回响应
        CloseableHttpResponse response = null;
        try {
            response = httpClient.execute(httpGet);
            //4.解析响应,获取数据
            //判断状态码是否是200
            if ( response.getStatusLine().getStatusCode()==200){
                HttpEntity httpEntity = response.getEntity();
                String html = EntityUtils.toString(httpEntity, "utf8");
                System.out.println(html);

            }
        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            try {
                //关闭response,httpclient
                response.close();
                httpClient.close();
            } catch (IOException e) {
                e.printStackTrace();
            }

        }
    }
}

爬到的网页信息:

Java爬虫入门案例,第一个爬虫程序_第1张图片

入门的第一个程序,比较菜~~

你可能感兴趣的:(爬虫学习)