Java爬虫----HttpClient方式(获取数据篇)

目录

一、爬虫的定义

二、获取数据

(1)基于Get方式的请求(无参)

(2)基于Get方式请求(有参)

(3)基于Post方式的请求(无参)

(4)基于Post方式的请求(有参)


一、爬虫的定义

爬虫指的是一种自动化程序,能够模拟人类在互联网上的浏览行为,自动从互联网上抓取、预处理并保存所需要的信息。


爬虫运行的过程一般是先制定规则(如指定要抓取的网址、要抓取的信息的类型等),紧接着获取该网址的HTML源代码,根据规则对源代码进行解析和抽取,最后进行处理和保存。

爬虫在实际应用中广泛使用,如搜索引擎、大数据分析、交易数据采集等领域,都需要用到爬虫技术来实现信息的定向采集和处理

关于爬虫,我们基本上可以分为两步,第一是获取数据,第二是解析数据;

二、获取数据

(1)基于Get方式的请求(无参)

public static void main(String[] args) throws IOException {
        CloseableHttpClient httpClient = HttpClients.createDefault();
        HttpGet httpGet=new HttpGet("https://www.lanqiao.cn/");
        //发送http中的get请求

        HttpEntity entity=null;
        CloseableHttpResponse response=null;
        //判断是否得到正确的数据

        try {
            response= httpClient.execute(httpGet);
            if(response.getStatusLine().getStatusCode()==200){
                //获取响应数据
                entity=response.getEntity();
                //获取的数据输出其实是个对象
                System.out.println(entity);
                //将响应数据以html源码形式展示
                String html = EntityUtils.toString(entity, "UTF-8");
                System.out.println(html);

            }
        }catch (Exception e){
            e.printStackTrace();
        }finally {
            try{
                if(response!=null)response.close();//响应成功后关闭
                if(httpClient!=null)httpClient.close();
            }catch(Exception e){
                e.printStackTrace();
            }
        }
    }

(2)基于Get方式请求(有参)

public static void main(String[] args) throws IOException {
        CloseableHttpClient httpClient = HttpClients.createDefault();
        HttpGet httpGet=new HttpGet("https://www.lanqiao.cn/");
        //发送http中的get请求

        HttpEntity entity=null;
        CloseableHttpResponse response=null;
        //判断是否得到正确的数据

        try {
            response= httpClient.execute(httpGet);
            if(response.getStatusLine().getStatusCode()==200){
                //获取响应数据
                entity=response.getEntity();
                //获取的数据输出其实是个对象
                System.out.println(entity);
                //将响应数据以html源码形式展示
                String html = EntityUtils.toString(entity, "UTF-8");
                System.out.println(html);

            }
        }catch (Exception e){
            e.printStackTrace();
        }finally {
            try{
                if(response!=null)response.close();//响应成功后关闭
                if(httpClient!=null)httpClient.close();
            }catch(Exception e){
                e.printStackTrace();
            }
        }
    }

(3)基于Post方式的请求(无参)

public class HtppClientDemo1 {
    public static void main(String[] args) throws IOException {
       CloseableHttpClient httpClient=HttpClients.createDefault();
       //创建post请求
        HttpPost httpPost=new HttpPost("https://www.lanqiao.cn/");
        HttpEntity entity=null;
        CloseableHttpResponse response=null;
        try{
            response=httpClient.execute(httpPost);
            if(response.getStatusLine().getStatusCode()==200){
                //获取响应数据
                entity=response.getEntity();
                System.out.println(entity);
                //网页源代码
                String html=EntityUtils.toString(entity,"UTF-8");
                System.out.println(html);
            }
        }catch(Exception e){
            e.printStackTrace();
        }finally {
            try{
                if(response!=null)response.close();
                if(httpClient!=null)httpClient.close();
            }catch (Exception e){
                e.printStackTrace();
            }

        }
    }

(4)基于Post方式的请求(有参)

public static void main(String[] args) {
        CloseableHttpClient httpClient = HttpClients.createDefault();
        //创建post请求
        HttpPost httpPost = new HttpPost("https://www.lanqiao.cn/");
        HttpEntity entity = null;
        CloseableHttpResponse response = null;
        try {
            //设置参数
            BasicNameValuePair basicNameValuePair=new BasicNameValuePair("progid","20");
            //装入集合
            List list=new ArrayList<>();
            list.add(basicNameValuePair);
            //开始进行参数请求,进行网络请求
            UrlEncodedFormEntity urlEncodedFormEntity=new UrlEncodedFormEntity(list,"UTF-8");
            httpPost.setEntity(urlEncodedFormEntity);
            //请求参数结束
            response = httpClient.execute(httpPost);
            if (response.getStatusLine().getStatusCode() == 200) {
                //获取响应数据
                entity = response.getEntity();
                System.out.println(entity);
                //网页源代码
                String html = EntityUtils.toString(entity, "UTF-8");
                System.out.println(html);
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                if (response != null) response.close();
                if (httpClient != null) httpClient.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

你可能感兴趣的:(爬虫,爬虫,java)