Java 爬虫笔记

今天的目标是从学习OJ爬取后台数据,拿到了一个链接的接口,修改题目编号和操作码就能下载数据,但是需要登录账号的Cookie

所以记录一下用HttpClient 发送网络请求,并下载文件

public static void doPostWithParam(String postUrl, Map params,Mapheaders, String saveDir, String fileName) throws Exception{
        CloseableHttpClient httpClient = HttpClients.createDefault();

        HttpPost post = new HttpPost(postUrl);

        List list = new ArrayList<>();

        /**
         * 添加 params
         */
        params.forEach((key,value) -> {
            list.add(new BasicNameValuePair(key,value));
        });
        StringEntity entity = new UrlEncodedFormEntity(list,"utf-8");
        post.setEntity(entity);

        /**
         * 添加 Headers
         */
        headers.forEach((key,value) -> {
            post.addHeader(key,value);
        });

        CloseableHttpResponse response =httpClient.execute(post);

        String string = EntityUtils.toString(response.getEntity());


        try {
            File file = new File(saveDir);
            if(!file.exists()){
                file.mkdirs();
            }
            file = new File(saveDir + fileName);
            FileWriter fw = new FileWriter(file);
            if(!file.exists()){
                file.createNewFile();
            }
            BufferedWriter bw = new BufferedWriter(fw);
            bw.write(string);
            bw.close();
            fw.close();
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        response.close();

        httpClient.close();
    }

 

上面的工具传入 url,params 的map,headers的map,保存文件路径,保存文件名

 

Map params = new HashMap<>();
Mapheaders = new HashMap<>();
headers.put("Cookie","...");
try {
  doPostWithParam("https://oj.bnuz.edu.cn:8081/problem/" + sources[i][2],params,headers,"E:/JAVA/Java_Work_Idea/Spider/src/data/" + id + "/",id + ".html");
} catch (Exception e) {
  e.printStackTrace();
}

 

你可能感兴趣的:(Java)