用Jsoup获取豆瓣TOP250书单

获取豆瓣TOP250书单(只获取书名和作者信息等)
格式如下
用Jsoup获取豆瓣TOP250书单_第1张图片

public class HttpDemo {
    public static void main(String[] args) throws Exception {
        Integer cot=1;
        File file = new File("d://豆瓣T250书单.txt");
        if(file.exists()==false) file.createNewFile();
        BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(new FileOutputStream(file,true));


        for (int i = 0; i <=225; i+=25) {
            Document document = Jsoup.parse(new URL("https://book.douban.com/top250?start=" + i), 10000);
            //获取书名
            List<String> bNameList = document.select("div#content .item .pl2 [title]").eachText();
            //获取简介
            List<String> bConList = document.select("div#content .item p.pl").eachText();
            for (int j = 0; j < bNameList.size(); j++) {
                bufferedOutputStream.write(String.valueOf(cot).getBytes());
                bufferedOutputStream.write("\r\n".getBytes());
                bufferedOutputStream.write(bNameList.get(j).getBytes());
                bufferedOutputStream.write("\r\n".getBytes());
                bufferedOutputStream.write(bConList.get(j).getBytes());
                bufferedOutputStream.write("\r\n".getBytes());
                cot++;
            }

        }
        bufferedOutputStream.close();
        System.out.println("completed");

最终如下:
用Jsoup获取豆瓣TOP250书单_第2张图片

你可能感兴趣的:(#,爬虫)