Java实现爬虫(抓取LOL英雄皮肤)

你是不是还在为下载一个个图片资源而感到无奈,现在就让你不再为图片资源而苦恼,再也不用你自己去一个个下载了。

1.后台代码

public class GrabLoLSkinPageProcessor implements PageProcessor {

    private Site site = Site.me().setRetryTimes(3).setSleepTime(1000).setTimeOut(10000);

    @Override
    public void process(Page page) {
        Html html = page.getHtml();

        // 皮肤详情链接
        List all = html.css(".listBox").links().all();
        page.addTargetRequests(all);

        Selectable xpath = html.xpath("//div[@class='listBox']/dl/dt/a/img");
        List images = xpath.all();
        page.putField("images",images);

    }
    @Override
    public Site getSite() {
        return site;
    }

    public static void main(String[] args) {
        Spider.create(new GrabLoLSkinPageProcessor())
                .addUrl("https://lol.52pk.com/skinlist_5939_4_2.shtml/")
                .addPipeline(new GrabPipe())
                .thread(3).run();
    }
}
public class GrabPipe implements Pipeline {

    private static final ObjectMapper MAPPER = new ObjectMapper();

    @Override
    public void process(ResultItems resultItems, Task task) {
        Map data = new HashMap<>();

        List images = resultItems.get("images");
        for (int i = 0; i < images.size(); i++) {
            String imageUrl = StringUtils.split(images.get(i), '"')[5];
            System.out.println(imageUrl);
            String newName = StringUtils.substringAfterLast(imageUrl,"/").substring(3);
            try {
                this.downloadFile(imageUrl, new File("D:\\axu\\img\\" + newName));
                data.put("image", newName);
                String json = MAPPER.writeValueAsString(data);
                FileUtils.write(new File("D:\\axu\\data.json"), json + "\n", "UTF-8", true);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }

    }

    /**
     * 下载文件
     *
     * @param url
     * @param dest
     * @throws Exception
     */
    public void downloadFile(String url, File dest) throws Exception {
        HttpGet httpGet = new HttpGet(url);
        CloseableHttpResponse response = HttpClientBuilder.create().build().execute(httpGet);
        try {
            FileUtils.writeByteArrayToFile(dest, IOUtils.toByteArray(response.getEntity().getContent()));
        } finally {
            response.close();
        }
    }

}

2.结果展示(注:我的目标网站上的图片资源有点少)

 data.json

{"image":"skinlist_5939_4_1.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"skinlist_5939_4_2.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"skinlist_5939_4_3.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"skinlist_5939_4_1.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"skinlist_5939_4_2.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml/"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml/"}
{"image":"skinlist_5939_4_1.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"skinlist_5939_4_2.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"skinlist_5939_4_3.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"skinlist_5939_4_1.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"skinlist_5939_4_3.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":".jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"1020_163I392G.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"G6_1H45X304.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"9444_1FP15192.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"1020_13224I605.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"aultpic.gif","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"1020_1G112U41.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"1020_1A20W023.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"1020_1A544E07.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"3X6_13555U345.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"1020_1G52YT8.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"322_162551F33.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"9140_1HH91N9.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"322_162Q3U42.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"9140_1I520Cc.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"9140_1P4259233.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"9140_1Q1594032.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"9140_1K9423307.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"322_1644121Z5.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"322_1HSA441.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"322_1J31I3a.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml/"}
{"image":"322_1G6159137.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"246_00102J4Y.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"246_1A00943Y.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"46_1J4123253.png","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"246_1622404G7.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"246_1530596001.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"246_1J333OP.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"322_16211012Z.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"322_16135V457.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"322_14051R115.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"46_1IR0EO.png","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"9500_150K26213.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"9500_164Z13E1.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"G4_1QI95351.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"9500_1G5293235.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"9500_1GUJ135.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"9500_1I522F41.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"9500_1P9546218.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"9500_153404W63.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"1020_16300H0a.jpg","url":"https://lol.52pk.com/skinlist_5939_4_1.shtml"}
{"image":"9140_1RQ55Z7.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"3X6_154511X00.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"6134_1R1261P9.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"3X6_1A05L957.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"6134_1R6234956.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"6134_1J01OI1.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"6134_1K0004135.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"6134_1K34952N.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"3X6_142053G95.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"3X6_1409361918.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"6134_1Q01b139.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"3X6_13495a3N.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"322_1KAEG6.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"G4_1A141RM.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"322_1G9109620.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"246_23412159A.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"1F_1KTQZ5.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"3X6_1K2115319.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"322_16143LG1.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"G4_1Q32HI6.jpg","url":"https://lol.52pk.com/skinlist_5939_4_3.shtml"}
{"image":"1020_163I392G.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"G6_1H45X304.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"9444_1FP15192.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"1020_13224I605.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"aultpic.gif","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"1020_1G112U41.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"1020_1A20W023.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"1020_1A544E07.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"3X6_13555U345.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"1020_1G52YT8.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"322_162551F33.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"9140_1HH91N9.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"322_162Q3U42.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"9140_1I520Cc.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"9140_1P4259233.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"9140_1Q1594032.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"9140_1K9423307.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"322_1644121Z5.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"322_1HSA441.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"322_1J31I3a.jpg","url":"https://lol.52pk.com/skinlist_5939_4_2.shtml"}
{"image":"9140_1RQ55Z7.jpg"}
{"image":"3X6_154511X00.jpg"}
{"image":"6134_1R1261P9.jpg"}
{"image":"3X6_1A05L957.jpg"}
{"image":"6134_1R6234956.jpg"}
{"image":"6134_1J01OI1.jpg"}
{"image":"6134_1K0004135.jpg"}
{"image":"6134_1K34952N.jpg"}
{"image":"3X6_142053G95.jpg"}
{"image":"3X6_1409361918.jpg"}
{"image":"6134_1Q01b139.jpg"}
{"image":"3X6_13495a3N.jpg"}
{"image":"322_1KAEG6.jpg"}
{"image":"G4_1A141RM.jpg"}
{"image":"322_1G9109620.jpg"}
{"image":"246_23412159A.jpg"}
{"image":"1F_1KTQZ5.jpg"}
{"image":"3X6_1K2115319.jpg"}
{"image":"322_16143LG1.jpg"}
{"image":"G4_1Q32HI6.jpg"}
{"image":"322_1G6159137.jpg"}
{"image":"246_00102J4Y.jpg"}
{"image":"246_1A00943Y.jpg"}
{"image":"46_1J4123253.png"}
{"image":"246_1622404G7.jpg"}
{"image":"246_1530596001.jpg"}
{"image":"246_1J333OP.jpg"}
{"image":"322_16211012Z.jpg"}
{"image":"322_16135V457.jpg"}
{"image":"322_14051R115.jpg"}
{"image":"46_1IR0EO.png"}
{"image":"9140_1RQ55Z7.jpg"}
{"image":"9500_150K26213.jpg"}
{"image":"1020_163I392G.jpg"}
{"image":"3X6_154511X00.jpg"}
{"image":"G6_1H45X304.jpg"}
{"image":"9500_164Z13E1.jpg"}
{"image":"G4_1QI95351.jpg"}
{"image":"6134_1R1261P9.jpg"}
{"image":"9444_1FP15192.jpg"}
{"image":"3X6_1A05L957.jpg"}
{"image":"9500_1G5293235.jpg"}
{"image":"1020_13224I605.jpg"}
{"image":"6134_1R6234956.jpg"}
{"image":"aultpic.gif"}
{"image":"9500_1GUJ135.jpg"}
{"image":"6134_1J01OI1.jpg"}
{"image":"1020_1G112U41.jpg"}
{"image":"9500_1I522F41.jpg"}
{"image":"6134_1K0004135.jpg"}
{"image":"1020_1A20W023.jpg"}
{"image":"9500_1P9546218.jpg"}
{"image":"6134_1K34952N.jpg"}
{"image":"1020_1A544E07.jpg"}
{"image":"9500_153404W63.jpg"}
{"image":"3X6_142053G95.jpg"}
{"image":"3X6_13555U345.jpg"}
{"image":"3X6_1409361918.jpg"}
{"image":"1020_16300H0a.jpg"}
{"image":"1020_1G52YT8.jpg"}
{"image":"6134_1Q01b139.jpg"}
{"image":"322_162551F33.jpg"}
{"image":"3X6_13495a3N.jpg"}
{"image":"322_1KAEG6.jpg"}
{"image":"9140_1HH91N9.jpg"}
{"image":"G4_1A141RM.jpg"}
{"image":"322_162Q3U42.jpg"}
{"image":"322_1G9109620.jpg"}
{"image":"246_23412159A.jpg"}
{"image":"9140_1I520Cc.jpg"}
{"image":"1F_1KTQZ5.jpg"}
{"image":"9140_1P4259233.jpg"}
{"image":"3X6_1K2115319.jpg"}
{"image":"9140_1Q1594032.jpg"}
{"image":"322_16143LG1.jpg"}
{"image":"G4_1Q32HI6.jpg"}
{"image":"9140_1K9423307.jpg"}
{"image":"322_1644121Z5.jpg"}
{"image":"322_1HSA441.jpg"}
{"image":"322_1J31I3a.jpg"}
{"image":"322_1G6159137.jpg"}
{"image":"246_00102J4Y.jpg"}
{"image":"246_1A00943Y.jpg"}
{"image":"46_1J4123253.png"}
{"image":"246_1622404G7.jpg"}
{"image":"246_1530596001.jpg"}
{"image":"246_1J333OP.jpg"}
{"image":"322_16211012Z.jpg"}
{"image":"322_16135V457.jpg"}
{"image":"322_14051R115.jpg"}
{"image":"46_1IR0EO.png"}
{"image":"9500_150K26213.jpg"}
{"image":"9500_164Z13E1.jpg"}
{"image":"G4_1QI95351.jpg"}
{"image":"9500_1G5293235.jpg"}
{"image":"9500_1GUJ135.jpg"}
{"image":"9500_1I522F41.jpg"}
{"image":"9500_1P9546218.jpg"}
{"image":"9500_153404W63.jpg"}
{"image":"1020_16300H0a.jpg"}
{"image":"322_1G6159137.jpg"}
{"image":"246_00102J4Y.jpg"}
{"image":"246_1A00943Y.jpg"}
{"image":"1020_163I392G.jpg"}
{"image":"46_1J4123253.png"}
{"image":"9140_1RQ55Z7.jpg"}
{"image":"3X6_154511X00.jpg"}
{"image":"246_1622404G7.jpg"}
{"image":"G6_1H45X304.jpg"}
{"image":"246_1530596001.jpg"}
{"image":"6134_1R1261P9.jpg"}
{"image":"9444_1FP15192.jpg"}
{"image":"246_1J333OP.jpg"}
{"image":"3X6_1A05L957.jpg"}
{"image":"322_16211012Z.jpg"}
{"image":"1020_13224I605.jpg"}
{"image":"6134_1R6234956.jpg"}
{"image":"322_16135V457.jpg"}
{"image":"aultpic.gif"}
{"image":"322_14051R115.jpg"}
{"image":"6134_1J01OI1.jpg"}
{"image":"1020_1G112U41.jpg"}
{"image":"46_1IR0EO.png"}
{"image":"6134_1K0004135.jpg"}
{"image":"9500_150K26213.jpg"}
{"image":"1020_1A20W023.jpg"}
{"image":"6134_1K34952N.jpg"}
{"image":"9500_164Z13E1.jpg"}
{"image":"1020_1A544E07.jpg"}
{"image":"3X6_142053G95.jpg"}
{"image":"G4_1QI95351.jpg"}
{"image":"3X6_13555U345.jpg"}
{"image":"3X6_1409361918.jpg"}
{"image":"9500_1G5293235.jpg"}
{"image":"1020_1G52YT8.jpg"}
{"image":"6134_1Q01b139.jpg"}
{"image":"322_162551F33.jpg"}
{"image":"9500_1GUJ135.jpg"}
{"image":"3X6_13495a3N.jpg"}
{"image":"322_1KAEG6.jpg"}
{"image":"9140_1HH91N9.jpg"}
{"image":"9500_1I522F41.jpg"}
{"image":"G4_1A141RM.jpg"}
{"image":"322_162Q3U42.jpg"}
{"image":"322_1G9109620.jpg"}
{"image":"9500_1P9546218.jpg"}
{"image":"246_23412159A.jpg"}
{"image":"9140_1I520Cc.jpg"}
{"image":"9500_153404W63.jpg"}
{"image":"9140_1P4259233.jpg"}
{"image":"1F_1KTQZ5.jpg"}
{"image":"1020_16300H0a.jpg"}
{"image":"3X6_1K2115319.jpg"}
{"image":"9140_1Q1594032.jpg"}
{"image":"322_16143LG1.jpg"}
{"image":"G4_1Q32HI6.jpg"}
{"image":"9140_1K9423307.jpg"}
{"image":"322_1644121Z5.jpg"}
{"image":"322_1HSA441.jpg"}
{"image":"322_1J31I3a.jpg"}

3.结论

这个json文件也可以当做你的一个数据来用,也是非常的方便。如果你真的对于爬取一些合法的资源感兴趣的话,可以学习Python,这一点是它的强项,运行效率更加高效。我这里只是用Java实现了简单的爬取。局限性还是非常大的。

你可能感兴趣的:(Java,java,爬虫,开发语言)