Java –如何从网页获取所有链接?

一个jsoup HTML解析器示例,向您展示如何解析和获取Web页面中的所有HTML超链接:

pom.xml

      org.jsoup
      jsoup
      1.12.1
  
JsoupFindLinkSample.java
package com.mkyong;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.util.HashSet;
import java.util.Set;

public class JsoupFindLinkSample {

    public static void main(String[] args) throws IOException {

        for (String link : findLinks("https://google.com")) {
            System.out.println(link);
        }

    }

    private static Set findLinks(String url) throws IOException {

        Set links = new HashSet<>();

        Document doc = Jsoup.connect(url)
                .data("query", "Java")
                .userAgent("Mozilla")
                .cookie("auth", "token")
                .timeout(3000)
                .get();

        Elements elements = doc.select("a[href]");
        for (Element element : elements) {
            links.add(element.attr("href"));
        }

        return links;

    }

}

输出量

https://play.google.com/?hl=en&tab=w8
https://www.google.com/calendar?tab=wc
/intl/en/about.html
https://photos.google.com/?tab=wq&pageId=none
https://drive.google.com/?tab=wo

//...

参考文献

  • jsoup HTML解析器你好世界示例
  • jsoup:Java HTML解析器

翻译自: https://mkyong.com/java/java-how-to-get-all-links-from-a-web-page/

你可能感兴趣的:(Java –如何从网页获取所有链接?)