链接分享的java实现(抓取标题和描述信息)

本想实现直接分析任意一个链接地址,如果该站点是文章则只抓取文章,不是文章则只抓取标题和描述信息,但找了很多相关资料,本人能力有限,看了很多砖家写的什么算法也是瞎扯淡,干脆简单的实现标题和表述的抓取,这个很简单,本不想贴在此,但怕以后要用,好找点,先先记录在此:

package com.jyeba.core.html;

public class HtmlInfo {
private String title;
private String desc;
public void setTitle(String title) {
this.title = title;
}
public String getTitle() {
return title;
}
public void setDesc(String desc) {
this.desc = desc;
}
public String getDesc() {
return desc;
}

}

抓取工具类
package com.jyeba.core.html;



public class HtmlTools {
public static HtmlInfo getHtmlInfo(String url) throws IOException {
HtmlInfo html = new HtmlInfo();

Document doc = Jsoup.connect(url)

.data("query", "Java")

.userAgent("Mozilla")

.cookie("auth", "token")

.timeout(6000)

.get();

Elements e = doc.select("title");
if (e.size() > 0) {

System.out.println(e.text());
html.setTitle(e.text());
}

e = doc.select("meta[name=Description]");
if (e.size() > 0) {
System.out.println(e.get(0).attr("content"));
html.setDesc(e.get(0).attr("content"));
}

return html;

}
public static void main(String[] args) throws IOException{
HtmlInfo info=HtmlTools.getHtmlInfo("http://news.qq.com/a/20111017/000091.htm");

}
}
 

 

你可能感兴趣的:(链接分享的java实现(抓取标题和描述信息))