java html串转换成文本串

采用htmlparser 来解决将html串中抽取出文本串。


String str = "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">" +
"<HTML><HEAD>" +
"<META http-equiv=Content-Type content=\"text/html; charset=gb2312\">" +
"<META content=\"MSHTML 6.00.6000.17095\" name=GENERATOR><LINK " +
"href=\"BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em}\"" +
"rel=stylesheet></HEAD>" +
"<BODY style=\"FONT-SIZE: 10pt; MARGIN: 10px; FONT-FAMILY: verdana\">" +
"<DIV><FONT face=Verdana size=2>helll,测试邮件</FONT></DIV>" +
"<DIV><FONT face=Verdana size=2></FONT>&nbsp;</DIV>" +
"<DIV align=left><FONT face=Verdana color=#c0c0c0 size=2>2011-03-03 " +
"</FONT></DIV><FONT face=Verdana size=2>"+
"<HR style=\"WIDTH: 122px; HEIGHT: 2px\" align=left SIZE=2>"+

"<DIV><FONT face=Verdana color=#c0c0c0 size=2><SPAN>shopeye7</SPAN> " +
"</FONT></DIV></FONT></BODY></HTML>" ;

System.out.println(StringUtil.html2Str(str));

效果:
helll,测试邮件 2011-03-03 shopeye7


方法:
/**
* @param html
* @return
*/
public static String html2Str(String html) {
try {
html = nvl(html);
Parser parser = Parser.createParser(html, "utf-8");
TextExtractingVisitor visitor = new TextExtractingVisitor();
parser.visitAllNodesWith(visitor);
return visitor.getExtractedText();
} catch (Exception ex) {
return null;
}
}

你可能感兴趣的:(java,html)