HtmlParser1.6修改输出Txt

在NodeList中添加成员函数:

public StringBuffer getTxt() { StringBuffer ret; NodeList chirdren; Node node; TextNode txtNode; ret = new StringBuffer(); for (int i = 0; i < size; i++) { node = nodeData[i]; // 只提取txt部分 if (node instanceof TextNode) { txtNode = (TextNode) node; String txt = txtNode.getText(); txt = txt.replaceAll(NOISE, ""); if (txt.trim().length() > 0) ret.append(txt + "/r/n"); } chirdren = node.getChildren(); if (chirdren != null) ret.append(chirdren.getTxt()); } return ret; } 

你可能感兴趣的:(null)