HtmlParser Sample1

You first have to know in what tags (divmetaspan, etc) the information you want are in, and know the attributes to identify those tags. Example :

 <span class="price"> $7.95</span>

if you are looking for this "price", then you are interested in span tags with class "price".

HTML Parser has a filter-by-attribute functionality.

filter = new HasAttributeFilter("class", "price");

When you parse using a filter, you will get a list of Nodes that you can do a instanceof operation on them to determine if they are of the type you are interested in, for span you'd do something like

if (node instanceof Span) // or any other supported element.

See list of supported tags here.

An example with HTML Parser to grab the meta tag that has description about a site:

Tag Sample :

<meta name="description" content="Amazon.com: frankenstein: Books"/> 
import org.htmlparser.Node;
import org.htmlparser.Parser;
import org.htmlparser.util.NodeList;
import org.htmlparser.util.ParserException;
import org.htmlparser.filters.HasAttributeFilter;
import org.htmlparser.tags.MetaTag;

public class HTMLParserTest {
    public static void main(String... args) {
        Parser parser = new Parser();
        //<meta name="description" content="Some texte about the site." />
        HasAttributeFilter filter = new HasAttributeFilter("name", "description");
        try {
            parser.setResource("http://www.youtube.com");
            NodeList list = parser.parse(filter);
            Node node = list.elementAt(0);

            if (node instanceof MetaTag) {
                MetaTag meta = (MetaTag) node;
                String description = meta.getAttribute("content");

                System.out.println(description);
                // Prints: "YouTube is a place to discover, watch, upload and share videos."
            }

        } catch (ParserException e) {
            e.printStackTrace();
        }
    }

}


 

你可能感兴趣的:(HtmlParser Sample1)