Authors are sometimes lazy and often fail to close some tags as required by the HTML standard. This causes some problems for the parser.
For this heuristic reason, not all possible tags are registered as composite tags, which is what generates the 'parent/child' nesting relationship. It is considered better to have a valid, less nested parse than a possibly invalid parse.
You are free to add whatever nodes you like as composite nodes using the prototypical node factory paradigm. First create your class that derives from CompositeTagNode (copy and modify one of the existing tags that is most like your desired tag):
public class BoldTag extends CompositeTag { private static final String[] mIds = new String[] {"B"}; public BoldTag () { } public String[] getIds () { return (mIds); } public String[] getEnders () { return (mIds); } public String[] getEndTagEnders () { return (new String[0]); } }
Then, register an instance of your node with a PrototypicalNodeFactory:
PrototypicalNodeFactory factory = new PrototypicalNodeFactory (); factory.registerTag (new BoldTag ()); parser.setNodeFactory (factory);
The problem becomes detecting when the tag doesn't have a </B> like it should, so getEnders() and getEndTagEnders() should probably have a longer list of tag names. Enders are the tag names that force an end tag to be generated, while EndTagEnders are the end tags (</xxx>) that force an end tag to be generated.