dom4j中两种DocumentFactory对性能的影响

   一种是默认的DocumentFactory,第二个是IndexedDocumentFactory,《Java And XML》一书中说,后者会把元素名装载到一个Map中,所以查找元素时性能比较好。但经过测试,并不是使用了它就会提高性能的,是在一定的条件下,才会产生作用。先把完整测试类贴出来,包含生成测试数据的方法。

 

package javaxml3;

import org.dom4j.*;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.SAXReader;
import org.dom4j.io.XMLWriter;
import org.dom4j.util.IndexedDocumentFactory;

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;

public class XMLReadAllNodesExample2 {
    public static final String sampleName = "e:/temp/4.xml";
    public static final int findCount = 1000;

    public static void main(String[] args) throws DocumentException, IOException {
        generateSampleFileWithDiffName(sampleName, 2000);

        System.out.println("default factory: " + testFindByDefaultFactory("25", findCount));
        System.out.println("indexed factory: " + testFindByIndexedFactory("25", findCount));
    }

    /**
     * generate sample xml which will be saved to path--sampleName
     *
     * @param fileName
     * @param amount
     * @throws IOException
     */
    private static void generateSampleFileWithDiffName(String fileName, int amount) throws IOException {
        DocumentFactory factory = DocumentFactory.getInstance();
        Document doc = factory.createDocument();

        addElement(factory, doc, "Company");
        Element root = doc.getRootElement();

        for (int i = 0; i < amount; i++) {
            Element person = addElement(factory, root, "person" + (i + 1), i + 1);
        }

        OutputFormat format = OutputFormat.createPrettyPrint();
        format.setSuppressDeclaration(false);
        XMLWriter writer = new XMLWriter(new FileWriter(fileName), format);

        writer.write(doc);

        writer.close();
    }

    private static void addElement(DocumentFactory factory, Document parent, String name) {
        parent.add(factory.createElement(name));
    }

    private static Element addElement(DocumentFactory factory, Element parent, String name, Object value) {
        Element newElem = factory.createElement(name);
        parent.add(newElem.addText(value + ""));
        return newElem;
    }

    /**
     * find element person25 with DocumentFactory
     *
     * @param id
     * @param count
     * @return
     * @throws DocumentException
     */
    private static long testFindByDefaultFactory(String id, int count) throws DocumentException {

        SAXReader reader = new SAXReader(DocumentFactory.getInstance());
        Document doc = reader.read(new File(sampleName));
        Node root = doc.selectSingleNode("Company");
        XPath xpath = DocumentHelper.createXPath("person" + id);
        long start = System.currentTimeMillis();
        for (int i = 0; i < count; i++) {
            root.selectSingleNode("person" + id);//使用doc.selectSingleNode,则xpath为"//person"+id
        }
        long end = System.currentTimeMillis();
        long elapsed = end - start;

        return elapsed;
    }

    /**
     * find element person25 with IndexedDocumentFactory
     *
     * @param id
     * @param count
     * @return
     * @throws DocumentException
     */
    private static long testFindByIndexedFactory(String id, int count) throws DocumentException {

        SAXReader reader = new SAXReader(IndexedDocumentFactory.getInstance());
        Document doc = reader.read(new File(sampleName));
        Node root = doc.selectSingleNode("Company");
        long start = System.currentTimeMillis();
        for (int i = 0; i < count; i++) {
            root.selectSingleNode("person" + id);//使用doc.selectSingleNode,则xpath为"//person"+id
        }

        long end = System.currentTimeMillis();
        long elapsed = end - start;

        return elapsed;
    }


}

 

  • 生成测试数据
    在main方法中只保留generateSampleFileWithDiffName这行,sampleName定义的测试数据位置修改成你本机的合适位置,运行,数据生成,然后注掉该行
  • 测试两种工厂的性能
    循环次数findCount设成1000的时候,IndexedDocumentFactory的方式只需要60多ms即可完成1000次循环查找,而默认工厂类则需要4秒多,随着次数的加大,差距越来越明显。但是,如果修改下程序,selectSingleNode的时候,不是从元素Company开始,而是使用doc对象来selectSingleNode,这时你会发现两者查找起来速度同样慢,1000次循环大概就需要6秒多。IndexedDocumentFactory就没什么用了,起作用的时候,是要查找的元素和查找入口点是直接上下级的关系,如果在company和person元素再加一层元素,比如employees,而入口搜索点仍是company元素,效果会如何呢?这个时候,仍然是不起作用的,即使加大虚拟机最大内存数也没有效果。

    结论

         IndexedDocumentFactory不是灵丹妙药,它起作用是需要一定的条件的,它适合于搜索点和被搜索的元素处于直接上下级的关系,非常适合于我们在程序中一些元素结构一致但类别不要求一致的情况下,保存并查找资源、配置等信息。

你可能感兴趣的:(document)