Lucene2.3性能提升

最近Lucene2.3发布,最大的变化是使用了新的索引算法,使用新的in-memory模型来大幅提升速度。中提到最简单的把lucene2.2的jar文件换成lucene2.3的jar文件就可以在某些测试中提速500%。Lucene2.3的changlog见http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_3_0/CHANGES.txt

我认为最主要的几个变化是:

  1. 对象池。可以复用 DocumentFieldToken的实例。
    写道
    LUCENE-969: Add new APIs to Token, TokenStream and Analyzer to
    permit re-using of Token and TokenStream instances during
    indexing. Changed Token to use a char[] as the store for the
    termText instead of String. This gives faster tokenization
    performance (~10-15%). (Mike McCandless)
     
  2. Re-open indexreader。reopen()操作只会加载变化的索引片段。
    写道
    LUCENE-743: Add IndexReader.reopen() method that re-opens an
    existing IndexReader by only loading those portions of an index
    that have changed since the reader was (re)opened. reopen() can
    be significantly faster than open(), depending on the amount of
    index changes. SegmentReader, MultiSegmentReader, MultiReader,
    and ParallelReader implement reopen(). (Michael Busch)
     

其他的变化还在发掘中。

你可能感兴趣的:(apache,算法,SVN,Lucene,performance)