lucene NRT实时索引学习

     lucene高级版本中添加了对实时索引查询的功能,因为在真实的应用场景中经常会对IndexWriter做写,更新或者删除操作之后马上去做查询操作,之前较低的版本中必要执行commit操作后将索引都写到磁盘之后才能从IndexSearcher 对象上的查询才能更新,老方式比较耗时。

 

      实例代码如下:

public class LuceneNrtTest extends TestCase {

	public static Analyzer analyzer;
	static {
		analyzer = new StandardAnalyzer(Version.LUCENE_31);

	}

	
	public void testNearRealTime() throws Exception {
		Directory dir = new RAMDirectory();
	
		IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_31,
				analyzer);
		iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
		IndexWriter writer = new IndexWriter(dir, iwc);
		for (int i = 0; i < 10; i++) {
			Document doc = new Document();
			doc.add(new Field("id", "" + i, Field.Store.NO,
					Field.Index.NOT_ANALYZED_NO_NORMS));
			doc.add(new Field("text", "aaa", Field.Store.NO,
					Field.Index.ANALYZED));
			writer.addDocument(doc);
		}

		// 测试是否能查询到刚刚插入的数据
		IndexReader reader = IndexReader.open(writer, false);
		IndexSearcher searcher = new IndexSearcher(reader);
		Query query = new TermQuery(new Term("text", "aaa"));
		TopDocs docs = searcher.search(query, 1);
		assertEquals(10, docs.totalHits);

		// 测试是否能删除一条数据
		writer.deleteDocuments(new Term("id", "7"));
		
		// 再加一条
		Document doc = new Document();
		doc.add(new Field("id", "11", Field.Store.NO,
				Field.Index.NOT_ANALYZED_NO_NORMS));
		doc.add(new Field("text", "bbb", Field.Store.NO, Field.Index.ANALYZED));
		writer.addDocument(doc);
		
		
		IndexReader newReader = IndexReader.open(writer, true);
		assertFalse(reader == newReader);
		reader.close();
		searcher = new IndexSearcher(newReader);
		TopDocs hits = searcher.search(query, 10);
		assertEquals(9, hits.totalHits);

		query = new TermQuery(new Term("text", "bbb"));
		hits = searcher.search(query, 1);
		assertEquals(1, hits.totalHits);

		newReader.close();
		writer.close();
	}
}

 代码所依赖的lucene版本是:

<dependency>
	<groupId>org.apache.lucene</groupId>
	<artifactId>lucene-core</artifactId>
	<version>3.5.0</version>
</dependency>

 

有一点要说明的是这里所说的实时索引,并不等同于正的实时索引,只不过接近实时索引,可能更新完条数据,马上去执行search操作,结果可能不会显示出来,要稍微等待一点时间就能查询到最新结果了(也许是十几毫秒吧),这点和数据库操作有点不一样,不过总的来说也非常好用了。

 

你可能感兴趣的:(Lucene)