Lucene索引时字段可以设置为存储(Store.YES)或者不存储(Store.NO)。一般情况下,搜索完成之后会得到一个TopDocs对象,用它去获取ScoreDoc之后取出Document。使用Document获取存储在索引中的值。但有些排序字段是不存储的,在构造Document时使用的下面的方式:
doc.add(new Field("time", "2001", Store.NO, Index.NOT_ANALYZED_NO_NORMS));
这样在获取Document之后,并无法获取time字段的值。难道真无办法了?
仔细分析一下,Lucene是怎么排序的就知道time这个字段肯定会在索引中存至少一份的,不然一个文档的time怎么和另外一个文档的time去比较?
下面的代码解决了这个问题:
package ramindex; import java.io.IOException; import java.util.Arrays; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.Field.Index; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.FieldDoc; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopFieldDocs; import org.apache.lucene.store.LockObtainFailedException; import org.apache.lucene.store.RAMDirectory; public class SortFieldValueTest { private static RAMDirectory ramDirectory = new RAMDirectory(); public static void buildIndex() throws CorruptIndexException, LockObtainFailedException, IOException { IndexWriter writer = new IndexWriter(ramDirectory, new StandardAnalyzer(), true); Document doc = new Document(); doc.add(new Field("id", "1", Store.YES, Index.NO)); doc.add(new Field("text", "lucene", Store.NO, Index.ANALYZED)); doc.add(new Field("time", "2010", Store.NO, Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("tide", "149", Store.NO, Index.NOT_ANALYZED_NO_NORMS)); writer.addDocument(doc); doc = new Document(); doc.add(new Field("id", "3", Store.YES, Index.NO)); doc.add(new Field("text", "lucene", Store.NO, Index.ANALYZED)); doc.add(new Field("time", "2011", Store.NO, Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("tide", "14", Store.NO, Index.NOT_ANALYZED_NO_NORMS)); writer.addDocument(doc); doc = new Document(); doc.add(new Field("id", "2", Store.YES, Index.NO)); doc.add(new Field("text", "lucene", Store.NO, Index.ANALYZED)); doc.add(new Field("time", "2001", Store.NO, Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("tide", "13", Store.NO, Index.NOT_ANALYZED_NO_NORMS)); writer.addDocument(doc); doc = new Document(); doc.add(new Field("id", "5", Store.YES, Index.NO)); doc.add(new Field("text", "lucene", Store.NO, Index.ANALYZED)); doc.add(new Field("time", "2001", Store.NO, Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("tide", "19", Store.NO, Index.NOT_ANALYZED_NO_NORMS)); writer.addDocument(doc); doc = new Document(); doc.add(new Field("id", "9", Store.YES, Index.NO)); doc.add(new Field("text", "lucene", Store.NO, Index.ANALYZED)); doc.add(new Field("time", "2171", Store.NO, Index.NOT_ANALYZED_NO_NORMS)); doc.add(new Field("tide", "19", Store.NO, Index.NOT_ANALYZED_NO_NORMS)); writer.addDocument(doc); writer.commit(); writer.close(); } public static void searchWithOneSortField() throws CorruptIndexException, IOException { IndexSearcher searcher = new IndexSearcher(ramDirectory); TermQuery termQuery = new TermQuery(new Term("text", "lucene")); TopFieldDocs topFieldDocs = searcher.search(termQuery, null, 10, new Sort(new SortField("time", true))); ScoreDoc[] sorDocs = topFieldDocs.scoreDocs; for (ScoreDoc doc : sorDocs) { FieldDoc fieldDoc = (FieldDoc) doc; System.out.println(searcher.doc(doc.doc).get("id") + " " + Arrays.toString(fieldDoc.fields)); } } public static void searchWithTwoSortField() throws CorruptIndexException, IOException { IndexSearcher searcher = new IndexSearcher(ramDirectory); TermQuery termQuery = new TermQuery(new Term("text", "lucene")); // 两个排序条件 TopFieldDocs topFieldDocs = searcher.search(termQuery, null, 10, new Sort(new SortField[] { new SortField("time", true), new SortField("tide", true) })); ScoreDoc[] sorDocs = topFieldDocs.scoreDocs; for (ScoreDoc doc : sorDocs) { FieldDoc fieldDoc = (FieldDoc) doc; System.out.println(searcher.doc(doc.doc).get("id") + " " + Arrays.toString(fieldDoc.fields)); } } public static void main(String[] args) throws CorruptIndexException, IOException { buildIndex(); searchWithOneSortField(); System.out.println("--------"); searchWithTwoSortField(); } }
首先获取一个TopFieldDocs,之后用它获取ScoreDoc,而此时的ScoreDoc其实是FieldDoc的实例。强制把ScoreDoc转为FieldDoc之后获取FieldDoc的fields字段。这个字段是一个数组,里面存储了你在请求排序是传入的所有字段的值,并且按照传入的顺序进行保存。
上面代码中searchWithOneSortField、searchWithTwoSortField两个方法分别演示了在一个排序条件下和两个排序条件下的使用效果。