最近需求中需要使用lucene的分组查询,现有API使用GroupingSearch查询,代码如下:
GroupingSearch groupingSearch = new GroupingSearch("compId"); groupingSearch.setGroupSort(new Sort(SortField.FIELD_SCORE)); groupingSearch.setFillSortFields(true); //groupingSearch.setCachingInMB(8.0, true); groupingSearch.setAllGroups(true); // groupingSearch.setAllGroupHeads(true); groupingSearch.setGroupDocsLimit(10); IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer()); config.setOpenMode(OpenMode.CREATE_OR_APPEND); LogByteSizeMergePolicy mergePolicy = new LogByteSizeMergePolicy(); Directory directory = NIOFSDirectory.open(new File("/Users/lvyanglin/searchdata/search/offlineindex").toPath()); // 新建的文件在子目录 mergePolicy.setMergeFactor(5); config.setMergePolicy(mergePolicy); config.setSimilarity(new ClassicSimilarity()); IndexWriter indexWriter = new IndexWriter(directory, config); IndexReader reader = DirectoryReader.open(indexWriter); IndexSearcher isearcher = new IndexSearcher(reader); Query query = new TermQuery(new Term("id", "20755185")); TopGroupsresult = groupingSearch.search(isearcher, query, 0, 1000); System.out.println("搜索命中数:" + result.totalHitCount); System.out.println("搜索结果分组数:" + result.groups.length); Document document; for (GroupDocs groupDocs : result.groups) { System.out.println("分组:" + groupDocs.groupValue.utf8ToString()); System.out.println("组内记录:" + groupDocs.totalHits); // System.out.println("groupDocs.scoreDocs.length:" + // groupDocs.scoreDocs.length); for (ScoreDoc scoreDoc : groupDocs.scoreDocs) { System.out.println("compId="+isearcher.doc(scoreDoc.doc).get("compId")); } } }
但是不管怎么调试都会出现:
Exception in thread "main" java.lang.IllegalStateException: unexpected docvalues type NONE for field 'compId' (expected=SORTED). Re-index with correct docvalues type. at org.apache.lucene.index.DocValues.checkField(DocValues.java:212) at org.apache.lucene.index.DocValues.getSorted(DocValues.java:264) at org.apache.lucene.search.grouping.term.TermFirstPassGroupingCollector.doSetNextReader(TermFirstPassGroupingCollector.java:91) at org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33) at org.apache.lucene.search.MultiCollector.getLeafCollector(MultiCollector.java:121) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:660) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473) at org.apache.lucene.search.grouping.GroupingSearch.groupByFieldOrFunction(GroupingSearch.java:193) at org.apache.lucene.search.grouping.GroupingSearch.search(GroupingSearch.java:129) at com.shunteng.service.test.v3.GroupSearchTest2.main(GroupSearchTest2.java:79)
问题在于分组排序字段必须为SortedDocValuesField,因为是long类型字段所以我就使用SortedNumbericDocValuesField这个字段建索引,运行后还是发生以上错误。百度基本上找个不到问题原因,官网也是没什么解释,官网wiki上也没有例子。只能debug代码来找出问题,看是在那个环节出现的异常。直接抛出异常的方法是DocValues.java中checkField方法,
private static void checkField(LeafReader in, String field, DocValuesType... expected) { FieldInfo fi = in.getFieldInfos().fieldInfo(field); if (fi != null) { DocValuesType actual = fi.getDocValuesType(); throw new IllegalStateException("unexpected docvalues type " + actual + " for field '" + field + "' " + (expected.length == 1 ? "(expected=" + expected[0] : "(expected one of " + Arrays.toString(expected)) + "). " + "Re-index with correct docvalues type."); } }
看下面方法进入此方法抛出异常,制药进入上面这个方法必定抛出异常。所以要找出为什么进入这个方法:
public static SortedDocValues getSorted(LeafReader reader, String field) throws IOException { SortedDocValues dv = reader.getSortedDocValues(field); if (dv == null) { checkField(reader, field, DocValuesType.SORTED); return emptySorted(); } else { return dv; } }
上面这个方法getSorted是关键条件,深入=reader.getSortedDocValues(field);这个方法如下
@Override @Override public final SortedDocValues getSortedDocValues(String field) throws IOException { ensureOpen(); MapdvFields = docValuesLocal.get(); Object previous = dvFields.get(field); if (previous != null && previous instanceof SortedDocValues) { return (SortedDocValues) previous; } else { FieldInfo fi = getDVField(field, DocValuesType.SORTED); if (fi == null) { return null; } SortedDocValues dv = getDocValuesReader().getSorted(fi); dvFields.put(field, dv); return dv; } }
这个previous instanceof SortedDocValues 如果成立 或这个后面类型为SortedDocValuesField 就不会有下面return null,问题就是在于我们配置索引分组字段的时候必须要使用SortedDocValuesField 这个类型的字段。不能使用SortedNumbericDocValuesField,来建索引,这个也是lucene奇葩地方。
我写这个文章目的是总结最近起找问题的心得,在使用阿里巴巴dubbo的时候也遇到到处找不到资料的情况,我们需要一步一步的debug出来问题根源比baidu上更靠谱速度也许更快。