lucene的排序和缓存的应用

Lucene的排序是通过FieldComparator及其子类实现的,以StringOrdValComparator作为例子详细说明lucene的排序的基于缓存FieldCache实现。


思路:用一个数组保存某个filed字段对应的所有的document的最大的一个term。这个数组的index就是docId,值对应所有这个filed所有term的数组的index

StringOrdValComparator 类里面的
    private String[] lookup; 值为某个filed的所有的term的值

private int[] order;  index为docId,值为lookup的index,表示某个document的某个field的最大的term的在lookup中的index。

排序的时候根据docId查询出order[docId]值的大小比较。



初始化
IndexSearcher# public void search(Weight weight, Filter filter, Collector collector) 方法中调用的TopFieldCollector#OneComparatorNonScoringCollector.setNextReader
调用FieldComparator#StringOrdValComparator. setNextReadercollector.setNextReader(subReaders[i], docStarts[i]);
方法中调用
FieldCache.DEFAULT.getStringIndex(reader, field)
初始化filed对应的所有的term信息和frg信息。String就从””开始读取直到读完,


算法是在FieldCacheImpl#StringIndexCache.createValue实现的
代码如下结果就是某个document的对应的某个term的最大值
@Override
    protected Object createValue(IndexReader reader, Entry entryKey)
        throws IOException {
      String field = StringHelper.intern(entryKey.field);
      final int[] retArray = new int[reader.maxDoc()];
      String[] mterms = new String[reader.maxDoc()+1];
      TermDocs termDocs = reader.termDocs();
      TermEnum termEnum = reader.terms (new Term (field));
      int t = 0;  // current term number

      // an entry for documents that have no terms in this field
      // should a document with no terms be at top or bottom?
      // this puts them at the top - if it is changed, FieldDocSortedHitQueue
      // needs to change as well.
      mterms[t++] = null;

      try {
        do {
          Term term = termEnum.term();
          if (term==null || term.field() != field) break;

          // store term text
          // we expect that there is at most one term per document
          if (t >= mterms.length) throw new RuntimeException ("there are more terms than " +
                  "documents in field \"" + field + "\", but it's impossible to sort on " +
                  "tokenized fields");
          mterms[t] = term.text();

          termDocs.seek (termEnum);
          while (termDocs.next()) {
            retArray[termDocs.doc()] = t;
          }

          t++;
        } while (termEnum.next());
      } finally {
        termDocs.close();
        termEnum.close();
      }

      if (t == 0) {
        // if there are no terms, make the term array
        // have a single null entry
        mterms = new String[1];
      } else if (t < mterms.length) {
        // if there are less terms than documents,
        // trim off the dead array space
        String[] terms = new String[t];
        System.arraycopy (mterms, 0, terms, 0, t);
        mterms = terms;
      }

      StringIndex value = new StringIndex (retArray, mterms);
      return value;
    }
  };


排序的实现
Lucene的查询结果是放在优先队列里面的,优先对象是通过org.apache.lucene.search.FieldComparator.StringOrdValComparator进行比较的。比较的StringOrdValComparator. Compare方法

  @Override
    public int compare(int slot1, int slot2) {
      if (readerGen[slot1] == readerGen[slot2]) {
        int cmp = ords[slot1] - ords[slot2];
        if (cmp != 0) {
          return cmp;
        }
      }

      final String val1 = values[slot1];
      final String val2 = values[slot2];
      if (val1 == null) {
        if (val2 == null) {
          return 0;
        }
        return -1;
      } else if (val2 == null) {
        return 1;
      }
      return val1.compareTo(val2);
}

如果优先队列满了,则先和最低层的比较,如果大于最底层的则先替代最底层的,然后对优先队列重新排序 

TopFieldCollector# OneComparatorNonScoringCollector. Collect的红色标识
@Override
    public void collect(int doc) throws IOException {
      ++totalHits;
     if (queueFull) {
        if ((reverseMul * comparator.compareBottom(doc)) <= 0) {
          // since docs are visited in doc Id order, if compare is 0, it means
          // this document is largest than anything else in the queue, and
          // therefore not competitive.
          return;
        }
       
        // This hit is competitive - replace bottom element in queue & adjustTop
        comparator.copy(bottom.slot, doc);
        updateBottom(doc);
        comparator.setBottom(bottom.slot);
      } else {
        // Startup transient: queue hasn't gathered numHits yet
        final int slot = totalHits - 1;
        // Copy hit into queue
        comparator.copy(slot, doc);
        add(slot, doc, Float.NaN);
        if (queueFull) {
          comparator.setBottom(bottom.slot);
        }
      }
    }

你可能感兴趣的:(apache,算法,Lucene)