自定义排序 IndexSearcher.java 动态计算存储的饭馆离某个位置最近最远 /** Expert: Low-level search implementation with arbitrary sorting. Finds * the top <code>n</code> hits for <code>query</code>, applying * <code>filter</code> if non-null, and sorting the hits by the criteria in * <code>sort</code>. * * <p>Applications should usually call {@link * Searcher#search(Query,Filter,int,Sort)} instead. * * @throws BooleanQuery.TooManyClauses */ @Override public TopFieldDocs search(Weight weight, Filter filter, final int nDocs, Sort sort) throws IOException { return search(weight, filter, nDocs, sort, true); } SortField.java /** Creates a sort with a custom comparison function. * @param field Name of field to sort by; cannot be <code>null</code>. * @param comparator Returns a comparator for sorting hits. */ public SortField(String field, FieldComparatorSource comparator) { initFieldType(field, CUSTOM); this.comparatorSource = comparator; } FieldComparatorSource.java /** * Provides a {@link FieldComparator} for custom field sorting. * * @lucene.experimental * */ public abstract class FieldComparatorSource implements Serializable { /** * Creates a comparator for the field in the given index. * * @param fieldname * Name of the field to create comparator for. * @return FieldComparator. * @throws IOException * If an error occurs reading the index. */ public abstract FieldComparator<?> newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException; } 对查询结果的进一步计算或者处理 Collector.java * <p><b>NOTE:</b> The doc that is passed to the collect * method is relative to the current reader. If your * collector needs to resolve this to the docID space of the * Multi*Reader, you must re-base it by recording the * docBase from the most recent setNextReader call. Here's * a simple example showing how to collect docIDs into a * BitSet:</p> * * <pre> * Searcher searcher = new IndexSearcher(indexReader); * final BitSet bits = new BitSet(indexReader.maxDoc()); * searcher.search(query, new Collector() { * private int docBase; * * <em>// ignore scorer</em> * public void setScorer(Scorer scorer) { * } * * <em>// accept docs out of order (for a BitSet it doesn't matter)</em> * public boolean acceptsDocsOutOfOrder() { * return true; * } * * public void collect(int doc) { * bits.set(doc + docBase); * } * * public void setNextReader(IndexReader reader, int docBase) { * this.docBase = docBase; * } * }); * </pre> 扩展QueryParse 1.禁用模糊查询和通配符查询 /** * Builds a new FuzzyQuery instance * @param term Term * @param minimumSimilarity minimum similarity * @param prefixLength prefix length * @return new FuzzyQuery Instance */ protected Query newFuzzyQuery(Term term, float minimumSimilarity, int prefixLength) { // FuzzyQuery doesn't yet allow constant score rewrite return new FuzzyQuery(term,minimumSimilarity,prefixLength); //去掉改为抛出异常 } 自定义过滤器,对于搜索结果本身可能会经常变化,导致在某段时间内可能需要过滤掉,某段时间不需要过滤,如果把这个字段加入索引,则可能导致结果不准确。比较好的方案是定义过滤器,可以根据某些特定规则对搜索进行过滤。比如热销书,某本书可能某段时间是热销书,某段时间不是,如果把是否热销书作为一个字段加入索引中,则不太合适,此时可以使用自定义filter计算某个doc是否要过滤掉。 /** * Abstract base class for restricting which documents may * be returned during searching. */ public abstract class Filter implements java.io.Serializable { /** * Creates a {@link DocIdSet} enumerating the documents that should be * permitted in search results. <b>NOTE:</b> null can be * returned if no documents are accepted by this Filter. * <p> * Note: This method will be called once per segment in * the index during searching. The returned {@link DocIdSet} * must refer to document IDs for that segment, not for * the top-level reader. * * @param reader a {@link IndexReader} instance opened on the index currently * searched on. Note, it is likely that the provided reader does not * represent the whole underlying index i.e. if the index has more than * one segment the given reader only represents a single segment. * * @return a DocIdSet that provides the documents which should be permitted or * prohibited in search results. <b>NOTE:</b> null can be returned if * no documents will be accepted by this Filter. * * @see DocIdBitSet */ public abstract DocIdSet getDocIdSet(IndexReader reader) throws IOException; } DocIdSet是二进制bit位,各bit的位置跟docid对应,如果某个bit设置为1,则会出现在搜索结果中,否则则不会出现在搜索结果。 filterQuery.java使用过滤后的查询,会拼成最终的查询表达式去查询。 性能问题: 1.lucene会在内部把RangeQuery重写booleanQuery来查询,OR查询表达式 如果查询范围超过1024,会抛出 TooManyClauses异常 /** Thrown when an attempt is made to add more than {@link * #getMaxClauseCount()} clauses. This typically happens if * a PrefixQuery, FuzzyQuery, WildcardQuery, or TermRangeQuery * is expanded to many terms during search. */ public static class TooManyClauses extends RuntimeException { public TooManyClauses() { super("maxClauseCount is set to " + maxClauseCount); } }