在lucene2中,对于结果的区间过滤,是通过RangeFilter来实现的,其中一个主要的判断“大”“小”的方法就是
public BitSet bits(IndexReader reader) throws IOException
在上一篇 [lucene那点事儿]继承RangeFilter编写数字型过滤器 通过继承RangeFilter重写bits方法,实现对于数字型数据的区间大小判断逻辑,以对数字型数据的直接支持(现在较为普遍的方式,是通过建立索引的时候对数字型的数据进行字符串转换,通过动态加 0 的方式凑齐长度来使用lucene字符串比对的方式实现对数字型数据的区间过滤,属于取巧的一种)。
在lucene3中,RangeFilter已经不再支持,而其中对于区间搜索的支持,也转到了TermRangeFilter TermRangeQuery TermRangeTermEnum 几个类中,所以一旦系统升级,原有的lucene2的方式将不再支持。
而lucene3中对于区间大小的比对,依然是通过对字符串的大小比对,所以根本上如果要重写支持数字型,只要重写其中字符串大小比对的部分就可以了,在lucene3中,实现方式要变换一下。
lucene3中,区间过滤器TermRangeFilter,其实现过程借用了TermRangeQuery:
public TermRangeFilter(String fieldName, String lowerTerm, String upperTerm, boolean includeLower, boolean includeUpper) { super(new TermRangeQuery(fieldName, lowerTerm, upperTerm, includeLower, includeUpper)); }
而TermRangeQuery,又借用TermRangeTermEnum来获取大小区间判断
@Override protected FilteredTermEnum getEnum(IndexReader reader) throws IOException { return new TermRangeTermEnum(reader, field, lowerTerm, upperTerm, includeLower, includeUpper, collator); }
而 TermRangeTermEnum 即在内部实现了针对字符串大小的判断:
@Override protected boolean termCompare(Term term) { if (collator == null) { // Use Unicode code point ordering boolean checkLower = false; if (!includeLower) // make adjustments to set to exclusive checkLower = true; if (term != null && term.field() == field) { // interned comparison if (!checkLower || null==lowerTermText || term.text().compareTo(lowerTermText) > 0) { checkLower = false; if (upperTermText != null) { int compare = upperTermText.compareTo(term.text()); /* * if beyond the upper term, or is exclusive and this is equal to * the upper term, break out */ if ((compare < 0) || (!includeUpper && compare==0)) { endEnum = true; return false; } } return true; } } else { // break endEnum = true; return false; } return false; } else { if (term != null && term.field() == field) { // interned comparison if ((lowerTermText == null || (includeLower ? collator.compare(term.text(), lowerTermText) >= 0 : collator.compare(term.text(), lowerTermText) > 0)) && (upperTermText == null || (includeUpper ? collator.compare(term.text(), upperTermText) <= 0 : collator.compare(term.text(), upperTermText) < 0))) { return true; } return false; } endEnum = true; return false; } }
源码的设计暂且不议,批判性的文章已经不少了,追踪到这里就可以知道本次重写需要涉及的地方,重写代码:
/** * 支持数字型的区间过滤器 * @description <p></p> * @author quzishen * @project NormandyPositionII * @class NumTermRangeFilter.java * @version 1.0 * @time 2011-1-6 */ public class NumTermRangeFilter extends MultiTermQueryWrapperFilter<NumTermRangeQuery> { private static final long serialVersionUID = 1L; protected NumTermRangeFilter(NumTermRangeQuery query) { super(query); } public NumTermRangeFilter(String fieldName, double lowerTerm, double upperTerm, boolean includeLower, boolean includeUpper) { super(new NumTermRangeQuery(fieldName, lowerTerm, upperTerm, includeLower, includeUpper)); } public NumTermRangeFilter(String fieldName, double lowerTerm, double upperTerm, boolean includeLower, boolean includeUpper, Collator collator) { super(new NumTermRangeQuery(fieldName, lowerTerm, upperTerm, includeLower, includeUpper, collator)); } public static NumTermRangeFilter Less(String fieldName, double upperTerm) { return new NumTermRangeFilter(fieldName, Double.MIN_VALUE, upperTerm, false, true); } public static NumTermRangeFilter More(String fieldName, double lowerTerm) { return new NumTermRangeFilter(fieldName, lowerTerm, Double.MAX_VALUE, true, false); } /** Returns the field name for this filter */ public String getField() { return query.getField(); } /** Returns the lower value of this range filter */ public double getLowerTerm() { return query.getLowerTerm(); } /** Returns the upper value of this range filter */ public double getUpperTerm() { return query.getUpperTerm(); } /** Returns <code>true</code> if the lower endpoint is inclusive */ public boolean includesLower() { return query.isIncludeLower(); } /** Returns <code>true</code> if the upper endpoint is inclusive */ public boolean includesUpper() { return query.isIncludeUpper(); } /** Returns the collator used to determine range inclusion, if any. */ public Collator getCollator() { return query.getCollator(); } }
/** * 支持数字型的区间检索 * @description <p></p> * @author quzishen * @project NormandyPositionII * @class NumTermRangeQuery.java * @version 1.0 * @time 2011-1-6 */ public class NumTermRangeQuery extends MultiTermQuery { private static final long serialVersionUID = 1L; private double lowerTerm; private double upperTerm; private Collator collator; private String field; private boolean includeLower; private boolean includeUpper; public NumTermRangeQuery(String field, double lowerTerm, double upperTerm, boolean includeLower, boolean includeUpper, Collator collator) { this.field = field; this.lowerTerm = lowerTerm; this.upperTerm = upperTerm; this.includeLower = includeLower; this.includeUpper = includeUpper; this.collator = collator; } public NumTermRangeQuery(String field, double lowerTerm, double upperTerm, boolean includeLower, boolean includeUpper) { this(field, lowerTerm, upperTerm, includeLower, includeUpper, null); } @Override protected FilteredTermEnum getEnum(IndexReader reader) throws IOException { return new NumTermRangeTermEnum(reader, field, lowerTerm, upperTerm, includeLower, includeUpper, collator); } @Override public String toString(String arg0) { StringBuilder buffer = new StringBuilder(); if (!getField().equals(field)) { buffer.append(getField()); buffer.append(":"); } buffer.append(includeLower ? '[' : '{'); buffer.append(lowerTerm); buffer.append(" TO "); buffer.append(upperTerm); buffer.append(includeUpper ? ']' : '}'); buffer.append(ToStringUtils.boost(getBoost())); return buffer.toString(); } @Override public int hashCode() { final int prime = 31; int result = super.hashCode(); result = prime * result + ((collator == null) ? 0 : collator.hashCode()); result = prime * result + ((field == null) ? 0 : field.hashCode()); result = prime * result + (includeLower ? 1231 : 1237); result = prime * result + (includeUpper ? 1231 : 1237); result = prime * result + (int) lowerTerm; result = prime * result + (int) upperTerm; return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (!super.equals(obj)) return false; if (getClass() != obj.getClass()) return false; NumTermRangeQuery other = (NumTermRangeQuery) obj; if (collator == null) { if (other.collator != null) return false; } else if (!collator.equals(other.collator)) return false; if (field == null) { if (other.field != null) return false; } else if (!field.equals(other.field)) return false; if (includeLower != other.includeLower) return false; if (includeUpper != other.includeUpper) return false; return lowerTerm == other.lowerTerm && upperTerm == other.upperTerm; }
/** * 数字型的Term比对 * @description <p></p> * @author quzishen * @project NormandyPositionII * @class NumTermRangeTermEnum.java * @version 1.0 * @time 2011-1-6 */ public class NumTermRangeTermEnum extends FilteredTermEnum { @SuppressWarnings("unused") private Collator collator = null; private boolean endEnum = false; private String field; //~~~ lucene 3 it is String. private double upperTermText; //~~~ lucene 3 it is String. private double lowerTermText; private boolean includeLower; private boolean includeUpper; NumTermRangeTermEnum(IndexReader reader, String field, double lowerTermText, double upperTermText, boolean includeLower, boolean includeUpper, Collator collator) throws IOException { this.collator = collator; this.upperTermText = upperTermText; this.lowerTermText = lowerTermText; this.includeLower = includeLower; this.includeUpper = includeUpper; this.field = StringHelper.intern(field); if (this.lowerTermText == 0) { this.includeLower = true; } if (this.upperTermText == 0) { this.includeUpper = true; } double startTermText = collator == null ? this.lowerTermText : 0; setEnum(reader.terms(new Term(this.field, ""+startTermText))); } @Override public float difference() { return 1.0f; } @Override protected boolean endEnum() { return endEnum; } /* (non-Javadoc) * @see org.apache.lucene.search.FilteredTermEnum#termCompare(org.apache.lucene.index.Term) */ @Override protected boolean termCompare(Term other) { /** * 主要是通过对该方法的重写,实现对数字型数据大小的比对 */ if(null != other && other.field() == field) { boolean result = false; result = includeLower ? (lowerTermText <= Double.parseDouble(other.text())) : (lowerTermText < Double.parseDouble(other.text())); result = result ? (includeUpper ? (upperTermText >= Double.parseDouble(other.text())) : (upperTermText > Double.parseDouble(other.text()))) :result; return result; } else { endEnum = true; return false; } } }
使用的方法
FilteredQuery filteredQuery; if(NorStringUtils.checkIsNumberic(beginValue) && NorStringUtils.checkIsNumberic(endValue)){ double beginDouble = Double.parseDouble(beginValue); double endDouble = Double.parseDouble(endValue); NumTermRangeFilter termRangeFilter = new NumTermRangeFilter(betweenKey, beginDouble, endDouble, true, true); filteredQuery = new FilteredQuery(query, termRangeFilter); } else { TermRangeFilter termRangeFilter = new TermRangeFilter(betweenKey, beginValue, endValue, true, true); filteredQuery = new FilteredQuery(query, termRangeFilter); }
测试数据:
索引中数据如下:
|字段|值|
|price|1|
|price|1111|
检索条件为price: [1000 TO 2000]
检索后得到结果:
|字段|值|
|price|1111|