[lucene异常]why am I getting a TooManyClause exception

异常情况:

org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
 at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:165)
 at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:156)
 at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:106)

 

出错代码:

BooleanQuery用一个变量存储搜索字句 clauses 是一个List类型,同时使用另外一个变量限制其长度

private static int maxClauseCount = 1024;

而出现这个异常的代码,就是:

/** Adds a clause to a boolean query. * @throws TooManyClauses if the new number of clauses exceeds the maximum clause number * @see #getMaxClauseCount() */ public void add(BooleanClause clause) { if (clauses.size() >= maxClauseCount) throw new TooManyClauses(); clauses.add(clause); } /** Thrown when an attempt is made to add more than {@link * #getMaxClauseCount()} clauses. This typically happens if * a PrefixQuery, FuzzyQuery, WildcardQuery, or RangeQuery * is expanded to many terms during search. */ public static class TooManyClauses extends RuntimeException { public TooManyClauses() {} public String getMessage() { return "maxClauseCount is set to " + maxClauseCount; } }

 

为什么呢?看了官方的文档

以下类型的查询是扩大了Lucene中的搜索:RangeQuery,PrefixQuery,WildcardQuery,FuzzyQuery。

比如,如果索引文档中包括条件 car 和 cars,那么使用 ca* 搜索之前,将被扩展成 car or cars。(查询字句的数目增大了,尤其是数据量较大,数据相似度较高,搜索条件较短的情况下这个出现的概率更高),这个条件列表的长度默认被限制在1024。当超出了1024的时候,就从上面的代码中抛出了异常。

解决方法有三种推荐的:

1、使用RangeFilter替换部分查询RangeQuery,但是效率会有影响;

2、设置默认长度值,BooleanQuery.setMaxClauseCount(),设置成10000,或者取消这个限制,BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE).

3、针对个别特殊的字段进行一些优化,比如时间字段保留到yyyyMMdd位,以避免后面时分位带来的搜索条件的扩大。

--------------------------------------

 

官方FAQ解释:

The following types of queries are expanded by Lucene before it does the search: RangeQuery, PrefixQuery, WildcardQuery, FuzzyQuery. For example, if the indexed documents contain the terms "car" and "cars" the query "ca*" will be expanded to "car OR cars" before the search takes place. The number of these terms is limited to 1024 by default. Here's a few different approaches that can be used to avoid the TooManyClauses exception:

  • Use a filter to replace the part of the query that causes the exception. For example, a RangeFilter can replace a RangeQuery on date fields and it will never throw the TooManyClauses exception -- You can even use ConstantScoreRangeQuery to execute your RangeFilter as a Query. Note that filters are slower than queries when used for the first time, so you should cache them using CachingWrapperFilter. Using Filters in place of Queries generated by QueryParser can be achieved by subclassing QueryParser and overriding the appropriate function to return a ConstantScore version of your Query.

  • Increase the number of terms using BooleanQuery.setMaxClauseCount(). Note that this will increase the memory requirements for searches that expand to many terms. To deactivate any limits, use BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE).

  • A specfic solution that can work on very precise fields is to reduce the precision of the data in order to reduce the number of terms in the index. For example, the DateField class uses a microsecond resultion, which is often not required. Instead you can save your dates in the "yyyymmddHHMM" format, maybe even without hours and minutes if you don't need them (this was simplified in Lucene 1.9 thanks to the new DateTools class).

你可能感兴趣的:(搜索与挖掘)