上面既然说到PhraseQuery和SpanQuery,那我就随带扯一扯这两个Query的区别吧,我估计这是很多初学Lucene者比较困惑的问题,两个Query都能根据多个Term进行查询,但PhraseQuery只能按照查询短语在文档中出现的顺序进行匹配,而不能颠倒过来匹配,比如你查询quick lazy,而索引中出现的是xxxxxxxxlazy qucikxxxxxxx,那PhraseQuery就没法匹配到了,这时候你就只能使用SpanQuery了,SpanQuery的inorder参数允许你设置是否按照查询短语在文档中出现的顺序进行匹配,以及是否允许有重叠,什么叫是否允许重叠?举个例子说明,假如域的值是这样的:“jumps over extremely very lazy broxn dog”,而你的查询短语是“dog over”,因为索引中dog在over后面,而你提供的查询短语中dog却在over前面,这与它在索引文档中出现的顺序是颠倒的,这时候你就不能使用PhraseQuery,PhraseQuery只能按出现顺序匹配,这种颠倒顺序匹配无法用PhraseQuery实现,把SpanQuery的inOrder设为false,就可以无视顺序了,即只要你能按slop规定的步数内匹配到dog over或者 over dog都算匹配成功。而如果inOrder设为true,意思就是你只能在规定步数内匹配到dog over,而匹配到over dog不算,并且匹配过程中不能有重叠。什么叫重叠?要得到dog over,那只能把over往右移动6步,可是它跨过了dog了,即dog重叠了,意思就是你只能在两者之间移动,不能跨越两者的边界进行匹配。我解释的不知道你们能看的明白不?注意两者的slop都是最多需要移动几步的意思即在规定步数内达到你想要的情况。
Overview Although Lucene provides the ability to create your own queries through its API, it also provides a rich query language through the Query Parser, a lexer which interprets a string into a Lucene Query using JavaCC. Generally, the query parser syntax may change from release to release. This page describes the syntax as of the current release. If you are using a different version of Lucene, please consult the copy of docs/queryparsersyntax.html that was distributed with the version you are using. Before choosing to use the provided Query Parser, please consider the following: If you are programmatically generating a query string and then parsing it with the query parser then you should seriously consider building your queries directly with the query API. In other words, the query parser is designed for human-entered text, not for program-generated text. Untokenized fields are best added directly to queries, and not through the query parser. If a field's values are generated programmatically by the application, then so should query clauses for this field. An analyzer, which the query parser uses, is designed to convert human-entered text to terms. Program-generated values, like dates, keywords, etc., should be consistently program-generated. In a query form, fields which are general text should use the query parser. All others, such as date ranges, keywords, etc. are better added directly through the query API. A field with a limit set of values, that can be specified with a pull-down menu should not be added to a query string which is subsequently parsed, but rather added as a TermQuery clause.
Java Compiler Compiler tm (JavaCC tm) is the most popular parser generator for use with Java tm applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions, debugging, etc.
1.QueryParser是为用户输入文本而设计的而不是你应用程序生成的文本而设计的,什么意思?意思就是你要考虑最恶劣的情况,因为用户输入的文本是无法预知的,你不能试图去规范用户输入什么样格式的查询字符串,如果你正在准备这么做,请你还是去使用Query api 构建你的Query实现类吧。
2.没有分词的域请直接使用Query API来构建你的Query实现类,因为QueryParser会使用分词器对用户输入的文本进行分词得到N个Term,然后再根据匹配的,这点你必须清楚。
3.第3点里提示你在设计查询表单时,对应普通的文本框可以直接使用QueryParser,但像日期范围啊搜索关键字啊下拉框里选定某个值或多个值进行限定值时,请使用Query API去做。
Term直接用一个单词表示,如“hello” ,多个Term用空格分割,如“hello java”,
可以添加上域,域和Term字符串用冒号隔开,如title:"The Right Way",查询多个域用or或者and连接,
如title:"The Right Way" AND text:go
Term字符串你还可以使用通配符进行模糊匹配,如title: ja*a title:ja?a title:ja*等等
你还可以使用~字符开启FuzzyQuery,如title:roam~ or title:roam~0.8
QueryParser语法表达式还支持开启PhraseQuery短语查询,如title:"jakarta apache"~10
当然也支持范围查询,title:[java to php],age[18 to 28]
Boolean Operators即boolean操作符即or和and,用来链接多个Term的,如果两个Term仅仅用空格隔开,则默认为or链接的,如title:java^5 and content:lucen*
当然还有+ -字符,表示必须符合和必须不符合即排除的意思,如+jakarta lucene,但注意只有一个Term的时候,不能用NOT,比如NOT "jakarta apache"是不合法的。
而这样就可以,"jakarta apache" -"Apache Lucene"表示必须包含jakarta apache,但不能包含Apache Lucene.
当or and条件很复杂时,需要限制优先级时可以用()小括号对Term条件进行分组,如(jakarta OR apache) AND website
当对某个域的限定值有多个可以用or/and进行链接,也可以用()写在一起,如title:(+return +"pink panther"),当然你也可以用and拆成title:return and title:"pink panther"
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \
QueryParser parser = new QueryParser(fieldName, new IKAnalyzer()); Query query = parser.parse(queryString);
但QueryParser并不能完全代替Query API,它并不能实现所有Query实现类的功能,比如它不支持SpanQuery.
public MultiFieldQueryParser(String[] fields, Analyzer analyzer, Map<String,Float> boosts) { this(fields, analyzer); this.boosts = boosts; }