Lucene作为一个开源的搜索工具包,它为开发人员提供了丰富的查询方法,总结如下:
第一种:TermQuery.TermQuery是Lucene里面最基本的一种原子查询。开发人员可以通过它来检索索引中含有指定词条的Document。代码如下:
Java代码
public static void main(String[] args) throws IOException {
createIndex();
termQuery();
}
private static void createIndex() throws IOException {
IndexWriter writer = new IndexWriter(STORE_PATH,
new StandardAnalyzer(), true);
writer.setUseCompoundFile( false);
Document doc1 = new Document();
Document doc2 = new Document();
Field field = new Field( "PostTitle", "Lucene开发浅谈", Field.Store.YES,
Field.Index.TOKENIZED);
Field field1 = new Field( "PostContent", "Lucene是一个开源的搜索工具包",
Field.Store.YES, Field.Index.TOKENIZED);
doc1.add(field);
doc1.add(field1);
Field field2 = new Field( "PostTitle", "云计算浅谈", Field.Store.YES,
Field.Index.TOKENIZED);
Field field3 = new Field( "PostContent",
"云计算是一种基于 Web 的服务,它使得普通计算机拥有超级计算机的能力", Field.Store.NO,
Field.Index.TOKENIZED);
doc2.add(field2);
doc2.add(field3);
writer.addDocument(doc1);
writer.addDocument(doc2);
writer.close();
}
private static void termQuery() throws IOException {
IndexSearcher searcher = new IndexSearcher(STORE_PATH);
Term term = new Term( "PostContent", "lucene");
Query termQuery = new TermQuery(term);
Hits hits = searcher.search(termQuery);
System.out.println( "TermQuery demo------");
System.out.println( "hits.length()==" + hits.length());
for ( int i = 0; i < hits.length(); i++) {
System.out.println(hits.doc(i));
}
}
Java代码
Java代码
TermQuery demo------
hits.length()==1
Document<stored/uncompressed,indexed,tokenized<PostTitle:Lucene开发浅谈> stored/uncompressed,indexed,tokenized<PostContent:Lucene是一个开源的搜索工具包>>
第二种:BooleanQuery。布尔查询其实就是将各种查询的结果再进行布尔运算,最后在得到查询结果。其中具体的组合方式有如下几种:
1 MUST,MUST
2 MUST,MUST_NOT
3 MUST_SHOULD
4 MUST_NOT,SHOULD.
5 SHOULD,SHOULD
6 MUST_NOT,MUST_NOT
具体代码如下:
Java代码
public static void main(String[] args) throws IOException {
createIndex();
booleanQuery();
}
private static void booleanQuery() throws IOException {
IndexSearcher searcher = new IndexSearcher(STORE_PATH);
Term term1 = new Term( "PostTitle", "谈");
Term term2 = new Term( "PostContent", "源");
TermQuery termquery1 = new TermQuery(term1);
TermQuery termquery2 = new TermQuery(term2);
BooleanQuery query = new BooleanQuery();
query.add(termquery1, BooleanClause.Occur.MUST);
query.add(termquery2, BooleanClause.Occur.MUST);
Hits hits = searcher.search(query);
System.out.println( "BooleanQuery demo with MUST and MUST -------");
System.out.println( "hits.length()==" + hits.length());
for ( int i = 0; i < hits.length(); i++) {
System.out.println(hits.doc(i));
}
System.out.println( "-----------------------------------");
BooleanQuery query1 = new BooleanQuery();
query1.add(termquery1, BooleanClause.Occur.MUST);
query1.add(termquery2, BooleanClause.Occur.MUST_NOT);
Hits hits1 = searcher.search(query1);
System.out.println( "BooleanQuery demo with MUST and MUST_NOT -------");
System.out.println( "hits.length()==" + hits1.length());
for ( int i = 0; i < hits1.length(); i++) {
System.out.println(hits1.doc(i));
}
System.out.println( "-----------------------------------");
BooleanQuery query2 = new BooleanQuery();
query2.add(termquery1, BooleanClause.Occur.SHOULD);
query2.add(termquery2, BooleanClause.Occur.MUST_NOT);
Hits hits2 = searcher.search(query2);
System.out.println( "BooleanQuery demo with SHOULD and MUST_NOT -------");
System.out.println( "hits.length()==" + hits2.length());
for ( int i = 0; i < hits2.length(); i++) {
System.out.println(hits2.doc(i));
}
System.out.println( "-----------------------------------");
Term term3 = new Term( "PostTitle", "lucene");
Term term4 = new Term( "PostContent", "云");
TermQuery termquery3 = new TermQuery(term3);
TermQuery termquery4 = new TermQuery(term4);
BooleanQuery query3 = new BooleanQuery();
query3.add(termquery3, BooleanClause.Occur.SHOULD);
query3.add(termquery4, BooleanClause.Occur.SHOULD);
Hits hits3 = searcher.search(query3);
System.out.println( "BooleanQuery demo with SHOULD and SHOULD -------");
System.out.println( "hits.length()==" + hits3.length());
for ( int i = 0; i < hits3.length(); i++) {
System.out.println(hits3.doc(i));
}
}
运行结果如下:
Java代码
BooleanQuery demo with MUST and MUST -------
hits.length()==1
Document<stored/uncompressed,indexed,tokenized<PostTitle:Lucene开发浅谈> stored/uncompressed,indexed,tokenized<PostContent:Lucene是一个开源的搜索工具包>>
-----------------------------------
BooleanQuery demo with MUST and MUST_NOT -------
hits.length()==1
Document<stored/uncompressed,indexed,tokenized<PostTitle:云计算浅谈>>
-----------------------------------
BooleanQuery demo with SHOULD and MUST_NOT -------
hits.length()==1
Document<stored/uncompressed,indexed,tokenized<PostTitle:云计算浅谈>>
-----------------------------------
BooleanQuery demo with SHOULD and SHOULD -------
hits.length()==2
Document<stored/uncompressed,indexed,tokenized<PostTitle:Lucene开发浅谈> stored/uncompressed,indexed,tokenized<PostContent:Lucene是一个开源的搜索工具包>>
Document<stored/uncompressed,indexed,tokenized<PostTitle:云计算浅谈>>
其中Should和must组合时,检索结果为must的结果,当于must_not结合时就相当于must和must_not.
第三种:RangeQuery。范围查询顾名思意就是给定一个方位来查询,比如查询用户ID在“10001-10005”之间的用户等。具体的代码如下:
Java代码
private static void rangeQuery() throws IOException {
IndexWriter writer = new IndexWriter(STORE_PATH,
new StandardAnalyzer(), true);
Field field1 = new Field( "userID", "10001", Field.Store.YES,
Field.Index.TOKENIZED);
Field field2 = new Field( "userID", "10002", Field.Store.YES,
Field.Index.TOKENIZED);
Field field3 = new Field( "userID", "10003", Field.Store.YES,
Field.Index.TOKENIZED);
Field field4 = new Field( "userID", "10004", Field.Store.YES,
Field.Index.TOKENIZED);
Field field5 = new Field( "userID", "10005", Field.Store.YES,
Field.Index.TOKENIZED);
Document doc1 = new Document();
Document doc2 = new Document();
Document doc3 = new Document();
Document doc4 = new Document();
Document doc5 = new Document();
doc1.add(field1);
doc2.add(field2);
doc3.add(field3);
doc4.add(field4);
doc5.add(field5);
writer.addDocument(doc1);
writer.addDocument(doc2);
writer.addDocument(doc3);
writer.addDocument(doc4);
writer.addDocument(doc5);
writer.close();
IndexSearcher searcher = new IndexSearcher(STORE_PATH);
Term start = new Term( "userID", "10001");
Term end = new Term( "userID", "10002");
RangeQuery query = new RangeQuery(start, end, true);
Hits hits = searcher.search(query);
System.out.println( "RangeQuery demo------");
System.out.println( "hits.length()==" + hits.length());
for ( int i = 0; i < hits.length(); i++) {
System.out.println(hits.doc(i));
}
}
运行结果如下:
Java代码
RangeQuery demo------
hits.length()==5
Document<stored/uncompressed,indexed,tokenized<userID:10001>>
Document<stored/uncompressed,indexed,tokenized<userID:10002>>
Document<stored/uncompressed,indexed,tokenized<userID:10003>>
Document<stored/uncompressed,indexed,tokenized<userID:10004>>
Document<stored/uncompressed,indexed,tokenized<userID:10005>>
其中RangeQuery构造函数的第三个参数是用来指定是否包含边界值,如果是true,就是闭区间,如果为false则为开区间。