开场白:
我看过几本书说关于lucene中的BooleanQuery查询条件的参数
Boolean.Clause.MUST,Boolean.Clause.MUST_NOT,Boolean.Clause.SHOULD之关的关系,其实就好象是集合中的交集并集等关系.这里不重复书的例子,说一点我平时在工作中的应用吧.
书基本都是说,当MUST与SHOULD关联使用的时候,跟MUST使用是一样的,那天我做一个关系搜索结果按相关排序,这里简单说一下需求,这是说用户输入关键字后,应该把所有的付费会员商品按相关度从高到低排序来推荐给用户.现在已经一批付费会员的编号了,同时知道会员用户的产品名称与产品介绍。那应该如何去按照输入的关键字来排序呢?要知道,关键字有可能与付费会员的产品完全不相关,当然,这些不相关产品会排在最后边。
现在贴出相关的代码
package com.lucene.test;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopFieldDocs;
/**
*
* @author kernaling.wong
* 本类只是演示should对结果的出现是没有影响的,但是影响结果的相关度,可以利用之来进度某些相关的排序搜索
*/
public class LuceneTest {
private static String INDEX_PATH = "./Index/";
public static void main(String[] args) {
if(false){ //做索引的时候就把 true;
IndexWriter writer = null;
try{
writer = new IndexWriter(INDEX_PATH,new StandardAnalyzer(),MaxFieldLength.LIMITED);
Document doc = new Document();
Field fieldCode = new Field("Code","1",Field.Store.YES,Field.Index.ANALYZED);
Field fieldName = new Field("Name","电热炉",Field.Store.YES,Field.Index.ANALYZED);
Field fieldInfo = new Field("Info","这是一个电热
炉",Field.Store.YES,Field.Index.ANALYZED);
doc.add(fieldCode);
doc.add(fieldName);
doc.add(fieldInfo);
writer.addDocument(doc);
doc = new Document();
fieldCode = new Field("Code","2",Field.Store.YES,Field.Index.ANALYZED);
fieldName = new Field("Name","电水壶",Field.Store.YES,Field.Index.ANALYZED);
fieldInfo = new Field("Info","解放牌电水壶",Field.Store.YES,Field.Index.ANALYZED);
doc.add(fieldCode);
doc.add(fieldName);
doc.add(fieldInfo);
writer.addDocument(doc);
doc = new Document();
fieldCode = new Field("Code","3",Field.Store.YES,Field.Index.ANALYZED);
fieldName = new Field("Name","水杯",Field.Store.YES,Field.Index.ANALYZED);
fieldInfo = new Field("Info","钢化水杯",Field.Store.YES,Field.Index.ANALYZED);
doc.add(fieldCode);
doc.add(fieldName);
doc.add(fieldInfo);
writer.addDocument(doc);
doc = new Document();
fieldCode = new Field("Code","4",Field.Store.YES,Field.Index.ANALYZED);
fieldName = new Field("Name","碟子",Field.Store.YES,Field.Index.ANALYZED);
fieldInfo = new Field("Info","质量很好",Field.Store.YES,Field.Index.ANALYZED);
doc.add(fieldCode);
doc.add(fieldName);
doc.add(fieldInfo);
writer.addDocument(doc);
System.out.println("索引建立完成!");
}catch(Exception ex){
ex.printStackTrace();
}finally{
if(writer != null){
try{
writer.close();
}catch(Exception ex){
ex.printStackTrace();
}
}
}
}else{
IndexSearcher search = null;
IndexReader ir = null;
try{
search = new IndexSearcher(INDEX_PATH);
String keyword = "水"; //用户输入了需要搜索的关键字
BooleanQuery bq = new BooleanQuery();
/**
* 以下上查询条件,因为无论用户输入的关键字如何都需要全部显示付费会员商品,相关度从高
到低
*/
BooleanQuery tmpBQ = new BooleanQuery(); //这是一个
tmpBQ.add(new TermQuery((new Term("Code","1"))),BooleanClause.Occur.MUST);
tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD);
tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD);
bq.add(tmpBQ,BooleanClause.Occur.SHOULD);
tmpBQ = new BooleanQuery();
tmpBQ.add(new TermQuery((new Term("Code","2"))),BooleanClause.Occur.MUST);
tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD);
tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD);
bq.add(tmpBQ,BooleanClause.Occur.SHOULD);
tmpBQ = new BooleanQuery();
tmpBQ.add(new TermQuery((new Term("Code","3"))),BooleanClause.Occur.MUST);
tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD);
tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD);
bq.add(tmpBQ,BooleanClause.Occur.SHOULD);
tmpBQ = new BooleanQuery();
tmpBQ.add(new TermQuery((new Term("Code","4"))),BooleanClause.Occur.MUST);
tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD);
tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD);
bq.add(tmpBQ,BooleanClause.Occur.SHOULD);
Sort sort = new Sort(SortField.FIELD_SCORE); //按相关度来排序
TopFieldDocs tdocs = search.search(bq,null,4,sort);
ScoreDoc scoreDocs[] = tdocs.scoreDocs;
ir = search.getIndexReader();
for(int i=0;i<scoreDocs.length;i++){
Document tmpDoc = ir.document(scoreDocs[i].doc);
System.out.println("文档得分:"+scoreDocs[i].score+"\t产品编号:"+tmpDoc.get
("Code")+"\t产品名称:"+tmpDoc.get("Name")+"\t产品说明:"+tmpDoc.get("Info"));
}
}catch(Exception ex){
ex.printStackTrace();
}finally{
try{
if(search != null){
search.close();
}
if(ir != null){
ir.close();
}
}catch(Exception ex){
ex.printStackTrace();
}
}
}
}
}
测试环境是 windowsXP + jdk1.6 与 lucene2.4.
实现后排序的结果应该是这样
后记
以上那种方面需要说明的是,如果出现了大量的付费会员的时候,即,需要BooleanClause.Occur.SHOULD很多用户编号的时候,建议不要再用些方法了,同时BooleanQuery条件中会限制BooleanQuery.add的条件数量,默认是1024个,当然也可以手动设置它的最大上限,通过setMaxClauseCount(int maxCount),因为按我以上那种方法,当注册用户数很多,比如10W或者以上的时候,搜索时性能就会成为瓶颈了。这个时候,我现在想出的方法就是更改Lucene索引的字段,不要再指定付费用户去SHOULD每一个付费用户了,改为增加一个字段去表示是否注册用户.然后再用上边类似方法就可以比较高效率地实现这个功能了,最原先那个种方法是头脑发热的时候想出来的,那时付费的用户并不多,所以就算这样实现起来都没有问题,不过现在也已经改为到第二种方法了,在暂时无法改变lucene源代码实现与算法的时候,那只能改变自己的方法了.