我在工作中的Lucene中关于 MUST , SHOULD的一个想法

开场白:
我看过几本书说关于lucene中的BooleanQuery查询条件的参数

Boolean.Clause.MUST,Boolean.Clause.MUST_NOT,Boolean.Clause.SHOULD之关的关系,其实就好象是集合中的交集并集等关系.这里不重复书的例子,说一点我平时在工作中的应用吧.
   书基本都是说,当MUST与SHOULD关联使用的时候,跟MUST使用是一样的,那天我做一个关系搜索结果按相关排序,这里简单说一下需求,这是说用户输入关键字后,应该把所有的付费会员商品按相关度从高到低排序来推荐给用户.现在已经一批付费会员的编号了,同时知道会员用户的产品名称与产品介绍。那应该如何去按照输入的关键字来排序呢?要知道,关键字有可能与付费会员的产品完全不相关,当然,这些不相关产品会排在最后边。


现在贴出相关的代码

package com.lucene.test;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopFieldDocs;

/**
 * 
 * @author kernaling.wong
 *	本类只是演示should对结果的出现是没有影响的,但是影响结果的相关度,可以利用之来进度某些相关的排序搜索
 */
public class LuceneTest {
	private static String INDEX_PATH = "./Index/";
	public static void main(String[] args) {

		if(false){					//做索引的时候就把 true;
			IndexWriter writer = null;
			try{
				writer = new IndexWriter(INDEX_PATH,new StandardAnalyzer(),MaxFieldLength.LIMITED);
				
				Document doc = new Document();
				Field fieldCode = new Field("Code","1",Field.Store.YES,Field.Index.ANALYZED);
				Field fieldName = new Field("Name","电热炉",Field.Store.YES,Field.Index.ANALYZED);
				Field fieldInfo = new Field("Info","这是一个电热

炉",Field.Store.YES,Field.Index.ANALYZED);
				doc.add(fieldCode);
				doc.add(fieldName);
				doc.add(fieldInfo);
				writer.addDocument(doc);
				
				doc = new Document();
				fieldCode = new Field("Code","2",Field.Store.YES,Field.Index.ANALYZED);
				fieldName = new Field("Name","电水壶",Field.Store.YES,Field.Index.ANALYZED);
				fieldInfo = new Field("Info","解放牌电水壶",Field.Store.YES,Field.Index.ANALYZED);
				doc.add(fieldCode);
				doc.add(fieldName);
				doc.add(fieldInfo);
				writer.addDocument(doc);
				
				doc = new Document();
				fieldCode = new Field("Code","3",Field.Store.YES,Field.Index.ANALYZED);
				fieldName = new Field("Name","水杯",Field.Store.YES,Field.Index.ANALYZED);
				fieldInfo = new Field("Info","钢化水杯",Field.Store.YES,Field.Index.ANALYZED);
				doc.add(fieldCode);
				doc.add(fieldName);
				doc.add(fieldInfo);
				writer.addDocument(doc);
				
				doc = new Document();
				fieldCode = new Field("Code","4",Field.Store.YES,Field.Index.ANALYZED);
				fieldName = new Field("Name","碟子",Field.Store.YES,Field.Index.ANALYZED);
				fieldInfo = new Field("Info","质量很好",Field.Store.YES,Field.Index.ANALYZED);
				doc.add(fieldCode);
				doc.add(fieldName);
				doc.add(fieldInfo);
				writer.addDocument(doc);
				
				System.out.println("索引建立完成!");
			}catch(Exception ex){
				ex.printStackTrace();
			}finally{
				if(writer != null){
					try{
						writer.close();
					}catch(Exception ex){
						ex.printStackTrace();
					}
				}
			}
		}else{
			
			IndexSearcher search = null;
			IndexReader ir = null;
			try{
				search = new IndexSearcher(INDEX_PATH);
				
				String keyword = "水";		//用户输入了需要搜索的关键字
				
				BooleanQuery bq = new BooleanQuery();
				
				/**
				 * 以下上查询条件,因为无论用户输入的关键字如何都需要全部显示付费会员商品,相关度从高

到低
				 */
				BooleanQuery tmpBQ = new BooleanQuery();			//这是一个
				tmpBQ.add(new TermQuery((new Term("Code","1"))),BooleanClause.Occur.MUST);
				tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD);
				tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD);
				bq.add(tmpBQ,BooleanClause.Occur.SHOULD);
				
				tmpBQ = new BooleanQuery();
				tmpBQ.add(new TermQuery((new Term("Code","2"))),BooleanClause.Occur.MUST);
				tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD);
				tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD);
				bq.add(tmpBQ,BooleanClause.Occur.SHOULD);
				
				tmpBQ = new BooleanQuery();
				tmpBQ.add(new TermQuery((new Term("Code","3"))),BooleanClause.Occur.MUST);
				tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD);
				tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD);
				bq.add(tmpBQ,BooleanClause.Occur.SHOULD);
				
				tmpBQ = new BooleanQuery();
				tmpBQ.add(new TermQuery((new Term("Code","4"))),BooleanClause.Occur.MUST);
				tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD);
				tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD);
				bq.add(tmpBQ,BooleanClause.Occur.SHOULD);
				
				
				
				Sort sort = new Sort(SortField.FIELD_SCORE);		//按相关度来排序
				TopFieldDocs tdocs = search.search(bq,null,4,sort);
				
				ScoreDoc scoreDocs[] = tdocs.scoreDocs;
				ir = search.getIndexReader();
				for(int i=0;i<scoreDocs.length;i++){
					Document tmpDoc = ir.document(scoreDocs[i].doc);
					System.out.println("文档得分:"+scoreDocs[i].score+"\t产品编号:"+tmpDoc.get

("Code")+"\t产品名称:"+tmpDoc.get("Name")+"\t产品说明:"+tmpDoc.get("Info"));
				}
				
			}catch(Exception ex){
				ex.printStackTrace();
			}finally{
				try{
					if(search != null){
						search.close();
					}
					
					if(ir != null){
						ir.close();
					}
				}catch(Exception ex){
					ex.printStackTrace();
				}
				
			}
		}
		
		
	}
}


测试环境是  windowsXP + jdk1.6 与 lucene2.4.
实现后排序的结果应该是这样


后记
    以上那种方面需要说明的是,如果出现了大量的付费会员的时候,即,需要BooleanClause.Occur.SHOULD很多用户编号的时候,建议不要再用些方法了,同时BooleanQuery条件中会限制BooleanQuery.add的条件数量,默认是1024个,当然也可以手动设置它的最大上限,通过setMaxClauseCount(int maxCount),因为按我以上那种方法,当注册用户数很多,比如10W或者以上的时候,搜索时性能就会成为瓶颈了。这个时候,我现在想出的方法就是更改Lucene索引的字段,不要再指定付费用户去SHOULD每一个付费用户了,改为增加一个字段去表示是否注册用户.然后再用上边类似方法就可以比较高效率地实现这个功能了,最原先那个种方法是头脑发热的时候想出来的,那时付费的用户并不多,所以就算这样实现起来都没有问题,不过现在也已经改为到第二种方法了,在暂时无法改变lucene源代码实现与算法的时候,那只能改变自己的方法了.

你可能感兴趣的:(apache,工作,算法,Lucene)