我看过几本书说关于lucene中的BooleanQuery查询条件的参数
Boolean.Clause.MUST,Boolean.Clause.MUST_NOT,Boolean.Clause.SHOULD之关的关系,其实就好象是集合中的交集并集等关系.这里不重复书的例子,说一点我平时在工作中的应用吧.
书基本都是说,当MUST与SHOULD关联使用的时候,跟MUST使用是一样的,那天我做一个关系搜索结果按相关排序,这里简单说一下需求,这是说用户输入关键字后,应该把所有的付费会员商品按相关度从高到低排序来推荐给用户.现在已经一批付费会员的编号了,同时知道会员用户的产品名称与产品介绍。那应该如何去按照输入的关键字来排序呢?要知道,关键字有可能与付费会员的产品完全不相关,当然,这些不相关产品会排在最后边。
现在贴出相关的代码
package com.lucene.test; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.index.IndexWriter.MaxFieldLength; import org.apache.lucene.search.BooleanClause; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopFieldDocs; /** * * @author kernaling.wong * 本类只是演示should对结果的出现是没有影响的,但是影响结果的相关度,可以利用之来进度某些相关的排序搜索 */ public class LuceneTest { private static String INDEX_PATH = "./Index/"; public static void main(String[] args) { if(false){ //做索引的时候就把 true; IndexWriter writer = null; try{ writer = new IndexWriter(INDEX_PATH,new StandardAnalyzer(),MaxFieldLength.LIMITED); Document doc = new Document(); Field fieldCode = new Field("Code","1",Field.Store.YES,Field.Index.ANALYZED); Field fieldName = new Field("Name","电热炉",Field.Store.YES,Field.Index.ANALYZED); Field fieldInfo = new Field("Info","这是一个电热 炉",Field.Store.YES,Field.Index.ANALYZED); doc.add(fieldCode); doc.add(fieldName); doc.add(fieldInfo); writer.addDocument(doc); doc = new Document(); fieldCode = new Field("Code","2",Field.Store.YES,Field.Index.ANALYZED); fieldName = new Field("Name","电水壶",Field.Store.YES,Field.Index.ANALYZED); fieldInfo = new Field("Info","解放牌电水壶",Field.Store.YES,Field.Index.ANALYZED); doc.add(fieldCode); doc.add(fieldName); doc.add(fieldInfo); writer.addDocument(doc); doc = new Document(); fieldCode = new Field("Code","3",Field.Store.YES,Field.Index.ANALYZED); fieldName = new Field("Name","水杯",Field.Store.YES,Field.Index.ANALYZED); fieldInfo = new Field("Info","钢化水杯",Field.Store.YES,Field.Index.ANALYZED); doc.add(fieldCode); doc.add(fieldName); doc.add(fieldInfo); writer.addDocument(doc); doc = new Document(); fieldCode = new Field("Code","4",Field.Store.YES,Field.Index.ANALYZED); fieldName = new Field("Name","碟子",Field.Store.YES,Field.Index.ANALYZED); fieldInfo = new Field("Info","质量很好",Field.Store.YES,Field.Index.ANALYZED); doc.add(fieldCode); doc.add(fieldName); doc.add(fieldInfo); writer.addDocument(doc); System.out.println("索引建立完成!"); }catch(Exception ex){ ex.printStackTrace(); }finally{ if(writer != null){ try{ writer.close(); }catch(Exception ex){ ex.printStackTrace(); } } } }else{ IndexSearcher search = null; IndexReader ir = null; try{ search = new IndexSearcher(INDEX_PATH); String keyword = "水"; //用户输入了需要搜索的关键字 BooleanQuery bq = new BooleanQuery(); /** * 以下上查询条件,因为无论用户输入的关键字如何都需要全部显示付费会员商品,相关度从高 到低 */ BooleanQuery tmpBQ = new BooleanQuery(); //这是一个 tmpBQ.add(new TermQuery((new Term("Code","1"))),BooleanClause.Occur.MUST); tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD); tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD); bq.add(tmpBQ,BooleanClause.Occur.SHOULD); tmpBQ = new BooleanQuery(); tmpBQ.add(new TermQuery((new Term("Code","2"))),BooleanClause.Occur.MUST); tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD); tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD); bq.add(tmpBQ,BooleanClause.Occur.SHOULD); tmpBQ = new BooleanQuery(); tmpBQ.add(new TermQuery((new Term("Code","3"))),BooleanClause.Occur.MUST); tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD); tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD); bq.add(tmpBQ,BooleanClause.Occur.SHOULD); tmpBQ = new BooleanQuery(); tmpBQ.add(new TermQuery((new Term("Code","4"))),BooleanClause.Occur.MUST); tmpBQ.add(new TermQuery((new Term("Name",keyword))),BooleanClause.Occur.SHOULD); tmpBQ.add(new TermQuery((new Term("Info",keyword))),BooleanClause.Occur.SHOULD); bq.add(tmpBQ,BooleanClause.Occur.SHOULD); Sort sort = new Sort(SortField.FIELD_SCORE); //按相关度来排序 TopFieldDocs tdocs = search.search(bq,null,4,sort); ScoreDoc scoreDocs[] = tdocs.scoreDocs; ir = search.getIndexReader(); for(int i=0;i
测试环境是 windowsXP + jdk1.6 与 lucene2.4.
实现后排序的结果应该是这样
后记
以上那种方面需要说明的是,如果出现了大量的付费会员的时候,即,需要BooleanClause.Occur.SHOULD很多用户编号的时候,建议不要再用些方法了,同时BooleanQuery条件中会限制BooleanQuery.add的条件数量,默认是1024个,当然也可以手动设置它的最大上限,通过setMaxClauseCount(int maxCount),因为按我以上那种方法,当注册用户数很多,比如10W或者以上的时候,搜索时性能就会成为瓶颈了。这个时候,我现在想出的方法就是更改Lucene索引的字段,不要再指定付费用户去SHOULD每一个付费用户了,改为增加一个字段去表示是否注册用户.然后再用上边类似方法就可以比较高效率地实现这个功能了,最原先那个种方法是头脑发热的时候想出来的,那时付费的用户并不多,所以就算这样实现起来都没有问题,不过现在也已经改为到第二种方法了,在暂时无法改变lucene源代码实现与算法的时候,那只能改变自己的方法了.