edismax支持boost函数与score相乘作为,而dismax只能使用bf作用效果是相加,所以在处理多个维度排序时,score其实也应该是其中一个维度 ,用相加的方式处理调整麻烦。
而dismax的实现代码逻辑比较简单,看起来比较易理解,edismax是它的加强版,其实是改变了不少。。比如在以下:
先看看dismax的解析主要实现思路:
首先取出搜索字段名qf
将最终解析成一个BooleanQuery
先解析主mainQuery:
看主要代码更清晰:
@Override public Query parse() throws ParseException { SolrParams solrParams = SolrParams.wrapDefaults(localParams, params); queryFields = SolrPluginUtils.parseFieldBoosts(solrParams.getParams(DisMaxParams.QF)); if (0 == queryFields.size()) { queryFields.put(req.getSchema().getDefaultSearchFieldName(), 1.0f); } /* the main query we will execute. we disable the coord because * this query is an artificial construct */ BooleanQuery query = new BooleanQuery(true); boolean notBlank = addMainQuery(query, solrParams); if (!notBlank) return null; addBoostQuery(query, solrParams); addBoostFunctions(query, solrParams); return query; }
edismax的主要实现思路跟dismax差不多,以下是一些主要差别之处:
edismax解析含有+,OR,NOT,-语法时,就会忽略掉使用MM。
以下是主要代码实现:
统计搜索串中+,OR ,NOT,-语法元个数
// defer escaping and only do if lucene parsing fails, or we need phrases // parsing fails. Need to sloppy phrase queries anyway though. List<Clause> clauses = null; int numPluses = 0; int numMinuses = 0; int numOR = 0; int numNOT = 0; clauses = splitIntoClauses(userQuery, false); for (Clause clause : clauses) { if (clause.must == '+') numPluses++; if (clause.must == '-') numMinuses++; if (clause.isBareWord()) { String s = clause.val; if ("OR".equals(s)) { numOR++; } else if ("NOT".equals(s)) { numNOT++; } else if (lowercaseOperators && "or".equals(s)) { numOR++; } } }
boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; if (parsedUserQuery != null && doMinMatched) { String minShouldMatch = solrParams.get(DisMaxParams.MM, "100%"); if (parsedUserQuery instanceof BooleanQuery) { SolrPluginUtils.setMinShouldMatch((BooleanQuery)parsedUserQuery, minShouldMatch); } }
短语查询,先找出普通的查询,原来就是短语查询的、或者属于“OR”,“AND”,“NOT”,’TO‘类型的都不要。由于edismax支持解析符合lucene语法的搜索串,所以不像dismax那样,只需要简单的将搜索串去掉\“,然后加个“”括起来就行
// find non-field clauses
List<Clause>normalClauses =new ArrayList<Clause>(clauses.size());
for (Clauseclause :clauses) {
if (clause.field !=null ||clause.isPhrase)continue;
// check for keywords "AND,OR,TO"
if (clause.isBareWord()) {
String s =clause.val.toString();
// avoid putting explict operators in the phrase query
if ("OR".equals(s) ||"AND".equals(s) ||"NOT".equals(s) || "TO".equals(s))continue;
}
normalClauses.add(clause);
}
// full phrase...
addShingledPhraseQueries(query, normalClauses, phraseFields, 0,
tiebreaker,pslop);
// shingles...
addShingledPhraseQueries(query, normalClauses, phraseFields2, 2,
tiebreaker,pslop);
addShingledPhraseQueries(query, normalClauses, phraseFields3, 3,
tiebreaker,pslop);
////下面是dismax获取短语查询的作法:
protected Query getPhraseQuery(String userQuery, SolrPluginUtils.DisjunctionMaxQueryParser pp) throws ParseException { String userPhraseQuery = userQuery.replace("\"", ""); return pp.parse("\"" + userPhraseQuery + "\""); }
private void addShingledPhraseQueries(final BooleanQuery mainQuery, final List<Clause> clauses, final Map<String,Float> fields, int shingleSize, final float tiebreaker, final int slop) throws ParseException { if (null == fields || fields.isEmpty() || null == clauses || clauses.size() <= shingleSize ) return; if (0 == shingleSize) shingleSize = clauses.size(); final int goat = shingleSize-1; // :TODO: better name for var? StringBuilder userPhraseQuery = new StringBuilder(); for (int i=0; i < clauses.size() - goat; i++) { userPhraseQuery.append('"'); for (int j=0; j <= goat; j++) { userPhraseQuery.append(clauses.get(i + j).val); userPhraseQuery.append(' '); } userPhraseQuery.append('"'); userPhraseQuery.append(' '); } ExtendedSolrQueryParser pp = new ExtendedSolrQueryParser(this, IMPOSSIBLE_FIELD_NAME); pp.addAlias(IMPOSSIBLE_FIELD_NAME, tiebreaker, fields); pp.setPhraseSlop(slop); pp.setRemoveStopFilter(true); // remove stop filter and keep stopwords pp.makeDismax = true; pp.minClauseSize = 2; Query phrase = pp.parse(userPhraseQuery.toString()); if (phrase != null) { mainQuery.add(phrase, BooleanClause.Occur.SHOULD); } }
edismax技术另一个重要的boost查询,
boost查询也是不会影响搜索结果数,但是影响排序,主要作用是将最后得分以相乘的方式作用于score,函数的解析跟bf差不多。
// // create a boosted query (scores multiplied by boosts) // Query topQuery = query; multBoosts = solrParams.getParams("boost"); if (multBoosts!=null && multBoosts.length>0) { List<ValueSource> boosts = new ArrayList<ValueSource>(); for (String boostStr : multBoosts) { if (boostStr==null || boostStr.length()==0) continue; Query boost = subQuery(boostStr, FunctionQParserPlugin.NAME).getQuery(); ValueSource vs; if (boost instanceof FunctionQuery) { vs = ((FunctionQuery)boost).getValueSource(); } else { vs = new QueryValueSource(boost, 1.0f); } boosts.add(vs); } if (boosts.size()>1) { ValueSource prod = new ProductFloatFunction(boosts.toArray(new ValueSource[boosts.size()])); topQuery = new BoostedQuery(query, prod); } else if (boosts.size() == 1) { topQuery = new BoostedQuery(query, boosts.get(0)); } }
它就是简单处理子查询的分值再与函数查询的分值相乘返回 :主要的score方法如下:
public float score() throws IOException { float score = qWeight * scorer.score() * vals.floatVal(scorer.docID()); return score>Float.NEGATIVE_INFINITY ? score : -Float.MAX_VALUE; }
转贴请声明来源:http://blog.csdn.net/duck_genuine/article/details/8060026