拼写检查(spellCheck)
首先配置 solrconfig.xml,文件可能已经有这两个元素(如果没有添加即可),需要根据我们自己的系统环境做些适当的修改。
<searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <lst name="spellchecker"> <str name="name">default</str> <!--这里指明需要根据哪个字段的索引为依据进行拼写检查。现配置 名为 name 的字段--> <str name="field">name</str> <!--拼写检查索引的目录--> <str name="spellcheckIndexDir">spellchecker</str> <!--当commit的时候,对拼写检查索引进行构建。(只有构建后,拼写检查才有效果)--> <!--当然,也可以选择在optimize的时候,进行构建。那么只需要将"buildOnCommint"换为 "buildOnOptimize"--> <str name="buildOnCommit">true</str> </lst> </searchComponent> <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy"> <!--默认参数--> <lst name="defaults"> <str name="spellcheck.onlyMorePopular">false</str> <str name="spellcheck.extendedResults">false</str> <!--配置拼写检查提示结果的个数(可以根据需要适当加大)--> <str name="spellcheck.count">1</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler>
配置完之后,需要重新建遍索引才能有效。然后我们这以请求 http://localhost:8080/solr/spell?q=name:王麻字&spellcheck=true
查询如果如下:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> </lst> <result name="response" numFound="0" start="0"/> <lst name="spellcheck"> <lst name="suggestions"> <lst name="王麻字"> <int name="numFound">1</int> <int name="startOffset">0</int> <int name="endOffset">3</int> <arr name="suggestion"> <str>王麻子</str> </arr> </lst> </lst> </lst> </response>
有时候我们需要以多个字段为依据进行拼写检查,但上面的配置只能设一个字段。为了达到同样的效果,
我能只能另行其道了。需要用到 coptyField 技术。比如我们在 schema.xml 中定义了
<field name="a" .../> <field name="b" .../> <field name="ab" multiValued="true" .../> <copyField source="a" dest="ab" /> <copyField source="b" dest="ab" />
然后配置 SpellCheckComponent 的字段为 ab 即可。
要作用Solr的SpellCheck功能,需要以下配置:
1. 在solrConfig.xml最后加入以下片段:
<!-- spell --> <searchComponent name="spellcheck" class="org.apache.solr.handler.component.SpellCheckComponent"> <lst name="spellchecker"> <!-- Optional, it is required when more than one spellchecker is configured. Select non-default name with spellcheck.dictionary in request handler. name是可选的,如果只有一个spellchecker可以不写name 如果有多个spellchecker,需要在Request Handler中指定spellcheck.dictionary --> <str name="name">default</str> <!-- The classname is optional, defaults to IndexBasedSpellChecker --> <str name="classname">solr.IndexBasedSpellChecker</str> <!-- Load tokens from the following field for spell checking, analyzer for the field's type as defined in schema.xml are used 下面这个field名字指的是拼写检查的依据,也就是说要根据哪个Field来检查用户输入。 --> <str name="field">name_t</str> <!-- Optional, by default use in-memory index (RAMDirectory) SpellCheck索引文件的存放位置,是可选的,如果不写默认使用内存模式RAMDirectory。 ./spellchecker1指的是:corex\data\spellchecker1 --> <str name="spellcheckIndexDir">./spellchecker1</str> <!-- Set the accuracy (float) to be used for the suggestions. Default is 0.5 --> <str name="accuracy">0.7</str> <!--何时创建拼写索引:buildOnCommit/buildOnOptimize --> <str name="buildOnCommit">true</str> </lst> <!-- 另一个拼写检查器,使用JaroWinklerDistance距离算法 --> <lst name="spellchecker"> <str name="name">jarowinkler</str> <str name="classname">solr.IndexBasedSpellChecker</str> <str name="field">name_t</str> <!-- Use a different Distance Measure --> <str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str> <str name="spellcheckIndexDir">./spellchecker2</str> <str name="buildOnCommit">true</str> </lst> <!-- 另一个拼写检查器,使用文件内容为检查依据 --> <lst name="spellchecker"> <str name="classname">solr.FileBasedSpellChecker</str> <str name="name">file</str> <str name="sourceLocation">spellings.txt</str> <str name="characterEncoding">UTF-8</str> <str name="spellcheckIndexDir">./spellcheckerFile</str> <str name="buildOnCommit">true</str> </lst> <!-- This field type's analyzer is used by the QueryConverter to tokenize the value for "q" parameter --> <str name="queryAnalyzerFieldType">text</str> </searchComponent> <!-- The SpellingQueryConverter to convert raw (CommonParams.Q) queries into tokens. Uses a simple regular expression to strip off field markup, boosts, ranges, etc. but it is not guaranteed to match an exact parse from the query parser. Optional, defaults to solr.SpellingQueryConverter --> <queryConverter name="queryConverter" class="solr.SpellingQueryConverter"/> <!-- Add to a RequestHandler !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NOTE: YOU LIKELY DO NOT WANT A SEPARATE REQUEST HANDLER FOR THIS COMPONENT. THIS IS DONE HERE SOLELY FOR THE SIMPLICITY OF THE EXAMPLE. YOU WILL LIKELY WANT TO BIND THE COMPONENT TO THE /select STANDARD REQUEST HANDLER. 下面这个Handler不是必需的,写在这里只是一个简单的例子,可以把相应的设置放到Standard request handler中就可以。 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! --> <requestHandler name="/spell" class="solr.SearchHandler"> <lst name="defaults"> <!-- Optional, must match spell checker's name as defined above, defaults to "default" --> <str name="spellcheck.dictionary">file</str> <!-- omp = Only More Popular --> <str name="spellcheck.onlyMorePopular">true</str> <!-- exr = Extended Results --> <str name="spellcheck.extendedResults">true</str> <!-- The number of suggestions to return --> <str name="spellcheck.count">1</str> </lst> <!-- Add to a RequestHandler !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! REPEAT NOTE: YOU LIKELY DO NOT WANT A SEPARATE REQUEST HANDLER FOR THIS COMPONENT. THIS IS DONE HERE SOLELY FOR THE SIMPLICITY OF THE EXAMPLE. YOU WILL LIKELY WANT TO BIND THE COMPONENT TO THE /select STANDARD REQUEST HANDLER. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! --> <arr name="last-components"> <str>spellcheck</str> </arr> </requestHandler>
2. 如果使用File的方式,需要在spell.txt中加入相应的拼写建议,每个拼写建议占一行。
3. 配置文件修改完后,需要重做索引,这样会在索引目录里出现下面的目录
分别对应拼写组件中每个SpellChecker对应的索引文件。
4. 在需要拼写检查的页面加入如下方法:
/** * * get spell suggestion from core * * * @param keyword * @param coreName * @return * @throws Exception */ private Collection<String> getSpellCheckFromCore(String keyword,String coreName) throws Exception { Collection<String> suggestion = new ArrayList<String>(); CoreContainer container = SearchManager.getCoreContainer(); SolrCore core = container.getCore(coreName); SearchComponent speller = core.getSearchComponent("spellcheck"); ModifiableSolrParams params = new ModifiableSolrParams(); params.add(CommonParams.QT, "/spell"); params.add(SpellCheckComponent.SPELLCHECK_BUILD, "true"); params.add(CommonParams.Q, keyword); params.add(SpellCheckComponent.COMPONENT_NAME, "true"); params.add(SpellCheckComponent.SPELLCHECK_COLLATE, "true"); SolrRequestHandler handler = core.getRequestHandler("/spell"); SolrQueryResponse rsp = new SolrQueryResponse(); rsp.add("responseHeader", new SimpleOrderedMap()); handler.handleRequest(new LocalSolrQueryRequest(core, params), rsp); NamedList values = rsp.getValues(); NamedList spellCheck = (NamedList) values.get("spellcheck"); NamedList suggestions = (NamedList) spellCheck.get("suggestions"); Boolean correctlySpelled = (Boolean) suggestions.get("correctlySpelled"); if(correctlySpelled == null){ String collation = (String) suggestions.get("collation"); suggestion.add(collation); } return suggestion; }
可以把返回的结果直接显示到页面相应的地方。