solr拼写检查(spellCheck)

拼写检查(spellCheck)

首先配置 solrconfig.xml,文件可能已经有这两个元素(如果没有添加即可),需要根据我们自己的系统环境做些适当的修改。

  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
      <str name="name">default</str>
      <!--这里指明需要根据哪个字段的索引为依据进行拼写检查。现配置 名为 name 的字段-->
      <str name="field">name</str>
      <!--拼写检查索引的目录-->
      <str name="spellcheckIndexDir">spellchecker</str>
      <!--当commit的时候,对拼写检查索引进行构建。(只有构建后,拼写检查才有效果)-->
      <!--当然,也可以选择在optimize的时候,进行构建。那么只需要将"buildOnCommint"换为 "buildOnOptimize"-->
      <str name="buildOnCommit">true</str>
    </lst>
  </searchComponent>

  <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <!--默认参数-->
    <lst name="defaults">
      <str name="spellcheck.onlyMorePopular">false</str>
      <str name="spellcheck.extendedResults">false</str>
      <!--配置拼写检查提示结果的个数(可以根据需要适当加大)-->
      <str name="spellcheck.count">1</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>

 

 

配置完之后,需要重新建遍索引才能有效。然后我们这以请求 http://localhost:8080/solr/spell?q=name:王麻字&spellcheck=true
 查询如果如下:

<?xml version="1.0" encoding="UTF-8"?>
  <response>
    <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">0</int>
    </lst>
    <result name="response" numFound="0" start="0"/>
    <lst name="spellcheck">
      <lst name="suggestions">
        <lst name="王麻字">
          <int name="numFound">1</int>
          <int name="startOffset">0</int>
          <int name="endOffset">3</int>
          <arr name="suggestion">
            <str>王麻子</str>
          </arr>
        </lst>
      </lst>
    </lst>
  </response>

 

 

有时候我们需要以多个字段为依据进行拼写检查,但上面的配置只能设一个字段。为了达到同样的效果,
我能只能另行其道了。需要用到 coptyField 技术。比如我们在 schema.xml 中定义了

<field name="a" .../> 
<field name="b" .../>
<field name="ab" multiValued="true" .../>
<copyField source="a" dest="ab" /> 
<copyField source="b" dest="ab" />

 

 


然后配置 SpellCheckComponent 的字段为 ab 即可。


要作用Solr的SpellCheck功能,需要以下配置:
1. 在solrConfig.xml最后加入以下片段:

<!-- spell -->

<searchComponent name="spellcheck" class="org.apache.solr.handler.component.SpellCheckComponent">



    <lst name="spellchecker">

      <!--

           Optional, it is required when more than one spellchecker is configured.

           Select non-default name with spellcheck.dictionary in request handler.

name是可选的,如果只有一个spellchecker可以不写name

如果有多个spellchecker,需要在Request Handler中指定spellcheck.dictionary

      -->

      <str name="name">default</str>

      <!-- The classname is optional, defaults to IndexBasedSpellChecker -->

      <str name="classname">solr.IndexBasedSpellChecker</str>

      <!--

               Load tokens from the following field for spell checking,

               analyzer for the field's type as defined in schema.xml are used

下面这个field名字指的是拼写检查的依据,也就是说要根据哪个Field来检查用户输入。

      -->

      <str name="field">name_t</str>

      <!-- Optional, by default use in-memory index (RAMDirectory) 

SpellCheck索引文件的存放位置,是可选的,如果不写默认使用内存模式RAMDirectory。

./spellchecker1指的是:corex\data\spellchecker1

-->

      <str name="spellcheckIndexDir">./spellchecker1</str>

      <!-- Set the accuracy (float) to be used for the suggestions. Default is 0.5 -->

      <str name="accuracy">0.7</str>

<!--何时创建拼写索引:buildOnCommit/buildOnOptimize -->

   <str name="buildOnCommit">true</str>

    </lst>

<!-- 另一个拼写检查器,使用JaroWinklerDistance距离算法 -->

    <lst name="spellchecker">

       <str name="name">jarowinkler</str>

       <str name="classname">solr.IndexBasedSpellChecker</str>

       <str name="field">name_t</str>

       <!-- Use a different Distance Measure -->

       <str name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</str>

       <str name="spellcheckIndexDir">./spellchecker2</str>

       <str name="buildOnCommit">true</str>

     </lst>

<!-- 另一个拼写检查器,使用文件内容为检查依据 --> 

     <lst name="spellchecker">

       <str name="classname">solr.FileBasedSpellChecker</str>

       <str name="name">file</str>

       <str name="sourceLocation">spellings.txt</str>

       <str name="characterEncoding">UTF-8</str>

       <str name="spellcheckIndexDir">./spellcheckerFile</str>

       <str name="buildOnCommit">true</str>

     </lst>

    <!-- This field type's analyzer is used by the QueryConverter to tokenize the value for "q" parameter 

-->

    <str name="queryAnalyzerFieldType">text</str>



</searchComponent>

<!--

  The SpellingQueryConverter to convert raw (CommonParams.Q) queries into tokens.  Uses a simple regular expression

  to strip off field markup, boosts, ranges, etc. but it is not guaranteed to match an exact parse from the query parser.



  Optional, defaults to solr.SpellingQueryConverter

-->

<queryConverter name="queryConverter" class="solr.SpellingQueryConverter"/>



<!--  Add to a RequestHandler

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

NOTE:  YOU LIKELY DO NOT WANT A SEPARATE REQUEST HANDLER FOR THIS COMPONENT.  THIS IS DONE HERE SOLELY FOR

THE SIMPLICITY OF THE EXAMPLE.  YOU WILL LIKELY WANT TO BIND THE COMPONENT TO THE /select STANDARD REQUEST HANDLER.

下面这个Handler不是必需的,写在这里只是一个简单的例子,可以把相应的设置放到Standard request handler中就可以。

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

-->

<requestHandler name="/spell" class="solr.SearchHandler">

    <lst name="defaults">

      <!-- Optional, must match spell checker's name as defined above, defaults to "default" -->

      <str name="spellcheck.dictionary">file</str>

      <!-- omp = Only More Popular -->

      <str name="spellcheck.onlyMorePopular">true</str>

      <!-- exr = Extended Results -->

      <str name="spellcheck.extendedResults">true</str>

      <!--  The number of suggestions to return -->

      <str name="spellcheck.count">1</str>

    </lst>

<!--  Add to a RequestHandler

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

REPEAT NOTE:  YOU LIKELY DO NOT WANT A SEPARATE REQUEST HANDLER FOR THIS COMPONENT.  THIS IS DONE HERE SOLELY FOR

THE SIMPLICITY OF THE EXAMPLE.  YOU WILL LIKELY WANT TO BIND THE COMPONENT TO THE /select STANDARD REQUEST HANDLER.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

-->

    <arr name="last-components">

      <str>spellcheck</str>

    </arr>

  </requestHandler>

 

 


2. 如果使用File的方式,需要在spell.txt中加入相应的拼写建议,每个拼写建议占一行。

3. 配置文件修改完后,需要重做索引,这样会在索引目录里出现下面的目录

 

 


分别对应拼写组件中每个SpellChecker对应的索引文件。

4. 在需要拼写检查的页面加入如下方法:

/**

 * 

 * get spell suggestion from core

 *

 *

 * @param keyword

 * @param coreName

 * @return

 * @throws Exception

 */

private Collection<String> getSpellCheckFromCore(String keyword,String coreName) throws Exception {

  Collection<String> suggestion = new ArrayList<String>();

    CoreContainer container = SearchManager.getCoreContainer();

  SolrCore core = container.getCore(coreName);

    SearchComponent speller = core.getSearchComponent("spellcheck");



    ModifiableSolrParams params = new ModifiableSolrParams();

    params.add(CommonParams.QT, "/spell");

    params.add(SpellCheckComponent.SPELLCHECK_BUILD, "true");

    params.add(CommonParams.Q, keyword);

    params.add(SpellCheckComponent.COMPONENT_NAME, "true");

    params.add(SpellCheckComponent.SPELLCHECK_COLLATE, "true");



    SolrRequestHandler handler = core.getRequestHandler("/spell");

    SolrQueryResponse rsp = new SolrQueryResponse();

    rsp.add("responseHeader", new SimpleOrderedMap());

    handler.handleRequest(new LocalSolrQueryRequest(core, params), rsp);

    NamedList values = rsp.getValues();

    NamedList spellCheck = (NamedList) values.get("spellcheck");

    NamedList suggestions = (NamedList) spellCheck.get("suggestions");

    Boolean correctlySpelled = (Boolean) suggestions.get("correctlySpelled");

    if(correctlySpelled == null){

     String collation = (String) suggestions.get("collation");

     suggestion.add(collation);

    }

    

    return suggestion;

}

 

 

可以把返回的结果直接显示到页面相应的地方。

你可能感兴趣的:(check)