Elasticsearch2.4.4自定义词典&同义词配置

自定义词典:

一、添加词典

mkdir -p elasticsearch-2.4.4/plugins/analysis-ik/config/custom

vi elasticsearch-2.4.4/plugins/analysis-ik/config/custom/ext_word.txt

博世
bosch

注意事项:

1,每个单词一行

2,编码为utf-8 无bom

二、修改ik配置

"1.0" encoding="UTF-8"?>

"http://java.sun.com/dtd/properties.dtd">

    IK Analyzer 扩展配置

    

    "ext_dict">custom/ext_word.dic;custom/single_word_low_freq.dic

     

    "ext_stopwords">custom/ext_stopword.dic

    

    

    

    

三、重启es

同义词配置:

一、添加词典:

mkdir -p elasticsearch-2.4.4/config/analysis

vi elasticsearch-2.4.4/config/analysis/synonym.txt

博世,bosch

注意事项:

1,每行一组同义词,以逗号分隔

2,编码为utf-8 无bom

3,修改synonym.txt后需要重启es

二、索引配置修改

新建业务索引2_syn,添加同义词过滤器synonym_filter

setting设置如下:

{

  "index": {

    "analysis": {

      "filter": {

        "light_english_stemmer": {

          "type""stemmer",

          "language""light_english"

        },

        "special_character_spliter": {

          "type""word_delimiter",

          "preserve_original""true"

        },

        "synonym_filter": {

          "type""synonym",

          "synonyms_path""analysis/synonym.txt"

        }

      },

      "analyzer": {

        "charSplit": {

          "filter": [

            "lowercase",

            "synonym_filter"

          ],

          "type""custom",

          "tokenizer""ngram_tokenizer"

        },

        "optik_smart": {

          "filter": [

            "lowercase",

            "light_english_stemmer",

            "special_character_spliter",

            "synonym_filter"

          ],

          "type""custom",

          "tokenizer""ik_smart"

        },

        "optik": {

          "filter": [

            "lowercase",

            "light_english_stemmer",

            "special_character_spliter",

            "synonym_filter"

          ],

          "type""custom",

          "tokenizer""ik"

        }

      },

      "tokenizer": {

        "ngram_tokenizer": {

          "token_chars": [

            "letter",

            "digit",

            "punctuation"

          ],

          "min_gram""1",

          "type""nGram",

          "max_gram""30"

        }

      }

    }

  }

}

 

三、测试同义词

GET   /2_syn/_analyze?analyzer=optik&pretty=true&text=博世

结果:

{

  "tokens": [

    {

      "token""博世",

      "start_offset"0,

      "end_offset"2,

      "type""CN_WORD",

      "position"0

    },

    {

      "token""bosch",

      "start_offset"0,

      "end_offset"2,

      "type""SYNONYM",

      "position"0

    }

  ]

}

 

四、数据迁移

使用reindex api迁移数据

POST _reindex

{

  "source": {

    "index""2"

  },

  "dest": {

    "index""2_syn"

  }

}

 

问题:

1,修改同义词词典synonym.txt 需要重启es

2,ik无法正确分词的token无法找到同义词,需要配合自定义词库使用

你可能感兴趣的:(Elasticsearch2.4.4自定义词典&同义词配置)