ES搜索——全拼&首拼前缀匹配mapping设计

1、创建索引,并设计mapping

全拼和首拼需要分两个字段。一开始想要用一个字段解决,结果怎么弄都无法满足需求。

PUT aikg_test
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword",
        "fields": {
          "full_pinyin": {
            "type": "text",
            "store": false,
            "term_vector": "with_offsets",
            "analyzer": "full_pinyin_analyzer",
            "boost": 10
          },
          "first_pinyin": {
            "type": "text",
            "store": false,
            "term_vector": "with_offsets",
            "analyzer": "first_pinyin_analyzer",
            "boost": 10
          }
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "first_pinyin_analyzer": {
          "tokenizer": "first_pinyin_letter"
        },
        "full_pinyin_analyzer": {
          "tokenizer": "full_pinyin_letter"
        }
      },
      "tokenizer": {
        "first_pinyin_letter": {
          "type": "pinyin",
          "keep_first_letter": true,
          "keep_full_pinyin": false,
          "keep_none_chinese": false, 
          "keep_none_chinese_in_first_letter": true,
          "none_chinese_pinyin_tokenize": false
        },
        "full_pinyin_letter": {
          "type": "pinyin",
          "keep_first_letter": false,
          "keep_full_pinyin": false,
          "keep_none_chinese": true,
          "keep_none_chinese_in_first_letter": false,
          "none_chinese_pinyin_tokenize": false,
          "keep_joined_full_pinyin": true,
          "keep_none_chinese_in_joined_full_pinyin": true
        }
      }
    }
  }
}

2、分词例子

2.1、全拼分词

具体参数设置:

"full_pinyin_letter": {
  "type": "pinyin",
  "keep_first_letter": false,
  "keep_full_pinyin": false,
  "keep_none_chinese": true,
  "keep_none_chinese_in_first_letter": false,
  "none_chinese_pinyin_tokenize": false,
  "keep_joined_full_pinyin": true,
  "keep_none_chinese_in_joined_full_pinyin": true
}

分词:

GET aikg_test/_analyze
{
  "text": ["刘德华at2016"],
  "analyzer": "full_pinyin_analyzer"
}

分词结果:

image.png

关键参数:"keep_joined_full_pinyin": true"keep_none_chinese_in_joined_full_pinyin": true,前者保证汉字全拼连接在一起,后者保证汉字全拼和其他字符连在一起。注意参数:"keep_full_pinyin": false

2.2、首拼分词

具体参数设置:

"first_pinyin_letter": {
  "type": "pinyin",
  "keep_first_letter": true,
  "keep_full_pinyin": false,
  "keep_none_chinese": true,
  "keep_none_chinese_in_first_letter": true,
  "none_chinese_pinyin_tokenize": false
}

分词:

GET aikg_test/_analyze
{
  "text": ["刘德华at2016"],
  "analyzer": "first_pinyin_analyzer"
}

分词结果:

image.png

关键参数:"keep_none_chinese": false,如果该值设置为 true,“刘德华at2016”会拆分为两个词,其中非中文会分成一个词。这种情况下输入 at 前缀匹配,会查询到该词,而实际上该词并不是以 at 开头。分词结果如下图:

image.png

当设置参数"keep_none_chinese_in_first_letter": true,就会把汉字首拼和其他字符连接在一起。

3、大小写问题

当参数为大写“LDH”时,无法匹配到刘德华。解决方法很简单,在程序里把参数统一转为小写。


其他参数详见:GitHub拼音分词插件 elasticsearch-analysis-pinyin

你可能感兴趣的:(ES搜索——全拼&首拼前缀匹配mapping设计)