今天用通讯录演示ES检索功能,在对姓名检索时,想实现中文和拼音均可检索,于是除之前常用的中文分词器ik外,又下载了拼音分词器pinyin,使用情况总结如下:
ik:https://github.com/medcl/elasticsearch-analysis-ik
pinyin:https://github.com/medcl/elasticsearch-analysis-pinyin
将下载的文件解压后放入es文件夹plugins下,可新建ik,pinyin文件夹;
其中pinyin分词器我不知为何无法直接下载zip文件,所以是下载的源码然后打包,再解压后放入plugins/pinyin下
GET _analyze?pretty
{
"analyzer": "pinyin",
"text": "刘德华"
}
结果:
{
"tokens": [
{
"token": "liu",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "de",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 1
},
{
"token": "hua",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
},
{
"token": "ldh",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
}
]
}
在模板setting中分词器的配置
"analysis" : {
"analyzer" : {
"ik" : {
"tokenizer" : "ik_max_word"
},
"pinyin_analyzer" : {
"tokenizer" : "my_pinyin"
}
},
"tokenizer" : {
"my_pinyin" : {
"keep_separate_first_letter" : "false",
"lowercase" : "true",
"type" : "pinyin",
"limit_first_letter_length" : "16",
"remove_duplicated_term" : "true",
"keep_original" : "true",
"keep_full_pinyin" : "true",
"keep_joined_full_pinyin":"true",
"keep_none_chinese_in_joined_full_pinyin":"true"
}
}
}
其中my_pinyin中配置项在https://github.com/medcl/elasticsearch-analysis-pinyin文档中有说明,可根据自己需求进行配置。
可以在一个属性中设置多个分词器fields:
"mappings": {
"doc": {
"properties": {
"PERSON_ENAME": {
"type" : "text",
"fields" : {
"ik" : {"type" : "text", "analyzer" :"ik"},
"english": { "type":"text","analyzer": "english"},
"standard" : {"type" : "text"}
}
},
"CONTACTER_NAME": {
"type" : "text",
"fields" : {
"ik" : {"type" : "text", "analyzer" :"ik"},
"pinyin": { "type":"text","analyzer": "pinyin_analyzer"},
"standard" : {"type" : "text"}
}
}
}
}
}
在多个字段中查询
POST sim/doc/_search
{
"query": {
"multi_match" : {
"query" : "dfbb",
"fields" : [
"PERSON_ENAME.ik",
"PERSON_ENAME.standard",
"PERSON_ENAME.english",
"CONTACTER_NAME.ik",
"CONTACTER_NAME.standard",
"CONTACTER_NAME.pinyin"]
}
}
}