ElasticSearch学习(四)—— 中文按拼音排序&拼音检索

  • 使用pinyin分词器
  1. 下载拼音分词器https://github.com/medcl/elasticsearch-analysis-pinyin
  2. 解压进入目录elasticsearch-analysis-pinyin
  3. 修改pom.xml中es版本为自己使用的版本
  4. 命令行mvn package 打包
  5. 进入elasticsearch-analysis-pinyin-master\target\releases解压elasticsearch-analysis-pinyin-7.7.0.zip文件
  6. 将解压后的文件拷贝到es安装目录下的 plugins/pinyin 中
  7. 重启es
  8. 修改报错的地方,重复步骤4567
  • 索引设置
PUT /book
{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "pinyin_analyzer": {
                        "tokenizer": "my_pinyin"
                    }
                }, 
                "tokenizer": {
                    "my_pinyin": {
                        "type": "pinyin", 
                        "keep_none_chinese": false,
                        "keep_full_pinyin": false,
                        "keep_joined_full_pinyin": true,
                        "keep_none_chinese_in_joined_full_pinyin": true,
                        "keep_first_letter": false,
                        "keep_none_chinese_in_first_letter": false,
                        "none_chinese_pinyin_tokenize": false
                    }
                }
            }
        }
    }
}

'keep_none_chinese' => false, // 对非中文不拆分词

'keep_full_pinyin' => false, // 关闭: 刘德华 -> liu, de, hua

'keep_joined_full_pinyin' => true, // 刘德华 -> liudehua

'keep_none_chinese_in_joined_full_pinyin' => true, // 刘德华2016 -> liudehua2016

'keep_first_letter' => true, // 刘德华 -> ldh

'keep_none_chinese_in_first_letter' => true, // 刘德华2016 -> ldh2016

'none_chinese_pinyin_tokenize' => false, // 没有卵用

keep_separate_first_letter :将字母分割,例如:刘德华> l,d,h,default:false。

keep_full_pinyin :包含全拼拼音,例如:刘德华> [ liu,de,hua],default:true。

limit_first_letter_length :设置first_letter结果的最大长度,default:16。

lowercase :小写非中文字母,default:true。

keep_none_chinese : 不在结果中保留非中文字母或数字,default:true。

  •  设置字段
POST /book/_mapping
{
    "properties": {
        "title": {
          "type": "text",                              
                 "fields": {                                 
                    "keyword": {                            
                      "type": "keyword",
                      "ignore_above": 256
                    },
                    "sort": {
                      "type": "text",
                      "analyzer": "pinyin_analyzer"
                    }
                }
        },
        "author": {                                        
            "type": "text",                              
            "fields": {                                 
                 "keyword": {                            
                    "type": "keyword",
                    "ignore_above": 256
                },
              "sort": {
                "type": "text",
                "analyzer": "pinyin_analyzer"
              }
            }
        }
    }
}

注意:Only text fields support the analyzer mapping parameter.只有text可以设置分词器

  •  检索
GET /book/_search
{
  "query": {
    "match": {
      "title": "测试"
    }
  },
  "from": 0, 
  "size": 20,
  "sort": {
      "title.sort" : "asc"
    }
}
  • 拼音检索
    • 安装ik分词器步骤https://github.com/medcl/elasticsearch-analysis-ik同拼音分词
    • 安装完成需要重启es
    • 索引设置
PUT /book
{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "ik_smart_pinyin": {
                        "type": "custom",
                        "tokenizer": "ik_smart",
                        "filter": "my_pinyin_filter"
                    },
                    "ik_max_word_pinyin": {
                        "type": "custom",
                        "tokenizer": "ik_max_word",
                        "filter": "my_pinyin_filter"
                    },
                    "pinyin_analyzer": {
                        "tokenizer": "my_pinyin_tokenizer"
                    }
                }, 
                "tokenizer": {
                    "my_pinyin_tokenizer": {
                        "type": "pinyin", 
                        "keep_first_letter": false, 
                        "keep_full_pinyin": false, 
                        "keep_joined_full_pinyin": true, 
                        "keep_none_chinese_in_first_letter": true, 
                        "none_chinese_pinyin_tokenize": false, 
                        "lowercase": true, 
                        "with_tone_number": true
                    }
                },
                "filter": {
                  "my_pinyin_filter": {
                        "type": "pinyin", 
                        "keep_first_letter": false, 
                        "keep_full_pinyin": false, 
                        "keep_joined_full_pinyin": true, 
                        "keep_none_chinese_in_first_letter": true, 
                        "none_chinese_pinyin_tokenize": false, 
                        "lowercase": true, 
                        "with_tone_number": true
                    }
                }
            }
        }
    }
}
  • 字段设置
POST /ancientbook/_mapping
{
    "properties": {
        "title": {                                        
                "type": "text",                     
                "analyzer": "ik_max_word_pinyin",          
                 "fields": {                        
                     "keyword": {                 
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "author": {                                       
                "type": "text",                     
                "analyzer": "ik_max_word_pinyin",          
                 "fields": {                       
                     "keyword": {                   
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            }
    }
}

该配置可以实现中文/拼音检索,但无法按拼音排序。

  • 使用icu分词器
    • 安装插件
      • es安装目录下查看插件./bin/elasticsearch-plugin list
      • es安装目录下./bin/elasticsearch-plugin install analysis-icu
      • 重启es
    • 索引设置
PUT /book

POST /book/_mapping
{
    "properties": {
        "title": {                                        
                "type": "text",                     
                "analyzer": "icu_analyzer",  
                 "fields": {                        
                     "keyword": {                 
                        "type": "keyword",
                        "ignore_above": 256
                    },
                  "sort": {  
                    "type": "icu_collation_keyword",
                    "index": false,
                    "language": "zh",
                    "country": "CN"
                  }
                }
            },
            "author": {                                       
                "type": "text",                     
                "analyzer": "icu_analyzer",          
                 "fields": {                       
                     "keyword": {                   
                        "type": "keyword",
                        "ignore_above": 256
                    },
                  "sort": {  
                    "type": "icu_collation_keyword",
                    "index": false,
                    "language": "zh",
                    "country": "CN"
                  }
                }
            }
    }
}
  • 检索
GET /book/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match_all": {
                      
                    }
                }
            ]
        }
    }, 
    "from": 0, 
    "size": 10, 
    "sort": [
        {
            "title.sort": "asc"
        }
    ]
}
  • 同时需要分词检索和按拼音排序使用的是icu分词器
1.es安装目录下查看插件./bin/elasticsearch-plugin list
2.es安装目录下./bin/elasticsearch-plugin install analysis-icu
3.重启es

 

你可能感兴趣的:(es,elasticsearch)