实验环境
es 2.1.1 elasticsearch-analysis-pinyin 1.5.2
步骤1
https://github.com/medcl/elasticsearch-analysis-pinyin 源码下载
步骤2
mvn package 后 target/releases/elasticsearch-analysis-pinyin-1.5.2.zip 解压后放到es/plugins/
步骤3
启动es
步骤4
创建一个测试索引
curl -XPUT http://localhost:9200/medcl/ -d' { "index" : { "analysis" : { "analyzer" : { "pinyin_analyzer" : { "tokenizer" : "my_pinyin", "filter" : ["standard"] } }, "tokenizer" : { "my_pinyin" : { "type" : "pinyin", "first_letter" : "none", "padding_char" : " " } } } } }'步骤5
测试集成拼音插件成功
在浏览器输入
http://localhost:9200/medcl/_analyze?text=%e5%88%98%e5%be%b7%e5%8d%8e&analyzer=pinyin_analyzer
返回结果如下,分词成功,该插件能够轻易的得到中文对应的拼音分词,然后搜索可以对输入的拼音提示对应的中文,提高搜索体验。
{"tokens":[{"token":"liu de hua ","start_offset":0,"end_offset":3,"type":"word","position":1}]}步骤6 实现拼音查询类似如图
(1)使用该插件创建索引,实现上图的拼音提示功能
curl -XPOST http://localhost:9200/medcl/_close curl -XPUT http://localhost:9200/medcl/_settings -d' { "index" : { "analysis" : { "analyzer" : { "pinyin_analyzer" : { "tokenizer" : "my_pinyin", "filter" : ["word_delimiter","nGram"] } }, "tokenizer" : { "my_pinyin" : { "type" : "pinyin", "first_letter" : "prefix", "padding_char" : " " } } } } }' curl -XPOST http://localhost:9200/medcl/_open(2)创建 Mapping ,即索引结构以及字段配置
curl -XPOST http://localhost:9200/medcl/folks/_mapping -d' { "folks": { "properties": { "name": { "type": "multi_field", "fields": { "name": { "type": "string", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "pinyin_analyzer", "boost": 10 }, "primitive": { "type": "string", "store": "yes", "analyzer": "keyword" } } } } } }'
(3) 开始索引数据
curl -XPOST http://localhost:9200/medcl/folks/andy -d'{"name":"刘德华"}'
(4) 检索数据
在浏览器一次输入下面的连接,你将会搜索得到上面索引的那个记录:刘德华。
http://localhost:9200/medcl/folks/_search?q=name:刘 http://localhost:9200/medcl/folks/_search?q=name:刘德 http://localhost:9200/medcl/folks/_search?q=name:liu http://localhost:9200/medcl/folks/_search?q=name:ldh http://localhost:9200/medcl/folks/_search?q=name:dehua
Use Pinyin-TokenFilter (contributed by @wangweiwei)
curl -XPUT http://localhost:9200/medcl2/ -d' { "index" : { "analysis" : { "analyzer" : { "user_name_analyzer" : { "tokenizer" : "whitespace", "filter" : "pinyin_filter" } }, "filter" : { "pinyin_filter" : { "type" : "pinyin", "first_letter" : "only", "padding_char" : "" } } } } }'
Token Test:刘德华 张学友 郭富城 黎明 四大天王
curl -XGET http://localhost:9200/medcl2/_analyze?text=%e5%88%98%e5%be%b7%e5%8d%8e+%e5%bc%a0%e5%ad%a6%e5%8f%8b+%e9%83%ad%e5%af%8c%e5%9f%8e+%e9%bb%8e%e6%98%8e+%e5%9b%9b%e5%a4%a7%e5%a4%a9%e7%8e%8b&analyzer=user_name_analyzer {"tokens":[{"token":"ldh","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"zxy","start_offset":4,
另外,还有拼音分词插件还有一些参数可以选配:
first_letter即拼音首字母,可以设置为(默认为none):
prefix , append , only 和none,对应上面“刘德华”的分词效果分别为”ldh liu de hua","liu de hua ldh","ldh","liu de hua"