ElasticSearch集成拼音插件

实验环境

es  2.1.1      elasticsearch-analysis-pinyin   1.5.2

步骤1

https://github.com/medcl/elasticsearch-analysis-pinyin  源码下载

步骤2

mvn package   后  target/releases/elasticsearch-analysis-pinyin-1.5.2.zip  解压后放到es/plugins/

步骤3

启动es

步骤4

创建一个测试索引

curl -XPUT http://localhost:9200/medcl/ -d'
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "pinyin_analyzer" : {
                    "tokenizer" : "my_pinyin",
                    "filter" : ["standard"]
                }
            },
            "tokenizer" : {
                "my_pinyin" : {
                    "type" : "pinyin",
                    "first_letter" : "none",
                    "padding_char" : " "
                }
            }
        }
    }
}'
步骤5

测试集成拼音插件成功

            在浏览器输入

http://localhost:9200/medcl/_analyze?text=%e5%88%98%e5%be%b7%e5%8d%8e&analyzer=pinyin_analyzer

            返回结果如下,分词成功,该插件能够轻易的得到中文对应的拼音分词,然后搜索可以对输入的拼音提示对应的中文,提高搜索体验。

{"tokens":[{"token":"liu de hua ","start_offset":0,"end_offset":3,"type":"word","position":1}]}
步骤6 实现拼音查询类似如图


(1)使用该插件创建索引,实现上图的拼音提示功能

curl -XPOST http://localhost:9200/medcl/_close
curl -XPUT http://localhost:9200/medcl/_settings -d'
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "pinyin_analyzer" : {
                    "tokenizer" : "my_pinyin",
                    "filter" : ["word_delimiter","nGram"]
                }
            },
            "tokenizer" : {
                "my_pinyin" : {
                    "type" : "pinyin",
                    "first_letter" : "prefix",
                    "padding_char" : " "
                }
            }
        }
    }
}'
curl -XPOST http://localhost:9200/medcl/_open
 (2)创建 Mapping ,即索引结构以及字段配置

curl -XPOST http://localhost:9200/medcl/folks/_mapping -d'
{
    "folks": {
        "properties": {
            "name": {
                "type": "multi_field",
                "fields": {
                    "name": {
                        "type": "string",
                        "store": "no",
                        "term_vector": "with_positions_offsets",
                        "analyzer": "pinyin_analyzer",
                        "boost": 10
                    },
                    "primitive": {
                        "type": "string",
                        "store": "yes",
                        "analyzer": "keyword"
                    }
                }
            }
        }
    }
}'

        (3) 开始索引数据

curl -XPOST http://localhost:9200/medcl/folks/andy -d'{"name":"刘德华"}'

        (4) 检索数据

        在浏览器一次输入下面的连接,你将会搜索得到上面索引的那个记录:刘德华。

http://localhost:9200/medcl/folks/_search?q=name:刘
http://localhost:9200/medcl/folks/_search?q=name:刘德
http://localhost:9200/medcl/folks/_search?q=name:liu
http://localhost:9200/medcl/folks/_search?q=name:ldh
http://localhost:9200/medcl/folks/_search?q=name:dehua





Use Pinyin-TokenFilter (contributed by @wangweiwei)

curl -XPUT http://localhost:9200/medcl2/ -d'
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "user_name_analyzer" : {
                    "tokenizer" : "whitespace",
                    "filter" : "pinyin_filter"
                }
            },
            "filter" : {
                "pinyin_filter" : {
                    "type" : "pinyin",
                    "first_letter" : "only",
                    "padding_char" : ""
                }
            }
        }
    }
}'

Token Test:刘德华 张学友 郭富城 黎明 四大天王

curl -XGET http://localhost:9200/medcl2/_analyze?text=%e5%88%98%e5%be%b7%e5%8d%8e+%e5%bc%a0%e5%ad%a6%e5%8f%8b+%e9%83%ad%e5%af%8c%e5%9f%8e+%e9%bb%8e%e6%98%8e+%e5%9b%9b%e5%a4%a7%e5%a4%a9%e7%8e%8b&analyzer=user_name_analyzer
{"tokens":[{"token":"ldh","start_offset":0,"end_offset":3,"type":"word","position":1},{"token":"zxy","start_offset":4,

另外,还有拼音分词插件还有一些参数可以选配: 
first_letter即拼音首字母,可以设置为(默认为none): 
prefix , append , only 和none
,对应上面“刘德华”的分词效果分别为”ldh liu de hua","liu de hua ldh","ldh","liu de hua"


你可能感兴趣的:(ElasticSearch)