ElasticSearch实战(六) 中文分词

前面处理的都是纯英文文本,英文使用空格分词,ES直接可以处理。如果搜索中文则需要另外安装插件。

下载elasticsearch-analysis-ik插件

https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.5.4/elasticsearch-analysis-ik-6.5.4.zip

解压后获得elasticsearch-analysis-ik-6.5.4目录,把该目录复制到ES安装路径下的plugs目录,以我的开发机为例,目录完整结构如下:

ElasticSearch实战(六) 中文分词_第1张图片

重启ES

检查插件是否生效

用浏览器打开 http://localhost:9200/_cat/plugins,显示类似如下内容即为安装成功

sRdVRrd analysis-ik 6.5.4

设置分析器

在Blog类的text属性上增加注解

    @Field(type = FieldType.Text, analyzer = "ik_max_word", searchAnalyzer = "ik_smart")
    private String text;

其中,指定了索引时的分词采用ik_max_word模式,特点是对文本做最细粒度的拆分。指定了搜索时采用ik_smart模式,对文本做最粗粒度的拆分。

重建索引并插入中文测试数据

        elasticsearchTemplate.deleteIndex("website");
        elasticsearchTemplate.createIndex("website");
        elasticsearchTemplate.putMapping(Blog.class);
        List indexQueries = Arrays.asList(
                new IndexQueryBuilder().withObject(new Blog(1, "Mary Jones", "Jane is an expert in her field", 80, parseDate("2019-06-21"))).build(),
                new IndexQueryBuilder().withObject(new Blog(2, "Jane Smith", "I am starting to get the hang of this...", 0, parseDate("2019-06-20"))).build(),
                new IndexQueryBuilder().withObject(new Blog(3, "John Smith", "The Query DSL is really powerful and flexible", 100, parseDate("2019-06-20"))).build(),
                new IndexQueryBuilder().withObject(new Blog(4, "Mary Jones", "Still trying this out...", 0, parseDate("2019-06-20"))).build(),
                new IndexQueryBuilder().withObject(new Blog(5, "Mary Jones", "However did I manage before Elasticsearch?", 200, parseDate("2019-06-19"))).build(),
                new IndexQueryBuilder().withObject(new Blog(6, "Jane Smith", "I like to collect rock albums", 0, parseDate("2019-06-19"))).build(),
                new IndexQueryBuilder().withObject(new Blog(7, "Douglas Fir", "I like to build cabinets", 50, parseDate("2019-06-19"))).build(),
                new IndexQueryBuilder().withObject(new Blog(8, "John Smith", "I love to go rock climbing", 40, parseDate("2019-06-18"))).build(),
                new IndexQueryBuilder().withObject(new Blog(9, "Mary Jones", "I am Mary Jones, welcome to my blog!", 500, parseDate("2019-06-17"))).build(),
                new IndexQueryBuilder().withObject(new Blog(10, "Mary Jones", "My first blog entry", 400, parseDate("2019-06-17"))).build(),
                new IndexQueryBuilder().withObject(new Blog(11, "小明", "试试中文分词", 5, parseDate("2019-06-21"))).build(),
                new IndexQueryBuilder().withObject(new Blog(12, "李三", "我参观了北京大学", 5, parseDate("2019-06-21"))).build()
        );
        elasticsearchTemplate.bulkIndex(indexQueries);

测试

ElasticSearch实战(六) 中文分词_第2张图片

你可能感兴趣的:(Elasticsearch)