ES6.8.6 创建索引配置分词器、映射字段指定分词器、查询数据高亮显示分词结果(内置分词器、icu、ik、pinyin分词器)

文章目录

    • ES环境
    • 内置分词器,以`simple分词器`示例查询
      • 创建索引`simple_news`,修改分词器为`simple`
      • 插入模拟数据
      • 分词查询:返回通过分词查询到的结果、高亮分词
      • 分词匹配:写一次示例,其他分词和匹配思路基本一致
        • 第一步:对输入值`三毛 我愿一生流浪 天才作家`进行分词
        • 第二步:被匹配到的`title="我愿一生流浪 | 三毛《撒哈拉的故事"`值进行分词
        • 查询词和`title`值匹配结果高亮
    • icu分词器
      • 创建索引`icu_news`,修改分词器为`icu_analyzer`
      • 插入模拟数据
      • 查询,返回通过分词查询到的结果、高亮的分词
    • ik分词器,以`ik_max_word`模式示例查询
      • 创建索引`ik_news`,修改分词器为`ik_max_word`
      • 插入模拟数据
      • 查询,返回通过分词查询到的结果、高亮的分词
    • pinyin分词器
      • 创建索引`pinyin_news`
      • 插入模拟数据
      • 查询,返回通过分词查询到的结果、高亮的分词
    • 参考链接

了解分词器是如何分词?分词结果是什么?可以看上一篇文章:

【ES6.8.6 分词器安装&使用、查询分词结果(内置分词器、icu、ik、pinyin分词器)-CSDN博客】

ES环境

  • elasticsearch6.8.6版本:已安装ik分词器、icu分词器、pinyin分词器(分词器版本要和es版本一致)
  • postman测试工具
  • 视图工具elasticsearch-head(https://github.com/mobz/elasticsearch-head)

注!

  • 以下postman截图中{{domain}}等于 http://127.0.0.1:9200
  • 以下全部用default配置默认分词器
"settings": {
    "analysis": {
        "analyzer": {
            "default": {
                # 修改默认分词器,使用内置或者三方分词器提供的默认名称: icu_analyzer、ik_smart、pinyin等	
                "type": "simple"
            }
        }
    }
}

        如果需要了解自定义分词器配置,可以看:

        【ES6.8.6 为索引映射(Mapping)创建自定义分词器,测试分词匹配效果-CSDN博客】

内置分词器,以simple分词器示例查询

创建索引simple_news,修改分词器为simple

ES6.8.6 创建索引配置分词器、映射字段指定分词器、查询数据高亮显示分词结果(内置分词器、icu、ik、pinyin分词器)_第1张图片

# 创建索引命令
PUT /simple_news

参数:
{
    "settings": {
        "analysis": {
            "analyzer": {
                "default": {
                    # 修改默认分词器
                    "type": "simple"
                }
            }
        }
    },
    "mapping": {
        "_doc": {
            "properties": {
                "id": {
                    "type": "long"
                },
                "title": {
                    "type": "text",
                    # 为字段指定分词器
                    "analyzer": "simple"
                },
                "uv": {
                    "type": "long"
                },
                "create_date": {
                    "type": "date"
                },
                "status": {
                    "type": "int"
                },
                "remark": {
                    "type": "text",
                    # 为字段指定分词器
                    "analyzer": "simple"
                }
            }
        }
    }
}

插入模拟数据

ES6.8.6 创建索引配置分词器、映射字段指定分词器、查询数据高亮显示分词结果(内置分词器、icu、ik、pinyin分词器)_第2张图片

# 批量新增
POST /simple_news/_doc/_bulk

参数

{"index": {"_id": 1}}
{"id":1,"title":"三毛:她把短暂的一生,活成了十世","uv":120,"create_date":"2024-01-15","status":1,"remark":"来源百度搜索"}
{"index": {"_id": 2}}
{"id":2,"title":"我愿一生流浪 | 三毛《撒哈拉的故事","uv":99,"create_date":"2024-01-14","status":1,"remark":"来源知乎搜索"}
{"index": {"_id": 3}}
{"id":3,"title":"离世33年仍是“华语顶流”,三毛“珍贵录音”揭露人生真相:世界是对的,但我也没错!","uv":80,"create_date":"2024-01-15","status":1,"remark":"来源搜狐"}
{"index": {"_id": 4}}
{"id":4,"title":"三毛逝世30周年丨一场与三毛穿越时空的对话","uv":150,"create_date":"2024-01-16","status":1,"remark":"来源澎湃新闻"}
{"index": {"_id": 5}}
{"id":5,"title":"三毛:从自闭少女到天才作家","uv":141,"create_date":"2024-01-18","status":1,"remark":"来源光明网"}
{"index": {"_id": 6}}
{"id":6,"title":"超全整理!三毛最出名的11本著作,没读过的一定要看看","uv":200,"create_date":"2024-01-23","status":1,"remark":"来源知乎搜索"}
{"index": {"_id": 7}}
{"id":7,"title":"三毛的英文名为什么叫Echo?","uv":300,"create_date":"2024-01-21","status":1,"remark":"来源百度知道"}
{"index": {"_id": 8}}
{"id":8,"title":"毛国家统计局发布第三季度贸易数据","uv":50,"create_date":"2024-01-23","status":1,"remark":"来源中华人民共和国商务部"}
{"index": {"_id": 9}}
{"id":9,"title":"网易公布2022年第三季度财报|净收入|毛利润","uv":131,"create_date":"2024-01-22","status":1,"remark":"来源网易科技"}
{"index": {"_id": 10}}
{"id":10,"title":"单季盈利超100亿元!比亚迪三季度毛利率超特斯拉","uv":310,"create_date":"2024-01-23","status":1,"remark":"来源新浪财经"}
# 最后要有一空行
# 批量新增数据要以换行分割

插入的数据:
ES6.8.6 创建索引配置分词器、映射字段指定分词器、查询数据高亮显示分词结果(内置分词器、icu、ik、pinyin分词器)_第3张图片

分词查询:返回通过分词查询到的结果、高亮分词

【es官网,高亮查询配置】

ES6.8.6 创建索引配置分词器、映射字段指定分词器、查询数据高亮显示分词结果(内置分词器、icu、ik、pinyin分词器)_第4张图片

# 查询索引simple_news内数据
POST /simple_news/_search

# 参数

{
    "query": {
        "bool": {
            "must": {
                "match": {
                    # 查询的字段title:值会被simple进行分词,然后到表格中匹配数据
                    "title": "三毛 我愿一生流浪 天才作家"
                }
            }
        }
    },
    # 高亮结果显示配置
    "highlight": {
        # 需要高亮显示的字段
        "fields": {
            "remark": {},
            "title": {}
        },
        # 设置分段的数量不做限制
        "number_of_fragments": 0,
        # 根据哪个分词配到到的结果进行高亮显示
        "post_tags": [
            ""
        ],
        "pre_tags": [
            ""
        ],
        "require_field_match": false,
        "type": "plain"
    },
    "from": 0,
    "size": 10000,
    "sort": [],
    "aggs": {}
}

查询结果和返回高亮的词:

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 4,
        "max_score": 1.9616584,
        "hits": [
            {
                "_index": "simple_news",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.9616584,
                "_source": {
                    "id": 2,
                    "title": "我愿一生流浪 | 三毛《撒哈拉的故事",
                    "uv": 99,
                    "create_date": "2024-01-14",
                    "status": 1,
                    "remark": "来源知乎搜索"
                },
                "highlight": {
                    "title": [
                        "我愿一生流浪 | 三毛《撒哈拉的故事"
                    ]
                }
            },
            {
                "_index": "simple_news",
                "_type": "_doc",
                "_id": "5",
                "_score": 1.3112576,
                "_source": {
                    "id": 5,
                    "title": "三毛:从自闭少女到天才作家",
                    "uv": 141,
                    "create_date": "2024-01-18",
                    "status": 1,
                    "remark": "来源光明网"
                },
                "highlight": {
                    "title": [
                        "三毛:从自闭少女到天才作家"
                    ]
                }
            },
            {
                "_index": "simple_news",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.5754429,
                "_source": {
                    "id": 1,
                    "title": "三毛:她把短暂的一生,活成了十世",
                    "uv": 120,
                    "create_date": "2024-01-15",
                    "status": 1,
                    "remark": "来源百度搜索"
                },
                "highlight": {
                    "title": [
                        "三毛:她把短暂的一生,活成了十世"
                    ]
                }
            },
            {
                "_index": "simple_news",
                "_type": "_doc",
                "_id": "3",
                "_score": 0.2876821,
                "_source": {
                    "id": 3,
                    "title": "离世33年仍是“华语顶流”,三毛“珍贵录音”揭露人生真相:世界是对的,但我也没错!",
                    "uv": 80,
                    "create_date": "2024-01-15",
                    "status": 1,
                    "remark": "来源搜狐"
                },
                "highlight": {
                    "title": [
                        "离世33年仍是“华语顶流”,三毛“珍贵录音”揭露人生真相:世界是对的,但我也没错!"
                    ]
                }
            }
        ]
    }
}

分词匹配:写一次示例,其他分词和匹配思路基本一致

以上面的查询结果为例,截图如下:

ES6.8.6 创建索引配置分词器、映射字段指定分词器、查询数据高亮显示分词结果(内置分词器、icu、ik、pinyin分词器)_第5张图片

第一步:对输入值三毛 我愿一生流浪 天才作家进行分词

ES6.8.6 创建索引配置分词器、映射字段指定分词器、查询数据高亮显示分词结果(内置分词器、icu、ik、pinyin分词器)_第6张图片

# 查询分词结果
GET /simple_news/_analyze

# 参数
{
    "analyzer": "simple",
    "text": "三毛 我愿一生流浪 天才作家"
}

分词结果为:三毛我愿一生流浪天才作家

	{
    "tokens": [
        {
            "token": "三毛",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
        },
        {
            "token": "我愿一生流浪",
            "start_offset": 3,
            "end_offset": 9,
            "type": "word",
            "position": 1
        },
        {
            "token": "天才作家",
            "start_offset": 10,
            "end_offset": 14,
            "type": "word",
            "position": 2
        }
    ]
}
第二步:被匹配到的title="我愿一生流浪 | 三毛《撒哈拉的故事"值进行分词

        参考对输入的分词,以同样的步骤,对我愿一生流浪 | 三毛《撒哈拉的故事进行分词

ES6.8.6 创建索引配置分词器、映射字段指定分词器、查询数据高亮显示分词结果(内置分词器、icu、ik、pinyin分词器)_第7张图片

# 查询分词结果
GET /simple_news/_analyze

# 参数
{
    "analyzer": "simple",
    "text": "我愿一生流浪 | 三毛《撒哈拉的故事"
}

分词结果为:我愿一生流浪三毛撒哈拉的故事

{
    "tokens": [
        {
            "token": "我愿一生流浪",
            "start_offset": 0,
            "end_offset": 6,
            "type": "word",
            "position": 0
        },
        {
            "token": "三毛",
            "start_offset": 9,
            "end_offset": 11,
            "type": "word",
            "position": 1
        },
        {
            "token": "撒哈拉的故事",
            "start_offset": 12,
            "end_offset": 18,
            "type": "word",
            "position": 2
        }
    ]
}
查询词和title值匹配结果高亮

        综合前面两步对三毛 我愿一生流浪 天才作家我愿一生流浪 | 三毛《撒哈拉的故事的分词结果。
        所以被匹配到的词有三毛我愿一生流浪,所以高亮的词(被匹配到的词)就是这两个:

image.png

icu分词器

创建索引icu_news,修改分词器为icu_analyzer

postman请求:
ES6.8.6 创建索引配置分词器、映射字段指定分词器、查询数据高亮显示分词结果(内置分词器、icu、ik、pinyin分词器)_第8张图片
请求命令:

PUT {{domain}}/icu_news

# 参数
{
    "settings": {
        "analysis": {
            "analyzer": {
                "default": {
                    # 修改分词器名称
                    "type": "icu_analyzer"
                }
            }
        }
    },
    "mapping": {
        "_doc": {
            "properties": {
                "id": {
                    "type": "long"
                },
                "title": {
                    "type": "text",
                    # 修改分词器
                    "analyzer": "icu_analyzer"
                },
                "uv": {
                    "type": "long"
                },
                "create_date": {
                    "type": "date"
                },
                "status": {
                    "type": "int"
                },
                "remark": {
                    "type": "text",
                    "analyzer": "icu_analyzer"
                }
            }
        }
    }
}

插入模拟数据

POST {{domain}}/icu_news/_doc/_bulk

# 参数

{"index": {"_id": 1}}
{"id":1,"title":"三毛:她把短暂的一生,活成了十世","uv":120,"create_date":"2024-01-15","status":1,"remark":"来源百度搜索"}
{"index": {"_id": 2}}
{"id":2,"title":"我愿一生流浪 | 三毛《撒哈拉的故事","uv":99,"create_date":"2024-01-14","status":1,"remark":"来源知乎搜索"}
{"index": {"_id": 3}}
{"id":3,"title":"离世33年仍是“华语顶流”,三毛“珍贵录音”揭露人生真相:世界是对的,但我也没错!","uv":80,"create_date":"2024-01-15","status":1,"remark":"来源搜狐"}
{"index": {"_id": 4}}
{"id":4,"title":"三毛逝世30周年丨一场与三毛穿越时空的对话","uv":150,"create_date":"2024-01-16","status":1,"remark":"来源澎湃新闻"}
{"index": {"_id": 5}}
{"id":5,"title":"三毛:从自闭少女到天才作家","uv":141,"create_date":"2024-01-18","status":1,"remark":"来源光明网"}
{"index": {"_id": 6}}
{"id":6,"title":"超全整理!三毛最出名的11本著作,没读过的一定要看看","uv":200,"create_date":"2024-01-23","status":1,"remark":"来源知乎搜索"}
{"index": {"_id": 7}}
{"id":7,"title":"三毛的英文名为什么叫Echo?","uv":300,"create_date":"2024-01-21","status":1,"remark":"来源百度知道"}
{"index": {"_id": 8}}
{"id":8,"title":"毛国家统计局发布第三季度贸易数据","uv":50,"create_date":"2024-01-23","status":1,"remark":"来源中华人民共和国商务部"}
{"index": {"_id": 9}}
{"id":9,"title":"网易公布2022年第三季度财报|净收入|毛利润","uv":131,"create_date":"2024-01-22","status":1,"remark":"来源网易科技"}
{"index": {"_id": 10}}
{"id":10,"title":"单季盈利超100亿元!比亚迪三季度毛利率超特斯拉","uv":310,"create_date":"2024-01-23","status":1,"remark":"来源新浪财经"}


数据内容同上,除索引名称、分词器不一样外,其他一样:

ES6.8.6 创建索引配置分词器、映射字段指定分词器、查询数据高亮显示分词结果(内置分词器、icu、ik、pinyin分词器)_第9张图片

查询,返回通过分词查询到的结果、高亮的分词

POST {{domain}}/icu_news/_search

# 参数

{
    "query": {
        "bool": {
            "must": {
                "match": {
                    # 查询文本同上,分词方式不同
                    "title": "三毛 我愿一生流浪 天才作家"
                }
            }
        }
    },
    "highlight": {
        "fields": {
            "remark": {},
            "title": {}
        },
        "number_of_fragments": 0,
        "post_tags": [
            ""
        ],
        "pre_tags": [
            ""
        ],
        "require_field_match": false,
        "type": "plain"
    },
    "from": 0,
    "size": 10000,
    "sort": [],
    "aggs": {}
}

分词查询结果:
        通过icu分词器的分词查询,天才作家也进行了高亮,说明通过天才作家匹配到了记录。

{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 7,
        "max_score": 4.7378397,
        "hits": [
            {
                "_index": "icu_news",
                "_type": "_doc",
                "_id": "2",
                "_score": 4.7378397,
                "_source": {
                    "id": 2,
                    "title": "我愿一生流浪 | 三毛《撒哈拉的故事",
                    "uv": 99,
                    "create_date": "2024-01-14",
                    "status": 1,
                    "remark": "来源知乎搜索"
                },
                "highlight": {
                    "title": [
                        "一生流浪 | 三毛《撒哈拉的故事"
                    ]
                }
            },
            {
                "_index": "icu_news",
                "_type": "_doc",
                "_id": "5",
                "_score": 4.1822214,
                "_source": {
                    "id": 5,
                    "title": "三毛:从自闭少女到天才作家",
                    "uv": 141,
                    "create_date": "2024-01-18",
                    "status": 1,
                    "remark": "来源光明网"
                },
                "highlight": {
                    "title": [
                        "三毛:从自闭少女到天才作家"
                    ]
                }
            },
            {
                "_index": "icu_news",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.83287835,
                "_source": {
                    "id": 1,
                    "title": "三毛:她把短暂的一生,活成了十世",
                    "uv": 120,
                    "create_date": "2024-01-15",
                    "status": 1,
                    "remark": "来源百度搜索"
                },
                "highlight": {
                    "title": [
                        "三毛:她把短暂的一生,活成了十世"
                    ]
                }
            },
            {
                "_index": "icu_news",
                "_type": "_doc",
                "_id": "3",
                "_score": 0.2876821,
                "_source": {
                    "id": 3,
                    "title": "离世33年仍是“华语顶流”,三毛“珍贵录音”揭露人生真相:世界是对的,但我也没错!",
                    "uv": 80,
                    "create_date": "2024-01-15",
                    "status": 1,
                    "remark": "来源搜狐"
                },
                "highlight": {
                    "title": [
                        "离世33年仍是“华语顶流”,三毛“珍贵录音”揭露人生真相:世界是对的,但我也没错!"
                    ]
                }
            },
            {
                "_index": "icu_news",
                "_type": "_doc",
                "_id": "7",
                "_score": 0.19214728,
                "_source": {
                    "id": 7,
                    "title": "三毛的英文名为什么叫Echo?",
                    "uv": 300,
                    "create_date": "2024-01-21",
                    "status": 1,
                    "remark": "来源百度知道"
                },
                "highlight": {
                    "title": [
                        "三毛的英文名为什么叫Echo?"
                    ]
                }
            },
            {
                "_index": "icu_news",
                "_type": "_doc",
                "_id": "4",
                "_score": 0.18085617,
                "_source": {
                    "id": 4,
                    "title": "三毛逝世30周年丨一场与三毛穿越时空的对话",
                    "uv": 150,
                    "create_date": "2024-01-16",
                    "status": 1,
                    "remark": "来源澎湃新闻"
                },
                "highlight": {
                    "title": [
                        "三毛逝世30周年丨一场与三毛穿越时空的对话"
                    ]
                }
            },
            {
                "_index": "icu_news",
                "_type": "_doc",
                "_id": "6",
                "_score": 0.119052075,
                "_source": {
                    "id": 6,
                    "title": "超全整理!三毛最出名的11本著作,没读过的一定要看看",
                    "uv": 200,
                    "create_date": "2024-01-23",
                    "status": 1,
                    "remark": "来源知乎搜索"
                },
                "highlight": {
                    "title": [
                        "超全整理!三毛最出名的11本著作,没读过的一定要看看"
                    ]
                }
            }
        ]
    }
}

ik分词器,以ik_max_word模式示例查询

创建索引ik_news,修改分词器为ik_max_word

PUT {{domain}}/ik_news


# 参数

{
    "settings": {
        "analysis": {
            "analyzer": {
                "default": {
                    # 指定ik最大粒度分词
                    "type": "ik_max_word"
                }
            }
        }
    },
    "mapping": {
        "_doc": {
            "properties": {
                "id": {
                    "type": "long"
                },
                "title": {
                    "type": "text",
                    # 指定ik最大粒度分词
                    "analyzer": "ik_max_word"
                },
                "uv": {
                    "type": "long"
                },
                "create_date": {
                    "type": "date"
                },
                "status": {
                    "type": "int"
                },
                "remark": {
                    "type": "text",
                    # 指定ik最大粒度分词
                    "analyzer": "ik_max_word"
                }
            }
        }
    }
}

插入模拟数据

        (略)数据内容同上,除索引名称、分词器不一样外,其他一样。

查询,返回通过分词查询到的结果、高亮的分词

请求查询命令:

POST {{domain}}/ik_news/_search

# 查询参数

{
    "query": {
        "bool": {
            "must": {
                "match": {
                    "title": "三毛 我愿一生流浪 天才作家"
                }
            }
        }
    },
    "highlight": {
        "fields": {
            "remark": {},
            "title": {}
        },
        "number_of_fragments": 0,
        "post_tags": [
            ""
        ],
        "pre_tags": [
            ""
        ],
        "require_field_match": false,
        "type": "plain"
    },
    "from": 0,
    "size": 10000,
    "sort": [],
    "aggs": {}
}

分词查询结果:
        观察高亮标签,对比icu分词器查询的结果更多了。说明ik_max_word对词的拆分粒度更大。
        尤其可以分析以下_id=8的数据,和原本应该查的关系关联度是非常低的。(风马牛不相及)

{
    "took": 13,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 10,
        "max_score": 6.2969456,
        "hits": [
            {
                "_index": "ik_news",
                "_type": "_doc",
                "_id": "2",
                "_score": 6.2969456,
                "_source": {
                    "id": 2,
                    "title": "我愿一生流浪 | 三毛《撒哈拉的故事",
                    "uv": 99,
                    "create_date": "2024-01-14",
                    "status": 1,
                    "remark": "来源知乎搜索"
                },
                "highlight": {
                    "title": [
                        "一生流浪 | 三毛《撒哈拉的故事"
                    ]
                }
            },
            {
                "_index": "ik_news",
                "_type": "_doc",
                "_id": "5",
                "_score": 4.8469224,
                "_source": {
                    "id": 5,
                    "title": "三毛:从自闭少女到天才作家",
                    "uv": 141,
                    "create_date": "2024-01-18",
                    "status": 1,
                    "remark": "来源光明网"
                },
                "highlight": {
                    "title": [
                        "三毛:从自闭少女到天才作家"
                    ]
                }
            },
            {
                "_index": "ik_news",
                "_type": "_doc",
                "_id": "1",
                "_score": 2.5841205,
                "_source": {
                    "id": 1,
                    "title": "三毛:她把短暂的一生,活成了十世",
                    "uv": 120,
                    "create_date": "2024-01-15",
                    "status": 1,
                    "remark": "来源百度搜索"
                },
                "highlight": {
                    "title": [
                        "三毛:她把短暂的一生,活成了十世"
                    ]
                }
            },
            {
                "_index": "ik_news",
                "_type": "_doc",
                "_id": "3",
                "_score": 0.8630463,
                "_source": {
                    "id": 3,
                    "title": "离世33年仍是“华语顶流”,三毛“珍贵录音”揭露人生真相:世界是对的,但我也没错!",
                    "uv": 80,
                    "create_date": "2024-01-15",
                    "status": 1,
                    "remark": "来源搜狐"
                },
                "highlight": {
                    "title": [
                        "离世33年仍是“华语顶流”,三毛“珍贵录音”揭露人生真相:世界是对的,但我也没错!"
                    ]
                }
            },
            {
                "_index": "ik_news",
                "_type": "_doc",
                "_id": "4",
                "_score": 0.6511617,
                "_source": {
                    "id": 4,
                    "title": "三毛逝世30周年丨一场与三毛穿越时空的对话",
                    "uv": 150,
                    "create_date": "2024-01-16",
                    "status": 1,
                    "remark": "来源澎湃新闻"
                },
                "highlight": {
                    "title": [
                        "三毛逝世30周年丨场与三毛穿越时空的对话"
                    ]
                }
            },
            {
                "_index": "ik_news",
                "_type": "_doc",
                "_id": "7",
                "_score": 0.55606395,
                "_source": {
                    "id": 7,
                    "title": "三毛的英文名为什么叫Echo?",
                    "uv": 300,
                    "create_date": "2024-01-21",
                    "status": 1,
                    "remark": "来源百度知道"
                },
                "highlight": {
                    "title": [
                        "三毛的英文名为什么叫Echo?"
                    ]
                }
            },
            {
                "_index": "ik_news",
                "_type": "_doc",
                "_id": "6",
                "_score": 0.5000325,
                "_source": {
                    "id": 6,
                    "title": "超全整理!三毛最出名的11本著作,没读过的一定要看看",
                    "uv": 200,
                    "create_date": "2024-01-23",
                    "status": 1,
                    "remark": "来源知乎搜索"
                },
                "highlight": {
                    "title": [
                        "超全整理!三毛最出名的11本著作,没读过的定要看看"
                    ]
                }
            },
            {
                "_index": "ik_news",
                "_type": "_doc",
                "_id": "8",
                "_score": 0.45885387,
                "_source": {
                    "id": 8,
                    "title": "毛国家统计局发布第三季度贸易数据",
                    "uv": 50,
                    "create_date": "2024-01-23",
                    "status": 1,
                    "remark": "来源中华人民共和国商务部"
                },
                "highlight": {
                    "title": [
                        "国家统计局发布第季度贸易数据"
                    ]
                }
            },
            {
                "_index": "ik_news",
                "_type": "_doc",
                "_id": "9",
                "_score": 0.42383182,
                "_source": {
                    "id": 9,
                    "title": "网易公布2022年第三季度财报|净收入|毛利润",
                    "uv": 131,
                    "create_date": "2024-01-22",
                    "status": 1,
                    "remark": "来源网易科技"
                },
                "highlight": {
                    "title": [
                        "网易公布2022年第季度财报|净收入|利润"
                    ]
                }
            },
            {
                "_index": "ik_news",
                "_type": "_doc",
                "_id": "10",
                "_score": 0.09917182,
                "_source": {
                    "id": 10,
                    "title": "单季盈利超100亿元!比亚迪三季度毛利率超特斯拉",
                    "uv": 310,
                    "create_date": "2024-01-23",
                    "status": 1,
                    "remark": "来源新浪财经"
                },
                "highlight": {
                    "title": [
                        "单季盈利超100亿元!比亚迪季度毛利率超特斯拉"
                    ]
                }
            }
        ]
    }
}

pinyin分词器

创建索引pinyin_news

PUT {{domain}}/pinyin_news

# 创建参数
{
    "settings": {
        "index": {
            "number_of_shards": "5",
            "number_of_replicas": "1"
        },
        "analysis": {
            "analyzer": {
                "default": {
                    # 拼音分词器默认名称
                    "type": "pinyin"
                }
            }
        }
    },
    "mapping": {
        "_doc": {
            "properties": {
                "id": {
                    "type": "long"
                },
                "title": {
                    "type": "text",
                    "analyzer": "pinyin"
                },
                "uv": {
                    "type": "long"
                },
                "create_date": {
                    "type": "date"
                },
                "status": {
                    "type": "int"
                },
                "remark": {
                    "type": "text",
                    "analyzer": "pinyin"
                }
            }
        }
    }
}

插入模拟数据

        (略)数据内容同上,除索引名称、分词器不一样外,其他一样。

查询,返回通过分词查询到的结果、高亮的分词

请求查询的命令:

POST {{domain}}/pinyin_news/_search

# 查询参数,pingyin

{
    "query": {
        "bool": {
            "must": {
                "match": {
                    # 查询参数写拼音,也能查到title字段是汉字的结果
                    "title": "sanmao woyuanyishengliulang tiancaizuojia"
                }
            }
        }
    },
    "highlight": {
        "fields": {
            "remark": {},
            "title": {}
        },
        "number_of_fragments": 0,
        "post_tags": [
            ""
        ],
        "pre_tags": [
            ""
        ],
        "require_field_match": false,
        "type": "plain"
    },
    "from": 0,
    "size": 10000,
    "sort": [],
    "aggs": {}
}

分词查询结果:
        通过拼音同样能查询到es中的中文记录,但是,pinyin分词对高亮结果的返回可能不太支持,全部都是空标签。

{
    "took": 123,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 10,
        "max_score": 6.046854,
        "hits": [
            {
                "_index": "pinyin_news",
                "_type": "_doc",
                "_id": "2",
                "_score": 6.046854,
                "_source": {
                    "id": 2,
                    "title": "我愿一生流浪 | 三毛《撒哈拉的故事",
                    "uv": 99,
                    "create_date": "2024-01-14",
                    "status": 1,
                    "remark": "来源知乎搜索"
                },
                "highlight": {
                    "remark": [
                        "来源知乎搜索"
                    ],
                    "title": [
                        "我愿一生流浪 | 三毛《撒哈拉的故事"
                    ]
                }
            },
            {
                "_index": "pinyin_news",
                "_type": "_doc",
                "_id": "5",
                "_score": 4.6167893,
                "_source": {
                    "id": 5,
                    "title": "三毛:从自闭少女到天才作家",
                    "uv": 141,
                    "create_date": "2024-01-18",
                    "status": 1,
                    "remark": "来源光明网"
                },
                "highlight": {
                    "remark": [
                        "来源光明网"
                    ],
                    "title": [
                        "三毛:从自闭少女到天才作家"
                    ]
                }
            },
            {
                "_index": "pinyin_news",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.7759907,
                "_source": {
                    "id": 1,
                    "title": "三毛:她把短暂的一生,活成了十世",
                    "uv": 120,
                    "create_date": "2024-01-15",
                    "status": 1,
                    "remark": "来源百度搜索"
                },
                "highlight": {
                    "remark": [
                        "来源百度搜索"
                    ],
                    "title": [
                        "三毛:她把短暂的一生,活成了十世"
                    ]
                }
            },
            {
                "_index": "pinyin_news",
                "_type": "_doc",
                "_id": "10",
                "_score": 1.6479323,
                "_source": {
                    "id": 10,
                    "title": "单季盈利超100亿元!比亚迪三季度毛利率超特斯拉",
                    "uv": 310,
                    "create_date": "2024-01-23",
                    "status": 1,
                    "remark": "来源新浪财经"
                },
                "highlight": {
                    "remark": [
                        "来源新浪财经"
                    ],
                    "title": [
                        "单季盈利超100亿元!比亚迪三季度毛利率超特斯拉"
                    ]
                }
            },
            {
                "_index": "pinyin_news",
                "_type": "_doc",
                "_id": "3",
                "_score": 1.4564657,
                "_source": {
                    "id": 3,
                    "title": "离世33年仍是“华语顶流”,三毛“珍贵录音”揭露人生真相:世界是对的,但我也没错!",
                    "uv": 80,
                    "create_date": "2024-01-15",
                    "status": 1,
                    "remark": "来源搜狐"
                },
                "highlight": {
                    "remark": [
                        "来源搜狐"
                    ],
                    "title": [
                        "离世33年仍是“华语顶流”,三毛“珍贵录音”揭露人生真相:世界是对的,但我也没错!"
                    ]
                }
            },
            {
                "_index": "pinyin_news",
                "_type": "_doc",
                "_id": "8",
                "_score": 1.352735,
                "_source": {
                    "id": 8,
                    "title": "毛国家统计局发布第三季度贸易数据",
                    "uv": 50,
                    "create_date": "2024-01-23",
                    "status": 1,
                    "remark": "来源中华人民共和国商务部"
                },
                "highlight": {
                    "remark": [
                        "来源中华人民共和国商务部"
                    ],
                    "title": [
                        "毛国家统计局发布第三季度贸易数据"
                    ]
                }
            },
            {
                "_index": "pinyin_news",
                "_type": "_doc",
                "_id": "6",
                "_score": 1.3015552,
                "_source": {
                    "id": 6,
                    "title": "超全整理!三毛最出名的11本著作,没读过的一定要看看",
                    "uv": 200,
                    "create_date": "2024-01-23",
                    "status": 1,
                    "remark": "来源知乎搜索"
                },
                "highlight": {
                    "remark": [
                        "来源知乎搜索"
                    ],
                    "title": [
                        "超全整理!三毛最出名的11本著作,没读过的一定要看看"
                    ]
                }
            },
            {
                "_index": "pinyin_news",
                "_type": "_doc",
                "_id": "9",
                "_score": 1.2533218,
                "_source": {
                    "id": 9,
                    "title": "网易公布2022年第三季度财报|净收入|毛利润",
                    "uv": 131,
                    "create_date": "2024-01-22",
                    "status": 1,
                    "remark": "来源网易科技"
                },
                "highlight": {
                    "remark": [
                        "来源网易科技"
                    ],
                    "title": [
                        "网易公布2022年第三季度财报|净收入|毛利润"
                    ]
                }
            },
            {
                "_index": "pinyin_news",
                "_type": "_doc",
                "_id": "4",
                "_score": 0.5007427,
                "_source": {
                    "id": 4,
                    "title": "三毛逝世30周年丨一场与三毛穿越时空的对话",
                    "uv": 150,
                    "create_date": "2024-01-16",
                    "status": 1,
                    "remark": "来源澎湃新闻"
                },
                "highlight": {
                    "remark": [
                        "来源澎湃新闻"
                    ],
                    "title": [
                        "三毛逝世30周年丨一场与三毛穿越时空的对话"
                    ]
                }
            },
            {
                "_index": "pinyin_news",
                "_type": "_doc",
                "_id": "7",
                "_score": 0.3807567,
                "_source": {
                    "id": 7,
                    "title": "三毛的英文名为什么叫Echo?",
                    "uv": 300,
                    "create_date": "2024-01-21",
                    "status": 1,
                    "remark": "来源百度知道"
                },
                "highlight": {
                    "remark": [
                        "来源百度知道"
                    ],
                    "title": [
                        "三毛的英文名为什么叫Echo?"
                    ]
                }
            }
        ]
    }
}

参考链接

  1. ElasticSearch 的使用-高亮查询
  2. 使用Postman操作es的_bulk批量导入操作_es批量 _bulk postman-CSDN博客

你可能感兴趣的:(#,elasticsearch,es6,分词器,分词器配置,分词结果匹配,分词查询)