转载:https://blog.csdn.net/gwd1154978352/article/details/83343933
分词器首先看文章:【Elasticsearch】Elasticsearch analyzer 分词器
【Es】ElasticSearch 自定义分词器
Elasticsearch中,内置了很多分词器(analyzers),例如standard (标准分词器)、english(英文分词)和chinese (中文分词),默认的是standard,
standard tokenizer:以单词边界进行切分
standard token filter:什么都不做
lowercase token filter:将所有字母转换为小写
stop token filer(默认被禁用):移除停用词,比如a the it等等
修改分词器设置
启用english,停用词token filter
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"es_std": {
"type": "standard",
"stopwords": "_english_"
}
}
}
}
}
标准分词测试代码
GET /my_index/_analyze
{
"analyzer": "standard",
"text": "a dog is in the house"
}
结果
{
"tokens": [
{
"token": "a",
"start_offset": 0,
"end_offset": 1,
"type": "" ,
"position": 0
},
{
"token": "dog",
"start_offset": 2,
"end_offset": 5,
"type": "" ,
"position": 1
},
{
"token": "is",
"start_offset": 6,
"end_offset": 8,
"type": "" ,
"position": 2
},
{
"token": "in",
"start_offset": 9,
"end_offset": 11,
"type": "" ,
"position": 3
},
{
"token": "the",
"start_offset": 12,
"end_offset": 15,
"type": "" ,
"position": 4
},
{
"token": "house",
"start_offset": 16,
"end_offset": 21,
"type": "" ,
"position": 5
}
]
}
设置的英文分词测试代码
GET /my_index/_analyze
{
"analyzer": "es_std",
"text":"a dog is in the house"
}
结果
{
"tokens": [
{
"token": "dog",
"start_offset": 2,
"end_offset": 5,
"type": "" ,
"position": 1
},
{
"token": "house",
"start_offset": 16,
"end_offset": 21,
"type": "" ,
"position": 5
}
]
}
PUT /my_index
{
"settings": {
"analysis": {
"char_filter": {
"&_to_and": {
"type": "mapping",
"mappings": ["&=> and"]
}
},
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": ["the", "a"]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": ["html_strip", "&_to_and"],
"tokenizer": "standard",
"filter": ["lowercase", "my_stopwords"]
}
}
}
}
}
测试代码
GET /my_index/_analyze
{
"text": "tom&jerry are a friend in the house, , HAHA!!",
"analyzer": "my_analyzer"
}
测试结果
{
"tokens": [
{
"token": "tomandjerry",
"start_offset": 0,
"end_offset": 9,
"type": "" ,
"position": 0
},
{
"token": "are",
"start_offset": 10,
"end_offset": 13,
"type": "" ,
"position": 1
},
{
"token": "friend",
"start_offset": 16,
"end_offset": 22,
"type": "" ,
"position": 3
},
{
"token": "in",
"start_offset": 23,
"end_offset": 25,
"type": "" ,
"position": 4
},
{
"token": "house",
"start_offset": 30,
"end_offset": 35,
"type": "" ,
"position": 6
},
{
"token": "HAHA",
"start_offset": 42,
"end_offset": 46,
"type": "" ,
"position": 7
}
]
}
type中的使用
PUT /my_index/_mapping/my_type
{
"properties": {
"content": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}