es 基础概念总结 —— 自定义分析器

一、分析器 analyzer

包括

1.字符过滤器 character filter

比如去除HTML标记,或者转化“&”为“and”

2.分词器 tokenizer

比如按空格分词

3.词单元标准化过滤器 token filter

如大小写转换,去掉停用词,增加同义词

 

二、内置分析器

标准分析器

根据单词边界分词,去标点符号,转小写

GET _analyze
{
  "analyzer": "standard",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}

 

简单分析器

根据非字母切分,非字母去除,转小写

GET _analyze
{
  "analyzer": "simple",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}

 

stop 分析器

根据非字母切分,非字母去除,转小写,停用词

GET _analyze
{
  "analyzer": "stop",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}

 

空格分析器

依据空格切分,不转换小写

GET _analyze
{
  "analyzer": "whitespace",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}

 

正则分析器

默认为非字符符号(\w+)分隔,转小写

GET _analyze
{
  "analyzer": "pattern",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}

 

keyword 分析器

不分词

GET _analyze
{
  "analyzer": "keyword",
  "text": "2 running Quick brown-foxes leap over lazy dogs in the summer evening."
}

 

三、自定义 analyzer

字符过滤器

# html_strip
POST _analyze
{
  "tokenizer": "keyword",
  "char_filter": ["html_strip"],
  "text": "hello world"
}

# 映射替换
POST _analyze
{
  "tokenizer": "standard", 
  "char_filter": [
    {
      "type": "mapping",
      "mappings": ["-=>_"]
    }
  ], 
  "text": "123-456, i-test"
}

# 正则替换
POST _analyze
{
  "tokenizer": "standard",
  "char_filter": [
    {
      "type": "pattern_replace",
      "pattern": "http://(.*)",
      "replacement": "$1"
    }
  ],
  "text": "http://www.elastic.co"
}

 分词器

# 路径分词
POST _analyze
{
  "tokenizer": "path_hierarchy",
  "text": "/usr/abc/efg"
}

# 空格分词
POST _analyze
{
  "tokenizer": "whitespace",
  "filter": ["stop"],
  "text": "The rain in Spain falls mainly on the plain."
}

 词单元过滤器

# 转小写、停用词去除
POST _analyze
{
  "tokenizer": "whitespace",
  "filter": ["lowercase", "stop"],
  "text": "The rain in Spain falls mainly on the plain."
}

 

 

相关阅读


https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer-anatomy.html#_character_filters

你可能感兴趣的:(es 基础概念总结 —— 自定义分析器)