ES分词器用法简述

一,分词器char_filter

POST _analyze
{
  "char_filter": [ "html_strip" ],
  "tokenizer": "standard",
  "text": " j am "
}
// character filter
POST _analyze
{
  "char_filter": [ "html_strip" ],
  "tokenizer": "keyword",
  "text": " j am "
}

二,自定义filter

// mapping 替换字符串
POST _analyze
{
  "char_filter": [ 
     {
        "type":"mapping",
        "mappings":["- => &&&", "&&& => %%%%"]
     } 
  ],
  "tokenizer": "keyword",
  "text": "i - haha &&&"
}

三,tokenizer

// path_hierarchy
POST _analyze
{
  "tokenizer": "path_hierarchy",
  "text":"/u/b/c/d/c"
}
// filter:"lowercase","stop"
POST _analyze
{
  "tokenizer": "whitespace",
  "filter":["lowercase","stop"],
  "text":"The dog is a big dog"
}

四,自定义分词

PUT analyzetest
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer":{
            "type":"custom",
            "tokenizer":"hahah",
            "char_filter":[
               "hahacha_filter"
            ],
            "filter":["lowercase","english_stop"]
         }
      },
      "tokenizer": {
        "hahah":{
          "type":"pattern",
          "pattern":"[0-9]"
        }
      },
      "char_filter": {
        "hahacha_filter":{
          "type":"mapping",
          "mappings":["- =>***"]
        }
      },
      "filter": {
        "english_stop":{
          "type":"stop",
          "stopwords":["_english_"]
        }
      }
    }
  }
}

五,使用自定义分词器

GET analyzetest/_analyze
{
“tokenizer”: “my_analyzer”,
“text”: “this is - a english 223232”
}

你可能感兴趣的:(java)