默认替换掉所有的标记文本
GET /_analyze
{
"tokenizer": "keyword",
"char_filter": [
"html_strip"
],
"text": "I'm so happy!
"
}
忽略b标签,删除其他标签
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "keyword",
"char_filter": [
"my_custom_html_strip_char_filter"
]
}
},
"char_filter": {
"my_custom_html_strip_char_filter": {
"type": "html_strip",
"escaped_tags": [
"b"
]
}
}
}
}
}
GET /_analyze
{
"tokenizer": "keyword",
"char_filter": [
{
"type": "mapping",
"mappings": [
"٠ => 0",
"١ => 1",
"٢ => 2",
"٣ => 3",
"٤ => 4",
"٥ => 5",
"٦ => 6",
"٧ => 7",
"٨ => 8",
"٩ => 9"
]
}
],
"text": "My license plate is ٢٥٠١٥"
}
/*输出结果*/
[ My license plate is 25015 ]
:)
替换成 _happy_
, :(
替换成_sad_
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_mappings_char_filter"
]
}
},
"char_filter": {
"my_mappings_char_filter": {
"type": "mapping",
"mappings": [
":) => _happy_",
":( => _sad_"
]
}
}
}
}
}
正则表达式字符过滤器使用一个正则表达式匹配应该与指定替换字符串替换字符。替换字符串可以引用正则表达式中的捕获组。
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"lowercase"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "(?<=\\p{Lower})(?=\\p{Upper})",
"replacement": " "
}
}
}
},
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
POST my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "The fooBarBaz method"
}
/*输出*/
[ the, foo, bar, baz, method ]