Elasticsearch整理之mapping的参数

目录

一、Mapping的参数

1. analyzer

2. normalizer

3. boost

4. coerce

5. copy_to

6. doc_values

7. dynamic

8. enable

9. fielddata

10. format

11. ignore_above

12. ignore_malformed

13. index

14. index_options

15. fields

16. norms

17. null_value

18. position_increment_gap

19. search_analyzer

20. similarity

21. store

22. term_vector


一、Mapping的参数

1. analyzer

分词器可以在query中定义、field中定义、index中定义

PUT /my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "text": { 
          "type": "text",
          "fields": {
            "english": { 
              "type":     "text",
              "analyzer": "english"
            }
          }
        }
      }
    }
  }
}

GET my_index/_analyze 
{
  "field": "text", //使用stardard分析器
  "text": "The quick Brown Foxes."  // return [ the, quick, brown, foxes ].
}

GET my_index/_analyze 
{
  "field": "text.english", //使用english分析器
  "text": "The quick Brown Foxes."  //[ quick, brown, fox ]
}

2. normalizer

normalizer用于解析前的标准化配置,比如把所有的字符转化为小写等。

PUT index
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

PUT index/_doc/1
{
  "foo": "BÀR"
}

PUT index/_doc/2
{
  "foo": "bar"
}

PUT index/_doc/3
{
  "foo": "baz"
}

POST index/_refresh

GET index/_search
{
  "query": {
    "term": {
      "foo": "BAR"
    }
  }
}

// BAR经过normalizer后会转化为bar,因此文档1和文档2都会被检索到
GET index/_search
{
  "query": {
    "match": {
      "foo": "BAR"
    }
  }
}

3. boost

用于设置字段的权重,默认值为1

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "boost": 2 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

4. coerce

coerce属性用于清除脏数据,默认值是true。比如整型数字5有可能会被写成字符串“5”或者浮点数5.0。开启coerce属性可以清洗:

      字符串会被转换为整数

      浮点数被转换为整数

5. copy_to

 可以使多个字段合并成一个字段。比如,first_name和last_name可以合并为full_name字段

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "first_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "first_name": "John",
  "last_name": "Smith"
}

GET my_index/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

6. doc_values

默认开启,如果不需要对字段进行排序或聚合,或者从脚本访问字段值,则可以将其设为false以节省磁盘空间

7. dynamic

要不要自动添加新字段。默认为true。值为false时,会忽略新字段;值为strict时,会引发异常。

PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic": false, 
      "properties": {
        "user": { 
          "properties": {
            "name": {
              "type": "text"
            },
            "social_networks": { 
              "dynamic": true,
              "properties": {}
            }
          }
        }
      }
    }
  }
}

8. enable

有些字段我们只想存储但不想对其索引,可以将该字段设为false。设为false后该字段只能从_source中获取,但是不可搜。

9. fielddata

https://www.elastic.co/guide/en/elasticsearch/reference/6.3/fielddata.html

10. format

format主要用来格式化日期,具体格式见https://www.elastic.co/guide/en/elasticsearch/reference/6.3/mapping-date-format.html

11. ignore_above

该字段用来指明字段的最大长度,超过该长度将不会被index或store

12. ignore_malformed

该字段可以忽略不规则数据,默认为false

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": {
          "type": "integer",
          "ignore_malformed": true
        },
        "number_two": {
          "type": "integer"
        }
      }
    }
  }
}

// 添加成功,因为开启了ignore_malformed字段
PUT my_index/my_type/1
{
  "text":       "Some text value",
  "number_one": "foo" 
}

// 添加失败,因为未开启
PUT my_index/my_type/2
{
  "text":       "Some text value",
  "number_two": "foo" 
}

13. index

该属性指定字段是否被索引,默认为true

14. index_options

 index_options指出哪些信息被加到倒排索引中

docs 只有文档编号被加入
freqs 文档编号和词的频率被加入
positions 文档编号、词的频率、词的位置被加入
offsets 文档编号、词的频率、词的位置、词项开始和结束的字符位置被加入

15. fields

fields可以让同一字段有多种不同的索引方式,比如一个String类型的字段,可以使用text做全文检索,使用keyword做聚合和排序。

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "city": {
          "type": "text",
          "fields": {
            "raw": { 
              "type":  "keyword"
            }
          }
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "city": "New York"
}

PUT my_index/_doc/2
{
  "city": "York"
}

GET my_index/_search
{
  "query": {
    "match": {
      "city": "york" 
    }
  },
  "sort": {
    "city.raw": "asc" 
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw" 
      }
    }
  }
}

 

16. norms

对评分很有用,但会消耗大量磁盘空间,默认不开启

17. null_value

默认情况下值为null的字段不被index和search,该参数可以让值为null的字段变得可index和search

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "status_code": {
          "type":       "keyword",
          "null_value": "NULL" 
        }
      }
    }
  }
}

// 值为null,可以被搜索到
PUT my_index/_doc/1
{
  "status_code": null
}
// 值为空,不是null,不可以被搜索到
PUT my_index/_doc/2
{
  "status_code": [] 
}

GET my_index/_search
{
  "query": {
    "term": {
      "status_code": "NULL" 
    }
  }
}

18. position_increment_gap

https://www.elastic.co/guide/en/elasticsearch/reference/6.3/position-increment-gap.html

19. search_analyzer

通常,应在索引和搜索时使用相同的分析器,以确保查询中的术语与反向索引中的属于具有相同的格式。但有时也需要使用不同的分析器,例如在使用 edge_ngram 进行自动补全时。

默认情况下,查询将使用analyzer字段制定的分析器,但也可以被search_analyzer覆盖

PUT my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "text": {
          "type": "text",
          "analyzer": "autocomplete", 
          "search_analyzer": "standard" 
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "text": "Quick Brown Fox" 
}

GET my_index/_search
{
  "query": {
    "match": {
      "text": {
        "query": "Quick Br", 
        "operator": "and"
      }
    }
  }
}

20. similarity

指定文档的评分模型,参数由"BM25"(默认), "classic"(TF/IDF), "boolean"(布尔评分模型)

21. store

默认情况下,field values是可索引和搜索的,但是它们不被存储。这意味着这些field可以被查询,但是原始的field value不能被获取。

不过这没关系,因为_source字段中已经默认保存了一份文档,所以可以从设置_source字段中来取。

在某些情况下,store参数也是有意义的,比如一个文档里面有title、date和一个超大的content字段,我们可能只想获取title和date,这种情况可以这样设置

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "store": true 
        },
        "date": {
          "type": "date",
          "store": true 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "title":   "Some short title",
  "date":    "2015-01-01",
  "content": "A very long content field..."
}

GET my_index/_search
{
  "stored_fields": [ "title", "date" ] 
}

22. term_vector

https://www.elastic.co/guide/en/elasticsearch/reference/6.3/term-vector.html

 

你可能感兴趣的:(elasticsearch)