官方文档
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-params.html
ElasticSearch提供了丰富的映射参数对字段的映射进行参数设计,比如字段的分词器、字段权重、日期格式、检索模型等等。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/analyzer.html
指定分词器(分析器更合理),对索引和查询都有效。如下,指定ik分词的配置
(1)定义索引
DELETE my_index
PUT my_index
(2)ik_smart分词
GET my_index/_analyze
{
"analyzer": "ik_smart",
"text":"安徽省长江流域"
}
{
"tokens": [
{
"token": "安徽省",
"start_offset": 0,
"end_offset": 3,
"type": "CN_WORD",
"position": 0
},
{
"token": "长江流域",
"start_offset": 3,
"end_offset": 7,
"type": "CN_WORD",
"position": 1
}
]
}
(3)定义mapping
POST my_index/fulltext/_mapping
{
"properties": {
"content": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
}
}
}
(4)插入数据
POST my_index/fulltext/1
{"content":"美国留给伊拉克的是个烂摊子吗"}
POST my_index/fulltext/2
{"content":"公安部:各地校车将享最高路权"}
POST my_index/fulltext/3
{"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}
POST my_index/fulltext/4
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
(5)查询
POST /index/fulltext/_search
{
"query" : { "match" : { "content" : "中国" }}
}
查询结果
{
"took": 135,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.6489038,
"hits": [
{
"_index": "index",
"_type": "fulltext",
"_id": "4",
"_score": 0.6489038,
"_source": {
"content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
}
},
{
"_index": "index",
"_type": "fulltext",
"_id": "3",
"_score": 0.2876821,
"_source": {
"content": "中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"
}
}
]
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/normalizer.html
normalizer用于解析前的标准化配置,比如把所有的字符转化为小写等。
DELETE my_index
PUT my_index
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"char_filter": [],
"filter": ["lowercase", "asciifolding"]
}
}
}
},
"mappings": {
"type": {
"properties": {
"foo": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}
PUT my_index/type/1
{"foo": "BÀR"}
PUT my_index/type/2
{"foo": "bar"}
PUT my_index/type/3
{"foo": "baz"}
POST my_index/_refresh
GET my_index/_search
{
"query": {
"match": {
"foo": "BAR"
}
}
}
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.2876821,
"hits": [
{
"_index": "my_index",
"_type": "type",
"_id": "2",
"_score": 0.2876821,
"_source": {
"foo": "bar"
}
},
{
"_index": "my_index",
"_type": "type",
"_id": "1",
"_score": 0.2876821,
"_source": {
"foo": "BÀR"
}
}
]
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-boost.html
官方建议:index time boost is deprecated. Instead, the field mapping boost is applied at query time.
也就是说,官方推荐在查询时指定boost。
我们可以通过指定一个boost值来控制每个查询子句的相对权重,该值默认为1。一个大于1的boost会增加该查询子句的相对权重。
DELETE my_index
put my_index
PUT my_index/my_type/1
{
"title":"quick brown fox"
}
POST _search
{
"query": {
"match" : {
"title": {
"query": "quick brown fox",
"boost": 2
}
}
}
}
查询结果
{
"took": 48,
"timed_out": false,
"_shards": {
"total": 45,
"successful": 45,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.7260926,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 1.7260926,
"_source": {
"title": "quick brown fox"
}
}
]
}
}
boost参数被用来增加一个子句的相对权重(当boost大于1时),或者减小相对权重(当boost介于0到1时),但是增加或者减小不是线性的。换言之,boost设为2并不会让最终的_score加倍。
相反,新的_score会在适用了boost后被归一化(Normalized)。每种查询都有自己的归一化算法(Normalization Algorithm)。但是能够说一个高的boost值会产生一个高的_score。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/coerce.html#coerce
coerce属性用于清除脏数据,coerce的默认值是true。整型数字5有可能会被写成字符串“5”或者浮点数5.0.coerce属性可以用来清除脏数据:
DELETE my_index
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"number_one": {
"type": "integer"
},
"number_two": {
"type": "integer",
"coerce": false
}
}
}
}
}
(2)写入一条测试文档
PUT my_index/my_type/1
{
"number_one": "10"
}
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
(3)写入另一条测试文档
PUT my_index/my_type/2
{
"number_two": "10"
}
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse [number_two]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse [number_two]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Integer value passed as String"
}
},
"status": 400
}
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/copy-to.html
copy_to属性用于配置自定义的_all字段。换言之,就是多个字段可以合并成一个超级字段。比如,first_name和last_name可以合并为full_name字段。
【例子】
(1)
DELETE my_index
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
}
PUT my_index/my_type/1
{
"first_name": "John",
"last_name": "Smith"
}
(2)查询
GET my_index/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
{
"took": 22,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.5753642,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.5753642,
"_source": {
"first_name": "John",
"last_name": "Smith"
}
}
]
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/doc-values.html
doc_values是为了加快排序、聚合操作,在建立倒排索引的时候,额外增加一个列式存储映射,是一个空间换时间的做法。默认是开启的,对于确定不需要聚合或者排序的字段可以关闭。
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"status_code": {
"type": "keyword"
},
"session_id": {
"type": "keyword",
"doc_values": false
}
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/dynamic.html
dynamic属性用于检测新发现的字段,有三个取值:
【例子】
(1)新建索引
取值为strict,非布尔值要加引号
DELETE my_index
PUT my_index
{
"mappings": {
"my_type": {
"dynamic": "strict",
"properties": {
"title": { "type": "text"}
}
}
}
}
(2)插入新文档
PUT my_index/my_type/1
{
"title": "test",
"content": "test dynamic"
}
抛出异常
{
"error": {
"root_cause": [
{
"type": "strict_dynamic_mapping_exception",
"reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
}
],
"type": "strict_dynamic_mapping_exception",
"reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
},
"status": 400
}
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/enabled.html
ELasticseaech默认会索引所有的字段,enabled设为false的字段,es会跳过字段内容,该字段只能从_source中获取,但是不可搜。而且字段可以是任意类型。
【例子】
(1)新建索引,插入文档
DELETE my_index
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"name":{"enabled": false}
}
}
}
}
PUT my_index/my_type/1
{
"title": "test enabled",
"name":"chengyuqiang"
}
(2)查看文档
GET my_index/my_type/1
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"title": "test enabled",
"name": "chengyuqiang"
}
}
(3)搜索字段
GET my_index/_search
{
"query": {
"match": {
"name": "chengyuqiang"
}
}
}
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
在《12.5 date类型》一节已经介绍了日期格式化。
这里需要强调的是:epoch_millis表示毫秒数,epoch_second表示秒数。
更多内置的日期格式https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-date-format.html
ignore_above用于指定字段索引和存储的长度最大值,超过最大值的会被忽略
DELETE my_index
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"message": {
"type": "keyword",
"ignore_above": 20
}
}
}
}
}
PUT my_index/my_type/1
{
"message": "Syntax error"
}
PUT my_index/my_type/2
{
"message": "Syntax error with some long stacktrace"
}
GET my_index/_search
{
"size":0,
"aggs": {
"messages": {
"terms": {
"field": "message"
}
}
}
}
mapping中指定了ignore_above字段的最大长度为20,第一个文档的字段长小于20,因此索引成功,第二个超过20,因此不索引,返回结果只有”Syntax error”,结果如下
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"messages": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Syntax error",
"doc_count": 1
}
]
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/ignore-malformed.html
ignore_malformed可以忽略不规则数据。对于账号userid字段,有人可能填写的是 整数类型,也有人填写的是邮件格式。给一个字段索引不合适的数据类型发生异常,导致整个文档索引失败。如果ignore_malformed参数设为true,异常会被忽略,出异常的字段不会被索引,其它字段正常索引。
DELETE my_index
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"number_one": {
"type": "integer",
"ignore_malformed": true
},
"number_two": {
"type": "integer"
}
}
}
}
}
PUT my_index/my_type/1
{
"text": "Some text value",
"number_one": "foo"
}
PUT my_index/my_type/2
{
"text": "Some text value",
"number_two": "foo"
}
上面的例子中number_one接受integer类型,ignore_malformed属性设为true,因此文档一种number_one字段虽然是字符串但依然能写入成功;number_two接受integer类型,默认ignore_malformed属性为false,因此写入失败。
{
"error": {
"root_cause": [
{
"type": "mapper_parsing_exception",
"reason": "failed to parse [number_two]"
}
],
"type": "mapper_parsing_exception",
"reason": "failed to parse [number_two]",
"caused_by": {
"type": "number_format_exception",
"reason": "For input string: \"foo\""
}
},
"status": 400
}
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/index-options.html
The index_options parameter controls what information is added to the inverted index, for search and highlighting purposes.
index_options参数控制将哪些信息添加到倒排索引,用于搜索和突出显示目的。
参数 | 说明 |
---|---|
docs | Only the doc number is indexed. Can answer the question Does this term exist in this field? |
freqs | Doc number and term frequencies are indexed. Term frequencies are used to score repeated terms higher than single terms. |
positions | Doc number, term frequencies, and term positions (or order) are indexed. Positions can be used for proximity or phrase queries. |
offsets | Doc number, term frequencies, positions, and start and end character offsets (which map the term back to the original string) are indexed. Offsets are used by the unified highlighter to speed up highlighting. |
注意:The index_options parameter has been deprecated for Numeric fields in 6.0.0。6.0.0中的数字字段已弃用index_options参数。
DELETE my_index
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"text": {
"type": "text",
"index_options": "offsets"
}
}
}
}
}
PUT my_index/my_type/1
{
"text": "Quick brown fox"
}
GET my_index/_search
{
"query": {
"match": {
"text": "brown fox"
}
},
"highlight": {
"fields": {
"text": {}
}
}
}
{
"took": 50,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.5753642,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.5753642,
"_source": {
"text": "Quick brown fox"
},
"highlight": {
"text": [
"Quick brown fox"
]
}
}
]
}
}
The index option controls whether field values are indexed. It accepts true or false and defaults to true. Fields that are not indexed are not queryable.
index属性指定字段是否索引,不索引也就不可搜索,取值可以为true或者false。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/multi-fields.html
It is often useful to index the same field in different ways for different purposes. This is the purpose of multi-fields. For instance, a string field could be mapped as a text field for full-text search, and as a keyword field for sorting or aggregations。
fields可以让同一文本有多种不同的索引方式,比如一个String类型的字段,可以使用text类型做全文检索,使用keyword类型做聚合和排序。
DELETE my_index
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
PUT my_index/my_type/1
{
"city": "New York"
}
PUT my_index/my_type/2
{
"city": "York"
}
GET my_index/_search
{
"query": {
"match": {
"city": "york"
}
},
"sort": {
"city.raw": "asc"
},
"aggs": {
"Cities": {
"terms": {
"field": "city.raw"
}
}
}
}
{
"took": 31,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": null,
"_source": {
"city": "New York"
},
"sort": [
"New York"
]
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": null,
"_source": {
"city": "York"
},
"sort": [
"York"
]
}
]
},
"aggregations": {
"Cities": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "New York",
"doc_count": 1
},
{
"key": "York",
"doc_count": 1
}
]
}
}
}
The city.raw field is a keyword version of the city field (.city.raw字段是城市字段的关键字版本。)
The city field can be used for full text search.( city字段可用于全文搜索。)
The city.raw field can be used for sorting and aggregations.( city.raw字段可用于排序和聚合)