napoay

Elasticsearch 5.4 Mapping详解

前言
一Field datatype字段数据类型
- 1string类型
- 2 text类型
- 3 keyword类型
- 4 数字类型
- 5 Object类型
- 6 date类型
- 7 Array类型
- 8 binary类型
- 9 ip类型
- 10 range类型
- 11 nested类型
- 12token_count类型
- 13 geo point 类型
二Meta-Fields元数据
- 1 _all
- 2 _field_names
- 3 _id
- 4 _index
- 4 _meta
- 5 _parent
- 6 _routing
- 7 _source
- 8 _type
- 9 _uid
三Mapping参数
- 1 analyzer
- 2 normalizer
- 3 boost
- 4 coerce
- 5 copy_to
- 6 doc_values
- 7 dynamic
- 8 enabled
- 9 fielddata
- 10 format
- 11 ignore_above
- 12 ignore_malformed
- 13 include_in_all
- 14 index
- 15 index_options
- 16 fields
- 17 norms
- 18 null_value
- 19 position_increment_gap
- 20 properties
- 21 search_analyzer
- 22 similarity
- 23 store
- 24 term_vector
四动态Mapping
- 1 default mapping
- 2 Dynamic field mapping
- 3 Dynamic templates
- 4 Override default template

前言

声明：本博客根据ELasticsearch官网文档翻译整理，转载请注明出处：http://blog.csdn.net/napoay

一、Field datatype(字段数据类型)

1.1string类型

ELasticsearch 5.X之后的字段类型不再支持string，由text或keyword取代。如果仍使用string，会给出警告。

测试：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": {
          "type":  "string"
        }
      }
    }
  }
}

结果：

#! Deprecation: The [string] field is deprecated, please use [text] or [keyword] instead on [title]
{
  "acknowledged": true,
  "shards_acknowledged": true
}

1.2 text类型

text取代了string，当一个字段是要被全文搜索的，比如Email内容、产品描述，应该使用text类型。设置text类型以后，字段内容会被分析，在生成倒排索引以前，字符串会被分析器分成一个一个词项。text类型的字段不用于排序，很少用于聚合（termsAggregation除外）。

把full_name字段设为text类型的Mapping如下：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "full_name": {
          "type":  "text"
        }
      }
    }
  }
}

1.3 keyword类型

keyword类型适用于索引结构化的字段，比如email地址、主机名、状态码和标签。如果字段需要进行过滤(比如查找已发布博客中status属性为published的文章)、排序、聚合。keyword类型的字段只能通过精确值搜索到。

1.4 数字类型

对于数字类型，ELasticsearch支持以下几种：

类型	取值范围
long	-2^63至2^63-1
integer	-2^31至2^31-1
short	-32,768至32768
byte	-128至127
double	64位双精度IEEE 754浮点类型
float	32位单精度IEEE 754浮点类型
half_float	16位半精度IEEE 754浮点类型
scaled_float	缩放类型的的浮点数（比如价格只需要精确到分，price为57.34的字段缩放因子为100，存起来就是5734）

对于float、half_float和scaled_float,-0.0和+0.0是不同的值，使用term查询查找-0.0不会匹配+0.0，同样range查询中上边界是-0.0不会匹配+0.0，下边界是+0.0不会匹配-0.0。

对于数字类型的数据，选择以上数据类型的注意事项：

在满足需求的情况下，尽可能选择范围小的数据类型。比如，某个字段的取值最大值不会超过100，那么选择byte类型即可。迄今为止吉尼斯记录的人类的年龄的最大值为134岁，对于年龄字段，short足矣。字段的长度越短，索引和搜索的效率越高。
优先考虑使用带缩放因子的浮点类型。

例子：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_of_bytes": {
          "type": "integer"
        },
        "time_in_seconds": {
          "type": "float"
        },
        "price": {
          "type": "scaled_float",
          "scaling_factor": 100
        }
      }
    }
  }
}

1.5 Object类型

JSON天生具有层级关系，文档会包含嵌套的对象：

PUT my_index/my_type/1
{ 
  "region": "US",
  "manager": { 
    "age":     30,
    "name": { 
      "first": "John",
      "last":  "Smith"
    }
  }
}

上面的文档中，整体是一个JSON，JSON中包含一个manager,manager又包含一个name。最终，文档会被索引成一平的key-value对：

{
  "region":             "US",
  "manager.age":        30,
  "manager.name.first": "John",
  "manager.name.last":  "Smith" }

上面文档结构的Mapping如下：

PUT my_index
{
  "mappings": {
    "my_type": { 
      "properties": {
        "region": {
          "type": "keyword"
        },
        "manager": { 
          "properties": {
            "age":  { "type": "integer" },
            "name": { 
              "properties": {
                "first": { "type": "text" },
                "last":  { "type": "text" }
              }
            }
          }
        }
      }
    }
  }
}

1.6 date类型

JSON中没有日期类型，所以在ELasticsearch中，日期类型可以是以下几种：

日期格式的字符串：e.g. “2015-01-01” or “2015/01/01 12:10:30”.
long类型的毫秒数( milliseconds-since-the-epoch)
integer的秒数(seconds-since-the-epoch)

日期格式可以自定义，如果没有自定义，默认格式如下：

"strict_date_optional_time||epoch_millis"

例子:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "date": {
          "type": "date" 
        }
      }
    }
  }
}

PUT my_index/my_type/1
{ "date": "2015-01-01" } 

PUT my_index/my_type/2
{ "date": "2015-01-01T12:10:30Z" } 

PUT my_index/my_type/3
{ "date": 1420070400001 } 

GET my_index/_search
{
  "sort": { "date": "asc"} 
}

查看三个日期类型：

{
  "took": 0,
  "timed_out": false,
  "_shards": { "total": 5, "successful": 5, "failed": 0 },
  "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": 1, "_source": { "date": "2015-01-01T12:10:30Z" } }, { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "date": "2015-01-01" } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "date": 1420070400001 } } ] } }

排序结果：

{
  "took": 2,
  "timed_out": false,
  "_shards": { "total": 5, "successful": 5, "failed": 0 },
  "hits": { "total": 3, "max_score": null, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": null, "_source": { "date": "2015-01-01" }, "sort": [ 1420070400000 ] }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": null, "_source": { "date": 1420070400001 }, "sort": [ 1420070400001 ] }, { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": null, "_source": { "date": "2015-01-01T12:10:30Z" }, "sort": [ 1420114230000 ] } ] } }

1.7 Array类型

ELasticsearch没有专用的数组类型，默认情况下任何字段都可以包含一个或者多个值，但是一个数组中的值要是同一种类型。例如：

字符数组: [ “one”, “two” ]
整型数组：[1,3]
嵌套数组：[1,[2,3]],等价于[1,2,3]
对象数组：[ { “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]

注意事项：

动态添加数据时，数组的第一个值的类型决定整个数组的类型
混合数组类型是不支持的，比如：[1,”abc”]
数组可以包含null值，空数组[ ]会被当做missing field对待。

1.8 binary类型

binary类型接受base64编码的字符串，默认不存储也不可搜索。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "name": {
          "type": "text"
        },
        "blob": {
          "type": "binary"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "name": "Some binary blob",
  "blob": "U29tZSBiaW5hcnkgYmxvYg==" 
}

搜索blog字段：

GET my_index/_search
{
  "query": {
    "match": {
      "blob": "test" 
    }
  }
}

返回结果：
{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "Binary fields do not support searching",
        "index_uuid": "fgA7UM5XSS-56JO4F4fYug",
        "index": "my_index"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_index",
        "node": "3dQd1RRVTMiKdTckM68nPQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "Binary fields do not support searching",
          "index_uuid": "fgA7UM5XSS-56JO4F4fYug",
          "index": "my_index"
        }
      }
    ]
  },
  "status": 400
}

Base64加密、解码工具：http://www1.tc711.com/tool/BASE64.htm

1.9 ip类型

ip类型的字段用于存储IPV4或者IPV6的地址。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "ip_addr": {
          "type": "ip"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "ip_addr": "192.168.1.1"
}

GET my_index/_search
{
  "query": {
    "term": {
      "ip_addr": "192.168.0.0/16"
    }
  }
}

1.10 range类型

range类型支持以下几种：

类型	范围
integer_range	-2^31至2^31-1
float_range	32-bit IEEE 754
long_range	-2^63至2^63-1
double_range	64-bit IEEE 754
date_range	64位整数，毫秒计时

range类型的使用场景：比如前端的时间选择表单、年龄范围选择表单等。
例子：

PUT range_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "expected_attendees": {
          "type": "integer_range"
        },
        "time_frame": {
          "type": "date_range", 
          "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        }
      }
    }
  }
}

PUT range_index/my_type/1
{
  "expected_attendees" : { 
    "gte" : 10,
    "lte" : 20
  },
  "time_frame" : { 
    "gte" : "2015-10-31 12:00:00", 
    "lte" : "2015-11-01"
  }
}

上面代码创建了一个range_index索引，expected_attendees的人数为10到20，时间是2015-10-31 12:00:00至2015-11-01。

查询：

POST range_index/_search
{
  "query" : {
    "range" : {
      "time_frame" : { 
        "gte" : "2015-08-01",
        "lte" : "2015-12-01",
        "relation" : "within" 
      }
    }
  }
}

查询结果：

{
  "took": 2,
  "timed_out": false,
  "_shards": { "total": 5, "successful": 5, "failed": 0 },
  "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "range_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "expected_attendees": { "gte": 10, "lte": 20 }, "time_frame": { "gte": "2015-10-31 12:00:00", "lte": "2015-11-01" } } } ] } }

1.11 nested类型

nested嵌套类型是object中的一个特例，可以让array类型的Object独立索引和查询。使用Object类型有时会出现问题，比如文档 my_index/my_type/1的结构如下：

PUT my_index/my_type/1
{
  "group" : "fans",
  "user" : [ 
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

user字段会被动态添加为Object类型。
最后会被转换为以下平整的形式：

{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ] }

user.first和user.last会被平铺为多值字段，Alice和White之间的关联关系会消失。上面的文档会不正确的匹配以下查询(虽然能搜索到,实际上不存在Alice Smith)：

GET my_index/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "user.first": "Alice" }},
        { "match": { "user.last":  "Smith" }}
      ]
    }
  }
}

使用nested字段类型解决Object类型的不足：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "user": {
          "type": "nested" 
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "group" : "fans",
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "Smith" }} 
          ]
        }
      }
    }
  }
}

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "bool": {
          "must": [
            { "match": { "user.first": "Alice" }},
            { "match": { "user.last":  "White" }} 
          ]
        }
      },
      "inner_hits": { 
        "highlight": {
          "fields": {
            "user.first": {}
          }
        }
      }
    }
  }
}

1.12token_count类型

token_count用于统计词频：


PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "name": { 
          "type": "text",
          "fields": {
            "length": { 
              "type":     "token_count",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

PUT my_index/my_type/1
{ "name": "John Smith" }

PUT my_index/my_type/2
{ "name": "Rachel Alice Williams" }

GET my_index/_search
{
  "query": {
    "term": {
      "name.length": 3 
    }
  }
}

1.13 geo point 类型

地理位置信息类型用于存储地理位置信息的经纬度：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "location": {
          "type": "geo_point"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "text": "Geo-point as an object",
  "location": { 
    "lat": 41.12,
    "lon": -71.34
  }
}

PUT my_index/my_type/2
{
  "text": "Geo-point as a string",
  "location": "41.12,-71.34" 
}

PUT my_index/my_type/3
{
  "text": "Geo-point as a geohash",
  "location": "drm3btev3e86" 
}

PUT my_index/my_type/4
{
  "text": "Geo-point as an array",
  "location": [ -71.34, 41.12 ] 
}

GET my_index/_search
{
  "query": {
    "geo_bounding_box": { 
      "location": {
        "top_left": {
          "lat": 42,
          "lon": -72
        },
        "bottom_right": {
          "lat": 40,
          "lon": -74
        }
      }
    }
  }
}

二、Meta-Fields(元数据)

2.1 _all

_all字段是把其它字段拼接在一起的超级字段，所有的字段用空格分开，_all字段会被解析和索引，但是不存储。当你只想返回包含某个关键字的文档但是不明确地搜某个字段的时候就需要使用_all字段。
例子：

PUT my_index/blog/1 
{
  "title":    "Master Java",
  "content":     "learn java",
  "author": "Tom"
}

_all字段包含:[ “Master”, “Java”, “learn”, “Tom” ]

搜索：

GET my_index/_search
{
  "query": {
    "match": {
      "_all": "Java"
    }
  }
}

返回结果：

{
  "took": 1,
  "timed_out": false,
  "_shards": { "total": 5, "successful": 5, "failed": 0 },
  "hits": { "total": 1, "max_score": 0.39063013, "hits": [ { "_index": "my_index", "_type": "blog", "_id": "1", "_score": 0.39063013, "_source": { "title": "Master Java", "content": "learn java", "author": "Tom" } } ] } }

使用copy_to自定义_all字段：

PUT myindex
{
  "mappings": {
    "mytype": {
      "properties": {
        "title": {
          "type":    "text",
          "copy_to": "full_content" 
        },
        "content": {
          "type":    "text",
          "copy_to": "full_content" 
        },
        "full_content": {
          "type":    "text"
        }
      }
    }
  }
}

PUT myindex/mytype/1
{
  "title": "Master Java",
  "content": "learn Java"
}

GET myindex/_search
{
  "query": {
    "match": {
      "full_content": "java"
    }
  }
}

2.2 _field_names

_field_names字段用来存储文档中的所有非空字段的名字，这个字段常用于exists查询。例子如下:

PUT my_index/my_type/1
{
  "title": "This is a document"
}

PUT my_index/my_type/2?refresh=true
{
  "title": "This is another document",
  "body": "This document has a body"
}

GET my_index/_search
{
  "query": {
    "terms": {
      "_field_names": [ "body" ] 
    }
  }
}

结果会返回第二条文档，因为第一条文档没有title字段。
同样，可以使用exists查询：

GET my_index/_search
{
    "query": {
        "exists" : { "field" : "body" }
    }
}

2.3 _id

每条被索引的文档都有一个_type和_id字段，_id可以用于term查询、temrs查询、match查询、query_string查询、simple_query_string查询，但是不能用于聚合、脚本和排序。例子如下：

PUT my_index/my_type/1
{
  "text": "Document with ID 1"
}

PUT my_index/my_type/2
{
  "text": "Document with ID 2"
}

GET my_index/_search
{
  "query": {
    "terms": {
      "_id": [ "1", "2" ] 
    }
  }
}

2.4 _index

多索引查询时，有时候只需要在特地索引名上进行查询，_index字段提供了便利，也就是说可以对索引名进行term查询、terms查询、聚合分析、使用脚本和排序。

_index是一个虚拟字段，不会真的加到Lucene索引中，对_index进行term、terms查询(也包括match、query_string、simple_query_string)，但是不支持prefix、wildcard、regexp和fuzzy查询。

举例，2个索引2条文档


PUT index_1/my_type/1
{
  "text": "Document in index 1"
}

PUT index_2/my_type/2
{
  "text": "Document in index 2"
}

对索引名做查询、聚合、排序并使用脚本新增字段：

GET index_1,index_2/_search
{
  "query": {
    "terms": {
      "_index": ["index_1", "index_2"] 
    }
  },
  "aggs": {
    "indices": {
      "terms": {
        "field": "_index", 
        "size": 10
      }
    }
  },
  "sort": [
    {
      "_index": { 
        "order": "asc"
      }
    }
  ],
  "script_fields": {
    "index_name": {
      "script": {
        "lang": "painless",
        "inline": "doc['_index']" 
      }
    }
  }
}

2.4 _meta

忽略

2.5 _parent

_parent用于指定同一索引中文档的父子关系。下面例子中现在mapping中指定文档的父子关系，然后索引父文档，索引子文档时指定父id，最后根据子文档查询父文档。

PUT my_index
{
  "mappings": {
    "my_parent": {},
    "my_child": {
      "_parent": {
        "type": "my_parent" 
      }
    }
  }
}


PUT my_index/my_parent/1 
{
  "text": "This is a parent document"
}

PUT my_index/my_child/2?parent=1 
{
  "text": "This is a child document"
}

PUT my_index/my_child/3?parent=1&refresh=true 
{
  "text": "This is another child document"
}


GET my_index/my_parent/_search
{
  "query": {
    "has_child": { 
      "type": "my_child",
      "query": {
        "match": {
          "text": "child document"
        }
      }
    }
  }
}

2.6 _routing

路由参数，ELasticsearch通过以下公式计算文档应该分到哪个分片上：

shard_num = hash(_routing) % num_primary_shards

默认的_routing值是文档的_id或者_parent，通过_routing参数可以设置自定义路由。例如，想把user1发布的博客存储到同一个分片上，索引时指定routing参数，查询时在指定路由上查询：

PUT my_index/my_type/1?routing=user1&refresh=true 
{
  "title": "This is a document"
}

GET my_index/my_type/1?routing=user1

在查询的时候通过routing参数查询：

GET my_index/_search
{
  "query": {
    "terms": {
      "_routing": [ "user1" ] 
    }
  }
}

GET my_index/_search?routing=user1,user2 
{
  "query": {
    "match": {
      "title": "document"
    }
  }
}

在Mapping中指定routing为必须的：

PUT my_index2
{
  "mappings": {
    "my_type": {
      "_routing": {
        "required": true 
      }
    }
  }
}

PUT my_index2/my_type/1 
{
  "text": "No routing value provided"
}

2.7 _source

存储的文档的原始值。默认_source字段是开启的，也可以关闭：

PUT tweets
{
  "mappings": {
    "tweet": {
      "_source": {
        "enabled": false
      }
    }
  }
}

但是一般情况下不要关闭，除法你不想做一些操作：

使用update、update_by_query、reindex
使用高亮
数据备份、改变mapping、升级索引
通过原始字段debug查询或者聚合

2.8 _type

每条被索引的文档都有一个_type和_id字段，可以根据_type进行查询、聚合、脚本和排序。例子如下：

PUT my_index/type_1/1
{
  "text": "Document with type 1"
}

PUT my_index/type_2/2?refresh=true
{
  "text": "Document with type 2"
}

GET my_index/_search
{
  "query": {
    "terms": {
      "_type": [ "type_1", "type_2" ] 
    }
  },
  "aggs": {
    "types": {
      "terms": {
        "field": "_type", 
        "size": 10
      }
    }
  },
  "sort": [
    {
      "_type": { 
        "order": "desc"
      }
    }
  ],
  "script_fields": {
    "type": {
      "script": {
        "lang": "painless",
        "inline": "doc['_type']" 
      }
    }
  }
}

2.9 _uid

_uid和_type和_index的组合。和_type一样，可用于查询、聚合、脚本和排序。例子如下：

PUT my_index/my_type/1
{
  "text": "Document with ID 1"
}

PUT my_index/my_type/2?refresh=true
{
  "text": "Document with ID 2"
}

GET my_index/_search
{
  "query": {
    "terms": {
      "_uid": [ "my_type#1", "my_type#2" ] 
    }
  },
  "aggs": {
    "UIDs": {
      "terms": {
        "field": "_uid", 
        "size": 10
      }
    }
  },
  "sort": [
    {
      "_uid": { 
        "order": "desc"
      }
    }
  ],
  "script_fields": {
    "UID": {
      "script": {
         "lang": "painless",
         "inline": "doc['_uid']" 
      }
    }
  }
}

三、Mapping参数

3.1 analyzer

指定分词器(分析器更合理)，对索引和查询都有效。如下，指定ik分词的配置：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "content": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        }
      }
    }
  }
}

3.2 normalizer

normalizer用于解析前的标准化配置，比如把所有的字符转化为小写等。例子：

PUT index
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "type": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

PUT index/type/1
{
  "foo": "BÀR"
}

PUT index/type/2
{
  "foo": "bar"
}

PUT index/type/3
{
  "foo": "baz"
}

POST index/_refresh

GET index/_search
{
  "query": {
    "match": {
      "foo": "BAR"
    }
  }
}

BÀR经过normalizer过滤以后转换为bar，文档1和文档2会被搜索到。

3.3 boost

boost字段用于设置字段的权重，比如，关键字出现在title字段的权重是出现在content字段中权重的2倍，设置mapping如下，其中content字段的默认权重是1.

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": {
          "type": "text",
          "boost": 2 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

同样，在查询时指定权重也是一样的：

POST _search
{
    "query": {
        "match" : {
            "title": {
                "query": "quick brown fox",
                "boost": 2
            }
        }
    }
}

推荐在查询时指定boost，第一中在mapping中写死，如果不重新索引文档，权重无法修改，使用查询可以实现同样的效果。

3.4 coerce

coerce属性用于清除脏数据，coerce的默认值是true。整型数字5有可能会被写成字符串“5”或者浮点数5.0.coerce属性可以用来清除脏数据：

字符串会被强制转换为整数
浮点数被强制转换为整数


PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": {
          "type": "integer"
        },
        "number_two": {
          "type": "integer",
          "coerce": false
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "number_one": "10" 
}

PUT my_index/my_type/2
{
  "number_two": "10" 
}

mapping中指定number_one字段是integer类型，虽然插入的数据类型是String，但依然可以插入成功。number_two字段关闭了coerce，因此插入失败。

3.5 copy_to

copy_to属性用于配置自定义的_all字段。换言之，就是多个字段可以合并成一个超级字段。比如，first_name和last_name可以合并为full_name字段。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "first_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "first_name": "John",
  "last_name": "Smith"
}

GET my_index/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

3.6 doc_values

doc_values是为了加快排序、聚合操作，在建立倒排索引的时候，额外增加一个列式存储映射，是一个空间换时间的做法。默认是开启的，对于确定不需要聚合或者排序的字段可以关闭。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "status_code": { 
          "type":       "keyword"
        },
        "session_id": { 
          "type":       "keyword",
          "doc_values": false
        }
      }
    }
  }
}

注:text类型不支持doc_values。

3.7 dynamic

dynamic属性用于检测新发现的字段，有三个取值：

true:新发现的字段添加到映射中。（默认）
flase:新检测的字段被忽略。必须显式添加新字段。
strict:如果检测到新字段，就会引发异常并拒绝文档。

例子：

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic": false, 
      "properties": {
        "user": { 
          "properties": {
            "name": {
              "type": "text"
            },
            "social_networks": { 
              "dynamic": true,
              "properties": {}
            }
          }
        }
      }
    }
  }
}

PS：取值为strict，非布尔值要加引号。

3.8 enabled

ELasticseaech默认会索引所有的字段，enabled设为false的字段，es会跳过字段内容，该字段只能从_source中获取，但是不可搜。而且字段可以是任意类型。

PUT my_index
{
  "mappings": {
    "session": {
      "properties": {
        "user_id": {
          "type":  "keyword"
        },
        "last_updated": {
          "type": "date"
        },
        "session_data": { 
          "enabled": false
        }
      }
    }
  }
}

PUT my_index/session/session_1
{
  "user_id": "kimchy",
  "session_data": { 
    "arbitrary_object": {
      "some_array": [ "foo", "bar", { "baz": 2 } ]
    }
  },
  "last_updated": "2015-12-06T18:20:22"
}

PUT my_index/session/session_2
{
  "user_id": "jpountz",
  "session_data": "none", 
  "last_updated": "2015-12-06T18:22:13"
}

3.9 fielddata

搜索要解决的问题是“包含查询关键词的文档有哪些？”，聚合恰恰相反，聚合要解决的问题是“文档包含哪些词项”，大多数字段再索引时生成doc_values，但是text字段不支持doc_values。

取而代之，text字段在查询时会生成一个fielddata的数据结构，fielddata在字段首次被聚合、排序、或者使用脚本的时候生成。ELasticsearch通过读取磁盘上的倒排记录表重新生成文档词项关系，最后在Java堆内存中排序。

text字段的fielddata属性默认是关闭的，开启fielddata非常消耗内存。在你开启text字段以前，想清楚为什么要在text类型的字段上做聚合、排序操作。大多数情况下这么做是没有意义的。

“New York”会被分析成“new”和“york”，在text类型上聚合会分成“new”和“york”2个桶，也许你需要的是一个“New York”。这是可以加一个不分析的keyword字段：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "my_field": { 
          "type": "text",
          "fields": {
            "keyword": { 
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

上面的mapping中实现了通过my_field字段做全文搜索，my_field.keyword做聚合、排序和使用脚本。

3.10 format

format属性主要用于格式化日期：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "date": {
          "type":   "date",
          "format": "yyyy-MM-dd"
        }
      }
    }
  }
}

更多内置的日期格式：https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html

3.11 ignore_above

ignore_above用于指定字段索引和存储的长度最大值，超过最大值的会被忽略：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "message": {
          "type": "keyword",
          "ignore_above": 15
        }
      }
    }
  }
}

PUT my_index/my_type/1 
{
  "message": "Syntax error"
}

PUT my_index/my_type/2 
{
  "message": "Syntax error with some long stacktrace"
}

GET my_index/_search 
{
  "size": 0, 
  "aggs": {
    "messages": {
      "terms": {
        "field": "message"
      }
    }
  }
}

mapping中指定了ignore_above字段的最大长度为15，第一个文档的字段长小于15，因此索引成功，第二个超过15，因此不索引，返回结果只有”Syntax error”,结果如下：

{
  "took": 2,
  "timed_out": false,
  "_shards": { "total": 5, "successful": 5, "failed": 0 },
  "hits": { "total": 2, "max_score": 0, "hits": [] },
  "aggregations": { "messages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [] } } }

3.12 ignore_malformed

ignore_malformed可以忽略不规则数据，对于login字段，有人可能填写的是date类型，也有人填写的是邮件格式。给一个字段索引不合适的数据类型发生异常，导致整个文档索引失败。如果ignore_malformed参数设为true，异常会被忽略，出异常的字段不会被索引，其它字段正常索引。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "number_one": {
          "type": "integer",
          "ignore_malformed": true
        },
        "number_two": {
          "type": "integer"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "text":       "Some text value",
  "number_one": "foo" 
}

PUT my_index/my_type/2
{
  "text":       "Some text value",
  "number_two": "foo" 
}

上面的例子中number_one接受integer类型，ignore_malformed属性设为true，因此文档一种number_one字段虽然是字符串但依然能写入成功；number_two接受integer类型，默认ignore_malformed属性为false，因此写入失败。

3.13 include_in_all

include_in_all属性用于指定字段是否包含在_all字段里面，默认开启，除索引时index属性为no。
例子如下，title和content字段包含在_all字段里，date不包含。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": { 
          "type": "text"
        },
        "content": { 
          "type": "text"
        },
        "date": { 
          "type": "date",
          "include_in_all": false
        }
      }
    }
  }
}

include_in_all也可用于字段级别，如下my_type下的所有字段都排除在_all字段之外，author.first_name 和author.last_name 包含在in _all中：

PUT my_index
{
  "mappings": {
    "my_type": {
      "include_in_all": false, 
      "properties": {
        "title":          { "type": "text" },
        "author": {
          "include_in_all": true, 
          "properties": {
            "first_name": { "type": "text" },
            "last_name":  { "type": "text" }
          }
        },
        "editor": {
          "properties": {
            "first_name": { "type": "text" }, 
            "last_name":  { "type": "text", "include_in_all": true } 
          }
        }
      }
    }
  }
}

3.14 index

index属性指定字段是否索引，不索引也就不可搜索，取值可以为true或者false。

3.15 index_options

index_options控制索引时存储哪些信息到倒排索引中，接受以下配置：

参数	作用
docs	只存储文档编号
freqs	存储文档编号和词项频率
positions	文档编号、词项频率、词项的位置被存储，偏移位置可用于临近搜索和短语查询
offsets	文档编号、词项频率、词项的位置、词项开始和结束的字符位置都被存储，offsets设为true会使用Postings highlighter

3.16 fields

fields可以让统一文本有多种不同的索引方式，比如一个String类型的字段，可以使用text类型做全文检索，使用keyword类型做聚合和排序。

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "city": {
          "type": "text",
          "fields": {
            "raw": { 
              "type":  "keyword"
            }
          }
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "city": "New York"
}

PUT my_index/my_type/2
{
  "city": "York"
}

GET my_index/_search
{
  "query": {
    "match": {
      "city": "york" 
    }
  },
  "sort": {
    "city.raw": "asc" 
  },
  "aggs": {
    "Cities": {
      "terms": {
        "field": "city.raw" 
      }
    }
  }
}

3.17 norms

norms参数用于标准化文档，以便查询时计算文档的相关性。norms虽然对评分有用，但是会消耗较多的磁盘空间，如果不需要对某个字段进行评分，最好不要开启norms。

3.18 null_value

值为null的字段不索引也不可以搜索，null_value参数可以让值为null的字段显式的可索引、可搜索。例子：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "status_code": {
          "type":       "keyword",
          "null_value": "NULL" 
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "status_code": null
}

PUT my_index/my_type/2
{
  "status_code": [] 
}

GET my_index/_search
{
  "query": {
    "term": {
      "status_code": "NULL" 
    }
  }
}

文档1可以被搜索到，因为status_code的值为null，文档2不可以被搜索到，因为status_code为空数组，但是不是null。

3.19 position_increment_gap

为了支持近似或者短语查询，text字段被解析的时候会考虑此项的位置信息。举例，一个字段的值为数组类型：

 "names": [ "John Abraham", "Lincoln Smith"]

为了区别第一个字段和第二个字段，Abraham和Lincoln在索引中有一个间距，默认是100。例子如下，这是查询”Abraham Lincoln”是查不到的：

PUT my_index/groups/1
{
    "names": [ "John Abraham", "Lincoln Smith"]
}

GET my_index/groups/_search
{
    "query": {
        "match_phrase": {
            "names": {
                "query": "Abraham Lincoln" 
            }
        }
    }
}

指定间距大于100可以查询到：

GET my_index/groups/_search
{
    "query": {
        "match_phrase": {
            "names": {
                "query": "Abraham Lincoln",
                "slop": 101 
            }
        }
    }
}

在mapping中通过position_increment_gap参数指定间距：

PUT my_index
{
  "mappings": {
    "groups": {
      "properties": {
        "names": {
          "type": "text",
          "position_increment_gap": 0 
        }
      }
    }
  }
}

3.20 properties

Object或者nested类型，下面还有嵌套类型，可以通过properties参数指定。

PUT my_index
{
  "mappings": {
    "my_type": { 
      "properties": {
        "manager": { 
          "properties": {
            "age":  { "type": "integer" },
            "name": { "type": "text"  }
          }
        },
        "employees": { 
          "type": "nested",
          "properties": {
            "age":  { "type": "integer" },
            "name": { "type": "text"  }
          }
        }
      }
    }
  }
}

对应的文档结构：

PUT my_index/my_type/1 
{
  "region": "US",
  "manager": {
    "name": "Alice White",
    "age": 30
  },
  "employees": [
    {
      "name": "John Smith",
      "age": 34
    },
    {
      "name": "Peter Brown",
      "age": 26
    }
  ]
}

可以对manager.name、manager.age做搜索、聚合等操作。

GET my_index/_search
{
  "query": {
    "match": {
      "manager.name": "Alice White" 
    }
  },
  "aggs": {
    "Employees": {
      "nested": {
        "path": "employees"
      },
      "aggs": {
        "Employee Ages": {
          "histogram": {
            "field": "employees.age", 
            "interval": 5
          }
        }
      }
    }
  }
}

3.21 search_analyzer

大多数情况下索引和搜索的时候应该指定相同的分析器，确保query解析以后和索引中的词项一致。但是有时候也需要指定不同的分析器，例如使用edge_ngram过滤器实现自动补全。

默认情况下查询会使用analyzer属性指定的分析器，但也可以被search_analyzer覆盖。例子：

PUT my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "text": {
          "type": "text",
          "analyzer": "autocomplete", 
          "search_analyzer": "standard" 
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "text": "Quick Brown Fox" 
}

GET my_index/_search
{
  "query": {
    "match": {
      "text": {
        "query": "Quick Br", 
        "operator": "and"
      }
    }
  }
}

3.22 similarity

similarity参数用于指定文档评分模型，参数有三个：

BM25 ：ES和Lucene默认的评分模型
classic ：TF/IDF评分
boolean：布尔模型评分
例子：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "default_field": { 
          "type": "text"
        },
        "classic_field": {
          "type": "text",
          "similarity": "classic" 
        },
        "boolean_sim_field": {
          "type": "text",
          "similarity": "boolean" 
        }
      }
    }
  }
}

default_field自动使用BM25评分模型，classic_field使用TF/IDF经典评分模型，boolean_sim_field使用布尔评分模型。

3.23 store

默认情况下，自动是被索引的也可以搜索，但是不存储，这也没关系，因为_source字段里面保存了一份原始文档。在某些情况下，store参数有意义，比如一个文档里面有title、date和超大的content字段，如果只想获取title和date，可以这样：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": {
          "type": "text",
          "store": true 
        },
        "date": {
          "type": "date",
          "store": true 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "title":   "Some short title",
  "date":    "2015-01-01",
  "content": "A very long content field..."
}

GET my_index/_search
{
  "stored_fields": [ "title", "date" ] 
}

查询结果：

{
  "took": 1,
  "timed_out": false,
  "_shards": { "total": 5, "successful": 5, "failed": 0 },
  "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "fields": { "date": [ "2015-01-01T00:00:00.000Z" ], "title": [ "Some short title" ] } } ] } }

Stored fields返回的总是数组，如果想返回原始字段，还是要从_source中取。

3.24 term_vector

词向量包含了文本被解析以后的以下信息：

词项集合
词项位置
词项的起始字符映射到原始文档中的位置。
term_vector参数有以下取值：

式，如下：

参数取值	含义
no	默认值，不存储词向量
yes	只存储词项集合
with_positions	存储词项和词项位置
with_offsets	词项和字符偏移位置
with_positions_offsets	存储词项、词项位置、字符偏移位置

例子：

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "text": {
          "type":        "text",
          "term_vector": "with_positions_offsets"
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "text": "Quick brown fox"
}

GET my_index/_search
{
  "query": {
    "match": {
      "text": "brown fox"
    }
  },
  "highlight": {
    "fields": {
      "text": {} 
    }
  }
}

四、动态Mapping

4.1 default mapping

在mapping中使用default字段，那么其它字段会自动继承default中的设置。

PUT my_index
{
  "mappings": {
    "_default_": { 
      "_all": {
        "enabled": false
      }
    },
    "user": {}, 
    "blogpost": { 
      "_all": {
        "enabled": true
      }
    }
  }
}

上面的mapping中，default中关闭了all字段，user会继承_default中的配置，因此user中的all字段也是关闭的，blogpost中开启_all，覆盖了_default的默认配置。

当default被更新以后，只会对后面新加的文档产生作用。

4.2 Dynamic field mapping

文档中有一个之前没有出现过的字段被添加到ELasticsearch之后，文档的type mapping中会自动添加一个新的字段。这个可以通过dynamic属性去控制，dynamic属性为false会忽略新增的字段、dynamic属性为strict会抛出异常。如果dynamic为true的话，ELasticsearch会自动根据字段的值推测出来类型进而确定mapping：

JSON格式的数据	自动推测的字段类型
null	没有字段被添加
true or false	boolean类型
floating类型数字	floating类型
integer	long类型
JSON对象	object类型
数组	由数组中第一个非空值决定
string	有可能是date类型（开启日期检测)、double或long类型、text类型、keyword类型

日期检测默认是检测符合以下日期格式的字符串：

[ "strict_date_optional_time","yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"]

例子:

PUT my_index/my_type/1
{
  "create_date": "2015/09/02"
}

GET my_index/_mapping

mapping 如下，可以看到create_date为date类型：

{
  "my_index": { "mappings": { "my_type": { "properties": { "create_date": { "type": "date", "format": "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis" } } } } } }

关闭日期检测：

PUT my_index
{
  "mappings": {
    "my_type": {
      "date_detection": false
    }
  }
}

PUT my_index/my_type/1 
{
  "create": "2015/09/02"
}

再次查看mapping，create字段已不再是date类型：

GET my_index/_mapping
返回结果：
{
  "my_index": {
    "mappings": {
      "my_type": {
        "date_detection": false,
        "properties": {
          "create": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

自定义日期检测的格式：

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic_`这里写代码片`date_formats": ["MM/dd/yyyy"]
    }
  }
}

PUT my_index/my_type/1
{
  "create_date": "09/25/2015"
}

开启数字类型自动检测：

PUT my_index
{
  "mappings": {
    "my_type": {
      "numeric_detection": true
    }
  }
}

PUT my_index/my_type/1
{
  "my_float":   "1.0", 
  "my_integer": "1" 
}

4.3 Dynamic templates

动态模板可以根据字段名称设置mapping，如下对于string类型的字段，设置mapping为：

  "mapping": { "type": "long"}

但是匹配字段名称为long_*格式的，不匹配*_text格式的：

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        {
          "longs_as_strings": {
            "match_mapping_type": "string",
            "match":   "long_*",
            "unmatch": "*_text",
            "mapping": {
              "type": "long"
            }
          }
        }
      ]
    }
  }
}

PUT my_index/my_type/1
{
  "long_num": "5", 
  "long_text": "foo" 
}

写入文档以后，long_num字段为long类型，long_text扔为string类型。

4.4 Override default template

可以通过default字段覆盖所有索引的mapping配置，例子：

PUT _template/disable_all_field
{
  "order": 0,
  "template": "*", 
  "mappings": {
    "_default_": { 
      "_all": { 
        "enabled": false
      }
    }
  }
}

你可能感兴趣的:(elasticsearch,mapping,5-4)

ES聚合分析原理与代码实例讲解光剑书架上的书大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
ES聚合分析原理与代码实例讲解1.背景介绍1.1问题的由来在大规模数据分析场景中，特别是在使用Elasticsearch（ES）进行数据存储和检索时，聚合分析成为了一个至关重要的功能。聚合分析允许用户对数据集进行细分和分组，以便深入探索数据的结构和模式。这在诸如实时监控、日志分析、业务洞察等领域具有广泛的应用。1.2研究现状目前，ES聚合分析已经成为现代大数据平台的核心组件之一。它支持多种类型的聚
spring mvc @RequestBody String类型参数 zoyation spring-mvc spring mvc
通过如下配置：text/html;charset=UTF-8application/json;charset=UTF-8在springmvc的Controller层使用@RequestBody接收Content-Type为application/json的数据时，默认支持Map方式和对象方式参数@RequestMapping(value="/{code}/saveUser",method=Requ
【Golang】 Golang 的 GORM 库中的 Rows 函数不爱洗脚的小滕 golang 开发语言后端
文章目录前言一、Rows函数解释二、代码实现三、总结前言在使用Go语言进行数据库操作时，GORM（GoObject-RelationalMapping）库是一个常用的工具。它提供了一种简洁和强大的方式来处理数据库操作。本文将介绍GORM库中的Rows函数，这是一个用于执行原生SQL查询并返回结果的函数。一、Rows函数解释在GORM库中，Rows函数用于执行原生SQL查询并返回*sql.Rows结
ElasticSearch查询超过10000条（1000页）时出现Result window is too large的问题王月亮17
问题当ES数据量较大，使用分页查询超过10000条（1000页）时，出现如下错误：Cannotexecutejestaction,responsecode:500,error:{"root_cause":[{"type":"query_phase_execution_exception","reason":"Resultwindowistoolarge,from+sizemustbelesstha
Jooq 框架介绍及其核心要点木南曌 Java java
一、引言Jooq（JavaPersistenceforRelationalDatabases）是一个强大的类型安全的SQL查询构建器和ORM（Object-RelationalMapping）框架，专为Java和Kotlin设计。它为开发者提供了一种优雅的方式来编写SQL代码，同时还能享受到静态类型检查带来的好处。本文将详细介绍Jooq的核心功能，并通过一系列的代码示例来展示如何使用Jooq。二、
虚幻引擎VR游戏开发03| 键位映射 charon8778 虚幻引擎游戏开发虚幻 vr 游戏引擎
Enhancedinputmapping按键映射在虚幻引擎（UnrealEngine）中，EnhancedInputMapping是一个用于管理和处理输入（例如键盘、鼠标、手柄等）的系统。它提供了一种更灵活、更强大的方式来定义和响应用户输入，相比传统的输入系统有多项改进。以下是它的主要功能和用途：更灵活的输入映射：EnhancedInputMapping支持基本的按键绑定，也能处理组合输入（例如按
Elasticsearch7.7.0 配置用户名和密码 i0208 Elasticsearch
配置用户和密码需要在配置文件中开启x-pack验证,修改config目录下面的elasticsearch.yml文件，在里面添加如下内容,并重启ES[root@localhostesuser]#vim/opt/es/elasticsearch-7.7.0/config/elasticsearch.ymlxpack.security.enabled:truexpack.license.self_ge
elasticsearch数据迁移之elasticdump 迷茫运维路 elasticsearch 中间件 elasticsearch linux 运维
系列文章目录第一章es集群搭建第二章es集群基本操作命令第三章es基于search-guard插件实现加密认证第四章es常用插件文章目录系列文章目录前言一、elasticdump是什么？二、安装elasticdump工具1.离线安装2.在线安装三、elasticdump相关参数四、使用elasticdump进行数据备份五、使用elasticdump进行数据恢复前言在企业实际生产环境中,避免不了要对
在生产环境中部署Elasticsearch：最佳实践和故障排除技巧——聚合与搜索（三）不会编程的小孩子 elasticsearch 大数据搜索引擎
#在生产环境中部署Elasticsearch：最佳实践和故障排除技巧——聚合与搜索（三）前言文章目录前言-聚合和分析-执行聚合操作-1.使用JavaAPI执行聚合操作-2.使用CURL命令执行聚合操作-1.使用JavaAPI执行度量操作-2.使用CURL命令执行度量操作-使用缓存-调整分片大小和数量-使用搜索建议-结论-节点发现-负载均衡-故障转移-结论-访问控制-加密-身份验证-结论-RESTA
Elasticsearch Java API 的使用（22）—实现桶聚合迷途码界 Elasticsearch Java API 桶聚合
分组聚合使用terms实现分组集合publicclassEsTermsAggthrowsUnknownHostException{publicvoidTermsAgg(TransportClientclient){AggregationBuilderagg=AggregationBuilders.terms("terms").field("agg");SearchResponseresponse=
Elasticsearch之bool查询 cyt涛 java elasticsearch 大数据搜索引擎 bool 布尔查询全文检索
bool查询是Elasticsearch中最常用的复合查询类型，允许将多个查询组合在一起。它通过逻辑操作符（如must、should、must_not和filter）来构建复杂的查询条件，从而满足多条件匹配、逻辑与（AND）、或（OR）、非（NOT）的查询需求。bool查询主要由四个部分组成：must：必须满足的条件（类似于SQL中的AND）。should：应该匹配的条件（类似于SQL中的OR）。
Elasticsearch 安装哒哒-blog Elasticsearch elasticsearch jenkins 大数据
下载安装elasticsearch下载链接运行：bin\elasticsearch.bat设置密码：.\bin\elasticsearch-setup-passwordsinteractive这边设置密码遇到一个坑PSG:\elasticsearch-8.8.1>.\bin\elasticsearch-setup-passwordsinteractiveFailedtoauthenticateus
Spring项目:文字花园（三）小李同学_LHY java 前端开发语言 spring sql 服务器
一.实现博客详情1.后端逻辑代码controller层添加方法（根据博客id查看博客详情）@RequestMapping("/getBlogDetail")publicResultgetBlogDetail(IntegerblogId){log.info("getBlogDetail,blogId:{}",blogId);BlogInfoblogInfo=blogService.getBlogDe
Docker启动Elasticsearch(挂载数据、配置文件、插件) 程序员迪迦项目实战 Java elasticsearch docker
Docker启动Elasticsearch拉取镜像dockerpullelasticsearch:7.4.2修改配置文件mkdir-p/mydata/elasticsearch/configmkdir-p/mydata/elasticsearch/data/mkdir-p/mydata/elasticsearch/pluginsecho"http.host:0.0.0.0">>/mydata/el
docker部署elasticsearch 大大陈· elasticsearch docker 大数据
docker部署es1.简单启动2.配置文件3.安装es步骤1.简单启动#"discovery.type=single-node"标识单机启动dockerrun-d--nameelasticsearch-p9200:9200-p9300:9300-e"discovery.type=single-node"elasticsearch:tag#如果没有上面的标识，是集群启动，不这样做会报错#但是这样启
Docker安装elasticsearch和kibana viego1999 elasticsearch docker 大数据
1、首先拉取elasticsearch镜像dockerpullelasticsearch:7.9.12、创建docker挂载的目录我这里将docker环境下挂在的目录统一放在了/dockerdata目录下mkdir-p/dockerdata/elasticsearch/configmkdir-p/dockerdata/elasticsearch/datamkdir-p/dockerdata/ela
Docker部署单点es Javaismymorning ES学习笔记 docker elasticsearch
前言该笔记是根据B站上黑马SpringCloud学习总结的一、ES是什么？Elasticsearch是一个分布式、高扩展、高实时的搜索与数据分析引擎。它能很方便的使大量数据具有搜索、分析和探索的能力。充分利用Elasticsearch的水平伸缩性，能使数据在生产环境变得更有价值二、Docker部署ES步骤1.创建网络因为还要部署Kibana，实现es和Kibana关联，创建一个网络Kibana是为
Spring Cloud云架构 - SSO单点登录之OAuth2.0 根据token获取用户信息(4) 初夏_91fb
上一篇我根据框架中OAuth2.0的使用总结，画了SSO单点登录之OAuth2.0登出流程，今天我们看一下根据用户token获取yoghurt信息的流程：image/***根据token获取用户信息*@paramaccessToken*@return*@throwsException*/@RequestMapping(value="/user/token/{accesstoken}",method
MySQL同步数据到Elasticsearch 运维小雅 elasticsearch mysql 大数据
背景随着平台的业务日益增多，基于数据库的全文搜索查询速度较慢，已经无法满足需求。所以，决定基于Elasticsearch做一个全文搜索平台，支持业务相关的搜索需求。那么第一个问题就是：如何从MySQL同步数据到Elasticsearch？解决方案一：基于Logstash同步数据该方案上次有详细说明过，这里就简单描述一下。Logstash同步数据流程图：优点：1、组件少，只需要Logstash就可以
docker安装与使用小鱼做了就会开发框架及各种插件 docker java maven ubuntu linux
docker安装与使用一、docker安装二、容器三、镜像五、Docker部署ES5.1部署ES5.2配置跨域5.3重启容器5.4Docker部署ES-IK分词器5.5Docker部署ElasticSearch-Head5.6Docker快速安装kibana一、docker安装sudowget-qO-https://get.docker.com/|bash二、容器容器是由镜像实例化而来，这和我们学
ES(Elasticsearch)常用的函数遨游在知识的海洋里无法自拔 java
Elasticsearch（简称ES）是一个开源的搜索引擎，广泛用于全文搜索、分析和数据可视化。以下是一些常用的Elasticsearch函数和操作：索引操作创建索引PUT/index_name删除索引DELETE/index_name查看索引GET/index_name文档操作插入文档POST/index_name/_doc/{"field":"value"}获取文档GET/index_name
《Spring实战》读书笔记-第5章构建Spring Web应用程序 2401_86367086 面试辅导大厂内推 spring 前端 java
5.1SpringMVC起步SpringMVC框架主要包括请求调度Servlet、处理器映射（handlermapping）、控制器以及视图解析器（viewresolver）这些组件。跟踪SpringMVC的请求Web请求从离开浏览器开始到获取响应返回，它会经历好多站，每站都会留下一些信息同时带上其他信息。一路上请求会将信息带到很多站点，并产生期望的结果请求带着URL以及其他信息离开浏览器后，第一
AN7536PT时钟电路 LeeYLong 时钟电路晶振选型
目录1时钟电路概述2时钟晶振电路2.1需求分析2.2晶振选型（Datasheet表5-7解读）2.3设计晶振电路（表4-1、图5-4）1时钟电路概述时钟电路是一种用于产生稳定、周期性脉冲信号的电子电路。它通常由晶体振荡器和相关逻辑电路组成。晶体振荡器负责产生高精度的振荡信号，而逻辑电路则负责对振荡信号进行分频、缓冲和分配，以满足不同部件的时序要求。时钟信号可以看作是系统中的心跳，指示了系统的工作节
java实现es分页查询_elasticsearch high level rest api分页查询数据 weixin_42565971 java实现es分页查询
Transport方式查询数据，在今后的elasticsearch中将不在维护，官方推荐用用highlevelrestapi或者lowerlevelapi去操作elasticsearch中的数据。在elasticsearch的增删改查操作中，其实最复杂的也就是分页查询了，根据elasticsearch官方资料，做个简单的笔记.1.准备jar包,我用的elasticsearch7.3版本,各版本需要
ES基础知识杰哥一号号 elasticsearch 搜索引擎大数据
ES基础GET/image_template_info/image_template_info/28GET/image_template_info/image_template_info/_mapping查看索引的mappingsGET/image_template_info/image_template_info/_search{ "query":{ "match":{ "cn_name"
JavaBean、MVC设计模式与Java中Dao、Service、Controll三层体系 Java糖糖 maven spring boot 后端 java spring
文章目录一、JavaBeanJavaBean实际就是一个普通的Java类，为了规范开发，要求JavaBean具有如下规范：①具有一个公共的、无参的构造方法；②类的属性私有，且必须提供公共的setter和getter方法用于外部对属性赋值和获取属性值；简而言之：JavaBean=属性私有+公共的setter/getter方法+空参构造器ORM编程思想(objectrelationalmapping，
设计之道：ORM、DAO、Service与三层架构的规范探索 VaporGas Java后端重构 java 设计规范 ORM DAO Service 三层架构
引言：实际开发中，遵守一定的开发规范，不仅可以提高开发效率，还可以提高项目的后续维护性以及项目的扩展性；了解一下本博客的项目设计规范，对项目开发很有意义一、ORM思想ORM（Object-Relational-Mapping）在对象模型和关系型模型之间做一个映射（转换）。目的是为了解决面向对象编程语言的发展和关系型数据库的发展不匹配的问题可以理解为：将Java中的数据结构与MySQL数据库中的数据
python连接es_Elasticsearch --- 3. ik中文分词器, python操作es weixin_39962285 python连接es
一.IK中文分词器1.下载安装2.测试#显示结果{"tokens":[{"token":"上海","start_offset":0,"end_offset":2,"type":"CN_WORD","position":0},{"token":"自来水","start_offset":2,"end_offset":5,"type":"CN_WORD","position":1},{"token":"
python 写入es_python-elasticsearch从创建索引到写入数据夙砂酒 python 写入es
创建索引fromelasticsearchimportElasticsearches=Elasticsearch('192.168.1.1:9200')mappings={"mappings":{"type_doc_test":{#type_doc_test为doc_type"properties":{"id":{"type":"long","index":"false"},"serial":{"
使用Python的Elasticsearch客户端 elasticsearch-py 来完成删除现有索引、重新创建索引并测试分词的示例代码 Roc-xb Python python elasticsearch
以下是一个使用Python的Elasticsearch客户端elasticsearch-py来完成删除现有索引、重新创建索引并测试分词的示例代码一、安装依赖pipinstallelasticsearch二、运行效果三、程序代码fromelasticsearchimportElasticsearch,NotFoundError#连接到Elasticsearches=Elasticsearch(
继之前的线程循环加到窗口中运行 3213213333332132 java thread JFrame JPanel
之前写了有关java线程的循环执行和结束，因为想制作成exe文件，想把执行的效果加到窗口上，所以就结合了JFrame和JPanel写了这个程序，这里直接贴出代码，在窗口上运行的效果下面有附图。 package thread; import java.awt.Graphics; import java.text.SimpleDateFormat; import java.util
linux 常用命令 BlueSkator linux 命令
1.grep 相信这个命令可以说是大家最常用的命令之一了。尤其是查询生产环境的日志，这个命令绝对是必不可少的。但之前总是习惯于使用（grep -n 关键字文件名）查出关键字以及该关键字所在的行数，然后再用（sed -n '100,200p' 文件名），去查出该关键字之后的日志内容。但其实还有更简便的办法，就是用（grep -B n、-A n、-C n 关键
php heredoc原文档和nowdoc语法 dcj3sjt126com PHP heredoc nowdoc
<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>Current To-Do List</title> </head> <body> <?
overflow的属性周华华 JavaScript
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&q
《我所了解的Java》——总体目录 g21121 java
准备用一年左右时间写一个系列的文章《我所了解的Java》，目录及内容会不断完善及调整。在编写相关内容时难免出现笔误、代码无法执行、名词理解错误等，请大家及时指出，我会第一时间更正。 &n
[简单]docx4j常用方法小结 53873039oycg docx
本代码基于docx4j-3.2.0，在office word 2007上测试通过。代码如下: import java.io.File; import java.io.FileInputStream; import ja
Spring配置学习云端月影 spring配置
首先来看一个标准的Spring配置文件 applicationContext.xml <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi=&q
Java新手入门的30个基本概念三 aijuans java 新手 java 入门
17.Java中的每一个类都是从Object类扩展而来的。　　18.object类中的equal和toString方法。　　equal用于测试一个对象是否同另一个对象相等。　　toString返回一个代表该对象的字符串,几乎每一个类都会重载该方法,以便返回当前状态的正确表示.(toString 方法是一个很重要的方法)　　 19.通用编程:任何类类型的所有值都可以同object类性的变量来代替。　
《2008 IBM Rational 软件开发高峰论坛会议》小记 antonyup_2006 软件测试敏捷开发项目管理 IBM 活动
我一直想写些总结,用于交流和备忘,然都没提笔,今以一篇参加活动的感受小记开个头,呵呵! 其实参加《2008 IBM Rational 软件开发高峰论坛会议》是9月4号,那天刚好调休.但接着项目颇为忙,所以今天在中秋佳节的假期里整理了下. 参加这次活动是一个朋友给的一个邀请书,才知道有这样的一个活动,虽然现在项目暂时没用到IBM的解决方案,但觉的参与这样一个活动可以拓宽下视野和相关知识.
PL/SQL的过程编程,异常,声明变量,PL/SQL块百合不是茶 PL/SQL的过程编程异常 PL/SQL块声明变量
PL/SQL; 过程; 符号; 变量; PL/SQL块; 输出; 异常; PL/SQL 是过程语言(Procedural Language)与结构化查询语言(SQL)结合而成的编程语言PL/SQL 是对 SQL 的扩展,sql的执行时每次都要写操作
Mockito(三)--完整功能介绍 bijian1013 持续集成 mockito 单元测试
mockito官网：http://code.google.com/p/mockito/，打开documentation可以看到官方最新的文档资料。一.使用mockito验证行为 //首先要import Mockito import static org.mockito.Mockito.*; //mo
精通Oracle10编程SQL(8)使用复合数据类型 bijian1013 oracle 数据库 plsql
/* *使用复合数据类型 */ --PL/SQL记录 --定义PL/SQL记录 --自定义PL/SQL记录 DECLARE TYPE emp_record_type IS RECORD( name emp.ename%TYPE, salary emp.sal%TYPE, dno emp.deptno%TYPE ); emp_
【Linux常用命令一】grep命令 bit1129 Linux常用命令
grep命令格式 grep [option] pattern [file-list] grep命令用于在指定的文件(一个或者多个,file-list)中查找包含模式串(pattern)的行,[option]用于控制grep命令的查找方式。 pattern可以是普通字符串，也可以是正则表达式，当查找的字符串包含正则表达式字符或者特
mybatis3入门学习笔记白糖_ sql ibatis qq jdbc 配置管理
MyBatis 的前身就是iBatis，是一个数据持久层(ORM)框架。 MyBatis 是支持普通 SQL 查询，存储过程和高级映射的优秀持久层框架。MyBatis对JDBC进行了一次很浅的封装。以前也学过iBatis，因为MyBatis是iBatis的升级版本，最初以为改动应该不大，实际结果是MyBatis对配置文件进行了一些大的改动，使整个框架更加方便人性化。
Linux 命令神器：lsof 入门 ronin47 lsof
lsof是系统管理/安全的尤伯工具。我大多数时候用它来从系统获得与网络连接相关的信息，但那只是这个强大而又鲜为人知的应用的第一步。将这个工具称之为lsof真实名副其实，因为它是指“列出打开文件（lists openfiles）”。而有一点要切记，在Unix中一切（包括网络套接口）都是文件。有趣的是，lsof也是有着最多
java实现两个大数相加，可能存在溢出。 bylijinnan java实现
import java.math.BigInteger; import java.util.regex.Matcher; import java.util.regex.Pattern; public class BigIntegerAddition { /** * 题目：java实现两个大数相加，可能存在溢出。 * 如123456789 + 987654321
Kettle学习资料分享，附大神用Kettle的一套流程完成对整个数据库迁移方法 Kai_Ge Kettle
Kettle学习资料分享 Kettle 3.2 使用说明书目录概述..........................................................................................................................................7 1.Kettle 资源库管
[货币与金融]钢之炼金术士 comsci 金融
自古以来,都有一些人在从事炼金术的工作.........但是很少有成功的那么随着人类在理论物理和工程物理上面取得的一些突破性进展...... 炼金术这个古老
Toast原来也可以多样化 dai_lm android toast
Style 1：默认 Toast def = Toast.makeText(this, "default", Toast.LENGTH_SHORT); def.show(); Style 2：顶部显示 Toast top = Toast.makeText(this, "top", Toast.LENGTH_SHORT); t
java数据计算的几种解决方法3 datamachine java hadoop ibatis r-langue r
4、iBatis 简单敏捷因此强大的数据计算层。和Hibernate不同，它鼓励写SQL，所以学习成本最低。同时它用最小的代价实现了计算脚本和JAVA代码的解耦，只用20%的代价就实现了hibernate 80%的功能,没实现的20%是计算脚本和数据库的解耦。复杂计算环境是它的弱项，比如：分布式计算、复杂计算、非数据
向网页中插入透明Flash的方法和技巧 dcj3sjt126com html Web Flash
将 Flash 作品插入网页的时候，我们有时候会需要将它设为透明，有时候我们需要在Flash的背面插入一些漂亮的图片，搭配出漂亮的效果……下面我们介绍一些将Flash插入网页中的一些透明的设置技巧。　　一、Swf透明、无坐标控制　　首先教大家最简单的插入Flash的代码，透明，无坐标控制：　　注意wmode="transparent"是控制Flash是否透明
ios UICollectionView的使用 dcj3sjt126com
UICollectionView的使用有两种方法，一种是继承UICollectionViewController，这个Controller会自带一个UICollectionView；另外一种是作为一个视图放在普通的UIViewController里面。个人更喜欢第二种。下面采用第二种方式简单介绍一下UICollectionView的使用。 1.UIViewController实现委托，代码如
Eos平台java公共逻辑蕃薯耀 Eos平台java公共逻辑 Eos平台 java公共逻辑
Eos平台java公共逻辑 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年6月1日 17:20:4
SpringMVC4零配置--Web上下文配置【MvcConfig】 hanqunfeng springmvc4
与SpringSecurity的配置类似，spring同样为我们提供了一个实现类WebMvcConfigurationSupport和一个注解@EnableWebMvc以帮助我们减少bean的声明。 applicationContext-MvcConfig.xml  <
解决ie和其他浏览器poi下载excel文件名乱码 jackyrong Excel
使用poi,做传统的excel导出，然后想在浏览器中，让用户选择另存为，保存用户下载的xls文件，这个时候，可能的是在ie下出现乱码（ie,9,10,11),但在firefox,chrome下没乱码，因此必须综合判断，编写一个工具类： /** * * @Title: pro
挥洒泪水的青春 lampcy 编程生活程序员
2015年2月28日，我辞职了，离开了相处一年的触控，转过身--挥洒掉泪水，毅然来到了兄弟连，背负着许多的不解、质疑——”你一个零基础、脑子又不聪明的人，还敢跨行业，选择Unity3D？“，”真是不自量力••••••“，”真是初生牛犊不怕虎•••••“，••••••我只是淡淡一笑，拎着行李----坐上了通向挥洒泪水的青春之地——兄弟连！这就是我青春的分割线，不后悔，只会去用泪水浇灌——已经来到
稳增长之中国股市两点意见-----严控做空，建立涨跌停版停牌重组机制 nannan408
对于股市，我们国家的监管还是有点拼的，但始终拼不过飞流直下的恐慌，为什么呢？笔者首先支持股市的监管。对于股市越管越荡的现象，笔者认为首先是做空力量超过了股市自身的升力，并且对于跌停停牌重组的快速反应还没建立好，上市公司对于股价下跌没有很好的利好支撑。我们来看美国和香港是怎么应对股灾的。美国是靠禁止重要股票做空，在
动态设置iframe高度(iframe高度自适应) Rainbow702 JavaScript iframe contentDocument 高度自适应局部刷新
如果需要对画面中的部分区域作局部刷新，大家可能都会想到使用ajax。但有些情况下，须使用在页面中嵌入一个iframe来作局部刷新。对于使用iframe的情况，发现有一个问题，就是iframe中的页面的高度可能会很高，但是外面页面并不会被iframe内部页面给撑开，如下面的结构： <div id="content"> <div id=&quo
用Rapael做图表 tntxia rap
function drawReport(paper,attr,data){ var width = attr.width; var height = attr.height; var max = 0; &nbs
HTML5 bootstrap2网页兼容（支持IE10以下） xiaoluode html5 bootstrap
<!DOCTYPE html> <html> <head lang="zh-CN"> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge">