1、嵌套对象和父子对象存在的意义
##用例子说话:索引1条数据,注意comments字段。
PUT /my_index/blogpost/1
{
"title": "Nest eggs",
"body": "Making your money work...",
"tags": [ "cash", "shares" ],
"comments": [
{
"name": "John Smith",
"comment": "Great article",
"age": 28,
"stars": 4,
"date": "2014-09-01"
},
{
"name": "Alice White",
"comment": "More like this please",
"age": 31,
"stars": 5,
"date": "2014-10-22"
}
]
}
##查询,看看是不是查到这个文档了。
GET /my_index5/blogpost/_search
{
"query": {
"bool": {
"must": [
{ "match": { "comments.name": "Alice" }},
{ "match": { "comments.age": 28 }}
]
}
}
}
##结论,竟然查到文档了,不奇怪吗,根本就没有comments.name叫Alice和"comments.age":是 28 岁的评论人。原因就在于lucene存储数据时存储的是扁平化的数据(如下)
{
"title": [ eggs, nest ],
"body": [ making, money, work, your ],
"tags": [ cash, shares ],
"comments.name": [ alice, john, smith, white ],
"comments.comment": [ article, great, like, more, please, this ],
"comments.age": [ 28, 31 ],
"comments.stars": [ 4, 5 ],
"comments.date": [ 2014-09-01, 2014-10-22 ]
}
上述的存储形式优点在于,所有内容都存储在同一文档中,这样使搜索更加优异,但是缺点也很明显,内部评论对象的关联已经无法挽回的丢失了。要知道Alice可不是28岁呀。为了解决这个问题,才引申出了嵌套对象和父子对象。
2、嵌套对象的使用
##创建索引和映射
DELETE my_index
PUT /my_index
{
"mappings": {
"blogpost": {
"properties": {
"comments": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"comment": { "type": "string" },
"age": { "type": "short" },
"stars": { "type": "short" },
"date": { "type": "date" }
}
}
}
}
}
}
##索引两条数据
PUT /my_index/blogpost/1
{
"title": "Nest eggs",
"body": "Making your money work...",
"tags": [ "cash", "shares" ],
"comments": [
{
"name": "John Smith",
"comment": "Great article",
"age": 28,
"stars": 4,
"date": "2014-09-01"
},
{
"name": "Alice White",
"comment": "More like this please",
"age": 31,
"stars": 5,
"date": "2014-10-22"
}
]
}
PUT /my_index/blogpost/2
{
"title": "Investment secrets",
"body": "What they don't tell you ...",
"tags": [ "shares", "equities" ],
"comments": [
{
"name": "Mary Brown",
"comment": "Lies, lies, lies",
"age": 42,
"stars": 1,
"date": "2014-10-18"
},
{
"name": "John Smith",
"comment": "You're making it up!",
"age": 28,
"stars": 2,
"date": "2014-10-16"
}
]
}
##嵌套查询,看看是不是内部对象的关联是有效的(改改值)
GET /my_index/blogpost/_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "eggs" }},
{
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{ "match": { "comments.name": "john" }},
{ "match": { "comments.age": 28 }}
]
}}}}
]
}}}
<1> title条件运作在根文档上
<2> nested条件深入嵌套的comments栏位。
<3> comments.name以及comments.age运作在同一个嵌套对象上(上边内部的评论)。
##嵌套排序
GET /my_index/blogpost/_search
{
"query": {
"nested": {
"path": "comments",
"filter": {
"range": {
"comments.date": {
"gte": "2014-10-01",
"lt": "2014-11-01"
}
}
}
}
},
"sort": {
"comments.stars": {
"order": "asc",
"mode": "min",
"nested_path": "comments",
"nested_filter": {
"range": {
"comments.date": {
"gte": "2014-10-01",
"lt": "2014-11-01"
}
}
}
}
}
}
<1>此处的 nested 查询将结果限定为在10月份收到过评论的博客文章。
<2>结果按照匹配的评论中 comment.stars 字段的最小值 (min) 来由小到大 (asc) 排序。
<3>排序子句中的 nested_path 和 nested_filter 和 query 子句中的 nested 查询相同,原因在下面有解释。
我们为什么要用 nested_path 和 nested_filter 重复查询条件呢?原因在于,排序发生在查询执行之后。 查询条件限定了在10月份收到评论的博客文档,但返回的是博客文档。如果我们不在排序子句中加入 nested_filter , 那么我们对博客文档的排序将基于博客文档的所有评论,而不是仅仅在10月份接收到的评论。
##嵌套聚合统计
GET /my_index/blogpost/_search?search_type=count
{
"aggs": {
"comments": {
"nested": {
"path": "comments"
},
"aggs": {
"by_month": {
"date_histogram": {
"field": "comments.date",
"interval": "month",
"format": "yyyy-MM"
},
"aggs": {
"avg_stars": {
"avg": {
"field": "comments.stars"
}
}
}
}
}
}
}
}
<1> nested集合深入嵌套对象的comments栏位
<2> 评论基於comments.date栏位被分至各个月份分段
<3> 每个月份分段单独计算星号的平均数
3、嵌套对象的缺点
新增丶修改或移除一个嵌套对象,我们必须重新索引整个文档。 要牢记搜寻要求的结果并不是只有嵌套对象,而是整个文档。但查询速度很快。
4、Parent Child(父子对象),父子对象和嵌套对象很像,大部分情况都可互相替代,但是父子对象有些嵌套对象没有的优点(如下),
The parent document can be updated without reindexing the children.
父文档可以更新而不需要重新索引子文档
Child documents can be added, changed, or deleted without affecting either the parent or other children. This is especially useful when child documents are large in number and need to be added or changed frequently.
子文档可以单独更新而不需要管父文档
Child documents can be returned as the results of a search request.
子文档可以单独返回,而不用整个文档一起返回
5、父子对象的缺点(就是查询性能比嵌套对象差一些,大概慢5-10倍),因此父子对象更适合索引性能高,查询性能要求低的场合
Parent-child joins can be a useful technique for managing relationships when index-time performance is more important than search-time performance, but it comes at a significant cost. Parent-child queries can be 5 to 10 times slower than the equivalent nested query!
6、父子对象使用
##建立索引和映射
PUT /company
{
"mappings": {
"branch": {},
"employee": {
"_parent": {
"type": "branch"
}
}
}
}
##索引数据
POST /company/branch/_bulk
{ "index": { "_id": "london" }}
{ "name": "London Westminster", "city": "London", "country": "UK" }
{ "index": { "_id": "liverpool" }}
{ "name": "Liverpool Central", "city": "Liverpool", "country": "UK" }
{ "index": { "_id": "paris" }}
{ "name": "Champs Élysées", "city": "Paris", "country": "France" }
##验证下,是否已经索引完成
GET /company/branch/_search
{
"query":{"match_all": {}}
}
##索引儿子类型中的文档
PUT /company/employee/1?parent=london
{
"name": "Alice Smith",
"dob": "1970-10-24",
"hobby": "hiking"
}
##再索引几条
POST /company/employee/_bulk
{ "index": { "_id": 2, "parent": "london" }}
{ "name": "Mark Thomas", "dob": "1982-05-16", "hobby": "diving" }
{ "index": { "_id": 3, "parent": "liverpool" }}
{ "name": "Barry Smith", "dob": "1979-04-01", "hobby": "hiking" }
{ "index": { "_id": 4, "parent": "paris" }}
{ "name": "Adrien Grand", "dob": "1987-05-11", "hobby": "horses" }
##验证下儿子类型中文档
GET /company/employee/_search
{
"query":{"match_all": {}}
}
##Has Child 查询,看下哪个分公司有1980年后出生的员工
GET /company/branch/_search
{
"query": {
"has_child": {
"type": "employee",
"query": {
"range": {
"dob": {
"gte": "1980-01-01"
}
}
}
}
}
}
##Has Parent查询,看下工作在英国的员工都有哪些
GET /company/employee/_search
{
"query": {
"has_parent": {
"type": "branch",
"query": {
"match": {
"country": "UK"
}
}
}
}
}
##Children agg聚合查询,查看下工作在每个国家的员工,爱好分布
GET /company/branch/_search?search_type=count
{
"aggs": {
"country": {
"terms": {
"field": "country"
},
"aggs": {
"employees": {
"children": {
"type": "employee"
},
"aggs": {
"hobby": {
"terms": {
"field": "hobby"
}
}
}
}
}
}
}
}
7、Grandparents,尽量不要用这种关系,查询性能会很差
参看:https://www.elastic.co/guide/cn/elasticsearch/guide/current/grandparents.html
8、父子对象总结,参看下。最好不要依赖父子对象
https://www.elastic.co/guide/cn/elasticsearch/guide/current/parent-child-performance.html