ES 提供了类似数据库中 Join 联结的实现,可以通过 Join 类型的字段维护父子关系的数据,其父文档和子文档可以单独维护。
ES 父子文档的创建可以分为下面三步:
下面针对每一步做演示。
假设我们有一个博客系统,每篇博客下有若干条评论,那么博客 blog 与评论 comment 就构成了一个父子关系。
父子文档的创建方为:
join
relations
指定父子关系示例如下:
# blog 为父文档,comment 为子文档
PUT blog_index
{
"mappings": {
"properties": {
"blog_comment_join": {
"type": "join",
"relations": {
"blog": "comment"
}
}
}
}
}
PUT blog_index/_doc/1
{
"title": "First Blog",
"author": "Ahri",
"content": "This is my first blog",
"blog_comment_join": {
"name": "blog"
}
}
PUT blog_index/_doc/2
{
"title": "Second Blog",
"author": "EZ",
"content": "This is my second blog",
"blog_comment_join": "blog"
}
插入子文档时需要注意一点:
routing
设置:子文档必须要与父文档存储在同一分片上,因此子文档的 routing
应该设置为父文档 ID 或者与父文档保持一致示例代码如下:
PUT blog_index/_doc/comment-1?routing=1&refresh
{
"user": "Tom",
"content": "Good blog",
"comment_date": "2020-01-01 10:00:00",
"blog_comment_join": {
"name": "comment",
"parent": 1
}
}
PUT blog_index/_doc/comment-2?routing=1&refresh
{
"user": "Jhon",
"content": "Good Job",
"comment_date": "2020-02-01 10:00:00",
"blog_comment_join": {
"name": "comment",
"parent": 1
}
}
PUT blog_index/_doc/comment-3?routing=2&refresh
{
"user": "Jack",
"content": "Great job",
"comment_date": "2020-01-01 10:00:00",
"blog_comment_join": {
"name": "comment",
"parent": 2
}
}
除了上面常见的父子文档类型,ES Join 还支持 多子文档 和 多级父子文档 的设置。如下:
构建多个子文档
Join 类型一个父文档可以配置多个子文档,创建方式如下:
PUT my_index
{
"mappings": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": ["answer", "comment"]
}
}
}
}
}
构建多级父子关系
PUT my_index
{
"mappings": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": ["answer", "comment"],
"answer": "vote"
}
}
}
}
}
上面创建的父子文档层级如下图所示:
基于父子文档的查询主要有三种:
parent_id
:基于父文档 ID 查询所有的子文档has_parent
:查询符合条件的父文档的所有子文档has_child
:查询符合条件的子文档的所有父文档下面是具体查询示例:
# 查询 ID 为 1 父文档的所有子文档
GET blog_index_parent_child/_search
{
"query": {
"parent_id": {
"type": "comment",
"id": 1
}
}
}
# 结果返回
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.44183275,
"hits" : [
{
"_index" : "blog_index",
"_type" : "_doc",
"_id" : "comment-1",
"_score" : 0.44183275,
"_routing" : "1",
"_source" : {
"user" : "Tom",
"content" : "Good blog",
"comment_date" : "2020-01-01 10:00:00",
"blog_comment_join" : {
"name" : "comment",
"parent" : 1
}
}
},
{
"_index" : "blog_index",
"_type" : "_doc",
"_id" : "comment-2",
"_score" : 0.44183275,
"_routing" : "1",
"_source" : {
"user" : "Jhon",
"content" : "Good Job",
"comment_date" : "2020-02-01 10:00:00",
"blog_comment_join" : {
"name" : "comment",
"parent" : 1
}
}
}
]
}
}
# 查询 title 包含 first 的父文档的所有子文档
GET blog_index/_search
{
"query": {
"has_parent": {
"parent_type": "blog",
"query": {
"match": {
"title": "first"
}
}
}
}
}
# 结果返回
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "blog_index",
"_type" : "_doc",
"_id" : "comment-1",
"_score" : 1.0,
"_routing" : "1",
"_source" : {
"user" : "Tom",
"content" : "Good blog",
"comment_date" : "2020-01-01 10:00:00",
"blog_comment_join" : {
"name" : "comment",
"parent" : 1
}
}
},
{
"_index" : "blog_index",
"_type" : "_doc",
"_id" : "comment-2",
"_score" : 1.0,
"_routing" : "1",
"_source" : {
"user" : "Jhon",
"content" : "Good Job",
"comment_date" : "2020-02-01 10:00:00",
"blog_comment_join" : {
"name" : "comment",
"parent" : 1
}
}
}
]
}
}
# 查询 user 包含 Jack 的所有子文档的父文档
GET blog_index/_search
{
"query": {
"has_child": {
"type": "comment",
"query": {
"match": {
"user": "Jack"
}
}
}
}
}
# 结果返回
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "blog_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"title" : "Second Blog",
"author" : "EZ",
"content" : "This is my second blog",
"blog_comment_join" : "blog"
}
}
]
}
}
下面是极客时间课程《Elasticsearch核心技术与实战》中给出的对比:
一般来说大多数数据还是读多写少的,因此大多数时候还是优先使用 Nested 对象。
老铁都看到这了来一波点赞、评论、关注三连可好
我是 AhriJ邹同学,前后端、小程序、DevOps 都搞的炸栈工程师。博客持续更新,如果觉得写的不错,欢迎来一波老铁三连,不好的话也欢迎指正,互相学习,共同进步。