ElasticSearch 父子文档使用简记

一. ES parent-child 文档简介

ES 提供了类似数据库中 Join 联结的实现,可以通过 Join 类型的字段维护父子关系的数据,其父文档和子文档可以单独维护。

二. 父子文档的索引创建与数据插入

ES 父子文档的创建可以分为下面三步:

  • 创建索引 Mapping,指明数据类型为 join 与父子文档名
  • 插入父文档
  • 插入子文档

下面针对每一步做演示。

1. 创建索引

假设我们有一个博客系统,每篇博客下有若干条评论,那么博客 blog 与评论 comment 就构成了一个父子关系。

父子文档的创建方为:

  • 指定字段类型为 join
  • 通过 relations 指定父子关系

示例如下:

# blog 为父文档,comment 为子文档
PUT blog_index
{
  "mappings": {
    "properties": {
      "blog_comment_join": {
        "type": "join",
        "relations": {
          "blog": "comment"
        }
      }
    }
  }
}

2. 插入父文档

PUT blog_index/_doc/1
{
  "title": "First Blog",
  "author": "Ahri",
  "content": "This is my first blog",
  "blog_comment_join": {
    "name": "blog"
  }
}


PUT blog_index/_doc/2
{
  "title": "Second Blog",
  "author": "EZ",
  "content": "This is my second blog",
  "blog_comment_join": "blog"
}

3. 插入子文档

插入子文档时需要注意一点:

  • routing 设置:子文档必须要与父文档存储在同一分片上,因此子文档的 routing 应该设置为父文档 ID 或者与父文档保持一致

示例代码如下:

PUT blog_index/_doc/comment-1?routing=1&refresh
{
  "user": "Tom",
  "content": "Good blog",
  "comment_date": "2020-01-01 10:00:00",
  "blog_comment_join": {
    "name": "comment",
    "parent": 1
  }
}

PUT blog_index/_doc/comment-2?routing=1&refresh
{
  "user": "Jhon",
  "content": "Good Job",
  "comment_date": "2020-02-01 10:00:00",
  "blog_comment_join": {
    "name": "comment",
    "parent": 1
  }
}

PUT blog_index/_doc/comment-3?routing=2&refresh
{
  "user": "Jack",
  "content": "Great job",
  "comment_date": "2020-01-01 10:00:00",
  "blog_comment_join": {
    "name": "comment",
    "parent": 2
  }
}

4. 其他

除了上面常见的父子文档类型,ES Join 还支持 多子文档多级父子文档 的设置。如下:

构建多个子文档

Join 类型一个父文档可以配置多个子文档,创建方式如下:

PUT my_index
{
  "mappings": {
    "properties": {
      "my_join_field": {
        "type": "join",
        "relations": {
          "question": ["answer", "comment"]  
        }
      }
    }
  }
}

构建多级父子关系

PUT my_index
{
  "mappings": {
    "properties": {
      "my_join_field": {
        "type": "join",
        "relations": {
          "question": ["answer", "comment"],  
          "answer": "vote" 
        }
      }
    }
  }
}

上面创建的父子文档层级如下图所示:

ElasticSearch 父子文档使用简记_第1张图片

三. 父子文档的查询

基于父子文档的查询主要有三种:

  • parent_id:基于父文档 ID 查询所有的子文档
  • has_parent:查询符合条件的父文档的所有子文档
  • has_child:查询符合条件的子文档的所有父文档

下面是具体查询示例:

【1】parent_id 查询
# 查询 ID1 父文档的所有子文档
GET blog_index_parent_child/_search
{
  "query": {
    "parent_id": {
      "type": "comment",
      "id": 1
    }
  }
}

# 结果返回
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.44183275,
    "hits" : [
      {
        "_index" : "blog_index",
        "_type" : "_doc",
        "_id" : "comment-1",
        "_score" : 0.44183275,
        "_routing" : "1",
        "_source" : {
          "user" : "Tom",
          "content" : "Good blog",
          "comment_date" : "2020-01-01 10:00:00",
          "blog_comment_join" : {
            "name" : "comment",
            "parent" : 1
          }
        }
      },
      {
        "_index" : "blog_index",
        "_type" : "_doc",
        "_id" : "comment-2",
        "_score" : 0.44183275,
        "_routing" : "1",
        "_source" : {
          "user" : "Jhon",
          "content" : "Good Job",
          "comment_date" : "2020-02-01 10:00:00",
          "blog_comment_join" : {
            "name" : "comment",
            "parent" : 1
          }
        }
      }
    ]
  }
}

【2】has_parent 查询
# 查询 title 包含 first 的父文档的所有子文档
GET blog_index/_search
{
  "query": {
    "has_parent": {
      "parent_type": "blog",
      "query": {
        "match": {
          "title": "first"
        }
      }
    }
  }
}
# 结果返回
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "blog_index",
        "_type" : "_doc",
        "_id" : "comment-1",
        "_score" : 1.0,
        "_routing" : "1",
        "_source" : {
          "user" : "Tom",
          "content" : "Good blog",
          "comment_date" : "2020-01-01 10:00:00",
          "blog_comment_join" : {
            "name" : "comment",
            "parent" : 1
          }
        }
      },
      {
        "_index" : "blog_index",
        "_type" : "_doc",
        "_id" : "comment-2",
        "_score" : 1.0,
        "_routing" : "1",
        "_source" : {
          "user" : "Jhon",
          "content" : "Good Job",
          "comment_date" : "2020-02-01 10:00:00",
          "blog_comment_join" : {
            "name" : "comment",
            "parent" : 1
          }
        }
      }
    ]
  }
}

【3】has_child 查询
# 查询 user 包含 Jack 的所有子文档的父文档
GET blog_index/_search
{
  "query": {
    "has_child": {
      "type": "comment",
      "query": {
        "match": {
          "user": "Jack"
        }
      }
    }
  }
}
# 结果返回
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "blog_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "title" : "Second Blog",
          "author" : "EZ",
          "content" : "This is my second blog",
          "blog_comment_join" : "blog"
        }
      }
    ]
  }
}


四. Nested 对象 VS 父子文档

下面是极客时间课程《Elasticsearch核心技术与实战》中给出的对比:

ElasticSearch 父子文档使用简记_第2张图片

一般来说大多数数据还是读多写少的,因此大多数时候还是优先使用 Nested 对象。


老铁都看到这了来一波点赞、评论、关注三连可好

我是 AhriJ邹同学,前后端、小程序、DevOps 都搞的炸栈工程师。博客持续更新,如果觉得写的不错,欢迎来一波老铁三连,不好的话也欢迎指正,互相学习,共同进步。

你可能感兴趣的:(ELK)