ElasticSearch7.2 父子文档

建立父-子文档语法

首先看一下如何建立父子文档,明显和网上”_parent”的方式不一样,说明es后期版本已经修改了语法

1
2
3
4
5
6
7
8
9
10
11
12
13
PUT my_index
{
  "mappings": {
    "properties": {
      "my_join_field": { 
        "type": "join",
        "relations": {
          "question": "answer" 
        }
      }
    }
  }
}

 

这段代码建立了一个my_index的索引,其中my_join_field是一个用于join的字段,type为join,关系relations为:父为question, 子为answer
至于建立一父多子关系,只需要改为数组即可:"question": ["answer", "comment"]

插入数据

插入两个父文档,语法如下

1
2
3
4
5
6
7
PUT my_index/_doc/1?refresh
{
  "text": "This is a question",
  "my_join_field": {
    "name": "question" 
  }
}

 

同时也可以省略name

1
2
3
4
5
PUT my_index/_doc/1?refresh
{
  "text": "This is a question",
  "my_join_field": "question"
}

 

插入子文档

子文档的插入语法如下,注意routing是父文档的id,平时我们插入文档时routing的默认就是id
此时name为answer,表示这是个子文档

1
2
3
4
5
6
7
PUT /my_index/_doc/3?routing=1
{
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", 
    "parent": "1" 
  }

 

通过parent_id查询子文档

通过parent_id query传入父文档id即可

1
2
3
4
5
6
7
8
9
GET my_index/_search
{
  "query": {
    "parent_id": { 
      "type": "answer",
      "id": "1"
    }
  }
}

 

父-子文档的性能及限制性

父-子文档主要适用于一对多的实体关系,将其反范式存入文档中

父-子文档主要由以下特性:

  • Only one join field mapping is allowed per index.
    每个索引只能有一个join字段
  • Parent and child documents must be indexed on the same shard. This means that the same routing value needs to be provided when getting, deleting, or updating a child document.
    父-子文档必须在同一个分片上,也就是说增删改查一个子文档,必须使用和父文档一样的routing key(默认是id)
  • An element can have multiple children but only one parent.
    每个元素可以有多个子,但只有一个父
  • It is possible to add a new relation to an existing join field.
    可以为一个已存在的join字段添加新的关联关系
  • It is also possible to add a child to an existing element but only if the element is already a parent.
    可以在一个元素已经是父的情况下添加一个子

总结

es中通过父子文档来实现join,但在一个索引中只能有一个一父多子的join

关系字段

es会自动生成一个额外的用于表示关系的字段:field#parent
我们可以通过以下方式查询

1
2
3
4
5
6
7
8
9
10
POST my_index/_search
{
 "script_fields": {
    "parent": {
      "script": {
         "source": "doc['my_join_field#question']" 
      }
    }
  }
}

 

部分响应为

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "8",
"_score" : 1.0,
"fields" : {
  "parent" : [
    "8"
  ]
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_routing" : "10",
"fields" : {
  "parent" : [
    "10"
  ]
}
}

 

有_routing字段的说明是子文档,它的parent字段是父文档id,如果没有_routing就是父文档,它的parent指向当前id

全局序列

父-子文档的join查询使用一种叫做全局序列(Global ordinals)的技术来加速查询,它采用预加载的方式构建,防止在第一次查询或聚合时出现太长时间的延迟,但在索引元数据改变时重建,父文档越多,构建时间就越长,重建在refresh时进行,这会造成refresh大量延迟时间(在refresh时也是预加载).
如果join字段很少用,可以关闭这种预加载模式:"eager_global_ordinals": false

全局序列的监控

1
2
3
4
# 每个索引
curl -X GET "localhost:9200/_stats/fielddata?human&fields=my_join_field#question&pretty"
# 每个节点上的每个索引
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata?human&fields=my_join_field#question&pretty"

一父多子的祖孙结构

考虑以下结构

1
2
3
4
5
6
7
   question
    /    \
   /      \
comment  answer
           |
           |
          vote

 

建立索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
PUT my_index
{
  "mappings": {
    "properties": {
      "my_join_field": {
        "type": "join",
        "relations": {
          "question": ["answer", "comment"],  
          "answer": "vote" 
        }
      }
    }
  }
}

 

插入孙子节点

注意这里的routing和parent值不一样,routing指的是祖父字段,即question,而parent指的就是字面意思answer

1
2
3
4
5
6
7
8
PUT my_index/_doc/3?routing=1&refresh 
{
  "text": "This is a vote",
  "my_join_field": {
    "name": "vote",
    "parent": "2" 
  }
}

 

has-child查询

查询包含特定子文档的父文档,这是一种很耗性能的查询,尽量少用。它的查询标准格式如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
GET my_index/_search
{
    "query": {
        "has_child" : {
            "type" : "child",
            "query" : {
                "match_all" : {}
            },
            "max_children": 10, //可选,符合查询条件的子文档最大返回数
            "min_children": 2, //可选,符合查询条件的子文档最小返回数
            "score_mode" : "min"
        }
    }
}

 

测试代码

部分测试代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
DELETE my_index

PUT /my_index?pretty
{
  "mappings": {
    "properties": {
      "my_join_field": { 
        "type": "join",
        "relations": {
          "question": "answer" 
        }
      }
    }
  }
}


# 插入父
PUT /my_index/_doc/8?refresh&pretty
{
  "text": "This is a question",
  "my_join_field": {
    "name": "question" 
  }
}

PUT /my_index/_doc/10?refresh&pretty
{
  "text": "This is a new question",
  "my_join_field": {
    "name": "question"
  }
}

PUT /my_index/_doc/12?refresh&pretty
{
  "text": "This is a new question",
  "my_join_field": {
    "name": "question"
  }
}

# 插入子
PUT /my_index/_doc/3?routing=8&refresh&pretty
{
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", 
    "parent": "8" 
  }
}


PUT /my_index/_doc/4?routing=10&refresh&pretty
{
  "text": "This is another answer",
  "my_join_field": {
    "name": "answer",
    "parent": "10"
  }
}

# 通过parent_id查询子文档
GET my_index/_search
{
  "query": {
    "parent_id": { 
      "type": "answer",
      "id": "8"
    }
  }
}

# 查询relation
POST my_index/_search
{
 "script_fields": {
    "parent": {
      "script": {
         "source": "doc['my_join_field#question']" 
      }
    }
  }
}

你可能感兴趣的:(elasticsearch)