数据的关联关系
-
真实世界中有很多重要的关联关系
博客 / 作者 / 评论
银⾏账户有多次交易记录
客户有多个银⾏账户
⽬录⽂件有多个⽂件和⼦⽬录
关系型数据库的范式化设计
关系型数据库的范式化设计 |
---|
1NF – 消除⾮主属性对键的部分函数依赖 |
2NF – 消除⾮主要属性对键的传递函数依赖 |
3NF – 消除主属性对键的传递函数依赖 |
BCNF –主属性不依赖于主属性 |
范式化设计(Normalization)的主要⽬标是“减少不必要 的更新”
副作⽤:⼀个完全范式化设计的数据库会经常⾯临 “查询缓慢”的问题
数据库越范式化,就需要 Join 越多的表
范式化节省了存储空间,但是存储空间却越来越便宜
范式化简化了更新,但是数据“读”取操作可能更多
Denormalization
-
反范式化设计
- 数据 “Flattening”,不使⽤关联关系,⽽是在⽂档中保存冗余的数据拷⻉
-
优点:⽆需处理 Joins 操作,数据读取性能好
- Elasticsearch 通过压缩 _source 字段,减少磁盘空间的开销
-
缺点:不适合在数据频繁修改的场景
- ⼀条数据(⽤户名)的改动,可能会引起很多数据的更新
在 Elasticsearch 中处理关联关系
-
关系型数据库,⼀般会考虑 Normalize 数据;在 Elasticsearch,往往考虑 Denormalize 数据
- Denormalize 的好处:读的速度变快 / ⽆需表连接 / ⽆需⾏锁
-
Elasticsearch 并不擅⻓处理关联关系。我们⼀般采⽤以下四种⽅法处理关联
对象类型
嵌套对象(Nested Object)
⽗⼦关联关系(Parent / Child )
应⽤端关联
案例 1:博客和其作者信息
-
对象类型
在每⼀博客的⽂档中都保留作者的信息
如果作者信息发⽣变化,需要修改相关的 博客⽂档
# 插入一条 Blog 信息
PUT blog/_doc/1
{
"content":"I like Elasticsearch",
"time":"2019-01-01T00:00:00",
"user":{
"userid":1,
"username":"Jack",
"city":"Shanghai"
}
}
- 通过⼀条查询即可获取到博客和作者信息
# 查询 Blog 信息
POST blog/_search
{
"query": {
"bool": {
"must": [
{"match": {"content": "Elasticsearch"}},
{"match": {"user.username": "Jack"}}
]
}
}
}
res:
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "blog",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"content" : "I like Elasticsearch",
"time" : "2019-01-01T00:00:00",
"user" : {
"userid" : 1,
"username" : "Jack",
"city" : "Shanghai"
}
}
}
]
案例 2:包含对象数组的⽂档
DELETE my_movies
# 电影的Mapping信息
PUT my_movies
{
"mappings" : {
"properties" : {
"actors" : {
"properties" : {
"first_name" : {
"type" : "keyword"
},
"last_name" : {
"type" : "keyword"
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
# 写入一条电影信息
POST my_movies/_doc/1
{
"title":"Speed",
"actors":[
{
"first_name":"Keanu",
"last_name":"Reeves"
},
{
"first_name":"Dennis",
"last_name":"Hopper"
}
]
}
# 查询电影信息
POST my_movies/_search
{
"query": {
"bool": {
"must": [
{"match": {"actors.first_name": "Keanu"}},
{"match": {"actors.last_name": "Hopper"}}
]
}
}
}
res:
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.723315,
"hits" : [
{
"_index" : "my_movies",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.723315,
"_source" : {
"title" : "Speed",
"actors" : [
{
"first_name" : "Keanu",
"last_name" : "Reeves"
},
{
"first_name" : "Dennis",
"last_name" : "Hopper"
}
]
}
}
]
}
为什么会搜到不需要的结果?
存储时,内部对象的边界并没有考虑在内,JSON 格式被处理成扁平式键值对的结构
当对多个字段进⾏查询时,导致了意外的搜索结果
可以⽤ Nested Data Type 解决这个问题
什么是 Nested Data Type
Nested 数据类型:允许对象数组中的 对象被独⽴索引
使⽤ nested 和 properties 关键字,将所有 actors 索引到多个分隔的⽂档
在内部, Nested ⽂档会被保存在两个 Lucene ⽂档中,在查询时做 Join 处理
DELETE my_movies
# 创建 Nested 对象 Mapping
PUT my_movies
{
"mappings" : {
"properties" : {
"actors" : {
"type": "nested",
"properties" : {
"first_name" : {"type" : "keyword"},
"last_name" : {"type" : "keyword"}
}},
"title" : {
"type" : "text",
"fields" : {"keyword":{"type":"keyword","ignore_above":256}}
}
}
}
}
嵌套查询
- 在内部, Nested ⽂档会被保存在两个 Lucene ⽂档中,会在查询时做 Join 处理
# Nested 查询
POST my_movies/_search
{
"query": {
"bool": {
"must": [
{"match": {"title": "Speed"}},
{
"nested": {
"path": "actors",
"query": {
"bool": {
"must": [
{"match": {
"actors.first_name": "Keanu"
}},
{"match": {
"actors.last_name": "Hopper"
}}
]
}
}
}
}
]
}
}
}
嵌套聚合
# 普通 aggregation不工作
POST my_movies/_search
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "actors.first_name",
"size": 10
}
}
}
}
# Nested Aggregation
POST my_movies/_search
{
"size": 0,
"aggs": {
"actors": {
"nested": {
"path": "actors"
},
"aggs": {
"actor_name": {
"terms": {
"field": "actors.first_name",
"size": 10
}
}
}
}
}
}
res:
"aggregations" : {
"actors" : {
"doc_count" : 2,
"actor_name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Dennis",
"doc_count" : 1
},
{
"key" : "Keanu",
"doc_count" : 1
}
]
}
}
}
本节知识点
-
在 Elasticsearch 中,往往会 Denormalize 数据的⽅式建模(使⽤对象的⽅式)
- 好处是:读写的速度变快 / ⽆需表连接 / ⽆需⾏锁
如果⽂档的更新并不频繁,可以在⽂档中使⽤对象
-
当对象包含了多值对象时
- 可以使⽤嵌套对象(Nested Object)解决查询正确性的问题
课程demos
DELETE blog
# 设置blog的 Mapping
PUT /blog
{
"mappings": {
"properties": {
"content": {
"type": "text"
},
"time": {
"type": "date"
},
"user": {
"properties": {
"city": {
"type": "text"
},
"userid": {
"type": "long"
},
"username": {
"type": "keyword"
}
}
}
}
}
}
# 插入一条 Blog 信息
PUT blog/_doc/1
{
"content":"I like Elasticsearch",
"time":"2019-01-01T00:00:00",
"user":{
"userid":1,
"username":"Jack",
"city":"Shanghai"
}
}
# 查询 Blog 信息
POST blog/_search
{
"query": {
"bool": {
"must": [
{"match": {"content": "Elasticsearch"}},
{"match": {"user.username": "Jack"}}
]
}
}
}
DELETE my_movies
# 电影的Mapping信息
PUT my_movies
{
"mappings" : {
"properties" : {
"actors" : {
"properties" : {
"first_name" : {
"type" : "keyword"
},
"last_name" : {
"type" : "keyword"
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
# 写入一条电影信息
POST my_movies/_doc/1
{
"title":"Speed",
"actors":[
{
"first_name":"Keanu",
"last_name":"Reeves"
},
{
"first_name":"Dennis",
"last_name":"Hopper"
}
]
}
# 查询电影信息
POST my_movies/_search
{
"query": {
"bool": {
"must": [
{"match": {"actors.first_name": "Keanu"}},
{"match": {"actors.last_name": "Hopper"}}
]
}
}
}
DELETE my_movies
# 创建 Nested 对象 Mapping
PUT my_movies
{
"mappings" : {
"properties" : {
"actors" : {
"type": "nested",
"properties" : {
"first_name" : {"type" : "keyword"},
"last_name" : {"type" : "keyword"}
}},
"title" : {
"type" : "text",
"fields" : {"keyword":{"type":"keyword","ignore_above":256}}
}
}
}
}
POST my_movies/_doc/1
{
"title":"Speed",
"actors":[
{
"first_name":"Keanu",
"last_name":"Reeves"
},
{
"first_name":"Dennis",
"last_name":"Hopper"
}
]
}
# Nested 查询
POST my_movies/_search
{
"query": {
"bool": {
"must": [
{"match": {"title": "Speed"}},
{
"nested": {
"path": "actors",
"query": {
"bool": {
"must": [
{"match": {
"actors.first_name": "Keanu"
}},
{"match": {
"actors.last_name": "Hopper"
}}
]
}
}
}
}
]
}
}
}
# Nested Aggregation
POST my_movies/_search
{
"size": 0,
"aggs": {
"actors": {
"nested": {
"path": "actors"
},
"aggs": {
"actor_name": {
"terms": {
"field": "actors.first_name",
"size": 10
}
}
}
}
}
}
# 普通 aggregation不工作
POST my_movies/_search
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "actors.first_name",
"size": 10
}
}
}
}
相关阅读
- https://www.elastic.co/guide/en/elasticsearch/reference/7.1/query-dsl-nested-query.html