Mongodb-sharding

Practical MongoDB Aggregations Book

概述

分片(sharding)是一种跨多台机器分布数据的方法, MongoDB使用分片来支持具有非常大的数据集和高吞吐量操作的部署。

MongoDB分片群集包含以下组件:

  • 分片(存储):每个分片包含分片数据的子集。 每个分片都可以部署为副本集
  • mongos(路由):mongos充当查询路由器,在客户端应用程序和分片集群之间提供接口
  • config servers(“调度”的配置):配置服务器存储群集的元数据和配置设置。
    从MongoDB 3.4开始,必须将配置服务器部署为副本集(CSRS)。
  • 分片键:分片键可以是单个索引字段,也可以是复合索引所涵盖的多个字段,该复合索引确定集合文档在集群分片之间的分布。
  • 对冲读取:从版本4.4开始,实例可以对冲使用非读取首选项的读取。使用对冲读取实例将读取操作路由到每个查询分片的两个副本集成员,并返回每个分片的第一个响应者的结果。

官方给出的分片集群中组件交互:
Mongodb-sharding_第1张图片
而我看来,从数据和控制的角度,下面的拓扑更为恰当:
Mongodb-sharding_第2张图片

LAB

Sharding的配置参考可以查看官方文档:https://docs.mongodb.com/manual/tutorial/deploy-shard-cluster/

拓扑图Mongodb-sharding_第3张图片

环境准备

所属副本集 主机:Port 存储文件夹
srs01 db1:27017 /mongodb/sharded_cluster/srs01_27017
srs01 db2:27017 /mongodb/sharded_cluster/srs01_27017
srs01 db3:27017 /mongodb/sharded_cluster/srs01_27017
srs02 db1:27018 /mongodb/sharded_cluster/srs02_27018
srs02 db2:27018 /mongodb/sharded_cluster/srs02_27018
srs02 db3:27018 /mongodb/sharded_cluster/srs02_27018
confrs db1:27019 /mongodb/sharded_cluster/confrs_27019
confrs db2:27019 /mongodb/sharded_cluster/confrs_27019
confrs db3:27019 /mongodb/sharded_cluster/confrs_27019

1.创建分片副本集

副本集的具体配置参看:Mongodb-ReplicaSet Install

创建副本集srs01和srs02,如db1,同样的命令用于db2 和 db3

[zyi@db1 ~]$ sudo mongod --port 27017 --dbpath /mongodb/sharded_cluster/srs01_27017/data/db --bind_ip 0.0.0.0 --shardsvr --replSet srs01  --logpath /mongodb/sharded_cluster/srs01_27017/log/srs01.log --fork
about to fork child process, waiting until server is ready for connections.
forked process: 27400
child process started successfully, parent exiting

和正常的副本集的区别在于使用参数:–shardsvr 设置了sharding. clusterRole: shardsvr,如果用conf文件:

sharding:
    clusterRole: shardsvr
replication:
    replSetName: 
net:
    bindIp: localhost,

2.创建配置服务器副本集

创建副本集confrs,如db1,同样的命令用于db2 和 db3

[zyi@db1 ~]$ sudo mongod --port 27019 --dbpath /mongodb/sharded_cluster/confrs_27019/data/db --bind_ip 0.0.0.0 --configsvr --replSet confrs --logpath /mongodb/sharded_cluster/confrs_27019/log/confrs.log --fork
about to fork child process, waiting until server is ready for connections.
forked process: 27790
child process started successfully, parent exiting

和正常的副本集的区别在于使用参数:–configsvr 设置了sharding. clusterRole: configsvr,如果用conf文件:

sharding:
  clusterRole: configsvr
replication:
  replSetName: 
net:
  bindIp: localhost,

3.创建Mongos Router

在另外一台服务器上面启动两个mongos服务,分别使用27018和27019端口。
先启动一个:

[zyi@mongos ~]$ sudo /usr/local/bin/mongos --port 27018 --configdb config/db1:27019,db2:27019,db3:27019 --bind_ip 0.0.0.0 --fork --logpath /mongodb/mongos/mongos.log
about to fork child process, waiting until server is ready for connections.
forked process: 5588
child process started successfully, parent exiting

在启动的时候指定了配置副本集的位置:

--configdb config/db1:27019,db2:27019,db3:27019 

也可以用conf文件的方式:

sharding:
  configDB: /cfg1.example.net:27019,cfg2.example.net:27019
net:
  bindIp: localhost,

再增加一台router

[zyi@centos ~]$ sudo /usr/local/bin/mongos --port 27019 --configdb config/db1:27019,db2:27019,db3:27019 --bind_ip 0.0.0.0 --fork --logpath /mongodb/mongos/mongos.log
about to fork child process, waiting until server is ready for connections.
forked process: 7490
child process started successfully, parent exiting

此时查看sharding状态:

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("61f7c76011fc11df29b8cf4b")
  }
  shards:
        {  "_id" : "srs01",  "host" : "srs01/db1:27017,db2:27017,db3:27017",  "state" : 1,  "topologyTime" : Timestamp(1643628963, 2) }
        {  "_id" : "srs02",  "host" : "srs02/db1:27018,db2:27018,db3:27018",  "state" : 1,  "topologyTime" : Timestamp(1643628995, 1) }
  active mongoses:
        "5.0.5" : 2
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled: yes
        Currently running: no
        Failed balancer rounds in last 5 attempts: 0
        Migration results for the last 24 hours: 
                512 : Success
...

可用的router已经变成了2

active mongoses:
        "5.0.5" : 2

4.访问分片

和对单机mongodb的访问方式一致,对sharding的访问可以用mongo shell方式:

[zyi@centos ~]$ sudo /usr/local/bin/mongo --port 27018
MongoDB shell version v5.0.5
connecting to: mongodb://127.0.0.1:27018/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("b5aec9bd-145f-4348-bff2-d07aed46f13a") }
MongoDB server version: 5.0.5
================
Warning: the "mongo" shell has been superseded by "mongosh",
which delivers improved usability and compatibility.The "mongo" shell has been deprecated and will be removed in
an upcoming release.
For installation instructions, see
https://docs.mongodb.com/mongodb-shell/install/
================
---
The server generated these startup warnings when booting: 
        2022-01-31T11:32:54.563+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
        2022-01-31T11:32:54.563+00:00: You are running this process as the root user, which is not recommended
---
mongos> 

如使用图形化终端如compass,访问mongodb://ip of mongos:port即可

增加分片副本集

进入mongos后使用命令添加sharding副本集
使用命令:sh.addShard( “/s1-mongo1.example.net:27018,s1-mongo2.example.net:27018,s1-mongo3.example.net:27018”)

mongos> show dbs
admin   0.000GB
config  0.000GB
mongos> sh.addShard('srs01/db1:27017,db2:27017,db3:27017')
{
        "shardAdded" : "srs01",
        "ok" : 1,
        "$clusterTime" : {
                "clusterTime" : Timestamp(1643628964, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        },
        "operationTime" : Timestamp(1643628964, 1)
}

添加完成以后可以查看分片状态

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("61f7c76011fc11df29b8cf4b")
  }
  shards:
        {  "_id" : "srs01",  "host" : "srs01/db1:27017,db2:27017,db3:27017",  "state" : 1,  "topologyTime" : Timestamp(1643628963, 2) }
        {  "_id" : "srs02",  "host" : "srs02/db1:27018,db2:27018,db3:27018",  "state" : 1,  "topologyTime" : Timestamp(1643628995, 1) }
  active mongoses:
        "5.0.5" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled: yes
        Currently running: no
        Failed balancer rounds in last 5 attempts: 0
        Migration results for the last 24 hours: 
                No recent migrations
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }

已经加入了shards

shards:
        {  "_id" : "srs01",  "host" : "srs01/db1:27017,db2:27017,db3:27017",  "state" : 1,  "topologyTime" : Timestamp(1643628963, 2) }
        {  "_id" : "srs02",  "host" : "srs02/db1:27018,db2:27018,db3:27018",  "state" : 1,  "topologyTime" : Timestamp(1643628995, 1) }

分片的数据操作

分片键
分片键可以是单个索引字段,也可以是复合索引所涵盖的多个字段,复合索引确定集合文档在集群分片之间的分布。

MongoDB将分片键值(或散列分片键值)的范围划分为分片键值(或散列分片键值)的非重叠范围。每个范围都与一个区块相关联,MongoDB尝试在集群中的分片之间均匀分布区块。

分片规则
对集合进行分片时,你需要选择一个分片键(Shard Key) , shard key 是每条记录都必须包含的,且建立了
索引的单个字段或复合字段,MongoDB按照片键将数据划分到不同的 数据块 中,并将 数据块均衡地分布到所有分片中.为了按照片键划分数据块,MongoDB使用 基于哈希的分片方式(随机平均分配)或者基于范围的分片方式(数值大小分配) 。

  1. 哈希策略
    对于 基于哈希的分片 ,MongoDB计算一个字段的哈希值,并用这个哈希值来创建数据块.
    在使用基于哈希分片的系统中,拥有”相近”片键的文档 很可能不会 存储在同一个数据块中,因此数据的分离性更好一些。
  2. 范围策略
    对 基于范围的分片 ,MongoDB按照片键的范围把数据分成不同部分。
    假设有一个数字的片键:想象一个从负无穷到正无穷的直线,每一个片键的值都在直线上画了一个点.MongoDB把这条直线划分为更短的不重叠的片段,并称之为 数据块 ,每个数据块包含了片键在一定范围内的数据。
    在使用片键做范围划分的系统中,拥有”相近”片键的文档很可能存储在同一个数据块中,因此也会存储在同一个分片中。

我们测试使用哈希策略:
需要使用分片的数据分布,需要先指定数据库和集合的分片键方法

mongos> sh.enableSharding('etaondb')
{
        "ok" : 1,
        "$clusterTime" : {
                "clusterTime" : Timestamp(1643629251, 29),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        },
        "operationTime" : Timestamp(1643629251, 28)
}
mongos> sh.shardCollection('etaondb.comment',{'nickname':'hashed'})
{
        "collectionsharded" : "etaondb.comment",
        "ok" : 1,
        "$clusterTime" : {
                "clusterTime" : Timestamp(1643629362, 42),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        },
        "operationTime" : Timestamp(1643629362, 38)
}

此时查看分片状态

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("61f7c76011fc11df29b8cf4b")
  }
  shards:
        {  "_id" : "srs01",  "host" : "srs01/db1:27017,db2:27017,db3:27017",  "state" : 1,  "topologyTime" : Timestamp(1643628963, 2) }
        {  "_id" : "srs02",  "host" : "srs02/db1:27018,db2:27018,db3:27018",  "state" : 1,  "topologyTime" : Timestamp(1643628995, 1) }
  active mongoses:
        "5.0.5" : 2
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled: yes
        Currently running: no
        Failed balancer rounds in last 5 attempts: 0
        Migration results for the last 24 hours: 
                512 : Success
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                srs01   512
                                srs02   512
                        too many chunks to print, use verbose if you want to force print
        {  "_id" : "etaondb",  "primary" : "srs02",  "partitioned" : true,  "version" : {  "uuid" : UUID("4bd8a393-94dd-442c-8b4d-8327dd5bb7b0"),  "timestamp" : Timestamp(1643629251, 26),  "lastMod" : 1 } }
                etaondb.comment
                        shard key: { "nickname" : "hashed" }
                        unique: false
                        balancing: true
                        chunks:
                                srs01   2
                                srs02   2
                        { "nickname" : { "$minKey" : 1 } } -->> { "nickname" : NumberLong("-4611686018427387902") } on : srs01 Timestamp(1, 0) 
                        { "nickname" : NumberLong("-4611686018427387902") } -->> { "nickname" : NumberLong(0) } on : srs01 Timestamp(1, 1) 
                        { "nickname" : NumberLong(0) } -->> { "nickname" : NumberLong("4611686018427387902") } on : srs02 Timestamp(1, 2) 
                        { "nickname" : NumberLong("4611686018427387902") } -->> { "nickname" : { "$maxKey" : 1 } } on : srs02 Timestamp(1, 3) 
mongos> 

在最后的部分,已经可以看到数据库etaondb的集合comment使用哈希策略。

增加数据
使用*for(var i=1;i<=1000;i++) {db.comment.insert({_id:i+"",nickname:“BoBo”+i})}*增加1000个数据

mongos> for(var i=1;i<=1000;i++) {db.comment.insert({_id:i+"",nickname:"BoBo"+i})}
WriteResult({ "nInserted" : 1 })
mongos> db.comment.count()
1000

在查看分片存储数据情况:

srs01:PRIMARY> use etaondb
switched to db etaondb
srs01:PRIMARY> db.comment.count()
507
......
srs02:PRIMARY> use etaondb
switched to db etaondb
srs02:PRIMARY> db.comment.count()
493

两个分片的数据数分别是507和493,基本上平衡分配。

你可能感兴趣的:(mongodb,数据库)