Practical MongoDB Aggregations Book
分片(sharding)是一种跨多台机器分布数据的方法, MongoDB使用分片来支持具有非常大的数据集和高吞吐量操作的部署。
MongoDB分片群集包含以下组件:
官方给出的分片集群中组件交互:
而我看来,从数据和控制的角度,下面的拓扑更为恰当:
Sharding的配置参考可以查看官方文档:https://docs.mongodb.com/manual/tutorial/deploy-shard-cluster/
所属副本集 | 主机:Port | 存储文件夹 |
---|---|---|
srs01 | db1:27017 | /mongodb/sharded_cluster/srs01_27017 |
srs01 | db2:27017 | /mongodb/sharded_cluster/srs01_27017 |
srs01 | db3:27017 | /mongodb/sharded_cluster/srs01_27017 |
srs02 | db1:27018 | /mongodb/sharded_cluster/srs02_27018 |
srs02 | db2:27018 | /mongodb/sharded_cluster/srs02_27018 |
srs02 | db3:27018 | /mongodb/sharded_cluster/srs02_27018 |
confrs | db1:27019 | /mongodb/sharded_cluster/confrs_27019 |
confrs | db2:27019 | /mongodb/sharded_cluster/confrs_27019 |
confrs | db3:27019 | /mongodb/sharded_cluster/confrs_27019 |
副本集的具体配置参看:Mongodb-ReplicaSet Install
创建副本集srs01和srs02,如db1,同样的命令用于db2 和 db3
[zyi@db1 ~]$ sudo mongod --port 27017 --dbpath /mongodb/sharded_cluster/srs01_27017/data/db --bind_ip 0.0.0.0 --shardsvr --replSet srs01 --logpath /mongodb/sharded_cluster/srs01_27017/log/srs01.log --fork
about to fork child process, waiting until server is ready for connections.
forked process: 27400
child process started successfully, parent exiting
和正常的副本集的区别在于使用参数:–shardsvr 设置了sharding. clusterRole: shardsvr,如果用conf文件:
sharding:
clusterRole: shardsvr
replication:
replSetName:
net:
bindIp: localhost,
创建副本集confrs,如db1,同样的命令用于db2 和 db3
[zyi@db1 ~]$ sudo mongod --port 27019 --dbpath /mongodb/sharded_cluster/confrs_27019/data/db --bind_ip 0.0.0.0 --configsvr --replSet confrs --logpath /mongodb/sharded_cluster/confrs_27019/log/confrs.log --fork
about to fork child process, waiting until server is ready for connections.
forked process: 27790
child process started successfully, parent exiting
和正常的副本集的区别在于使用参数:–configsvr 设置了sharding. clusterRole: configsvr,如果用conf文件:
sharding:
clusterRole: configsvr
replication:
replSetName:
net:
bindIp: localhost,
在另外一台服务器上面启动两个mongos服务,分别使用27018和27019端口。
先启动一个:
[zyi@mongos ~]$ sudo /usr/local/bin/mongos --port 27018 --configdb config/db1:27019,db2:27019,db3:27019 --bind_ip 0.0.0.0 --fork --logpath /mongodb/mongos/mongos.log
about to fork child process, waiting until server is ready for connections.
forked process: 5588
child process started successfully, parent exiting
在启动的时候指定了配置副本集的位置:
--configdb config/db1:27019,db2:27019,db3:27019
也可以用conf文件的方式:
sharding:
configDB: /cfg1.example.net:27019,cfg2.example.net:27019
net:
bindIp: localhost,
再增加一台router
[zyi@centos ~]$ sudo /usr/local/bin/mongos --port 27019 --configdb config/db1:27019,db2:27019,db3:27019 --bind_ip 0.0.0.0 --fork --logpath /mongodb/mongos/mongos.log
about to fork child process, waiting until server is ready for connections.
forked process: 7490
child process started successfully, parent exiting
此时查看sharding状态:
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("61f7c76011fc11df29b8cf4b")
}
shards:
{ "_id" : "srs01", "host" : "srs01/db1:27017,db2:27017,db3:27017", "state" : 1, "topologyTime" : Timestamp(1643628963, 2) }
{ "_id" : "srs02", "host" : "srs02/db1:27018,db2:27018,db3:27018", "state" : 1, "topologyTime" : Timestamp(1643628995, 1) }
active mongoses:
"5.0.5" : 2
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration results for the last 24 hours:
512 : Success
...
可用的router已经变成了2
active mongoses:
"5.0.5" : 2
和对单机mongodb的访问方式一致,对sharding的访问可以用mongo shell方式:
[zyi@centos ~]$ sudo /usr/local/bin/mongo --port 27018
MongoDB shell version v5.0.5
connecting to: mongodb://127.0.0.1:27018/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("b5aec9bd-145f-4348-bff2-d07aed46f13a") }
MongoDB server version: 5.0.5
================
Warning: the "mongo" shell has been superseded by "mongosh",
which delivers improved usability and compatibility.The "mongo" shell has been deprecated and will be removed in
an upcoming release.
For installation instructions, see
https://docs.mongodb.com/mongodb-shell/install/
================
---
The server generated these startup warnings when booting:
2022-01-31T11:32:54.563+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
2022-01-31T11:32:54.563+00:00: You are running this process as the root user, which is not recommended
---
mongos>
如使用图形化终端如compass,访问mongodb://ip of mongos:port即可
进入mongos后使用命令添加sharding副本集
使用命令:sh.addShard( “/s1-mongo1.example.net:27018,s1-mongo2.example.net:27018,s1-mongo3.example.net:27018”)
mongos> show dbs
admin 0.000GB
config 0.000GB
mongos> sh.addShard('srs01/db1:27017,db2:27017,db3:27017')
{
"shardAdded" : "srs01",
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1643628964, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1643628964, 1)
}
添加完成以后可以查看分片状态
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("61f7c76011fc11df29b8cf4b")
}
shards:
{ "_id" : "srs01", "host" : "srs01/db1:27017,db2:27017,db3:27017", "state" : 1, "topologyTime" : Timestamp(1643628963, 2) }
{ "_id" : "srs02", "host" : "srs02/db1:27018,db2:27018,db3:27018", "state" : 1, "topologyTime" : Timestamp(1643628995, 1) }
active mongoses:
"5.0.5" : 1
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "config", "primary" : "config", "partitioned" : true }
已经加入了shards
shards:
{ "_id" : "srs01", "host" : "srs01/db1:27017,db2:27017,db3:27017", "state" : 1, "topologyTime" : Timestamp(1643628963, 2) }
{ "_id" : "srs02", "host" : "srs02/db1:27018,db2:27018,db3:27018", "state" : 1, "topologyTime" : Timestamp(1643628995, 1) }
分片键
分片键可以是单个索引字段,也可以是复合索引所涵盖的多个字段,复合索引确定集合文档在集群分片之间的分布。
MongoDB将分片键值(或散列分片键值)的范围划分为分片键值(或散列分片键值)的非重叠范围。每个范围都与一个区块相关联,MongoDB尝试在集群中的分片之间均匀分布区块。
分片规则
对集合进行分片时,你需要选择一个分片键(Shard Key) , shard key 是每条记录都必须包含的,且建立了
索引的单个字段或复合字段,MongoDB按照片键将数据划分到不同的 数据块 中,并将 数据块均衡地分布到所有分片中.为了按照片键划分数据块,MongoDB使用 基于哈希的分片方式(随机平均分配)或者基于范围的分片方式(数值大小分配) 。
我们测试使用哈希策略:
需要使用分片的数据分布,需要先指定数据库和集合的分片键方法
mongos> sh.enableSharding('etaondb')
{
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1643629251, 29),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1643629251, 28)
}
mongos> sh.shardCollection('etaondb.comment',{'nickname':'hashed'})
{
"collectionsharded" : "etaondb.comment",
"ok" : 1,
"$clusterTime" : {
"clusterTime" : Timestamp(1643629362, 42),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
},
"operationTime" : Timestamp(1643629362, 38)
}
此时查看分片状态
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("61f7c76011fc11df29b8cf4b")
}
shards:
{ "_id" : "srs01", "host" : "srs01/db1:27017,db2:27017,db3:27017", "state" : 1, "topologyTime" : Timestamp(1643628963, 2) }
{ "_id" : "srs02", "host" : "srs02/db1:27018,db2:27018,db3:27018", "state" : 1, "topologyTime" : Timestamp(1643628995, 1) }
active mongoses:
"5.0.5" : 2
autosplit:
Currently enabled: yes
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration results for the last 24 hours:
512 : Success
databases:
{ "_id" : "config", "primary" : "config", "partitioned" : true }
config.system.sessions
shard key: { "_id" : 1 }
unique: false
balancing: true
chunks:
srs01 512
srs02 512
too many chunks to print, use verbose if you want to force print
{ "_id" : "etaondb", "primary" : "srs02", "partitioned" : true, "version" : { "uuid" : UUID("4bd8a393-94dd-442c-8b4d-8327dd5bb7b0"), "timestamp" : Timestamp(1643629251, 26), "lastMod" : 1 } }
etaondb.comment
shard key: { "nickname" : "hashed" }
unique: false
balancing: true
chunks:
srs01 2
srs02 2
{ "nickname" : { "$minKey" : 1 } } -->> { "nickname" : NumberLong("-4611686018427387902") } on : srs01 Timestamp(1, 0)
{ "nickname" : NumberLong("-4611686018427387902") } -->> { "nickname" : NumberLong(0) } on : srs01 Timestamp(1, 1)
{ "nickname" : NumberLong(0) } -->> { "nickname" : NumberLong("4611686018427387902") } on : srs02 Timestamp(1, 2)
{ "nickname" : NumberLong("4611686018427387902") } -->> { "nickname" : { "$maxKey" : 1 } } on : srs02 Timestamp(1, 3)
mongos>
在最后的部分,已经可以看到数据库etaondb的集合comment使用哈希策略。
增加数据
使用*for(var i=1;i<=1000;i++) {db.comment.insert({_id:i+"",nickname:“BoBo”+i})}*增加1000个数据
mongos> for(var i=1;i<=1000;i++) {db.comment.insert({_id:i+"",nickname:"BoBo"+i})}
WriteResult({ "nInserted" : 1 })
mongos> db.comment.count()
1000
在查看分片存储数据情况:
srs01:PRIMARY> use etaondb
switched to db etaondb
srs01:PRIMARY> db.comment.count()
507
......
srs02:PRIMARY> use etaondb
switched to db etaondb
srs02:PRIMARY> db.comment.count()
493
两个分片的数据数分别是507和493,基本上平衡分配。