mongodb的扩展方式---分片,如果业务数据和系统负载不断增加,可以通过分片来解决。 分片就是指将数据拆分,分散到不同的服务器上,从而处理更大的负载,存储大数据。 当数据增大到一定程度时,查询数据会变的很慢,难以忍受的地步,严重影响用户体验。往往就会根据业务对大表大数据库进行分表分库操作, MySQL分表操作参见http://www.ttlsa.com/html/1757.html,人为的按照某种协定好的策略将若干不同的数据存储到不同的数据库服务器上,应用程序管理不同服务器上的不同数据,每台服务器上的连接都是完全独立的。在我曾经工作过的地方,mysql分表分库大量应用,比如将论坛附件表根据uid拆分成10个表00-09,取模10,也就是取uid最后两位;将post_000到post_014存在db1服务器上,将post_015到post_029存在db2服务器上,如此类推。这种分表分库可以很好的工作,弊端就是非常难以维护,调整数据分布和服务器负载,添加或减除节点非常困难,变一处而动全身。 mongodb支持自动分片,集群自动的切分数据,做负载均衡。避免上面的分片管理难度。 mongodb分片是将集合切合成小块,分散到若干片里面,每个片负责所有数据的一部分。这些块对应用程序来说是透明的,不需要知道哪些数据分布到哪些片上,甚至不在乎是否有做过分片,应用程序连接mongos进程,mongos知道数据和片的对应关系,将客户端请求转发到正确的片上,如果请求有了响应,mongos将结果收集起来返回给客户端程序。 分片适用场景: 1. 服务器磁盘不够用 2. 单个mongod不能满足日益频繁写请求 3. 将大量数据存放于内存中提高性能 建立分片需要三种角色: 1. shard server 保存实际数据容器。每个shard可以是单个的mongod实例,也可以是复制集,即使片内又多台服务器,只能有一个主服务器,其他的保存相同数据的副本。为了实现每个shard内部的auto-failover,强烈建议为每个shard配置一组Replica Set。 2. config server 为了将一个特定的collection 存储在多个shard 中,需要为该collection 指定一个shardkey,shardkey 可以决定该条记录属于哪个chunk。Config Servers 就是用来存储:所有shard 节点的配置信息、每个chunk 的shardkey 范围、chunk 在各shard 的分布情况、该集群中所有DB 和collection 的sharding 配置信息。 3. route server 集群前端路由,路由所有请求,然后将结果聚合。客户端由此接入,询问config server需要到哪些shard上查询或保存数据,再连接到相应的shard进行操作,最后将结果返回给客户端。客户端只需要将原先发送给mongod的请求原封不动的发给mongos即可,不必数据分布在哪个shard上。 shard key: 设置分片时,需要从集合中选一个键,作为数据拆分的依据,这个键就是shard key。 shard key的选择决定了插入操作在片之间的分布。 shard key保证足够的不一致性,数据才能更好的分布到多台服务器上。同时保持块在一个合理的规模是非常重要的,这样数据平衡和移动块不会消耗大量的资源。 mongodb shard集群配置参加 http://www.ttlsa.com/html/1096.html 判断是否shard
view source
print ?
> db.runCommand({ isdbgrid : 1});
生产环境下,常用分片方案: Shard server: 使用Replica Sets,确保每个数据节点都具有备份、自动容错转移、自动恢复能力。 Config server: 使用三个配置服务器,确保元数据完整性(two-phase commit)。 Route server: 配合LVS,实现负载平衡,提高接入性能(high performance)。也可以通过mongodb驱动连接,mongodb://。 mongodb sharding cluster简易搭建方案:
view source
print ?
> mongo --nodb
# mkdir -p /data/db/test1
> cluster = new ShardingTest({"shards" : 3, "chunksize" : 1}) //30000、30001、30002 mongod,mongos 30999
# netstat -tnplu | grep mong
tcp 0 0 0.0.0.0:30000 0.0.0.0:* LISTEN 28000/mongod
tcp 0 0 0.0.0.0:30001 0.0.0.0:* LISTEN 28016/mongod
tcp 0 0 0.0.0.0:30002 0.0.0.0:* LISTEN 28030/mongod
tcp 0 0 0.0.0.0:30999 0.0.0.0:* LISTEN 28048/mongos
tcp 0 0 0.0.0.0:31000 0.0.0.0:* LISTEN 28000/mongod
tcp 0 0 0.0.0.0:31001 0.0.0.0:* LISTEN 28016/mongod
tcp 0 0 0.0.0.0:31002 0.0.0.0:* LISTEN 28030/mongod
tcp 0 0 0.0.0.0:31999 0.0.0.0:* LISTEN 28048/mongos
# ps -ef | grep mong
root 27869 25530 0 22:14 pts/3 00:00:00 ./mongo --nodb
root 28000 27869 0 22:17 pts/3 00:00:00 /usr/local/mongodb-linux-x86_64-2.4.5/bin/mongod --port 30000 --dbpath /data/db/test0 --setParameter enableTestCommands=1
root 28016 27869 0 22:17 pts/3 00:00:00 /usr/local/mongodb-linux-x86_64-2.4.5/bin/mongod --port 30001 --dbpath /data/db/test1 --setParameter enableTestCommands=1
root 28030 27869 0 22:17 pts/3 00:00:00 /usr/local/mongodb-linux-x86_64-2.4.5/bin/mongod --port 30002 --dbpath /data/db/test2 --setParameter enableTestCommands=1
root 28048 27869 0 22:17 pts/3 00:00:00 /usr/local/mongodb-linux-x86_64-2.4.5/bin/mongos --port 30999 --configdb localhost:30000 --chunkSize 1 --setParameter enableTestCommands=1
> db = (new Mongo("localhost:30999")).getDB("test")
m30999| Sat Jul 27 22:24:25.744 [mongosMain] connection accepted from 127.0.0.1:39375 #2 (2 connections now open)
test
mongos>for (var i=0; i<100000; i++) {
... db.users.insert({"username" : "user"+i, "created_at" : new Date()});
... }
mongos> db.users.count()
100000
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 3,
"minCompatibleVersion" : 3,
"currentVersion" : 4,
"clusterId" : ObjectId("51f3d68a3b74fc9fc09a0043")
}
shards:
{ "_id" : "shard0000", "host" : "localhost:30000" }
{ "_id" : "shard0001", "host" : "localhost:30001" }
{ "_id" : "shard0002", "host" : "localhost:30002" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0001" }
mongos> sh.enableSharding("test")
m30999| Sat Jul 27 22:28:31.460 [conn2] enabling sharding on: test
{ "ok" : 1 }
mongos> db.users.ensureIndex({"username" : 1})
m30001| Sat Jul 27 22:29:01.272 [conn3] build index test.users { username: 1.0 }
m30001| Sat Jul 27 22:29:01.554 [conn3] build index done. scanned 100000 total records. 0.282 secs
m30001| Sat Jul 27 22:29:01.554 [conn3] insert test.system.indexes ninserted:1 keyUpdates:0 locks(micros) w:282639 282ms
mongos> sh.shardCollection("test.users", {"username" : 1})
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 3,
"minCompatibleVersion" : 3,
"currentVersion" : 4,
"clusterId" : ObjectId("51f3d68a3b74fc9fc09a0043")
}
shards:
{ "_id" : "shard0000", "host" : "localhost:30000" }
{ "_id" : "shard0001", "host" : "localhost:30001" }
{ "_id" : "shard0002", "host" : "localhost:30002" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }
test.users
shard key: { "username" : 1 }
chunks:
shard0000 5
shard0002 4
shard0001 5
{ "username" : { "$minKey" : 1 } } -->> { "username" : "user16643" } on : shard0000 Timestamp(2, 0)
{ "username" : "user16643" } -->> { "username" : "user2329" } on : shard0002 Timestamp(3, 0)
{ "username" : "user2329" } -->> { "username" : "user29937" } on : shard0000 Timestamp(4, 0)
{ "username" : "user29937" } -->> { "username" : "user36583" } on : shard0002 Timestamp(5, 0)
{ "username" : "user36583" } -->> { "username" : "user43229" } on : shard0000 Timestamp(6, 0)
{ "username" : "user43229" } -->> { "username" : "user49877" } on : shard0002 Timestamp(7, 0)
{ "username" : "user49877" } -->> { "username" : "user56522" } on : shard0000 Timestamp(8, 0)
{ "username" : "user56522" } -->> { "username" : "user63169" } on : shard0002 Timestamp(9, 0)
{ "username" : "user63169" } -->> { "username" : "user69816" } on : shard0000 Timestamp(10, 0)
{ "username" : "user69816" } -->> { "username" : "user76462" } on : shard0001 Timestamp(10, 1)
{ "username" : "user76462" } -->> { "username" : "user83108" } on : shard0001 Timestamp(1, 10)
{ "username" : "user83108" } -->> { "username" : "user89756" } on : shard0001 Timestamp(1, 11)
{ "username" : "user89756" } -->> { "username" : "user96401" } on : shard0001 Timestamp(1, 12)
{ "username" : "user96401" } -->> { "username" : { "$maxKey" : 1 } } on : shard0001 Timestamp(1, 13)
mongos> db.users.find({username: "user12345"}).toArray()
[
{
"_id" : ObjectId("51f3d84f0273316ee2e7f700"),
"username" : "user12345",
"created_at" : ISODate("2013-07-27T14:25:19.769Z")
}
]
mongos> db.users.find({username: "user12345"}).explain()
{
"clusteredType" : "ParallelSort",
"shards" : {
"localhost:30000" : [
{
"cursor" : "BtreeCursor username_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"username" : [
[
"user12345",
"user12345"
]
]
},
"server" : "nd0302012029:30000"
}
]
},
"cursor" : "BtreeCursor username_1",
"n" : 1,
"nChunkSkips" : 0,
"nYields" : 0,
"nscanned" : 1,
"nscannedAllPlans" : 1,
"nscannedObjects" : 1,
"nscannedObjectsAllPlans" : 1,
"millisShardTotal" : 0,
"millisShardAvg" : 0,
"numQueries" : 1,
"numShards" : 1,
"indexBounds" : {
"username" : [
[
"user12345",
"user12345"
]
]
},
"millis" : 1
}
mongos> db.users.find().explain()
{
"clusteredType" : "ParallelSort",
"shards" : {
"localhost:30000" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 36924,
"nscannedObjects" : 36924,
"nscanned" : 36924,
"nscannedObjectsAllPlans" : 36924,
"nscannedAllPlans" : 36924,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 71,
"indexBounds" : {
},
"server" : "nd0302012029:30000"
}
],
"localhost:30001" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 33536,
"nscannedObjects" : 33536,
"nscanned" : 33536,
"nscannedObjectsAllPlans" : 33536,
"nscannedAllPlans" : 33536,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 55,
"indexBounds" : {
},
"server" : "nd0302012029:30001"
}
],
"localhost:30002" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 29540,
"nscannedObjects" : 29540,
"nscanned" : 29540,
"nscannedObjectsAllPlans" : 29540,
"nscannedAllPlans" : 29540,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 63,
"indexBounds" : {
},
"server" : "nd0302012029:30002"
}
]
},
"cursor" : "BasicCursor",
"n" : 100000,
"nChunkSkips" : 0,
"nYields" : 0,
"nscanned" : 100000,
"nscannedAllPlans" : 100000,
"nscannedObjects" : 100000,
"nscannedObjectsAllPlans" : 100000,
"millisShardTotal" : 189,
"millisShardAvg" : 63,
"numQueries" : 3,
"numShards" : 3,
"millis" : 72
}
mongos> cluster.stop()
# netstat -tnplu | grep mong
# ps -ef | grep mong
root 27869 25530 0 22:14 pts/3 00:00:03 ./mongo --nodb
注意: #是在系统下, >是在mongo下