网上关于三种集群方式的搭建方式很多,都是分开来介绍的。Replica Set (复制集成)主要是做主从库的,但是没法实现负载均衡的效果,真正实现这个的,是Sharding(分片集群),通过数据分布在每个分片上而实现。所以,如果只用分片,如果一个只有一个主库,那么挂了就真的挂了。所以在我尝试的集群部署采用的Replica Set+Sharding的方式。OS是Redhat_x64系统,客户端用的是Java客户端。Mongodb版本是mongodb-linux-x86_64-2.4.9.tgz。
要构建一个 MongoDB Sharding Cluster,需要三种角色:
l Shard Server: mongod 实例,用于存储实际的数据块,实际生产环境中一个shard server角色可由几台机器组个一个relica set承担,防止主机单点故障
l Config Server: mongod 实例,存储了整个 Cluster Metadata,其中包括 chunk 信息。
l Route Server: mongos 实例,前端路由,客户端由此接入,且让整个集群看上去像单一数据库,前端应用可以透明使用。
1. 分别在3台机器运行一个mongod实例(称为mongod shard11,mongod shard12,mongod shard13)组织replica set1,作为cluster的shard1
2. 1台机器运行一个mongod实例(称为mongod shard22,单机作为cluster的shard2
3. 每台机器运行一个mongod实例,作为3个config server
4. 每台机器运行一个mongs进程,用于客户端连接
主机 |
IP |
端口信息 |
Server1 |
172.17.253.216 |
mongod shard11:27017 |
Server2 |
172.17.253.217 |
mongod shard12:27017 |
Server3 |
172.17.253.67 |
mongod shard13:27017 |
1. 安装monodb软件
su – mongodb
tar zxvf mongodb-linux-x86_64-2.4.9.tgz
创建数据目录
根据本例sharding架构图所示,在各台sever上创建shard数据文件目录
Server1:
cd /opt/mongodb
mkdir -p data/shard11
Server2:
cd /opt/mongodb
mkdir -p data/shard12
mkdir -p data/shard22
Server3:
cd /opt/mongodb
mkdir -p data/shard13
1. 配置shard1所用到的replica sets:
方式一:
Server1:
cd /opt/mongodb/bin
./mongod –shardsvr –replSet shard1 –port 27017 –dbpath /mongodb/data/shard11 –oplogSize 100 –logpath /mongodb/data/shard11.log –logappend –fork
Server2:
cd /opt/mongodb/bin
./mongod –shardsvr –replSet shard1 –port 27017 –dbpath /mongodb/data/shard12 –oplogSize 100 –logpath /mongodb/data/shard12.log –logappend –fork
Server3:
cd /opt/mongodb/bin
./mongod –shardsvr –replSet shard1 –port 27017 –dbpath /mongodb/data/shard13 –oplogSize 100 –logpath /mongodb/data/shard13.log –logappend –fork
方式二:
由于配置文件比较多,建议写在文件中
Server1:
vim /opt/mongodb/bin/shard11.conf
01.
#shard11.conf
02.
dbpath=/opt/mongodb/data/shard11
03.
logpath = /opt/mongodb/data/shard11.log
04.
pidfilepath = /opt/mongdb/shard11.pid
05.
directoryperdb =
true
06.
logappend =
true
07.
replSet = shard1
08.
bind_ip=
172.17
.
253.216
09.
port =
27017
10.
oplogSize =
100
11.
fork =
true
12.
noprealloc=
true
cd /opt/mongodb/bin
./mongod -shardsvr -f shard11.conf
当看到下面的内容表示启动成功:
1.
about to fork child process, waiting until server is ready
for
connections.
2.
all output going to: /opt/mongodb/data/shard11.log
3.
forked process:
14867
4.
child process started successfully, parent exiting
Server2:同理
vim /opt/mongodb/bin/shard12.conf
01.
#shard12.conf
02.
dbpath=/opt/mongodb/data/shard12
03.
logpath = /opt/mongodb/data/shard12.log
04.
pidfilepath = /opt/mongdb/shard12.pid
05.
directoryperdb =
true
06.
logappend =
true
07.
replSet = shard1
08.
bind_ip=
172.17
.
253.217
09.
port =
27017
10.
oplogSize =
100
11.
fork =
true
12.
noprealloc=
true
cd /opt/mongodb/bin
./mongod -shardsvr -f shard12.conf
Server3:同理
vim /opt/mongodb/bin/shard12.conf
01.
#shard13.conf
02.
dbpath=/opt/mongodb/data/shard13
03.
logpath = /opt/mongodb/data/shard13.log
04.
pidfilepath = /opt/mongdb/shard13.pid
05.
directoryperdb =
true
06.
logappend =
true
07.
replSet = shard1
08.
bind_ip=
172.17
.
253.67
09.
port =
27017
10.
oplogSize =
100
11.
fork =
true
12.
noprealloc=
true
cd /opt/mongodb/bin
./mongod -shardsvr -f shard12.conf
参数解释:
dbpath:数据存放目录
logpath:日志存放路径
pidfilepath:进程文件,方便停止mongodb
directoryperdb:为每一个数据库按照数据库名建立文件夹存放
logappend:以追加的方式记录日志
replSet:replica set的名字
bind_ip:mongodb所绑定的ip地址
port:mongodb进程所使用的端口号,默认为27017
oplogSize:mongodb操作日志文件的最大大小。单位为Mb,默认为硬盘剩余空间的5%
fork:以后台方式运行进程
noprealloc:不预先分配存储
初始化replica set
配置主,备,仲裁节点,可以通过客户端连接mongodb,也可以直接在三个节点中选择一个连接mongodb。
用mongo连接其中一个mongod,执行:
01.
[root
@localhost
bin]# ./mongo
172.17
.
253.217
:
27017
02.
MongoDB shell version:
2.4
.
9
03.
connecting to:
172.17
.
253.217
:
27017
/test
04.
> use admin
05.
switched to db admin
06.
> config={_id:
'shard1'
,members:[{_id:
0
,host:
'172.17.253.216:27017'
,priority:
2
},{_id:
1
,host:
'172.17.253.217:27017'
,priority:
1
},{_id:
2
,host:
'172.17.253.67:27017'
,arbiterOnly:
true
}]}
07.
{
08.
"_id"
:
"shard1"
,
09.
"members"
: [
10.
{
11.
"_id"
:
0
,
12.
"host"
:
"172.17.253.216:27017"
,
13.
"priority"
:
2
14.
},
15.
{
16.
"_id"
:
1
,
17.
"host"
:
"172.17.253.217:27017"
,
18.
"priority"
:
1
19.
},
20.
{
21.
"_id"
:
2
,
22.
"host"
:
"172.17.253.67:27017"
,
23.
"arbiterOnly"
:
true
24.
}
25.
]
26.
}
27.
> rs.initiate(config)#使配置生效
28.
29.
{
30.
"info"
:
"Config now saved locally. Should come online in about a minute."
,
31.
"ok"
:
1
32.
}
config是可以任意的名字,当然最好不要是mongodb的关键字,conf,config都可以。最外层的_id表示replica set的名字,members里包含的是所有节点的地址以及优先级。优先级最高的即成为主节点,即这里的172.17.253.216:27017。特别注意的是,对于仲裁节点,需要有个特别的配置——arbiterOnly:true。这个千万不能少了,不然主备模式就不能生效。
配置的生效时间根据不同的机器配置会有长有短,配置不错的话基本上十几秒内就能生效,有的配置需要一两分钟。如果生效了,执行rs.status()命令会看到如下信息:
01.
> rs.status()
02.
{
03.
"set"
:
"shard1"
,
04.
"date"
: ISODate(
"2014-02-13T17:39:46Z"
),
05.
"myState"
:
2
,
06.
"members"
: [
07.
{
08.
"_id"
:
0
,
09.
"name"
:
"172.17.253.216:27017"
,
10.
"health"
:
1
,
11.
"state"
:
6
,
12.
"stateStr"
:
"UNKNOWN"
,
13.
"uptime"
:
42
,
14.
"optime"
: Timestamp(
0
,
0
),
15.
"optimeDate"
: ISODate(
"1970-01-01T00:00:00Z"
),
16.
"lastHeartbeat"
: ISODate(
"2014-02-13T17:39:44Z"
),
17.
"lastHeartbeatRecv"
: ISODate(
"1970-01-01T00:00:00Z"
),
18.
"pingMs"
:
1
,
19.
"lastHeartbeatMessage"
:
"still initializing"
20.
},
21.
{
22.
"_id"
:
1
,
23.
"name"
:
"172.17.253.217:27017"
,
24.
"health"
:
1
,
25.
"state"
:
2
,
26.
"stateStr"
:
"SECONDARY"
,
27.
"uptime"
:
3342
,
28.
"optime"
: Timestamp(
1392313137
,
1
),
29.
"optimeDate"
: ISODate(
"2014-02-13T17:38:57Z"
),
30.
"self"
:
true
31.
},
32.
{
33.
"_id"
:
2
,
34.
"name"
:
"172.17.253.67:27017"
,
35.
"health"
:
1
,
36.
"state"
:
5
,
37.
"stateStr"
:
"STARTUP2"
,
38.
"uptime"
:
40
,
39.
"lastHeartbeat"
: ISODate(
"2014-02-13T17:39:44Z"
),
40.
"lastHeartbeatRecv"
: ISODate(
"2014-02-13T17:39:44Z"
),
41.
"pingMs"
:
0
42.
}
43.
],
44.
"ok"
:
1
45.
}
1.
> rs.initiate()
2.
{
3.
"errmsg"
:
"exception: Can't take a write lock while out of disk space"
,
4.
"code"
:
14031
,
5.
"ok"
:
0
6.
}
我们会发现,本来应该是主库的显示unkown,仲裁库显示STARTUP2。报错:
1.
Can't take a write lock
while
out of disk space
这个如何解决呢,经过百度一番之后,
将lock文件删除
rm /var/lib/mongodb/mongod.lock
最好也把journal日志删除,那玩意也很占硬盘,重启mongodb服务
在次尝试:
01.
root
@Labs06
bin]# ./mongo
172.17
.
253.216
:
27017
02.
MongoDB shell version:
2.4
.
9
03.
connecting to:
172.17
.
253.216
:
27017
/test
04.
shard1:PRIMARY>
05.
shard1:PRIMARY> rs.status()
06.
{
07.
"set"
:
"shard1"
,
08.
"date"
: ISODate(
"2014-02-13T10:53:12Z"
),
09.
"myState"
:
1
,
10.
"members"
: [
11.
{
12.
"_id"
:
0
,
13.
"name"
:
"172.17.253.216:27017"
,
14.
"health"
:
1
,
15.
"state"
:
1
,
16.
"stateStr"
:
"PRIMARY"
,
17.
"uptime"
:
921
,
18.
"optime"
: Timestamp(
1392313137
,
1
),
19.
"optimeDate"
: ISODate(
"2014-02-13T17:38:57Z"
),
20.
"self"
:
true
21.
},
22.
{
23.
"_id"
:
1
,
24.
"name"
:
"172.17.253.217:27017"
,
25.
"health"
:
1
,
26.
"state"
:
2
,
27.
"stateStr"
:
"SECONDARY"
,
28.
"uptime"
:
815
,
29.
"optime"
: Timestamp(
1392313137
,
1
),
30.
"optimeDate"
: ISODate(
"2014-02-13T17:38:57Z"
),
31.
"lastHeartbeat"
: ISODate(
"2014-02-13T10:53:10Z"
),
32.
"lastHeartbeatRecv"
: ISODate(
"2014-02-13T10:53:11Z"
),
33.
"pingMs"
:
1
,
34.
"syncingTo"
:
"172.17.253.216:27017"
35.
},
36.
{
37.
"_id"
:
2
,
38.
"name"
:
"172.17.253.67:27017"
,
39.
"health"
:
1
,
40.
"state"
:
7
,
41.
"stateStr"
:
"ARBITER"
,
42.
"uptime"
:
776
,
43.
"lastHeartbeat"
: ISODate(
"2014-02-13T10:53:11Z"
),
44.
"lastHeartbeatRecv"
: ISODate(
"2014-02-13T10:53:10Z"
),
45.
"pingMs"
:
0
46.
}
47.
],
48.
"ok"
:
1
49.
}
同样方法,配置shard2用到的replica sets:
这里我们之添加172.17.253.217:27018单机
Server1:
mkdir -p /mongodb/data/config
./mongod –configsvr –dbpath /mongodb/data/config –port 20000 –logpath /mongodb/data/config.log –logappend –fork #config server也需要dbpath
Server2:
mkdir -p /mongodb/data/config
./mongod –configsvr –dbpath /mongodb/data/config –port 20000 –logpath /mongodb/data/config.log –logappend –fork
Server3:
mkdir -p /mongodb/data/config
./mongod –configsvr –dbpath /mongodb/data/config –port 20000 –logpath /mongodb/data/config.log –logappend –fork
方式二:
由于配置文件比较多,建议写在文件中
Server1:
#config.conf
1.
dbpath=/opt/mongodb/data/config
2.
logpath = /opt/mongodb/data/config.log
3.
logappend =
true
4.
bind_ip=
172.17
.
253.216
5.
port =
20000
6.
fork =
true
1.
[root
@localhost
bin]# ./mongod -configsvr -f config.conf
2.
about to fork child process, waiting until server is ready
for
connections.
3.
forked process:
24132
4.
all output going to: /opt/mongodb/data/config.log
5.
child process started successfully, parent exiting
Server2:
1.
dbpath=/opt/mongodb/data/config
2.
logpath = /opt/mongodb/data/config.log
3.
logappend =
true
4.
bind_ip=
172.17
.
253.217
5.
port =
20000
6.
fork =
true
Server3:
1.
dbpath=/opt/mongodb/data/config
2.
logpath = /opt/mongodb/data/config.log
3.
logappend =
true
4.
bind_ip=
172.17
.
253.67
5.
port =
20000
6.
fork =
true
2.4启动路由节点
在每一台server上都执行
1.
[root
@localhost
bin]# ./mongos --configdb
172.17
.
253.217
:
20000
,
172.17
.
253.67
:
20000
,
172.17
.
253.216
:
20000
-port
30000
-chunkSize
5
-logpath /opt/mongodb/data/mongos.log -logappend -fork
2.
about to fork child process, waiting until server is ready
for
connections.
3.
forked process:
26210
4.
all output going to: /opt/mongodb/data/mongos.log
5.
child process started successfully, parent exiting
连接到其中一个mongos进程,并切换到admin数据库做以下配置
1. 连接到mongs,并切换到admin
./mongo 172.17.253.217:30000/admin这里必须连接路由节点
>db
Admin
2. 加入shards
如里shard是单台服务器,用>db.runCommand( { addshard : “[:]” } )这样的命令加入,如果shard是replica sets,用replicaSetName/[:port][,serverhostname2[:port],…]这样的格式表示,例如本例执行:
1.
mongos> db.runCommand( { addshard :
"shard1/172.17.253.216:27017,172.17.253.67:27017,172.17.253.217:27017"
,name:
"shard1"
,maxsize:
20480
});
2.
{
"shardAdded"
:
"shard1"
,
"ok"
:
1
}
1.
mongos> db.runCommand( { addshard :
"shard2/172.17.253.217:27018"
,name:
"shard2"
,maxsize:
20480
});
2.
{
"shardAdded"
:
"shard2"
,
"ok"
:
1
}
01.
mongos> db.runCommand({listshards:
1
})
02.
{
03.
"shards"
: [
04.
{
05.
"_id"
:
"shard1"
,
06.
"host"
:
"shard1/172.17.253.216:27017,172.17.253.217:27017"
07.
},
08.
{
09.
"_id"
:
"shard2"
,
10.
"host"
:
"shard2/172.17.253.217:27018"
11.
}
12.
],
13.
"ok"
:
1
14.
}
注意:在添加第二个shard时,出现error:test database 已经存在的错误,这里用mongo命令连接到第二个replica set,用db.dropDatabase()命令把test数据库给删除然后就可加入
3. 可选参数
Name:用于指定每个shard的名字,不指定的话系统将自动分配
maxSize:指定各个shard可使用的最大磁盘空间,单位megabytes
4. Listing shards
>db.runCommand( { listshards : 1 } )
如果列出了以上二个你加的shards,表示shards已经配置成功
1、激活数据库分片
命令:
> db.runCommand( { enablesharding : “” } );
通过执行以上命令,可以让数据库跨shard,如果不执行这步,数据库只会存放在一个shard,一旦激活数据库分片,数据库中不同的collection将被存放在不同的shard上,但一个collection仍旧存放在同一个shard上,要使单个collection也分片,还需单独对collection作些操作
2、Collection分片
要使单个collection也分片存储,需要给collection指定一个分片key,通过以下命令操作:
> db.runCommand( { shardcollection : “”,key : });
注:
a. 分片的collection系统会自动创建一个索引(也可用户提前创建好)
b. 分片的collection只能有一个在分片key上的唯一索引,其它唯一索引不被允许
One note: a sharded collection can have only one unique index, which must exist on the shard key. No other unique indexes can exist on the collection.
>db.runCommand( { shardcollection : “test.c1″,key : {id: 1} } )
>for (var i = 1; i <= 200003; i++) db.c1.save({id:i,value1:”1234567890″,value2:”1234567890″,value3:”1234567890″,value4:”1234567890″});
> db.c1.stats()(该命令可以查看表的存储状态)