1、备份数据文件
在MongoDB运行时复制数据目录不***全,所以先把服务器关了,再复制数据目录。数据库目录中就是关闭那一刻数据的快照,在服务器重新启动之前,可以复制目录作为备份。
2、mongodump和mongorestore
2.1、mongodump
mongodump是一种能在运行时备份的方法。它对运行的mongodb做查询,然后将所有查到的文档写入磁盘。它使用普通的查询机制,所以产生的备份不一定是服务器数据的实时快照,服务器在备份过程中处理写入时尤为明显。
另外,mongodump还带来个问题,备份时的查询会对其它客户端的性能产生不利的影响。
[root@gflinux102 ~]# mongodump --help
Export MongoDB data to BSON files.
Options:
--help produce help message
-v [ --verbose ] be more verbose (include multiple times
for more verbosity e.g. -vvvvv)
--quiet silence all non error diagnostic
messages
--version print the program's version and exit
-h [ --host ] arg mongo host to connect to ( <set
name>/s1,s2 for sets)
--port arg server port. Can also use --host
hostname:port
--ipv6 enable IPv6 support (disabled by
default)
-u [ --username ] arg username
-p [ --password ] arg password
--authenticationDatabase arg user source (defaults to dbname)
--authenticationMechanism arg (=MONGODB-CR)
authentication mechanism
--gssapiServiceName arg (=mongodb) Service name to use when authenticating
using GSSAPI/Kerberos
--gssapiHostName arg Remote host name to use for purpose of
GSSAPI/Kerberos authentication
--dbpath arg directly access mongod database files
in the given path, instead of
connecting to a mongod server - needs
to lock the data directory, so cannot
be used if a mongod is currently
accessing the same path
--directoryperdb each db is in a separate directory
(relevant only if dbpath specified)
--journal enable journaling (relevant only if
dbpath specified)
-d [ --db ] arg database to use
-c [ --collection ] arg collection to use (some commands)
-o [ --out ] arg (=dump) output directory or "-" for stdout
-q [ --query ] arg json query
--oplog Use oplog for point-in-time
snapshotting
--repair try to recover a crashed database
--forceTableScan force a table scan (do not use
$snapshot)
--dumpDbUsersAndRoles Dump user and role definitions for the
given database
备份测试数据库
[root@gflinux102 ~]# mongodump --port 10000 -d test -o backup
connected to: 127.0.0.1:10000
2015-02-13T09:51:05.506+0800 DATABASE: test to backup/test
2015-02-13T09:51:05.508+0800 test.system.indexes to backup/test/system.indexes.bson
2015-02-13T09:51:05.510+0800 2 documents
2015-02-13T09:51:05.511+0800 test.rgf to backup/test/rgf.bson
2015-02-13T09:51:05.512+0800 16 documents
2015-02-13T09:51:05.512+0800 Metadata for test.rgf to backup/test/rgf.metadata.json
[root@gflinux102 ~]# ls -R backup/
backup/:
test
backup/test:
rgf.bson rgf.metadata.json system.indexes.bson
[root@gflinux102 ~]#
会在当前目录下产生backup目录,之后产生备份数据库目录。
2.2、mongorestore
[root@gflinux102 ~]# mongorestore --help
Import BSON files into MongoDB.
usage: mongorestore [options] [directory or filename to restore from]
Options:
--help produce help message
-v [ --verbose ] be more verbose (include multiple times
for more verbosity e.g. -vvvvv)
--quiet silence all non error diagnostic
messages
--version print the program's version and exit
-h [ --host ] arg mongo host to connect to ( <set
name>/s1,s2 for sets)
--port arg server port. Can also use --host
hostname:port
--ipv6 enable IPv6 support (disabled by
default)
-u [ --username ] arg username
-p [ --password ] arg password
--authenticationDatabase arg user source (defaults to dbname)
--authenticationMechanism arg (=MONGODB-CR)
authentication mechanism
--gssapiServiceName arg (=mongodb) Service name to use when authenticating
using GSSAPI/Kerberos
--gssapiHostName arg Remote host name to use for purpose of
GSSAPI/Kerberos authentication
--dbpath arg directly access mongod database files
in the given path, instead of
connecting to a mongod server - needs
to lock the data directory, so cannot
be used if a mongod is currently
accessing the same path
--directoryperdb each db is in a separate directory
(relevant only if dbpath specified)
--journal enable journaling (relevant only if
dbpath specified)
-d [ --db ] arg database to use
-c [ --collection ] arg collection to use (some commands)
--objcheck validate object before inserting
(default)
--noobjcheck don't validate object before inserting
--filter arg filter to apply before inserting
--drop drop each collection before import
--oplogReplay replay oplog for point-in-time restore
--oplogLimit arg include oplog entries before the
provided Timestamp (seconds[:ordinal])
during the oplog replay; the ordinal
value is optional
--keepIndexVersion don't upgrade indexes to newest version
--noOptionsRestore don't restore collection options
--noIndexRestore don't restore indexes
--restoreDbUsersAndRoles Restore user and role definitions for
the given database
--w arg (=0) minimum number of replicas per write
[root@gflinux102 ~]#
恢复数据库:
[root@gflinux102 ~]# mongorestore --port 10000 -d test --drop backup/test/
connected to: 127.0.0.1:10000
2015-02-13T09:58:19.104+0800 backup/test/rgf.bson
2015-02-13T09:58:19.104+0800 going into namespace [test.rgf]
2015-02-13T09:58:19.105+0800 dropping
16 objects found
2015-02-13T09:58:19.119+0800 Creating index: { key: { _id: 1 }, name: "_id_", ns: "test.rgf" }
2015-02-13T09:58:19.120+0800 Creating index: { key: { name: 1 }, name: "name_1", ns: "test.rgf" }
[root@gflinux102 ~]#
-d指定了要恢复的数据库,这个选项可以将备份恢复到与原来不同命的数据库中。--drop代表在恢复前删除集合(若存在),否则,数据就会与现有集合数据合并,可能会覆盖一些文档。
3、fsync和锁
虽然用mongodump和mongorestore能不停机备份,但是却失去了获取实时数据视图的能力。fsync命令能在mongodb运行时复制数据目录还不会损坏数据库。
fsync命令会强制服务器将所有缓冲区写入磁盘,还可以选择上锁阻止对数据库的进一步写入,直到释放锁为止。写入锁是让fsync在备份时发挥作用的关键。
3.1、强制执行并获得写入锁:
> use admin
switched to db admin
> db.runCommand({"fsync":1,"lock":1})
{
"info" : "now locked against writes, use db.fsyncUnlock() to unlock",
"seeAlso" : "http://dochub.mongodb.org/core/fsynccommand",
"ok" : 1
}
>
3.2、开始备份
[root@gflinux102 ~]# mongodump --port 10000 -d test -o /opt/backup
connected to: 127.0.0.1:10000
2015-02-13T10:11:09.731+0800 DATABASE: test to /opt/backup/test
2015-02-13T10:11:09.734+0800 test.system.indexes to /opt/backup/test/system.indexes.bson
2015-02-13T10:11:09.736+0800 2 documents
2015-02-13T10:11:09.736+0800 test.rgf to /opt/backup/test/rgf.bson
2015-02-13T10:11:09.738+0800 16 documents
2015-02-13T10:11:09.738+0800 Metadata for test.rgf to /opt/backup/test/rgf.metadata.json
[root@gflinux102 ~]# ls -R /opt/backup/
/opt/backup/:
test
/opt/backup/test:
rgf.bson rgf.metadata.json system.indexes.bson
至此,数据目录的数据就是一致的,且为数据的实时快照。因为上了写入锁,可以安全地将数据目录副本用做备份。要是数据库运行在有快照功能的文件系统上,这个会非常快。
[root@gflinux102 ~]#
3.3、解锁
> use admin
switched to db admin
> db.fsyncUnlock()
{ "ok" : 1, "info" : "unlock completed" }
> db.currentOp()
{ "inprog" : [ ] }
>
运行currentop()是为了确保已经解锁。
有了fsync,就能灵活地备份,不用停掉服务器,也不用牺牲备份的实时特性,要付出的代价就是一些写入操作被暂时阻塞。
唯一不耽误读写还能保证实时快照的备份方式就是通过从服务器备份。
4、从属备份
当以复制方式运行mongodb时,备份技术不仅能用在主服务器上,还能用在从服务器上,用在从服务器上的效果会更好,从服务器的数据几乎与主服务器同步,因为不太在乎从服务器的性能或是能不能读写,关停、存储和恢复工具或fsync命令。在从服务器上备份是mongoDB推荐的备份方式。
5、修复数据库
5.1、修复简介
修复所有数据库最简单的方式就是在启动时加上--repair:
mongod --repair
修复原理:将所有的文档导出然后马上导入,忽略那些无效的文档,完成以后,会重新建立索引。
数据量大的时候,会花费很多时间,因为所有的数据都要验证,所有索引都要重建。修复后可能会比修复前少些文档,因为损毁的文档被丢弃了。
修复数据库还能起到压缩数据的作用,闲置的空间(比如删除体积较大的集合或删除大量文档后腾出的空间)在修复后被重新回收。
5.2、修复运行中的数据库
修复运行中的数据库,要在shell中用repairDatabase。
例如,修复test数据库:
> use test;
switched to db test
> db.repairDatabase()
{ "ok" : 1 }
>
要是不通过shell而是通过驱动程序,可以用repairDatabase来完成相同的事情:
{"repairDatabase":1}
修复损坏的数据是不得已。尽可能稳妥地停掉服务器,利用复制功能实现故障恢复,经常做备份,这些才是最有效的管理数据的手段。