GridFS是一种在MongoDB中存储大二进制文件的机制。使用GridFS存文件有如下几个原因:
● GridFS可以简化需求。如果已经用了MongoDB,GridFS就可以不需要独立的文件存储架构。
● GridFS利用已经建立的复制和分片机制,所以对于文件存储来说故障恢复和扩展都很容易。
● GridFS可以避免用于存储用户上传内容的文件系统出现的某些问题。例如:GridFS在同一目录下放置大量文件是没有任何问题的。
● GridFS不产生磁片,因为MongoDB分配的数据文件空间以2G为一块。
mongofiles是GridFS的实用工具,用于管理GridFS文件
--帮助命令
[root@racdb ~]# mongofiles--help
Browse and modify a GridFSfilesystem.
usage: mongofiles [options]command [gridfs filename]
command:
one of (list|search|put|get)
list - list all files. 'gridfs filename' is an optional prefix
which listed filenames must beginwith.
search - search all files. 'gridfs filename'is a substring
which listed filenames must contain.
put - add a file with filename 'gridfsfilename'
get - get a file with filename 'gridfsfilename'
delete - delete all files with filename'gridfs filename'
options:
--help produce helpmessage
-v [ --verbose ] be more verbose (includemultiple times
formore verbosity e.g. -vvvvv)
--version print theprogram's version and exit
-h [ --host ] arg mongo host to connect to (<set
name>/s1,s2 for sets)
--port arg server port. Can also use --host
hostname:port
--ipv6 enable IPv6support (disabled by
default)
-u [ --username ] arg username
-p [ --password ] arg password
--authenticationDatabase arg user source (defaults to dbname)
--authenticationMechanism arg (=MONGODB-CR)
authentication mechanism
--dbpath arg directly accessmongod database files
in thegiven path, instead of
connecting to a mongod server -needs
to lockthe data directory, so cannot
be usedif a mongod is currently
accessing the same path
--directoryperdb each db is in a separate directly
(relevant only if dbpath specified)
--journal enable journaling(relevant only if
dbpathspecified)
-d [ --db ] arg database to use
-c [ --collection ] arg collection to use (somecommands)
-l [ --local ] arg local filename for put|get(default is
to usethe same name as 'gridfs
filename')
-t [ --type ] arg MIME type for put (defaultis to omit)
-r [ --replace ] Remove other files withsame name after
PUT
--上传文件
[root@racdb ~]# mongofiles put foo.log
connected to: 127.0.0.1
added file: { _id:ObjectId('56caba480ad7ef0aa8a76f0c'), filename: "foo.log", chunkSize:261120, uploadDate: new Date(1456126536618), md5:"d1bfff5ab0cc6b652aaf08345b19b7e6", length: 21 }
done!
--列出文件
[root@racdb ~]# mongofiles list
connected to: 127.0.0.1
install.log 54876
foo.log 21
--下载文件
[root@racdb ~]# rm -f foo.log
[root@racdb ~]# mongofiles get foo.log
connected to: 127.0.0.1
done write to: foo.log
[root@racdb ~]# ll foo.log
-rw-r--r--. 1 root root 21 2月 22 15:36 foo.log
--从Gridfs中删除一个文件
[root@racdb ~]# mongofiles deleteinstall.log
connected to: 127.0.0.1
done!
[root@racdb ~]# mongofiles list
connected to: 127.0.0.1
foo.log 21
Gridfs的基本思想就是可以将大文件分成很多块,每块作为一个单独的文档存储,这样就能存大文件了。它一个建立在普通MongoDB文档基础上轻量级文件规范。
由于MongoDB支持在文档存储二进制数据,可以最大限度减少块的存储开销。另外,除了存储文件本身的块,还有一个单独的文档用来存储分块的信息和文件的元数据。
Gridfs的块有个单独的fs.chunks集合(默认),块集合的文档结构如下:
{
"_id" : ObjectId("..."),
"n" : 0,
"data" :BinData("..."),
"files_id" :ObjectId("...")
}
● _id:块的唯一ID
● files_id:包含这个块元数据的文件文档的id
● n:表示块编号,也就是这个块在原文件中顺序编号
● data:包含组成文件块的二进制数据
> db.fs.chunks.find()
{ "_id" :ObjectId("56caba48e0355316e5e4ab39"), "files_id" :ObjectId("56caba480ad7ef0aa8a76f0c"), "n" : 0,"data" : BinData(0,"SGVsbG8gTW9uZ29EQiBHcmlkZnMK") }
{ "_id" :ObjectId("56cabb85e0355316e5e4ab3a"), "files_id" :ObjectId("56cabb85d07cdd46e1f143a4"), "n" : 0,"data" : BinData(0,"SGVsbG8gTW9uZ29EQiBHcmlkZnMK") }
{ "_id" :ObjectId("56cabb89e0355316e5e4ab3b"), "files_id" :ObjectId("56cabb895c03f6feeb64bb6e"), "n" : 0,"data" :BinData(0,"5a6J6KOFIGxpYmdjYy00LjQuNy00LmVsNi54ODZfNjQKd2FybmluZzogbGliZ2NjLTQuNC43LTQuZWw2Lng4Nl82NDogSGVhZGVyIFYzIFJTQS9TSEEyNTYgU2lnbmF0dXJlLCBrZXkgSUQgZWM1NTFmMDM6IE5PS0VZCuWuieijhSBmb250cGFja2FnZXMtZmlsZXN5c3RlbS0xLjQxLTEuMS5lbDYu
......
--查询返回指定字段
>db.fs.chunks.find({},{"files_id":1,"n":1})
{ "_id" :ObjectId("56caba48e0355316e5e4ab39"), "files_id" :ObjectId("56caba480ad7ef0aa8a76f0c"), "n" : 0 }
{ "_id" :ObjectId("56cabb85e0355316e5e4ab3a"), "files_id" :ObjectId("56cabb85d07cdd46e1f143a4"), "n" : 0 }
{ "_id" :ObjectId("56cabb89e0355316e5e4ab3b"), "files_id" : ObjectId("56cabb895c03f6feeb64bb6e"),"n" : 0 }
Gridfs文件的元数据放在fs.files集合(默认)。这里没每个文档代表GridFS中的一个文件,与文件相关的自定义元数据也可以存在其中。
> db.fs.files.find()
{ "_id" :ObjectId("56caba480ad7ef0aa8a76f0c"), "filename" :"foo.log", "chunkSize" : 261120, "uploadDate" :ISODate("2016-02-22T07:35:36.618Z"), "md5" :"d1bfff5ab0cc6b652aaf08345b19b7e6", "length" : 21 }
{ "_id" :ObjectId("56cabb85d07cdd46e1f143a4"), "filename" :"foo.log", "chunkSize" : 261120, "uploadDate" :ISODate("2016-02-22T07:40:53.015Z"), "md5" :"d1bfff5ab0cc6b652aaf08345b19b7e6", "length" : 21 }
{ "_id" :ObjectId("56cabb895c03f6feeb64bb6e"), "filename" :"install.log", "chunkSize" : 261120, "uploadDate": ISODate("2016-02-22T07:40:57.387Z"), "md5" :"fbe1119cd9688d14475e2a84ccd8a7a6", "length" : 54876 }
● _id 文件的唯一id,在块中作为files_id键值存储
● length 文件内容总的字节数
● chunkSize 每块的大小(字节),默认是256K,必要时可调整
● uploadDate文件存入GridFS的时间戳
● md5 文件内容的md5的校验和,由服务器端生成。
在弄明白GridFS原理后,可对GridFS进行一些操作
--获取GridFS中不重复的文件列表
>db.fs.files.distinct("filename")
[ "foo.log","install.log" ]