MongoDB 进阶(GridFS)



GridFS是一种在MongoDB中存储大二进制文件的机制。使用GridFS存文件有如下几个原因:

 GridFS可以简化需求。如果已经用了MongoDB,GridFS就可以不需要独立的文件存储架构。

 GridFS利用已经建立的复制和分片机制,所以对于文件存储来说故障恢复和扩展都很容易。

 GridFS可以避免用于存储用户上传内容的文件系统出现的某些问题。例如:GridFS在同一目录下放置大量文件是没有任何问题的。

 GridFS不产生磁片,因为MongoDB分配的数据文件空间以2G为一块。

 

使用GridFS:mongofiles

mongofilesGridFS的实用工具,用于管理GridFS文件

 

--帮助命令

[root@racdb ~]# mongofiles--help

Browse and modify a GridFSfilesystem.

 

usage: mongofiles [options]command [gridfs filename]

command:

  one of (list|search|put|get)

  list - list all files.  'gridfs filename' is an optional prefix

         which listed filenames must beginwith.

  search - search all files. 'gridfs filename'is a substring

           which listed filenames must contain.

  put - add a file with filename 'gridfsfilename'

  get - get a file with filename 'gridfsfilename'

  delete - delete all files with filename'gridfs filename'

options:

  --help                                produce helpmessage

  -v [ --verbose ]                      be more verbose (includemultiple times

                                        formore verbosity e.g. -vvvvv)

  --version                             print theprogram's version and exit

  -h [ --host ] arg                     mongo host to connect to (<set

                                       name>/s1,s2 for sets)

  --port arg                            server port. Can also use --host

                                       hostname:port

  --ipv6                                enable IPv6support (disabled by

                                       default)

  -u [ --username ] arg                 username

  -p [ --password ] arg                 password

  --authenticationDatabase arg          user source (defaults to dbname)

  --authenticationMechanism arg (=MONGODB-CR)

                                        authentication mechanism

  --dbpath arg                          directly accessmongod database files

                                        in thegiven path, instead of

                                       connecting to a mongod  server -needs

                                        to lockthe data directory, so cannot

                                        be usedif a mongod is currently

                                       accessing the same path

  --directoryperdb                      each db is in a separate directly

                                       (relevant only if dbpath specified)

  --journal                             enable journaling(relevant only if

                                        dbpathspecified)

  -d [ --db ] arg                       database to use

  -c [ --collection ] arg               collection to use (somecommands)

  -l [ --local ] arg                    local filename for put|get(default is

                                        to usethe same name as 'gridfs

                                       filename')

  -t [ --type ] arg                     MIME type for put (defaultis to omit)

  -r [ --replace ]                      Remove other files withsame name after

                                        PUT

                                       

--上传文件

[root@racdb ~]# mongofiles put foo.log

connected to: 127.0.0.1

added file: { _id:ObjectId('56caba480ad7ef0aa8a76f0c'), filename: "foo.log", chunkSize:261120, uploadDate: new Date(1456126536618), md5:"d1bfff5ab0cc6b652aaf08345b19b7e6", length: 21 }

done!

--列出文件

[root@racdb ~]# mongofiles list

connected to: 127.0.0.1

install.log     54876

foo.log 21

--下载文件

[root@racdb ~]# rm -f foo.log

[root@racdb ~]# mongofiles get foo.log

connected to: 127.0.0.1

done write to: foo.log

[root@racdb ~]# ll foo.log

-rw-r--r--. 1 root root 21 2  22 15:36 foo.log

--Gridfs中删除一个文件

[root@racdb ~]# mongofiles deleteinstall.log

connected to: 127.0.0.1

done!

[root@racdb ~]# mongofiles list

connected to: 127.0.0.1

foo.log 21

 

Gridfs内部原理

Gridfs的基本思想就是可以将大文件分成很多块,每块作为一个单独的文档存储,这样就能存大文件了。它一个建立在普通MongoDB文档基础上轻量级文件规范。

由于MongoDB支持在文档存储二进制数据,可以最大限度减少块的存储开销。另外,除了存储文件本身的块,还有一个单独的文档用来存储分块的信息和文件的元数据。

 

Gridfs的块有个单独的fs.chunks集合(默认),块集合的文档结构如下:

{

"_id" : ObjectId("..."),

"n" : 0,

"data" :BinData("..."),

"files_id" :ObjectId("...")

}

 

  _id:块的唯一ID

 files_id:包含这个块元数据的文件文档的id

 n:表示块编号,也就是这个块在原文件中顺序编号

 data:包含组成文件块的二进制数据

 

> db.fs.chunks.find()

{ "_id" :ObjectId("56caba48e0355316e5e4ab39"), "files_id" :ObjectId("56caba480ad7ef0aa8a76f0c"), "n" : 0,"data" : BinData(0,"SGVsbG8gTW9uZ29EQiBHcmlkZnMK") }

{ "_id" :ObjectId("56cabb85e0355316e5e4ab3a"), "files_id" :ObjectId("56cabb85d07cdd46e1f143a4"), "n" : 0,"data" : BinData(0,"SGVsbG8gTW9uZ29EQiBHcmlkZnMK") }

{ "_id" :ObjectId("56cabb89e0355316e5e4ab3b"), "files_id" :ObjectId("56cabb895c03f6feeb64bb6e"), "n" : 0,"data" :BinData(0,"5a6J6KOFIGxpYmdjYy00LjQuNy00LmVsNi54ODZfNjQKd2FybmluZzogbGliZ2NjLTQuNC43LTQuZWw2Lng4Nl82NDogSGVhZGVyIFYzIFJTQS9TSEEyNTYgU2lnbmF0dXJlLCBrZXkgSUQgZWM1NTFmMDM6IE5PS0VZCuWuieijhSBmb250cGFja2FnZXMtZmlsZXN5c3RlbS0xLjQxLTEuMS5lbDYu

......

--查询返回指定字段

>db.fs.chunks.find({},{"files_id":1,"n":1})

{ "_id" :ObjectId("56caba48e0355316e5e4ab39"), "files_id" :ObjectId("56caba480ad7ef0aa8a76f0c"), "n" : 0 }

{ "_id" :ObjectId("56cabb85e0355316e5e4ab3a"), "files_id" :ObjectId("56cabb85d07cdd46e1f143a4"), "n" : 0 }

{ "_id" :ObjectId("56cabb89e0355316e5e4ab3b"), "files_id" : ObjectId("56cabb895c03f6feeb64bb6e"),"n" : 0 }

 

 

Gridfs文件的元数据放在fs.files集合(默认)。这里没每个文档代表GridFS中的一个文件,与文件相关的自定义元数据也可以存在其中。

> db.fs.files.find()

{ "_id" :ObjectId("56caba480ad7ef0aa8a76f0c"), "filename" :"foo.log", "chunkSize" : 261120, "uploadDate" :ISODate("2016-02-22T07:35:36.618Z"), "md5" :"d1bfff5ab0cc6b652aaf08345b19b7e6", "length" : 21 }

{ "_id" :ObjectId("56cabb85d07cdd46e1f143a4"), "filename" :"foo.log", "chunkSize" : 261120, "uploadDate" :ISODate("2016-02-22T07:40:53.015Z"), "md5" :"d1bfff5ab0cc6b652aaf08345b19b7e6", "length" : 21 }

{ "_id" :ObjectId("56cabb895c03f6feeb64bb6e"), "filename" :"install.log", "chunkSize" : 261120, "uploadDate": ISODate("2016-02-22T07:40:57.387Z"), "md5" :"fbe1119cd9688d14475e2a84ccd8a7a6", "length" : 54876 }

 

 _id 文件的唯一id,在块中作为files_id键值存储

 length 文件内容总的字节数

 chunkSize 每块的大小(字节),默认是256K,必要时可调整

 uploadDate文件存入GridFS的时间戳

 md5 文件内容的md5的校验和,由服务器端生成。

 

在弄明白GridFS原理后,可对GridFS进行一些操作

--获取GridFS中不重复的文件列表

>db.fs.files.distinct("filename")

[ "foo.log","install.log" ]

你可能感兴趣的:(MongoDB 进阶(GridFS))