琐碎的学习——关于mogilefs分布式文件存储

mogilefs开源项目


  • Application level -- no special kernel modules required.
  • No single point of failure -- all three components of a MogileFS setup (storage nodes, trackers, and the tracker's database(s)) can be run on multiple machines, so there's no single point of failure. (you can run trackers on the same machines as storage nodes, too, so you don't need 4 machines...) A minimum of 2 machines is recommended.
  • Automatic file replication -- files, based on their "class", are automatically replicated between enough different storage nodes as to satisfy the minimum replica count as requested by their class. For instance, for a photo hosting site you can make original JPEGs have a minimum replica count of 3, but thumbnails and scaled versions only have a replica count of 1 or 2. If you lose the only copy of a thumbnail, the application can just rebuild it. In this way, MogileFS (without RAID) can save money on disks that would otherwise be storing multiple copies of data unnecessarily.
  • "Better than RAID" -- in a non-SAN RAID setup, the disks are redundant, but the host isn't. If you lose the entire machine, the files are inaccessible. MogileFS replicates the files between devices which are on different hosts, so files are always available.
  • Flat Namespace -- Files are identified by named keys in a flat, global namespace. You can create as many namespaces as you'd like, so multiple applications with potentially conflicting keys can run on the same MogileFS installation.
  • Shared-Nothing -- MogileFS doesn't depend on a pricey SAN with shared disks. Every machine maintains its own local disks.
  • No RAID required -- Local disks on MogileFS storage nodes can be in a RAID, or not. It's cheaper not to, as RAID doesn't buy you any safety that MogileFS doesn't already provide.
  • Local filesystem agnostic -- Local disks on MogileFS storage nodes can be formatted with your filesystem of choice (ext3, XFS, etc..). MogileFS does its own internal directory hashing so it doesn't hit filesystem limits such as "max files per directory" or "max directories per directory". Use what you're comfortable with

1,应用层级别——没有特殊内核依赖

2,无单点故障——mogileFS的三个组件(存储节点,跟踪器,跟踪器数据库)可以运行在多个机器上,所以没有单点故障。

你可以把跟踪器和存储节点放在同样的服务器上,这样你就不必用4台机器。不过推荐是至少2台机器。

3,自动文件复制——文件根据其分类,自动在不同节点之间复制备份满足该类别最少的备份数目要求。

如一个照片网站可以配置照片拥有三份存储,但是缩略图和不同版本只有1到2份复制。如果丢失其中一份备份,mogilefs会自行修复它。这样,就可以节省不必要的磁盘占用。

4,比Raid好——在非存储型raid系统中,磁盘是可以做到冗余的,但主机不是。如果整个机器坏掉,存储的文件就不可达了。mogilefs在不同主机的设备之间做备份,可以保证存储系统永远可用。

5,扁平的命名空间——文件被一个扁平全局的命名空间确定,你可以创建许多命名空间。这样多个应用程序可以运行在mogilefs文件系统上(只要key不冲突即可)

6,没有共享——mogilefs不依赖昂贵的san磁盘做共享磁盘,每个机器维护其自身的磁盘

7,不需要raid——mogilefs存储节点的磁盘可以做raid,也可以不做。mogilefs已经做了raid能做的所有冗余工作。

8,本地文件系统随意——本地磁盘可以选择任意文件系统格式(ext3,xfs等)。mogilefs通过内部目录hash组织文件,不会受到各种文件系统自身的限制(如,目录下文件最大数目,最大文件夹数目等)

你可能感兴趣的:(mogilefs,琐碎的学习)