市面上各种分布式文件系统品种繁多,层出不穷。
列举几个主要的:
mogileFS:
Key-Value型元文件系统,不支持FUSE,应用程序访问它时需要API,
主要用在web领域处理海量小图片,效率相比mooseFS高很多。
fastDFS:
国人在mogileFS的基础上进行改进的key-value型文件系统,
同样不支持FUSE,提供比mogileFS更好的性能。
mooseFS: 支持FUSE,相对比较轻量级,对master服务器有单点依赖,
用perl编写,性能相对较差,国内用的人比较多
glusterFS: 支持FUSE,比mooseFS庞大
ceph:
支持FUSE,客户端已经进入了linux-2.6.34内核,也就是说可以像ext3/rasierFS一样,选择ceph为文件系统。
彻底的分布式,没有单点依赖,用C编写,性能较好。
基于不成熟的btrfs,其本身也非常不成熟。
lustre:
Oracle公司的企业级产品,非常庞大,对内核和ext3深度依赖
NFS: 老牌网络文件系统,具体不了解,反正NFS最近几年没发展,肯定不能用。
本来我打算用mogileFS,因为它用的人最多,而且我的主要需求都是在web方面。
但是研究了它的api之后发现,Key-Value型文件系统没有目录结构,
导致不能用list某个子目录的所有文件,不能直接像本地文件系统一样操作,干什么事情都需要一个api,让人十分不爽。
mogileFs这种做法,可能是受同一个开发团队的另一个大名鼎鼎的产品memcached的侦听端口+api模式影响,
也有可能是mogileFS刚开始设计的时候,FUSE还没有开始流行。
总之我决心要找一个支持FUSE的分布式文件系统,最后就在mooseFS, glusterFS, ceph中选择。
从技术上来看,ceph肯定是最棒的,用c编写,进入linux-2.6.34内核,基于btrfs文件系统,保证了它的高性能,
而多台master的结构彻底解决了单点依赖问题,从而实现了高可用。
可是ceph太不成熟了,它基于的btrfs本身就不成熟,它的官方网站上也明确指出不要把ceph用在生产环境中。
而且国内用的人较少,linux发行版中,ubuntu10.04的内核版本是2.6.32,仍然不能直接使用ceph。
而glusterFS比较适合大型应用,口碑相对较差,因此也不考虑。
最后我选择了缺点和优点同样明显的mooseFS。
虽然它有单点依赖,它的master非常占内存。但是根据我的需求,mooseFS已经足够满足我的存储需求。
国内mooseFS的人比较多,并且有很多人用在了生产环境,更加坚定了我的选择。
打算用一台高性能服务器(双路至强5500, 24GB内存)作为为master,
两台HP DL360G4(6块SCSI 146GB)作为chunk服务器,搭建一个冗余度为2的分布式文件系统,
提供给web服务中的每一台服务器使用。
一.MooseFS简介:
MooseFs是一个具有容错功能的网络分布式文件系统.
MooseFS独有的特性:
*高可靠性,数据能在不同计算机上存储若干副本。
*通过添加新的计算机或是磁盘来动态扩展空间。
*能存储特定时间内删除的文件。
*建立文件快照,和整个原文件保持一致的副本,原文件也可以正在被访问或写入
二. MooseFS架构(如图):
包括四种类型的机器:
*Managing server(master server)
*Data servers(chunk servers)
*Metadata backup servers(metalogger server)
*Client
三.支持的平台:
*Linux (Linux 2.6.14 and up have FUSE support included in the official kernel)
*FreeBSD
*NetBSD
*OpenSolaris
*MacOS X
四.环境如下:
Managing server(master server): OS: Centos5.4 IP:192.168.2.241
Metadata backup server (metalogger server):
OS: Centos5.4 IP:192.168.2.242
Data servers(chunk servers):
OS: Centos5.4 IP:192.168.2.243
OS: Centos5.4 IP:192.168.2.244
Client(mfsmount):
OS: Ubuntu9.10 IP:192.168.2.66
五.安装配置
1.Master server机器上:
*下载安装
wget http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz
groupadd mfs
useradd mfs –g mfstar -zxvf mfs-1.6.13.tar.gz
cd mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfschunkserver --disable-mfsmount --with-default-group=mfs --with-default-user=mfs
make&make install
*修改相关配置:
cd /usr/local/mfs/etc
mv mfsexports.cfg.dist mfsexports.cfg
mv mfsmaster.cfg.dist mfsmaster.cfgcd /usr/local/mfs/var/mfs
mv metadata.mfs.empty metadata.mfs (否则启动时会提示如下错误)
if this is new instalation then rename metadata.mfs.empty as metadata.mfs
init: file system manager failed !!! error occured during initialization – exiting
*修改mfsexports.cfg为如下
:
(允许192.168.2.66 挂载)
#* / ro
#192.168.1.0/24 / rw
#192.168.1.0/24 / rw,alldirs,maproot=0,password=passcode
#10.0.0.0-10.0.0.5 /test rw,maproot=nobody,password=test
#* . rw
192.168.2.66 / rw,alldirs,maproot=0
*修改/etc/hosts添加:
192.168.2.241 mfsmaster
*其它配置文件我采用的默认方式
*启动master
/usr/local/mfs/sbin/mfsmaster start
*关闭master
/usr/local/mfs/sbin/mfsmaster stop
启动: /usr/local/mfs/sbin/mfscgiserv
停止: kill /usr/local/mfs/sbin/mfscgiserv
2. metalogger server机器上
*下载安装:
wget http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz
groupadd mfs
useradd mfs –g mfstar -zxvf mfs-1.6.13.tar.gz
cd mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfschunkserver --disable-mfsmount --with-default-user=mfs --with-default-user=mfs
make&make install
*修改相关配置:
cd /usr/local/mfs/etc
mv mfsmetalogger.cfg.dist mfsmetalogger.cfg
*修改/etc/hosts添加:
192.168.2.241 mfsmaster
*其它保持默认配置
*启动metalogger:
/usr/local/mfs/sbin/mfsmetalogger start
*关闭metalogger:
/usr/local/mfs/sbin/mfsmetalogger stop
注:metalogger连接master的9419端口,注意Firewall把端口打开,我在做测试时把iptables 关闭了.
当master server出现故障需要恢复时可以从metalogger server 复制metadata.mfs.back 和最后一个日志,缺一不可.
3.chunk server 机器上
*下载安装:
wget http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz
groupadd mfs
useradd mfs –g mfstar -zxvf mfs-1.6.13.tar.gz
cd mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfsmaster --disable-mfsmount --with-default-user=mfs --with-default-group=mfs
make&make install
*修改相关配置:
cd /usr/local/mfs/etc
mv mfschunkserver.cfg.dist mfschunkserver.cfg
mv mfshdd.cfg.dist mfshdd.cfg
*修改mfshdd.cfg 文件内容如下:
/store-data
注:mfshdd.cfg文件存放的是用来给MooseFS使用的空间,我这里是/store-data分区,把其它没用的删除掉,如果你有更多分区可以一一加进来.
*修改 /etc/hosts添加:
192.168.2.241 mfsmaster
我这里mfschunkserver.cfg文件采用默认配置.
*改变/store-data权限
chown –R mfs.mfs /store-data
*启动chunk server:
/usr/local/mfs/sbin/mfschunkserver start
*停止chunk server
/usr/local/mfs/sbin/mfschunkserver stop
另一个chunk server配置步骤和这个一样,详细步骤不再给出.
4.Client配置:
需要用到FUSE,官网地址:http://fuse.sourceforge.net/
Ubuntu要用到libfuse-dev
*下载安装
*安装FUSE
由于我的Client是ubuntu9.10所以直接用apt-get install libfuse-dev 安装的FUSE
非ubuntu系统可以用源码直接安装步骤如下:
wget http://cdnetworks-kr-2.dl.sourceforge.net/project/fuse/fuse-2.X/2.8.3/fuse-2.8.3.tar.gz
tar –zxvf fuse-2.8.3.tar.gz
cd fuse-2.8.3
./configure –prefix=/usr/local/fuse
make
make install
*安装MooseFS
wget http://moosefs.com/tl_files/mfscode/mfs-1.6.13.tar.gz
groupadd mfs
useradd mfs -g mfstar -zxvf mfs-1.6.13.tar.gz
cd mfs-1.6.13
./configure --prefix=/usr/local/mfs --disable-mfsmaster --disable-mfschunkserver --enable-mfsmount --with-default-user=mfs --with-default-group=mfs
make & make install
*挂载MooseFS
mkdir –p /media/mfs
/usr/local/mfs/bin/mfsmount -H mfsmaster /media/mfs/
mfsmaster accepted connection with parameters:read-write,restricted_ip;root mapped to root:root
*查看挂载情况:
4、挂接MFSMETA文件系统
(1)、创建挂接点 mkdir /mnt/mfsmeta
(2)、挂接MFSMETA
/usr/local/mfs-old/bin/mfsmount -m /mnt/mfsmeta/ -H 192.168.3.34
df –h | grep mfs
mfsmaster:9421 1.8T 139G 1.6T 9% /media/mfs
5)、卸载已挂接的文件系统
利用Linux系统的umount命令就可以了,例如:
[root@www ~]# umount /mnt/mfs
如果出现下列情况:
[root@www ~]# umount /mnt/mfs
umount: /mnt/mfs: device is busy
umount: /mnt/mfs: device is busy
则说明客户端本机有正在使用此文件系统,可以查明是什么命令正在使用,然后推出就可以了,最好不要强制退出。
*在client端进行具体的操作指令
*指定文件副本的分数。
/usr/local/mfs/sbin/mfssetgoal –r 3 /media/mfs
注: (-r 表示递归)
*查看:
/usr/local/mfs/bin/mfsgetgoal /media/mfs/ubuntu-9.10-server-i386.iso
/media/mfs/ubuntu-9.10-server-i386.iso: 3
*指定文件删除后回收的时间600秒(10分钟)
/usr/local/mfs/bin/mfssettrashtime -r 600 /media/mfs
*更多的使用方法查看:
http://www.moosefs.com/reference-guide.html#operations-specific-for-moosefs
六.故障恢复测试
假设系统崩溃,我现在做的是删除/usr/local/mfs并且重新启动计算机,显然现在整个文件系统是不可用的,重新安装MooseFS,做和原先相同的配置
从Metadata backup server 复制metadata_ml.mfs.back和最后一个日志文件我这里是changelog_ml.30.mfs,到新安装好的master机器上,执行恢复命令
/usr/local/mfs/sbin/mfsmetarestore -m metadata_ml.mfs.back -o metadata.mfs changelog_ml.30.mfsloading objects (files,directories,etc.) ... ok
loading names ... ok
loading deletion timestamps ... ok
checking filesystem consistency ... ok
loading chunks data ... ok
connecting files and chunks ... ok
applying changes from file: changelog_ml.30.mfs
meta data version: 4633
4765: version mismatch
接着执行如下命令:
/usr/local/mfs/sbin/mfsmetarestore -a
file 'metadata.mfs.back' not found - will try 'metadata_ml.mfs.back' instead
loading objects (files,directories,etc.) ... ok
loading names ... ok
loading deletion timestamps ... ok
checking filesystem consistency ... ok
loading chunks data ... ok
connecting files and chunks ... ok
applying changes from file: /usr/local/mfs/var/mfs/changelog_ml.30.mfs
meta data version: 4633
4765: version mismatch
最后启动master,并查看整个文件系统的恢复情况
按以下步骤安全停止MooseFS集群:
在所有机器上用umount命令卸载文件系统(在我们的示例中是:umount /mnt/mfs)
停止chunk server进程: /usr/sbin/mfschunkserver stop
停止metalogger进程: /usr/sbin/mfsmetalogger stop
停止master server进程: /usr/sbin/mfsmaster stop
文件恢复
Removed files may be accessed through a separately mounted MFSMETA file system.
In particular it contains directories /trash (containing information about deleted files that are still being stored)
and /trash/undel (designed for retrieving files). Only the administrator has access to MFSMETA (user with uid 0, usually root).
$ mfssettrashtime 3600 /mnt/mfs-test/test1
/mnt/mfs-test/test1: 3600
$ rm /mnt/mfs-test/test1
$ ls /mnt/mfs-test/test1
ls: /mnt/mfs-test/test1: No such file or directory
The name of the file that is still visible in the "trash" directory consists of an 8-digit hexadecimal i-node number
and a path to the file relative to the mounting point with characters / replaced with the | character.
If such a name exceeds the limits of the operating system (usually 255 characters), the initial part of the path is deleted.
The full path of the file in relation to the mounting point can be read or saved by reading or saving this special file:
# ls -l /mnt/mfs-test-meta/trash/*test1
-rw-r--r-- 1 user users 1 2007-08-09 15:23 /mnt/mfs-test-meta/trash/00013BC7|test1
# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'
test1
# echo 'test/test2' > '/mnt/mfs-test-meta/trash/00013BC7|test1'
# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'
test/test2
Moving this file to the trash/undel subdirectory causes restoring of the original file in a proper
MooseFS file system - at path set in a way described above or the original path (if it was not changed).
# mv /mnt/mfs-test-meta/trash/00013BC7|test1 /mnt/mfs-test-meta/trash/undel/
Note: if a new file with the same path already exists, restoring of the file will not succeed.
Similarly, you cannot move the file with different filename.
mfs集群开机顺序与关机
MOOSEFS MAINTENANCE
Starting MooseFS cluster
The safest way to start MooseFS (avoiding any read or write errors, inaccessible data or similar problems)
is to run the following commands in this sequence:
start mfsmaster process
start all mfschunkserver processes
start mfsmetalogger processes (if configured)
when all chunkservers get connected to the MooseFS master, the filesystem can be mounted on any number of clients using mfsmount
(you can check if all chunkservers are connected by checking master logs or CGI monitor).
Stopping MooseFS cluster
To safely stop MooseFS:
unmount MooseFS on all clients (using the umount command or an equivalent)
stop chunkserver processes with the mfschunkserver stop command
stop metalogger processes with the mfsmetalogger stop command
stop master process with the mfsmaster stop command.
Maintenance of MooseFS chunkservers
Provided that there are no files with a goal lower than 2 and no under-goal files
(what can be checked by mfsgetgoal -r and mfsdirinfo commands), it is possible to stop or restart a single chunkserver at any time.
When you need to stop or restart another chunkserver afterwards, be sure that the previous one is connected and there are no under-goal chunks.
MooseFS metadata backups
There are two general parts of metadata:
main metadata file (metadata.mfs, named metadata.mfs.back when the mfsmaster is running), synchronized each hour
metadata changelogs (changelog.*.mfs), stored for last N hours (configured by BACK_LOGS setting)
The main metadata file needs regular backups with the frequency depending on how many hourly changelogs are stored.
Metadata changelogs should be automatically replicated in real time. Since MooseFS 1.6.5, both tasks are done by mfsmetalogger daemon.
MooseFS master recovery
In case of mfsmaster crash (due to e.g. host or power failure) last metadata changelog needs to be merged into the main metadata file.
It can be done with the mfsmetarestore utility; the simplest way to use it is:
$ mfsmetarestore -a
If master data are stored in location other than the specified during MooseFS compilation,
the actual path needs to be specified using the -d option,
e.g.:
$ mfsmetarestore -a -d /storage/mfsmaster
MooseFS master recovery from a backup
In order to restore the master host from a backup:
install mfsmaster in normal way
configure it using the same settings (e.g. by retrieving mfsmaster.cfg file from the backup)
retrieve metadata.mfs.back file from the backup or metalogger host, place it in mfsmaster data directory
copy last metadata changelogs from any metalogger running just before master failure into mfsmaster data directory
merge metadata changelogs using mfsmetarestore command as specified before - either using mfsmetarestore -a,
or by specifying actual file names using non-automatic mfsmetarestore syntax, e.g.
$ mfsmetarestore -m metadata.mfs.back -o metadata.mfs changelog.*.mfs
Please also read a mini howto about preparing a fail proof solution in case of outage of the master server.
In that document we present a solution using CARP and in which metalogger takes over functionality of the broken master server.