mfs-1.6.11故障一例

mfs-1.6.11故障一例

一、系统概况

服务器

Moosefs版本

OS

Mfsmaster

1.6.11 

centos 5.5

Mfschunkerserver

1.6.11 

centos 5.5

Mfsmetalogger

1.6.11 

centos 5.5

二、场景回放

2012410日下午15:11,一同事在检查服务器时发现mfschunkerserver服务器的时间实际的时间错上一个多小时,因此就使用ntpdate同步时间,紧接着恐怖的事情发生。

Mfsmastermfschunkerserver之间失去连接,虽然之前和知道在chunkerserver服务器时间发生变化时会会短暂的出现mfsmastermfschunkerserver之前失去连接的情况,但是在经历了有7分钟之后,依旧无法建立连接。

Mfsmaster服务器错误信息如下:

Apr 10 15:19:49 ngmaster mfsmaster[1318]: * currently unavailable file 3885142: 2011/blog/article_photo_b/1023/21/nzGZBk.JPG
Apr 10 15:19:49 ngmaster mfsmaster[1318]: currently unavailable chunk 00000000003F769A (inode: 3885143 ; index: 0)
Apr 10 15:19:49 ngmaster mfsmaster[1318]: * currently unavailable file 3885143: 2011/his/blog/head_photo/1023/21/5sCEGW_cut.jpg
Apr 10 15:19:49 ngmaster mfsmaster[1318]: currently unavailable chunk 00000000003FB946 (inode: 3885144 ; index: 0)
Apr 10 15:19:49 ngmaster mfsmaster[1318]: * currently unavailable file 3885144: blog/head_photo/118/11850/1185078_l.gif
Apr 10 15:19:49 ngmaster mfsmaster[1318]: currently unavailable chunk 00000000003FB947 (inode: 3885145 ; index: 0)
Apr 10 15:19:49 ngmaster mfsmaster[1318]: * currently unavailable file 3885145: blog/head_photo/118/11850/1185078_m.gif
Apr 10 15:19:49 ngmaster mfsmaster[1318]: currently unavailable chunk 00000000003FB948 (inode: 3885146 ; index: 0)
Apr 10 15:19:49 ngmaster mfsmaster[1318]: * currently unavailable file 3885146: blog/head_photo/118/11850/1185078_s.gif
Apr 10 15:19:49 ngmaster mfsmaster[1318]: currently unavailable chunk 00000000003F769E (inode: 3885147 ; index: 0)
Apr 10 15:19:49 ngmaster mfsmaster[1318]: * currently unavailable file 3885147: 2011/his/blog/head_photo/1023/21/BWpiZ4.gif
Apr 10 15:19:49 ngmaster mfsmaster[1318]: currently unavailable chunk 00000000003F769F (inode: 3885148 ; index: 0)
Apr 10 15:19:49 ngmaster mfsmaster[1318]: * currently unavailable file 3885148: 2011/his/blog/head_photo/1023/21/GiPKeB_cut.jpg
Apr 10 15:19:49 ngmaster mfsmaster[1318]: currently unavailable chunk 00000000003F76A3 (inode: 3885149 ; index: 0)
Apr 10 15:19:49 ngmaster mfsmaster[1318]: * currently unavailable file 3885149: 2011/his/blog/head_photo/1023/21/FRnrnh.gif
Apr 10 15:19:49 ngmaster mfsmaster[1318]: currently unavailable chunk 00000000003F76A4 (inode: 3885150 ; index: 0)
Apr 10 15:19:49 ngmaster mfsmaster[1318]: * currently unavailable file 3885150: 2011/blog/article_photo_s/1023/21/zpW6XB.JPG
Apr 10 15:19:49 ngmaster mfsmaster[1318]: currently unavailable chunk 00000000003F76A5 (inode: 3885151 ; index: 0)
Apr 10 15:19:49 ngmaster mfsmaster[1318]: * currently unavailable file 3885151: 2011/blog/article_photo_b/1023/21/zpW6XB.JPG

对于以上错误信息,之前在做mfs测试时我和另外同事测试过,可以在Client端使用mfsfilerepair修复,但是现在问题时各个client根本无法进入,也就无法使用命令mfsfilerepair进行修复了。

同时在经历了一段时间之后,mfschunkerserver服务器报错信息如下:

Apr 10 16:44:42 chunker1 ^XX`[10952]: closing *:9422
Apr 10 16:45:05 chunker1 mfschunkserver[11053]: set gid to 500
Apr 10 16:45:05 chunker1 mfschunkserver[11053]: set uid to 500
Apr 10 16:45:27 chunker1 sshd(pam_unix)[11061]: session opened for user root by (uid=0)
Apr 10 16:54:47 chunker1 mfschunkserver[11055]: listen on *:9422
Apr 10 16:54:47 chunker1 mfschunkserver[11055]: connecting ...
Apr 10 16:54:47 chunker1 mfschunkserver[11055]: open files limit: 10000
Apr 10 16:54:47 chunker1 mfschunkserver[11055]: connected to Master
Apr 10 16:54:52 chunker1 mfschunkserver[11055]: testing chunk: /xxtdata/3C/chunk_00000000002B013C_00000001.mfs
Apr 10 16:55:02 chunker1 mfschunkserver[11055]: testing chunk: /xxtdata/36/chunk_00000000002B0136_00000001.mfs
Apr 10 16:55:12 chunker1 mfschunkserver[11055]: testing chunk: /xxtdata/34/chunk_00000000002B0134_00000001.mfs
Apr 10 16:55:22 chunker1 mfschunkserver[11055]: testing chunk: /xxtdata/2E/chunk_00000000002B012E_00000001.mfs
Apr 10 16:55:32 chunker1 mfschunkserver[11055]: testing chunk: /xxtdata/A0/chunk_00000000001903A0_00000002.mfs
Apr 10 16:55:42 chunker1 mfschunkserver[11055]: testing chunk: /xxtdata/1D/chunk_00000000002B011D_00000001.mfs
Apr 10 16:55:50 chunker1 mfschunkserver[11055]: connecting ...
Apr 10 16:55:50 chunker1 mfschunkserver[11055]: connected to Master

 

而与此同时,尝试通过重启mfsmaster服务,以及mfschunkerserver服务器,试图将moosefs系统起来,但是每次尝试都以失败而告终。联系国内之前一起研究mfs的同仁以及moosefs官方网站的开发人员,最终也是无果。无奈之下,只好根据错误查看moosef-1.6.11源码,发现有关currently unavailable chunk的错误在源码中mfsmaster/filesystem.c有如下相关代码:

for (k=0 ; k<(NODEHASHSIZE/3600) && i<NODEHASHSIZE ; k++,i++) {

               for (f=nodehash[i] ; f ; f=f->next) {

                       if (f->type==TYPE_FILE || f->type==TYPE_TRASH || f->type==TYPE_RESERVED) {

                               valid = 1;

                               ugflag = 0;

                               for (j=0 ; j<f->data.fdata.chunks ; j++) {

                                       chunkid = f->data.fdata.chunktab[j];

                                       if (chunkid>0) {

                                                if (chunk_get_validcopies(chunkid,&vc)!=STATUS_OK) {

                                                       syslog(LOG_ERR,"structure error - chunk %016"PRIX64" not found (inode: %"PRI

u32" ; index: %"PRIu32")",chunkid,f->id,j);

                                                       if (leng<MSGBUFFSIZE) {

                                                               leng += snprintf(msgbuff+leng,MSGBUFFSIZE-leng,"structure error - ch

unk %016"PRIX64" not found (inode: %"PRIu32" ; index: %"PRIu32")\n",chunkid,f->id,j);

                                                       }

                                                       valid =0;

                                                       mchunks++;

                                                } else if (vc==0) {

                                                       syslog(LOG_ERR,"currently unavailable chunk %016"PRIX64" (inode: %"PRIu32" ;

 index: %"PRIu32")",chunkid,f->id,j);

                                                       if (leng<MSGBUFFSIZE) {

                                                               leng += snprintf(msgbuff+leng,MSGBUFFSIZE-leng,"currently unavailabl

e chunk %016"PRIX64" (inode: %"PRIu32" ; index: %"PRIu32")\n",chunkid,f->id,j);

                                                       }

                                                       valid = 0;

                                                       mchunks++;

                                               } else if (vc<f->goal) {

                                                       ugflag = 1;

                                                       ugchunks++;

                                               }

                                               chunks++;

                                       }

                               }

而在moosefs-1.6.24中代码为:

 for (k=0 ; k<(NODEHASHSIZE/14400) && i<NODEHASHSIZE ; k++,i++) {

               for (f=nodehash[i] ; f ; f=f->next) {

                       if (f->type==TYPE_FILE || f->type==TYPE_TRASH || f->type==TYPE_RESERVED) {

                               valid = 1;

                               ugflag = 0;

                               for (j=0 ; j<f->data.fdata.chunks ; j++) {

                                       chunkid = f->data.fdata.chunktab[j];

                                       if (chunkid>0) {

                                               if (chunk_get_validcopies(chunkid,&vc)!=STATUS_OK) {

                                                       if (errors<ERRORS_LOG_MAX) {

                                                               syslog(LOG_ERR,"structure error - chunk %016"PRIX64" not found (inod

e: %"PRIu32" ; index: %"PRIu32")",chunkid,f->id,j);

                                                               if (leng<MSGBUFFSIZE) {

                                                                       leng += snprintf(msgbuff+leng,MSGBUFFSIZE-leng,"structure er

ror - chunk %016"PRIX64" not found (inode: %"PRIu32" ; index: %"PRIu32")\n",chunkid,f->id,j);

                                                               }

                                                               errors++;

                                                       }

                                                       notfoundchunks++;

                                                       if ((notfoundchunks%1000)==0) {

                                                               syslog(LOG_ERR,"unknown chunks: %"PRIu32" ...",notfoundchunks);

                                                       }

                                                       valid =0;

                                                       mchunks++;

                                               } else if (vc==0) {

                                                       if (errors<ERRORS_LOG_MAX) {

                                                               syslog(LOG_ERR,"currently unavailable chunk %016"PRIX64" (inode: %"P

RIu32" ; index: %"PRIu32")",chunkid,f->id,j);

                                                               if (leng<MSGBUFFSIZE) {

                                                                       leng += snprintf(msgbuff+leng,MSGBUFFSIZE-leng,"currently un

available chunk %016"PRIX64" (inode: %"PRIu32" ; index: %"PRIu32")\n",chunkid,f->id,j);

                                                               }

                                                               errors++;

                                                       }

                                                       unavailchunks++;

                                                       if ((unavailchunks%1000)==0) {

                                                               syslog(LOG_ERR,"unavailable chunks: %"PRIu32" ...",unavailchunks);

                                                       }

                                                       valid = 0;

                                                       mchunks++;

                                               } else if (vc<f->goal) {

                                                       ugflag = 1;

                                                       ugchunks++;

                                               }

                                               chunks++;

                                       }

                               }

由两个moosefs版本之间NODEHASHSIZE可知,原来的时间为3600s即一小时,而后来则调整为14400s4小时。因此尝试升级mfsmaster服务器的版本(安装过程这里就不再赘述),结果很快mfschunkerservermfsmaster之间恢复了连接,mfsclient挂载也恢复了正常。最终时间为19:05分,整体故障时间为3小时54分。

当然为了保证外部用户的正常访问,是使用了备用的系统对外提供服务器,不然外部用户能投诉死。

为了保证整个系统的版本一致,在日后的加班过程中也将chunkerservermfsmount以及mfsclient的版本进行升级。而这里也简单介绍一下moosefs-1.6.24的新特性:

     1mfs新版本不在有文件2T的限制,而最大的可以支持128P大小的数据。

     2Chunkservers在初始化过程中不再检查每个数据块的属性,从而加快了Chunkservers启动速度。

     3、添加test的命令,当程序在运行时返回程序的pid

     4、为mfscgiserv添加lockfile/pidfile,从而可以应用 'start', 'stop','restart''test'

     5、添加了简单的网络拓扑结构的支持(“架构基础”) -详细可参阅mfstopology.cfg

     6、当主服务器的磁盘没有空间存放metadata时,使用备用服务器存储metadata

     7、添加隐藏文件'.oplog''.ophistory' 存放由mfsmount的详细的当前/历史的详细操作信息,包括客户端读取pid/uid/gid和文件操作类型。

     8、服务的挂载密码,存放在mfsmount.cfg(而不是fstab的,这是由每个人都可读取)。

     9、连接符在客户端缓存。

     10、频繁连接主服务器的固有问题,现在当创建多个快照或使用mfstools(为mfstools安装代理)没有必要建立新的连接。

 

因此为了保证系统的稳定性,可以在适时的更新软件的版本。尤其在moosefs-1.6.24版本对chunkerserver做了大幅度的修改,从而加快了moosefs chunkerserver的启动速度。moosefs-1.6.24chunkerserver因为在新版中省略掉了对数据库属性的扫描,因此在mfschunkerserver启动的时候节省了大量的时间。进过测试同样的文件mfs-1.6.11需要110s的时间,而对应mfs-1.6.24只需要29s的时间。mfschunkerserver启动速度还是有比较大的提升的。

 

你可能感兴趣的:(mfs-1.6.11故障一例)