EMC Isilon存储系统磁盘状态解释

写这个blog的目的是因为遇到了很多客户,从我们这里买了磁盘,然后更换后,只要不是HEALTHY的状态,就说磁盘有问题,然后要求退货,或者直接就退货了,等问工程师,当时磁盘是什么状态的时候,他的回答往往是不识别、不同步等等博大精深的根本无法知道磁盘到底什么状态的答案。

很多二把刀头脑中,磁盘只有两个状态,正常和故障,在isilon系统中,正常就是HEALTHY,翻译过来就是健康的意思,其他统统就是故障。其实任何的智能存储系统,磁盘都有很多的状态,每个状态代表了磁盘不同的含义,有了这个状态,才可以往下进一步看如何处理。

下面这个表是isilon的所有磁盘状态定义,看起来有点复杂,后面我会对常见的一些状态做个解释:

Description 描述

如何查看

Error state

HEALTHY

健康

All drives in the node are functioning correctly.

这个就不用过多解释了,磁盘工作状态正常

Command-line interface, web administration interface

命令行或者图形界面查看

SMARTFAIL or Smartfail or restripe in progress

磁盘故障,数据拷贝中

The drive is in the process of being removed safely from the file system, either because of an I/O error or by user request. Nodes or drives in a smartfail or read-only state affect only write quorum.

磁盘故障了,当然也可以是用户自己触发的,系统开始同步这个磁盘中的数据到其他磁盘

Command-line interface, web administration interface

NOT AVAILABLE

A drive is unavailable for a variety of reasons. You can click the bay to view detailed information about this condition.

Note: In the web administration interface, this state includes the ERASE and SED_ERROR command-line interface states.

Command-line interface, web administration interface

X

SUSPENDED

挂起

This state indicates that drive activity is temporarily suspended and the drive is not in use. The state is manually initiated and does not occur during normal cluster activity.

Command-line interface, web administration interface

NOT IN USE

不在使用

A node in an offline state affects both read and write quorum.

Command-line interface, web administration interface

REPLACE

更换

The drive was smartfailed successfully and is ready to be replaced.

Command-line interface only

STALLED

失速状态,就是可能磁盘坏了,系统要去再检查磁盘是否正的坏了

The drive is stalled and undergoing stall evaluation. Stall evaluation is the process of checking drives that are slow or having other issues. Depending on the outcome of the evaluation, the drive may return to service or be smartfailed. This is a transient state.

Command-line interface only

NEW

新磁盘

The drive is new and blank. This is the state that a drive is in when you run the isi dev command with the -a add option.

Command-line interface only

USED

拆机磁盘

The drive was added and contained an Isilon GUID but the drive is not from this node. This drive likely will be formatted into the cluster.

Command-line interface only

PREPARING

准备中

The drive is undergoing a format operation. The drive state changes to HEALTHY when the format is successful.

Command-line interface only

EMPTY

空的

No drive is in this bay.

Command-line interface only

WRONG_TYPE

错误类型

The drive type is wrong for this node. For example, a non-SED drive in a SED node, SAS instead of the expected SATA drive type.

Command-line interface only

BOOT_DRIVE

系统启动盘

Unique to the A100 drive, which has boot drives in its bays.

Command-line interface only

SED_ERROR

The drive cannot be acknowledged by the OneFS system.

Note: In the web administration interface, this state is included in Not available.

Command-line interface, web administration interface

X

ERASE

The drive is ready for removal but needs your attention because the data has not been erased. You can erase the drive manually to guarantee that data is removed.

Note: In the web administration interface, this state is included in Not available.

Command-line interface only

INSECURE

Data on the self-encrypted drive is accessible by unauthorized personnel. Self-encrypting drives should never be used for non-encrypted data purposes.

Note: In the web administration interface, this state is labeled Unencrypted SED.

Command-line interface only

X

UNENCRYPTED SED
                                        

Data on the self-encrypted drive is accessible by unauthorized personnel. Self-encrypting drives should never be used for non-encrypted data purposes.

Note: In the command-line interface, this state is labeled INSECURE.

Web administration interface only

X

对于最后面的几个状态是关于SED磁盘的,就不特别说明了,说多了,更乱。中国国内很少有客户使用SED磁盘的,少量外企客户使用。

重点几个状态给大家解释下:

  • HEALTHY  这个就是磁盘正常的状态,更换完磁盘,要到这个状态才是正常的,其他都是存在各种各样问题,还需要继续处理的。
  • SMARTFAIL  这个是很多二把刀最容易犯错的情况,如果磁盘在SMARTFAIL这个状态是坚决不能更换磁盘的,这个时候系统是在将这个磁盘中的数据同步到其他磁盘中。这个状态会持续挺长时间的,1-2天,有些如果有问题甚至更长时间。
  • SUSPEND,就是挂起的状态,这个在Isilon GEN6系统中常见的一个状态,在更换故障磁盘的时候,要先将一个sled中的磁盘放置到SUSPEND状态才可以更换有问题的磁盘。
  • REPLACE,这个也是很重要的一个状态,磁盘只有到了REPLACE这个状态才可以进行更换。
  • STALLED,这个翻译过来就是失速,降速,类似于NetApp的maintenance center的意思,就是系统认为磁盘可能有问题,但没有绝对把握,系统会对磁盘做一系列的测试,如果测试通过,就又回到了HEALTHY的状态,如果测试没有通过,就到了SMARTFAIL的状态。所以看到这个,不要更换呀。
  • NEW,这个是更换磁盘后的第一个状态,当然前提是磁盘是全新的,非拆机盘。
  • USED,这个也是很常见的一个状态,这个状态代表这个磁盘是拆机盘,使用过的。一般二手磁盘加入后都是这个状态,然后通过命令add,甚至要formate这个磁盘最后才能正常。这里的format不是传统意义上的对低格,其实就是做两个分区,满足isilon文件系统要求,速度很快,几分钟就完成了。
  • PREPRAING,这个是中间状态,当add磁盘后,首先就是这个状态,添加到存储以后,就变成了HEALTHY,一般几分钟时间就到了HEALTHY。
  • EMPTY,顾名思义就是空,系统没有识别到磁盘。其实没有识别到有两层含义,一种是操作系统压根没有认到这个磁盘,可能是磁盘有问题,可能是SAS后端有问题。还有一种是底层OS其实认到了磁盘,但是OneFS 的isilon的OS识别不到磁盘。这个在老版本的OneFS中很常见,是个bug。

今天就分享到这里,上面的状态都是系统正常运行的状态,也有各种异常情况发生,比如smartfail就不动,好几天一点不走,preparingt了好多天也不到正常状态,怎么换盘都是EMPTY等等。具体问题就add wechat at StorageExpert做进一步沟通吧。 

你可能感兴趣的:(EMC存储设备,存储维护,NetApp,存储,linux,运维,EMC存储)