同raidtools一样,mdadm也可以软件模拟故障,命令选项为 --fail或 --set-faulty:
[root@localhost eric4ever]# mdadm --set-faulty --help
Usage: mdadm arraydevice options component devices...
This usage is for managing the component devices within an array.
The --manage option is not needed and is assumed if the first argument
is a device name or a management option.
The first device listed will be taken to be an md array device, and
subsequent devices are (potential) components of that array.
Options that are valid with management mode are:
--add -a : hotadd subsequent devices to the array
--remove -r : remove subsequent devices, which must not be active
--fail -f : mark subsequent devices a faulty
--set-faulty : same as --fail
--run -R : start a partially built array
--stop -S : deactivate array, releasing all resources
--readonly -o : mark array as readonly
--readwrite -w : mark array as readwrite
[root@localhost eric4ever]# mdadm --fail --help
Usage: mdadm arraydevice options component devices...
This usage is for managing the component devices within an array.
The --manage option is not needed and is assumed if the first argument
is a device name or a management option.
The first device listed will be taken to be an md array device, and
subsequent devices are (potential) components of that array.
Options that are valid with management mode are:
--add -a : hotadd subsequent devices to the array
--remove -r : remove subsequent devices, which must not be active
--fail -f : mark subsequent devices a faulty
--set-faulty : same as --fail
--run -R : start a partially built array
--stop -S : deactivate array, releasing all resources
--readonly -o : mark array as readonly
--readwrite -w : mark array as readwrite
接下来我们模拟 /dev/sdb故障:
[root@localhost eric4ever]# mdadm --manage --set-faulty /dev/md0 /dev/sdb
mdadm: set /dev/sdb faulty in /dev/md0
查看一下系统日志,如果你配置了冗余磁盘,可能会显示如下信息:
kernel: raid5: Disk failure on sdb, disabling device.
kernel: md0: resyncing spare disk sde to replace failed disk
检查 /proc/mdstat,如果配置的冗余磁盘可用,阵列可能已经开始重建。
首先我们使用 mdadm --detail /dev/md0命令来查看一下RAID的状态:
[root@localhost eric4ever]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Thu May 24 13:45:35 2007
Raid Level : raid5
Array Size : 16777088 (16.00 GiB 17.18 GB)
Used Dev Size : 8388544 (8.00 GiB 8.59 GB)
Raid Devices : 3
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 24 14:07:55 2007
State : active, degraded, recovering
Active Devices : 2
Working Devices : 3
Failed Devices : 2
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 3% complete
UUID : 4b15050e:7d0c477d:98ed7d00:0f3c29e4
Events : 0.6
Number Major Minor RaidDevice State
0 8 16 0 faulty spare /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
3 8 64 3 spare rebuilding /dev/sde
查看 /proc/mdstat:
[root@localhost eric4ever]# cat /proc/mdstat
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdb[4] sde[3] sdd[2] sdc[1]
16777088 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU]
[==>..................] recovery = 10.2% (858824/8388544) finish=12.4min speed=10076K/sec
unused devices: <none>
再查看一下RAID状态:
[root@localhost eric4ever]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Thu May 24 13:45:35 2007
Raid Level : raid5
Array Size : 16777088 (16.00 GiB 17.18 GB)
Used Dev Size : 8388544 (8.00 GiB 8.59 GB)
Raid Devices : 3
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 24 14:08:27 2007
State : active, degraded, recovering
Active Devices : 2
Working Devices : 4
Failed Devices : 1
Spare Devices : 2
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 11% complete
UUID : 4b15050e:7d0c477d:98ed7d00:0f3c29e4
Events : 0.8
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
3 8 64 3 spare /dev/sde
4 8 16 4 spare /dev/sdb
已经完成到 11%了。查看一下日志消息:
[root@localhost eric4ever]# tail /var/log/messages
May 24 14:08:27 localhost kernel: --- rd:3 wd:2 fd:1
May 24 14:08:27 localhost kernel: disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
May 24 14:08:27 localhost kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc
May 24 14:08:27 localhost kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdd
May 24 14:08:27 localhost kernel: RAID5 conf printout:
May 24 14:08:27 localhost kernel: --- rd:3 wd:2 fd:1
May 24 14:08:27 localhost kernel: disk 0, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
May 24 14:08:27 localhost kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc
May 24 14:08:27 localhost kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdd
May 24 14:08:27 localhost kernel: md: cannot remove active disk sde from md0 ...
使用 mdadm -E命令查看一下 /dev/sdb的情况:
[root@localhost eric4ever]# mdadm -E /dev/sdb
/dev/sdb:
Magic : a92b4efc
Version : 00.90.00
UUID : 4b15050e:7d0c477d:98ed7d00:0f3c29e4
Creation Time : Thu May 24 13:45:35 2007
Raid Level : raid5
Used Dev Size : 8388544 (8.00 GiB 8.59 GB)
Array Size : 16777088 (16.00 GiB 17.18 GB)
Raid Devices : 3
Total Devices : 5
Preferred Minor : 0
Update Time : Thu May 24 14:08:27 2007
State : active
Active Devices : 2
Working Devices : 4
Failed Devices : 1
Spare Devices : 2
Checksum : a6a19662 - correct
Events : 0.8
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 16 4 spare /dev/sdb
0 0 0 0 0 faulty removed
1 1 8 32 1 active sync /dev/sdc
2 2 8 48 2 active sync /dev/sdd
3 3 8 64 3 spare /dev/sde
4 4 8 16 4 spare /dev/sdb
自动修复完成后,我们再查看一下RAID的状态:
[root@localhost eric4ever]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Thu May 24 13:45:35 2007
Raid Level : raid5
Array Size : 16777088 (16.00 GiB 17.18 GB)
Used Dev Size : 8388544 (8.00 GiB 8.59 GB)
Raid Devices : 3
Total Devices : 5
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu May 24 14:21:54 2007
State : active
Active Devices : 3
Working Devices : 4
Failed Devices : 1
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
UUID : 4b15050e:7d0c477d:98ed7d00:0f3c29e4
Events : 0.9
Number Major Minor RaidDevice State
0 8 64 0 active sync /dev/sde
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
4 8 16 4 spare /dev/sdb
[root@localhost eric4ever]# cat /proc/mdstat
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdb[4] sde[0] sdd[2] sdc[1]
16777088 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>
我们可以看到 /dev/sde已经替换了 /dev/sdb。看看系统的日志消息:
[root@localhost eric4ever]# tail /var/log/messages
May 24 14:21:54 localhost kernel: --- rd:3 wd:3 fd:0
May 24 14:21:54 localhost kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sde
May 24 14:21:54 localhost kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc
May 24 14:21:54 localhost kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdd
May 24 14:21:54 localhost kernel: md: updating md0 RAID superblock on device
May 24 14:21:54 localhost kernel: md: sdb [events: 00000009]<6>(write) sdb's sb offset: 8388544
May 24 14:21:54 localhost kernel: md: sde [events: 00000009]<6>(write) sde's sb offset: 8388544
May 24 14:21:54 localhost kernel: md: sdd [events: 00000009]<6>(write) sdd's sb offset: 8388544
May 24 14:21:54 localhost kernel: md: sdc [events: 00000009]<6>(write) sdc's sb offset: 8388544
May 24 14:21:54 localhost kernel: md: recovery thread got woken up ...
recovery thread got woken up ...
这时我们可以从 /dev/md0中移除 /dev/sdb设备:
[root@localhost eric4ever]# mdadm /dev/md0 -r /dev/sdb
mdadm: hot removed /dev/sdb
类似地,我们可以使用下列命令向 /dev/md0中添加一个设备:
[root@localhost eric4ever]# mdadm /dev/md0 --add /dev/sdf