硬盘坏块检测和隔离

1.oops ~ # smartctl -H /dev/sdb 查看硬盘健康状态

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.3.1-gentoo] (local build)

Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: FAILED! <----result后边的结果:PASSED,这表示硬盘健康状态良好;如果这里显示Failure,那么最好立刻给服务器更换硬盘

Drive failure expected in less than 24 hours. SAVE ALL DATA.

Failed Attributes:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

5 Reallocated_Sector_Ct 0x0033 001 001 036 Pre-fail Always FAILING_NOW 4095

2.oops ~ # smartctl -C -t short /dev/sdb2

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.3.1-gentoo] (local build)

Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Short self-test routine immediately in captive mode".

Drive command "Execute SMART Short self-test routine immediately in captive mode" successful.

Testing has begun.

Please wait 1 minutes for test to complete.

Test will complete after Tue Nov 5 12:19:54 2013

3.oops ~ # smartctl -l selftest /dev/sdb2

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.3.1-gentoo] (local build)

Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short captive Completed: unknown failure 90% 36265 0

# 2 Short captive Completed: unknown failure 90% 36265 0

4.#smartctl -A /dev/sda 查看硬盘的详细信息

#smartctl -s on /dev/sda 如果没有打开SMART技术,使用该命令打开SMART技术。

#smartctl -t short /dev/sda 后台检测硬盘,消耗时间短;

#smartctl -t long /dev/sda 后台检测硬盘,消耗时间长;

#smartctl -C -t /dev/sda short前台检测硬盘,消耗时间短;

#smartctl -C -t /dev/sda long前台检测硬盘,消耗时间长。其实就是利用硬盘SMART的自检程序。

#smartctl -X /dev/sda 中断后台检测硬盘。

#smartctl -l selftest /dev/sda 显示硬盘检测日志。

#smartctl -l error /dev/sda 显示硬盘错误汇总

如果只是逻辑坏道,你可以

直接fsck

fsck -a /dev/sdb

或者格式化

如果是物理坏道,你需要

a.备份硬盘数据

b.删除所有硬盘分区 <--可以隔离坏块

c.根据坏块位置以及大小,估算出所占空间。然后重新分区隔离损坏部分

badblocks -w是破坏性的检查 不要轻易使用 !!!

首先扫描坏道

5.oops ~ # badblocks -s -b 4096 -c 16 /dev/sdb1 -o bad.sdb1

Checking for bad blocks (read-only test): 13.34% done, 2:24 elapsed. (0/0/0 errors)

2.oops ~ # badblocks -s -b 4096 -c 16 /dev/sdb2 -o bad.sdb2

Checking for bad blocks (read-only test): 0.00% done, 0:22 elapsed. (0/0/0 errors)

意思就是以4k为一个block,每一个block检查16次,

将结果输入到bad.sdbx文件,如果硬盘正常的话,bad.sdbx是没有任何内容的,如果硬盘很大,我们可以加一个-s参数来显示进度。

经过慢长的时间,我��得到了一个文件/root/sdb.bad :

16435904

sdb 有1个坏块

先用dd尽量备份坏块

6.dd if=/dev/sdb bs=4096 skip=16435904 of=/tmp/15435904.dat count=1

如果显示读取字节数是0就多试几次, 不行就可能丢失此块数据, 倒是不用担心,一般不会有太大问题.

用badblocks的写测试功能,对这些坏块进行重写(注意! -w写测试会覆盖数据):

7.badblocks -w -f /dev/sdb2 16435904 16435904

如果前面的操作有成功的备份/tmp/15435904.dat, 就把它写回:

8.dd if=/tmp/15435904.dat of=/dev/sdb seek=15435904 bs=4096 count=1

其实我们不需要等待badblocks扫描完成, 就可以进行修复。

badblocks是对块设备进行处理, 所以可以实现对挂载中的系统进行处理。

在修复前后,利用smartctl 对磁盘进行long测试

# smartctl -l selftest /dev/sdb

你可能感兴趣的:(隔离,硬盘坏块检测)