1.oops ~ # smartctl -H /dev/sdb 查看硬盘健康状态
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.3.1-gentoo] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED! <----result后边的结果:PASSED,这表示硬盘健康状态良好;如果这里显示Failure,那么最好立刻给服务器更换硬盘
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 001 001 036 Pre-fail Always FAILING_NOW 4095
2.oops ~ # smartctl -C -t short /dev/sdb2
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.3.1-gentoo] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in captive mode".
Drive command "Execute SMART Short self-test routine immediately in captive mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Tue Nov 5 12:19:54 2013
3.oops ~ # smartctl -l selftest /dev/sdb2
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.3.1-gentoo] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short captive Completed: unknown failure 90% 36265 0
# 2 Short captive Completed: unknown failure 90% 36265 0
4.#smartctl -A /dev/sda 查看硬盘的详细信息
#smartctl -s on /dev/sda 如果没有打开SMART技术,使用该命令打开SMART技术。
#smartctl -t short /dev/sda 后台检测硬盘,消耗时间短;
#smartctl -t long /dev/sda 后台检测硬盘,消耗时间长;
#smartctl -C -t /dev/sda short前台检测硬盘,消耗时间短;
#smartctl -C -t /dev/sda long前台检测硬盘,消耗时间长。其实就是利用硬盘SMART的自检程序。
#smartctl -X /dev/sda 中断后台检测硬盘。
#smartctl -l selftest /dev/sda 显示硬盘检测日志。
#smartctl -l error /dev/sda 显示硬盘错误汇总
如果只是逻辑坏道,你可以
直接fsck
fsck -a /dev/sdb
或者格式化
如果是物理坏道,你需要
a.备份硬盘数据
b.删除所有硬盘分区 <--可以隔离坏块
c.根据坏块位置以及大小,估算出所占空间。然后重新分区隔离损坏部分
badblocks -w是破坏性的检查 不要轻易使用 !!!
首先扫描坏道
5.oops ~ # badblocks -s -b 4096 -c 16 /dev/sdb1 -o bad.sdb1
Checking for bad blocks (read-only test): 13.34% done, 2:24 elapsed. (0/0/0 errors)
2.oops ~ # badblocks -s -b 4096 -c 16 /dev/sdb2 -o bad.sdb2
Checking for bad blocks (read-only test): 0.00% done, 0:22 elapsed. (0/0/0 errors)
意思就是以4k为一个block,每一个block检查16次,
将结果输入到bad.sdbx文件,如果硬盘正常的话,bad.sdbx是没有任何内容的,如果硬盘很大,我们可以加一个-s参数来显示进度。
经过慢长的时间,我們得到了一个文件/root/sdb.bad :
16435904
sdb 有1个坏块
先用dd尽量备份坏块
6.dd if=/dev/sdb bs=4096 skip=16435904 of=/tmp/15435904.dat count=1
如果显示读取字节数是0就多试几次, 不行就可能丢失此块数据, 倒是不用担心,一般不会有太大问题.
用badblocks的写测试功能,对这些坏块进行重写(注意! -w写测试会覆盖数据):
7.badblocks -w -f /dev/sdb2 16435904 16435904
如果前面的操作有成功的备份/tmp/15435904.dat, 就把它写回:
8.dd if=/tmp/15435904.dat of=/dev/sdb seek=15435904 bs=4096 count=1
其实我们不需要等待badblocks扫描完成, 就可以进行修复。
badblocks是对块设备进行处理, 所以可以实现对挂载中的系统进行处理。
在修复前后,利用smartctl 对磁盘进行long测试
# smartctl -l selftest /dev/sdb