参考戴尔官方论坛:https://www.dell.com/community/Systems-Management/Run-a-Consistency-Check-on-a-PERC-managed-RAID-without-OMSA/m-p/4767003
一、方法一
在bios设置里可以设置开启一致性主动检查
RAID卡有两种一致性检查方式
[root@kvm_100_67_159_143 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -AdpCcSched -Info -aALL
Adapter #0
Operation Mode: Concurrent
Execution Delay: 168
Next start time: 03/10/2018, 03:00:00
Current State: Stopped
Number of iterations: 94
Number of VD completed: 0
Excluded VDs : None
Exit Code: 0x00
周期是7天 168小时的间隔,24*7=168
一致性检查可以主动触发发现,错误可以手动触发或者发起。例如定时发起检查。
/opt/MegaRAID/MegaCli/MegaCli64 -AdpSetTime `date +%Y%m%d` `date +%H:%M:%S` -aALL -NoLog
[root@kvm_100_91_167_90 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -AdpGetTime -aALL
Adapter 0:
Date: 03/13/2018
Time: 11:23:51
Exit Code: 0x00
SSD的机器可能不支持
/opt/MegaRAID/MegaCli/MegaCli64 -AdpPR -Start -aALL
/opt/MegaRAID/MegaCli/MegaCli64 -fwtermlog -dsply -aall
Tries to discover disk error before it is too late and data is lost.
(including hot spare connected to a controller.)
This process causes the drives to read the data by issuing "read-verify" commands. By using the "read-verify " command, the data from the drives is not transferred to the MegaRAID adapter unless an error is detected and reported by one or more drives included in the stripe.
If a single drive reports an error within the stripe, the read patrol function initiates read commands to all the other stripe unit drives and the data for this single failing stripe unit is recreated by the MegaRAID adapter from the remaining data and parity stripe units.
After recreating this data, the adapter then issues a write-verify command to the drive that reported the error on the read-verify command and writes this recreated portion of the stripe to that drive. After this write completes successfully, this is now a known good stripe, and read patrol can continue with the next stripe. In the event that two or more drives report errors during the read-verify portion of the read patrol, the failing stripe will be added to the Bad Stripe Table.
In the event that two or more drives report errors during the read-verify portion of the read patrol, the failing stripe will be added to the Bad Stripe Table.
* delay of 168 hours between different patrol reads
* 30% of IO resources
{
试图在太晚或数据丢失之前发现磁盘错误。(包括连接到控制器的热备件)这个过程通过发出“read-verify”命令使驱动器读取数据。通过使用“read-verify”命令,来自驱动器的数据不会传输到MegaRAID适配器,除非条带中包含的一个或多个驱动器检测到错误并报告错误。如果一个驱动器报告了该条带内的错误,read patrol函数将启动对所有其他条带单元驱动器的read命令,MegaRAID适配器将从剩余的数据和奇偶校验条带单元中重新创建这个失败的条带单元的数据。在重新创建此数据之后,适配器将向驱动器发出write-verify命令,该命令报告read-verify命令上的错误,并将重新创建的条带部分写入该驱动器。在此写操作成功完成后,这是一个已知的好条带,读取巡逻可以继续下一个条带。如果两个或多个驱动器在读取巡逻的读-验证部分报告错误,失败的条带将被添加到坏条带表中。
*不同巡逻读数之间延迟168小时
* 30% IO资源
}
(1) Patrol read setting
MegaCli64 -AdpPR -Info -aALL
Adapter 0: Patrol Read Information:
Patrol Read Mode: Auto
Patrol Read Execution Delay: 168 hours <-- 7 days
Number of iterations completed: 92
Current State: Stopped
Patrol Read on SSD Devices: Disabled
Remark
By default it is done automatically (with a delay of 168 hours between different patrol reads)
and will take up to 30% of IO resources.
(2) Patrol Read Rate
MegaCli64 -AdpGetProp PatrolReadRate -aALL
Adapter 0: Patrol Read Rate = 30%
P.S.
# 設定成 10%
MegaCli64 -AdpSetProp PatrolReadRate 10 -aALL
(3) Enable & Disable automatic patrol read
# To enable automatic patrol read:
MegaCli64 -AdpPR -EnblAuto -aALL
# To disable automatic patrol read:
MegaCli64 -AdpPR -Dsbl -aALL
(4) manual patrol read scan
# Start:
MegaCli64 -AdpPR -Start -aALL
# Stop:
MegaCli64 -AdpPR -Stop -aALL
# 進行 patrol read 時的 status:
MegaCli64 -AdpPR -Info -aALL
Patrol Read Mode: Auto
Patrol Read Execution Delay: 168 hours
Number of iterations completed: 92
Current State: Active
Adapter 0: Number of PDs completed: 0
Patrol Read on SSD Devices: Disabled
(5) To correct media error during patrol read
# Get setting
MegaCli64 -AdpGetProp PrCorrectUncfgdAreas -aALL
Adapter 0: PR Correct Unconfigured Areas: Enabled
# Modify Setting
MegaCli -AdpSetProp -PrCorrectUncfgdAreas -1 -aALL
MegaCli64 -FwTermLog -Dsply -aALL
12/23/14 17:50:10: prDiskStart: starting Patrol Read on PD=00
12/23/14 17:50:10: prDiskStart: starting Patrol Read on PD=01
12/23/14 17:50:10: prDiskStart: starting Patrol Read on PD=02
12/23/14 17:50:10: prDiskStart: starting Patrol Read on PD=03
12/23/14 17:50:11: EVT#03353-12/23/14 17:50:11: 39=Patrol Read started
遇到 Error 後:
12/23/14 17:53:43: EVT#03354-12/23/14 17:53:43: 113=Unexpected sense: PD 00(e0x20/s0) Path 1221000000000000, CDB: 4d 00 4d 00 00 00 00 00 20 00, Sense: 5/24/00
12/23/14 17:53:43: Raw Sense for PD 0: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00
解釋
位置: "PD 00(e0x20/s0)" # Disk0, enclosure 0x20 ,slot 0:
Address: "Path 1221000000000000" # SAS Address of the drive:
Command: "CDB: 4d 00 4d 00 00 00 00 00 20 00"
actual fault: "Sense: 5/24/00"
List_of_SCSI_commands - "CDB: 4d 00 4d 00 00 00 00 00 20 00"
http://en.wikipedia.org/wiki/SCSI_command#List_of_SCSI_commands
Type of error/actual fault - "Sense: 5/24/00"
http://en.wikipedia.org/wiki/Key_Code_Qualifier
詳見:
http://datahunter.org/megacli#terminal_logging
使用SCSI log SENSE命令访问日志页面
该实用程序向设备发送SCSI LOG SENSE命令,然后
输出响应。LOG SENSE命令用于获取日志页面。
默认情况下对已知的日志页进行解码。当——复位和/或
——给出选择选项,然后向其发出SCSI LOG select命令
重新设置参数。
http://manpages.ubuntu.com/manpages/trusty/man8/sg_logs.8.html
Format:
http://www.bustrace.com/bustrace9/sas/log_sense_supported_pages.htm
一致性检查读取条带的所有部分,从条带的数据部分计算奇偶性,然后将计算的奇偶性与从驱动器读取的奇偶性进行比较。
* Not valid to RAID0
# When the next consistency check is scheduled
MegaCli64 -AdpCcSched -Info -aALL
Adapter #0
Operation Mode: Disabled
Execution Delay: 168
Next start time: 12/14/2013, 03:00:00
Current State: Stopped
Number of iterations: 0
Number of VD completed: 0
Excluded VDs : None
Exit Code: 0x00
# Consistency Check Rate(CCRate)
# Get
MegaCli64 -AdpGetProp -CCRate -aALL
# Set
MegaCli64 -AdpSetProp -CCRate 10 -aALL
# Scheduled task is set to run
MegaCli64 -AdpCCSched -SetSTartTime yymmdd hh -aALL
# Mode: Disabled | Concurrent | Sequencial
MegaCli64 -AdpCcSched -ModeConc -aALL <-- 修改 "Operation Mode:"
MegaCli64 -AdpCcSched -ModeSeq -aALL
MegaCli64 -AdpCcSched -Dsbl -aALL
Remark
ModeConc: The scheduled CC on all of the virtual drives runs concurrently for the given adapter(s).
ModeSeq: The scheduled CC on all of the virtual drives runs sequentially for the given adapter(s).
# 人手行
MegaCli64 -LDCC -Start|-Abort|-ShowProg|-ProgDsply -LALL -aALL
# Show
MegaCli64 -LDCC -ShowProg -LALL -aALL
-ShowProg: Displays a snapshot of an ongoing CC.
Check Consistency on VD #0 (target id #0) Completed 2% in 7 Minutes.
MegaCli64 -LDCC -ProgDsply -LALL -aALL
-ProgDsply: Displays ongoing CC progress. The progress displays until at least one CC is completed or a key is pressed.
Progress of Virtual Drives...
Virtual Drive # Percent Complete Time Elps
0 # 02 % 00:07:59
Press key to quit...
语法
Syntax: MegaCLI -AdpPR -SetStartTime yyyymmdd hh -aALL
Cmd: MegaCLI -AdpPR -SetStartTime 20120702 01 –aAL
Syntax: MegaCLI -AdpPR -SetDelay Val -aALL (Val is in hours)
Cmd: MegaCLI -AdpPR -SetDelay 168 –aALL
Syntax: MegaCLI -AdpCcSched -SetStartTime yyyymmdd hh -aN|-a0,1,2|-aALL
Cmd: MegaCLI -AdpCcSched -SetStartTime 20120704 03 -aALLand
Syntax: MegaCLI -AdpCcSched -SetDelay Val -aALL (in hours)
Cmd: MegaCLI -AdpCcSched -SetDelay 672 –aALL