在PERC管理的RAID上运行一致性检查

参考戴尔官方论坛:https://www.dell.com/community/Systems-Management/Run-a-Consistency-Check-on-a-PERC-managed-RAID-without-OMSA/m-p/4767003
一、方法一

在bios设置里可以设置开启一致性主动检查

在PERC管理的RAID上运行一致性检查_第1张图片

RAID卡有两种一致性检查方式

  1. Patrol read
  2. consistency check

[root@kvm_100_67_159_143 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -AdpCcSched -Info -aALL                                 

Adapter #0

Operation Mode: Concurrent

Execution Delay: 168

Next start time: 03/10/2018, 03:00:00

Current State: Stopped

Number of iterations: 94

Number of VD completed: 0

Excluded VDs          : None

Exit Code: 0x00

 

周期是7天 168小时的间隔,24*7=168

 

一致性检查可以主动触发发现,错误可以手动触发或者发起。例如定时发起检查。

在PERC管理的RAID上运行一致性检查_第2张图片

/opt/MegaRAID/MegaCli/MegaCli64 -AdpSetTime `date +%Y%m%d` `date +%H:%M:%S` -aALL -NoLog

[root@kvm_100_91_167_90 ~]# /opt/MegaRAID/MegaCli/MegaCli64  -AdpGetTime -aALL

Adapter 0:

   Date: 03/13/2018

    Time: 11:23:51

Exit Code: 0x00                               

手动触发Patrol read 检查

SSD的机器可能不支持

/opt/MegaRAID/MegaCli/MegaCli64 -AdpPR -Start -aALL

/opt/MegaRAID/MegaCli/MegaCli64 -fwtermlog -dsply -aall

在PERC管理的RAID上运行一致性检查_第3张图片

 

Patrol read

Tries to discover disk error before it is too late and data is lost.
(including hot spare connected to a controller.)

This process causes the drives to read the data by issuing "read-verify" commands. By using the "read-verify " command, the data from the drives is not transferred to the MegaRAID adapter unless an error is detected and reported by one or more drives included in the stripe.

If a single drive reports an error within the stripe, the read patrol function initiates read commands to all the other stripe unit drives and the data for this single failing stripe unit is recreated by the MegaRAID adapter from the remaining data and parity stripe units.

After recreating this data, the adapter then issues a write-verify command to the drive that reported the error on the read-verify command and writes this recreated portion of the stripe to that drive. After this write completes successfully, this is now a known good stripe, and read patrol can continue with the next stripe. In the event that two or more drives report errors during the read-verify portion of the read patrol, the failing stripe will be added to the Bad Stripe Table.

In the event that two or more drives report errors during the read-verify portion of the read patrol, the failing stripe will be added to the Bad Stripe Table.

* delay of 168 hours between different patrol reads
* 30% of IO resources

{
试图在太晚或数据丢失之前发现磁盘错误。(包括连接到控制器的热备件)这个过程通过发出“read-verify”命令使驱动器读取数据。通过使用“read-verify”命令,来自驱动器的数据不会传输到MegaRAID适配器,除非条带中包含的一个或多个驱动器检测到错误并报告错误。如果一个驱动器报告了该条带内的错误,read patrol函数将启动对所有其他条带单元驱动器的read命令,MegaRAID适配器将从剩余的数据和奇偶校验条带单元中重新创建这个失败的条带单元的数据。在重新创建此数据之后,适配器将向驱动器发出write-verify命令,该命令报告read-verify命令上的错误,并将重新创建的条带部分写入该驱动器。在此写操作成功完成后,这是一个已知的好条带,读取巡逻可以继续下一个条带。如果两个或多个驱动器在读取巡逻的读-验证部分报告错误,失败的条带将被添加到坏条带表中。
*不同巡逻读数之间延迟168小时
* 30% IO资源
}

(1) Patrol read setting

MegaCli64 -AdpPR -Info -aALL

Adapter 0: Patrol Read Information:
Patrol Read Mode:
Auto

Patrol Read Execution Delay: 168 hours         <-- 7 days

Number of iterations completed: 92

Current State: Stopped

Patrol Read on SSD Devices: Disabled

Remark

By default it is done automatically (with a delay of 168 hours between different patrol reads)
and will take up to 30% of IO resources.

(2) Patrol Read Rate

MegaCli64 -AdpGetProp PatrolReadRate -aALL

Adapter 0: Patrol Read Rate = 30%

P.S.

# 設定成 10%

MegaCli64 -AdpSetProp PatrolReadRate 10 -aALL

(3) Enable & Disable automatic patrol read

# To enable automatic patrol read:

MegaCli64 -AdpPR -EnblAuto -aALL

# To disable automatic patrol read:

MegaCli64 -AdpPR -Dsbl -aALL

(4) manual patrol read scan

# Start:

MegaCli64 -AdpPR -Start -aALL

# Stop:

MegaCli64 -AdpPR -Stop -aALL

# 進行 patrol read 時的 status:

MegaCli64 -AdpPR -Info -aALL

Patrol Read Mode: Auto

Patrol Read Execution Delay: 168 hours

Number of iterations completed: 92

Current State: Active

Adapter 0: Number of PDs completed: 0

Patrol Read on SSD Devices: Disabled

(5) To correct media error during patrol read

# Get setting

MegaCli64 -AdpGetProp PrCorrectUncfgdAreas -aALL

Adapter 0: PR Correct Unconfigured Areas: Enabled

# Modify Setting

MegaCli -AdpSetProp -PrCorrectUncfgdAreas -1 -aALL

BBU erminal Logging

MegaCli64 -FwTermLog -Dsply -aALL

12/23/14 17:50:10: prDiskStart: starting Patrol Read on PD=00
12/23/14 17:50:10: prDiskStart: starting Patrol Read on PD=01
12/23/14 17:50:10: prDiskStart: starting Patrol Read on PD=02
12/23/14 17:50:10: prDiskStart: starting Patrol Read on PD=03
12/23/14 17:50:11: EVT#03353-12/23/14 17:50:11:  39=Patrol Read started

遇到 Error :

12/23/14 17:53:43: EVT#03354-12/23/14 17:53:43: 113=Unexpected sense: PD 00(e0x20/s0) Path 1221000000000000, CDB: 4d 00 4d 00 00 00 00 00 20 00, Sense: 5/24/00
12/23/14 17:53:43: Raw Sense for PD 0: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00 00 00

解釋

位置: "PD 00(e0x20/s0)"                           # Disk0, enclosure 0x20 ,slot 0:

Address: "Path 1221000000000000"         # SAS Address of the drive:

Command: "CDB: 4d 00 4d 00 00 00 00 00 20 00"

actual fault: "Sense: 5/24/00"

List_of_SCSI_commands - "CDB: 4d 00 4d 00 00 00 00 00 20 00"

http://en.wikipedia.org/wiki/SCSI_command#List_of_SCSI_commands

 

Type of error/actual fault - "Sense: 5/24/00"

http://en.wikipedia.org/wiki/Key_Code_Qualifier

詳見:

http://datahunter.org/megacli#terminal_logging

 

sg_logs

使用SCSI log SENSE命令访问日志页面

该实用程序向设备发送SCSI LOG SENSE命令,然后

输出响应。LOG SENSE命令用于获取日志页面。

默认情况下对已知的日志页进行解码。当——复位和/或

——给出选择选项,然后向其发出SCSI LOG select命令

重新设置参数。

http://manpages.ubuntu.com/manpages/trusty/man8/sg_logs.8.html

Format:

http://www.bustrace.com/bustrace9/sas/log_sense_supported_pages.htm

 

 

Consistency check

一致性检查读取条带的所有部分,从条带的数据部分计算奇偶性,然后将计算的奇偶性与从驱动器读取的奇偶性进行比较。

* Not valid to RAID0

# When the next consistency check is scheduled

MegaCli64 -AdpCcSched -Info -aALL

    Adapter #0
    Operation Mode: Disabled
    Execution Delay: 168
    Next start time: 12/14/2013, 03:00:00
    Current State: Stopped
    Number of iterations: 0
    Number of VD completed: 0
    Excluded VDs          : None
    Exit Code: 0x00

# Consistency Check Rate(CCRate)

# Get

MegaCli64 -AdpGetProp -CCRate -aALL

# Set

MegaCli64 -AdpSetProp -CCRate 10 -aALL

# Scheduled task is set to run

MegaCli64 -AdpCCSched -SetSTartTime yymmdd hh -aALL

# Mode: Disabled | Concurrent | Sequencial

MegaCli64 -AdpCcSched -ModeConc -aALL                     <-- 修改 "Operation Mode:"

MegaCli64 -AdpCcSched -ModeSeq -aALL

MegaCli64 -AdpCcSched -Dsbl -aALL

Remark

ModeConc: The scheduled CC on all of the virtual drives runs concurrently for the given adapter(s).

ModeSeq: The scheduled CC on all of the virtual drives runs sequentially for the given adapter(s).

# 人手行

MegaCli64 -LDCC -Start|-Abort|-ShowProg|-ProgDsply -LALL -aALL

# Show

MegaCli64 -LDCC -ShowProg -LALL -aALL

-ShowProg: Displays a snapshot of an ongoing CC.

Check Consistency on VD #0 (target id #0) Completed 2% in 7 Minutes.

MegaCli64 -LDCC -ProgDsply -LALL -aALL

-ProgDsply: Displays ongoing CC progress. The progress displays until at least one CC is completed or a key is pressed.

 Progress of Virtual Drives...
 
  Virtual Drive #              Percent Complete                       Time Elps
          0         #                      02 %                        00:07:59
 
    Press  key to quit...

语法

Syntax: MegaCLI -AdpPR -SetStartTime yyyymmdd hh -aALL

Cmd: MegaCLI -AdpPR -SetStartTime 20120702 01 –aAL

Syntax: MegaCLI -AdpPR -SetDelay Val -aALL (Val is in hours)

Cmd: MegaCLI -AdpPR -SetDelay 168 –aALL

Syntax: MegaCLI -AdpCcSched -SetStartTime yyyymmdd hh -aN|-a0,1,2|-aALL

Cmd: MegaCLI -AdpCcSched -SetStartTime 20120704 03 -aALLand

Syntax: MegaCLI -AdpCcSched -SetDelay Val -aALL (in hours)

Cmd: MegaCLI -AdpCcSched -SetDelay 672 –aALL

你可能感兴趣的:(服务器硬件)