下载:http://www.lsi.com/support/pages/download-results.aspx?keyword=megacli ,Management Software and Tools,选择对应版本,解压zip包,获取Linux目录下的rpm包。
安装:上传rpm包,安装:rmp -ivh MegaCli-xxx.rpm,安装路径: /opt/MegaRAID/MegaCli, ln -s /opt/MegaRAID/MegaCli/MegaCli64 /bin/MegaCli64 。 @zwz 提醒:仓库中也包含 MegaCli,yum直接安装即可。
注意: 不是单词
All
,而是大写字母 A、小写字母 L 和大写字母 i。
megacli -CfgDsply -a0
...
Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1) //LD1
Name :Virtual Disk 1
RAID Level : Primary-0, Secondary-3, RAID Level Qualifier-0
Size : 1.818 TB
Parity Size : 0
State : Optimal
Strip Size : 1.0 MB
Number Of Drives : 1
Span Depth : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Is VD Cached: No
Physical Disk Information:
Physical Disk: 0
Enclosure Device ID: 0
Slot Number: 1 //S1,盘阵第一个插槽是S0
...
'for i in {0..14};do echo megacli -CfgLdAdd -r0[0:$i] -a0 ;done' 一般来说是是[0:0]有时候是[E0:S0],试试就知道了
-CfgLdAdd -rX[E0:S0,E1:S1,...] [WT|WB] [NORA|RA|ADRA] [Direct|Cached]
[CachedBadBBU|NoCachedBadBBU] [-szXXX [-szYYY ...]]
[-strpszM] [-Hsp[E0:S0,...]] [-AfterLdX] [-Force]|[Secure]
[-Default| -Automatic| -None| -Maximum| -MaximumWithoutCaching] [-Cache] -aN
如果在创建的时候返回了
The specified physical disk does not have the appropriate attributes to complete the requested command.
, 说明换上的盘很可能是一块带raid信息的老盘,需要手动清除 foreign 信息。
megacli -CfgForeign -Scan -a0 //返回 foreign 的盘的数量
megacli -CfgForeign
-[Dsply|Preview|Impor|Clear] [x]|[-Passphrase sssssssssss] -aN|-a0,1,2|-aALL
如果创建时出现了上述错误信息,但执行 Foreign 相关操作无效果,则可能是该盘被配置为 JBOD
megacli -PdList -a0 // 查看信息,确定其状态是否是 JBOD
megacli -PDMakeGood -PhysDrv[E0:S0,E1:S1,...] | [-Force] -aN|-a0,1,2|-aALL // 将状态重置成 Unconfigured good,如提示无法操作,可加 -force 参数,强制执行并清除数据
megacli -PDRbld -ProgDsply -PhysDrv[0:5] -a0 // 0:5 是 Rebuild 的磁盘的 Enclosure ID 和 Slot Number
MegaCli -LDSetProp WT -L0 -a0 // -L 指定logic disk,-a 指定控制器
-Lx|-L0,1,2|-LALL -aN|-a0,1,2|-aALL
cache 策略可以是
WT|WB|ForcedWB
megacli -LDSetProp -Cached -LAll -aAll //设置 read cache
-Cached|Direct
megacli -LDGetProp -Cache -L0 -a0 //获取 read cache 的状态
megacli -LDSetProp EnDskCache -LAll -aAll //启用磁盘 cache
megacli -LDSetProp ADRA -LALL -aALL //启用 ReadAhead
megacli -AdpAutoRbld -Dsply -a0 //显示 rebuild 状态
-AdpAutoRbld -Dsbl // 禁止 auto rebuild -AdpAutoRbld -Dsply // 显示 rebuild 状态 -AdpAutoRbld -Enbl // 启用 auto rebuild
megacli -PDRbld -ShowProg -PhysDrv [32:1] -aALL //显示特定硬盘的 rebuild 状态
megasasctl 和 megactl 需要安装 megactl 包
megaraidsas-status 和 megaraid-status 需要安装 megaraid-status 包
megaclisas-status 需要安装 megaclisas-status 包
megamgr 需要安装 megamgr 包
sudo megasasctl
sudo megaraidsas-status
sudo megaclisas-status
sudo megactl
sudo megaraid-status
sudo megamgr
$ sudo megacli -CfgLdAdd -r0[17:6] -a0
Adapter 0: Configure Adapter Failed
FW error description: The current operation is not allowed because the controller has data in cache for offline or missing virtual drives.
Exit Code: 0x54
Adapter #0
Virtual Drive(Target ID 06): Missing.
Exit Code: 0x00
sudo megacli -DiscardPreservedCache -L6 -a0
Adapter #0
Virtual Drive(Target ID 06): Preserved Cache Data Cleared.
Exit Code: 0x00
现在再执行
sudo megacli -CfgLdAdd -r0[17:6] -a0
相关操作就不会报错了
思路:
a. 通过获取Slot Number
sudo megacli -pdlist -a0 或
sudo megasasctl -v
b. 通过smatctl 获取ssd寿命信息
sudo smartctl -a -d megaraid,8 /dev/sdd
表示Slot Number: 8, 设备符为/dev/sdd获取到的smart信息
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 096 096 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 544
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 9
170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 6
175 Program_Fail_Count_Chip 0x0033 100 100 010 Pre-fail Always - 13044744823
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 084 071 000 Old_age Always - 16 (Min/Max 10/29)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 6
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 16
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 357286
226 Load-in_Time 0x0032 100 100 000 Old_age Always - 849
227 Torq-amp_Count 0x0032 100 100 000 Old_age Always - 50
228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always - 12518
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
234 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 357286
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 357643
获取需要的数据信息
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
100 100 表示没有损耗,随着损耗的增加数字递减,当数字降为1时便不再下降,这块ssd便寿终啦。
c. 参考
利用MegaCli和smartCtl工具获得ssd盘使用情况
参考资料:
大部分的浪潮服务器都是用的 Adaptec 的 RAID 控制器,可以使用 arcconf (Adaptec RAID Configuration Utility) 管理。
umount /dev/sdX1
卸载对应的磁盘
sudo arcconf getconfig 1 ld
确认之前 Logical Drive 已经消失,如果没有执行 sudo /sa/bin/arcconf delete 1 logicaldrive 24
删除对应的 Logical Drive, 这里,24为出问题的 Logical Drive。
sudo arcconf getconfig 1 pd
找到出故障的硬盘,并记录 Reported Channel, Device
编号。
通知机房看问题磁盘;如果有问题,直接更换
更换好硬盘后,执行 sudo arcconf getconfig 1 pd
检查新换的硬盘是否已识别,新加的硬盘的 State
会是 Ready.
Device #23
Device is a Hard drive
State : Online #更换硬盘后,"State" 会变为 "Ready"
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,31(31:0) #这里获取编号
Reported Location : Enclosure 0, Slot 23
Reported ESD(T:L) : 2,0(0:0)
Vendor :
Model : ST32000542AS
Firmware : CC34
Serial number : 6XW1M1V1
Size : 1907729 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Power State : Full rpm
Supported Power States : Full rpm,Powered off,Reduced rpm
SSD : No
MaxIQ Cache Capable : No
MaxIQ Cache Assigned : No
NCQ status : Enabled
Device #24
Device is an Enclosure services device
Reported Channel,Device(T:L) : 2,0(0:0)
Enclosure ID : 0
Type : SES2
Vendor : LSILOGIC
Model : SASX36 A.1
Firmware : 7015
Status of Enclosure services device
执行 sudo arcconf create 1 jbod 0 31 noprompt
创建 Logical Drive, 这里 0 31
就是之前记录的 Channel, Device 编号。
执行 dmesg
检查是否有新的硬盘设备检测到, 分区并创建文件系统。
sudo cfdisk /dev/sdX
sudo mkfs.ext3 /dev/sdX1
修改刚才创建文件系统的分区的 UUID 或者修改 /etc/fstab 中对应的项目的 UUID。
sudo mount -a
挂载硬盘,并修改对应的目录的权限。
开启缓存 sudo arcconf setcache 1 device 0 31 wb
, 这里 0 31
是之前记录的 Channel, Device 编号。
关闭逻辑盘缓存 arcconf setcache 1 LOGICALDRIVE 0 wt
, 这里的 0
是 logicaldrive 的序列号。
=================================================================
megacli
工具处理。
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
lspci
命令可以获取。arcconf
工具处理。
02:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
hpacucli
工具处理
02:01.0 RAID bus controller: Compaq Computer Corporation Smart Array 64xx (rev 01)
megacli
工具处理。
02:0e.0 RAID bus controller: Dell PowerEdge Expandable RAID controller 5
omreport
工具Atto的SCSI控制器
05:00.0 Serial Attached SCSI controller: Atto Technology SAS Adapter
看dmesg日志查看硬盘型号 grep -i "ata\|scsi" dmesg.0
sudo lsscsi
[0:0:0:0] cd/dvd Dell Virtual CDROM 123 /dev/scd0
[1:0:0:0] disk Dell Virtual Floppy 123 /dev/sdc
[2:0:0:0] disk ATA Maxtor 7L250S0 1G10 /dev/sda
[2:0:4:0] disk SEAGATE ST3300555SS T105 /dev/sdb
cat /proc/scsi/scsi
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: ATA Model: Maxtor 7L250S0 Rev: 1G10
Type: Direct-Access ANSI SCSI revision: 05
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: Dell Model: Virtual CDROM Rev: 123
Type: CD-ROM ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: Dell Model: Virtual Floppy Rev: 123
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 04 Lun: 00
Vendor: SEAGATE Model: ST3300555SS Rev: T105
Type: Direct-Access ANSI SCSI revision: 05
sudo smartctl -a /dev/sda
=== START OF INFORMATION SECTION ===
Model Family: Maxtor MaXLine III family (ATA/133 and SATA/150)
Device Model: Maxtor 7L250S0
sudo smartctl -a /dev/sdb
Device: SEAGATE ST3300555SS Version: T105
Serial number: 3LM09L9B
RAID卡电池查看
sudo megacli -AdpBbuCmd -GetBbuDesignInfo -aALL
BBU Design Info for Adapter: 0
Date of Manufacture: 08/10, 2011
Design Capacity: 1700 mAh
Design Voltage: 3700 mV
Specification Info: 33
Serial Number: 2020
Pack Stat Configuration: 0x0000
Manufacture Name: SANYO
Device Name: DLNU209
Device Chemistry: LION
Battery FRU: N/A
Transparent Learn = 0
App Data = 0
Exit Code: 0x00
RAID 卡电池相关信息查看
megaCli -AdpBbuCmd -GetBbuStatus -aALL |grep 'Charger Status' [查看充电状态]
`megaCli -AdpBbuCmd -GetBbuStatus -aALL` [显示BBU状态信息]
`megaCli -AdpBbuCmd -GetBbuCapacityInfo -aALL` [显示BBU容量信息]
`megaCli -AdpBbuCmd -GetBbuDesignInfo -aALL` [显示BBU设计参数]
`megaCli -AdpBbuCmd -GetBbuProperties -aALL` [显示当前BBU属性]
arcconf
hpacucli
sudo hpacucli ctrl all show config detail
sudo omreport storage pdisk controller=0
sudo omreport storage vdisk controller=0
[NOTE] megasasctl/megactl
工具能比较直观的获取到Enclosure Device ID
和Slot Number
信息,这两个信息就是盘阵信息和插槽信息。
. 获取磁盘信息
sudo megasasctl
a0 PERC H700 Integrated encl:1 ldrv:1 batt:good
a0d0 1675GiB RAID 10 3x2 optimal
a0e32s0 558GiB a0d0 online
a0e32s1 558GiB a0d0 online
a0e32s2 558GiB a0d0 online
a0e32s3 558GiB a0d0 online
a0e32s4 558GiB a0d0 online
a0e32s5 558GiB a0d0 online
. 盘阵处理
- 单盘RAID 0
megacli -CfgLdAdd -r0[E0:S0] -a0
- 双盘RAID 1
- 多盘RAID 6
megacli -CfgLdAdd -r6[17:0,17:1,17:2,17:3,17:4,17:5,17:6,17:7,17:8,17:9,17:10,17:11,17:12,17:13,17:14] -a0
- 多盘RAID 5
- 多盘RAID 10
- 2x2 做的RAID 10
a0d1 1862GiB RAID 10 2x2 optimal
megacli -CfgSpanAdd -r10 -Array0[32:2,32:3] -Array1[32:4,32:5] -a0`
- 4x2 做的RAID 10
a0d1 2233GiB RAID 10 4x2 optimal
megacli -CfgSpanAdd -r10 -Array0[32:2,32:3] -Array1[32:4,32:5] -Array3[32:6,32:7] -Array4[32:8,32:9] -a0
[NOTE] Reported Channel,Device编号是识别识别物理磁盘的依据,都是唯一的。通过arcconf getconfig 1 pd
能够获取。可以参考arcconf create -h
。
. 获取磁盘信息
sudo arcconf getconfig 1 pd
Device #0
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA0 Gb/s
Reported Channel,Device(T:L) : 0,8(8:0) #获取编号
Reported Location : Enclosure 0, Slot 0
Reported ESD(T:L) : 2,0(0:0)
Vendor : Hitachi
Model : HDS722020ALA330
Firmware : JKAOA3EA
Serial number : JK11A4B8KBT5YW
Size : 1907729 MB
Write Cache : Enabled (write -back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Power State : Full rpm
Supported Power States : Full rpm,Powered off,Reduced rpm
SSD : No
MaxCache Capable : No
MaxCache Assigned : No
NCQ status : Enabled
. 盘阵处理
- 单盘RAID 0
sudo arcconf create 1 logicaldrive max volume 0 31 noprompt
- 多盘RAID 60
arcconf create 1 LOGICALDRIVE MAX 60 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 16 0 17 0 18 0 19 0 20 0 21 0 22 0 23 0 24 0 25 0 26 0 27 0 28 0 29 0 30 0 31 noprompt
[NOTE] MBR分区表只支持最大为2TB的分区,2TB以上的分区使用GPT分区表。当然2TB以下的分区也是可以使用GPT分区的。
. cfdisk/fdisk
. parted
- 选择分区表
sudo parted -s /dev/sdb mklabel gpt
- 创建分区
sudo parted -s - - /dev/sdb mkpart primary 0 -1s
parted -s - - /dev/sda mkpart primary 0 100%
- **修改分区表并分区**
sudo parted -s /dev/sda - - mklabel gpt mkpart primary 0 -1
. partprobe/partx
. testdisk恢复分区表
[NOTE] mkfs.ext3/ext4支持格式化的时候指定UUID,而mkfs.xfs格式化的时候不支持指定UUID,只能通过xfs_admin手工修改UUID。
sudo xfs_admin -U cd1b9cdb -fa95 -4bbe -9a93 -57196ec5510c /dev/sdg1
[NOTE] 磁盘挂载一般通过UUID的方式挂载,不建议通过设备名挂载,因为UUID是唯一的,系统重启的时候设备名会变而UUID是不会变的。
mount -a
chown root.root /srv/0
. 内存文件系统的挂载
mount -t tmpfs -o nosuid,nodev,size=10G,mode=1777 tmpfs /mnt/
. 挂载移动硬盘
mount -t ntfs -o uid=music,gid=netease,fmask=133,dmask=022 /dev/sdz1 /mnt/usb/
. bind 的用法
/vicepa/unfs /home/unfs none defaults,bind 0 0
/vicepa/dfs /home/dfs none defaults,bind 0 0
sudo mount - -bind /mnt/ssd/0/ /data/
. 挂载nobarrier
/dev/sdb1 on /mnt/ndir type xfs (rw,noatime,attr2,delaylog,noquota)
UUID=74e18540 -397a -469d -9599 -3e404ca23ed2 /mnt/ndir xfs noatime,nobarrier 0 0
/dev/sdb1 on /mnt/ndir type xfs (rw,noatime,attr2,delaylog,nobarrier,noquota)
. 查看dmesg信息
. 查看挂点信息,看目录能否读写,目录是否被占用
sudo fuser -mv /srv
. umount目录
. 查看坏盘
megasasctl
或 megactl
. 联系机房换盘
. 新盘初始化
. 查看dmesg信息,是否有offline信息
. 查看挂点信息,看目录能否读写,目录是否被占用
fuser
or losf
. umount目录
. 查看坏盘
sudo arcconf getconfig 1 pd|grep "Slot"
. 联系机房换盘
. 新盘初始化
. 定位有坏道的物理盘
通过 megasasctl
或 sudo megacli -CfgDsply -a0
查看有坏道的物理磁盘。搜索 \'Media Error Count'
. 定位物理盘所在的逻辑盘
sudo megacli -CfgDsply -a0
// 下面这行就是
Virtual Drive: 1 (Target Id: 1)
. offline 逻辑盘
sudo megacli -CfgLdDel -L1 -a0
. 定位坏盘,使之亮灯,让机房识别
定位某块磁盘(通过控制盘阵上对应的指示灯)
sudo megacli -PdLocate -start -PhysDrv[0:5] -a0
// 0:5 是要定位的磁盘的 Enclosure ID 和 Slot Number
sudo megacli -PDOffline -PhysDrv[21:0] -a0
. 机房处理
. umount 坏盘分区
. 新盘初始化
. umount 坏盘分区
. 查看坏道
$ sudo arcconf GETLOGS 1 DEVICE
Controllers found: 1
Command completed successfully.
. 辨别有问题的逻辑盘
[NOTE] arcconf 不能删除已经挂载的磁盘logicaldrive,已经挂载的磁盘删除logicaldrive设备会报错
- mount状态
sudo arcconf delete 1 logicaldrive 15
Controllers found: 1
Logical device 15 is mounted on /mnt/dfs/15 and cannot be deleted.
Command aborted.
- umount状态
sudo arcconf delete 1 logicaldrive 14
Controllers found: 1
WARNING: logical device 14 may contain a partition.
All data in logical device 14 will be lost.
Delete the logical device?
Press y, then ENTER to continue or press ENTER to abort:
. 给坏盘亮灯
sudo arcconf IDENTIFY 1 LOGICALDRIVE 14
Controllers found: 1
The specified device is blinking.
Press any key to stop the blinking.
. 机房换盘
. 新盘初始化
. umount 坏盘分区
. 用 smartctl 查看磁盘序列号(Serial number)
sudo smartctl -a /dev/sde
Vendor: WDC
Product: WD3000FYYZ -01UL1
Revision: 01.0
User Capacity: 2,995,729,203,200 bytes [2.99 TB]
Logical block size: 512 bytes
Logical Unit id: 0x50014ee2b3ed2863
Serial number: WD -WCC131257088
Device type: disk
Local Time is: Wed Sep 17 12:47:29 2014 CST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
. 通过磁盘序列号查找磁盘 chennel id
sudo arcconf getconfig 1 pd|less
Device #3
Device is a Hard drive
State : Online (JBOD)
Supported : Yes
Transfer Speed : SATA 6.0 Gb/s
Reported Channel,Device(T:L) : 0,11(11:0) (需要这个用来亮灯)
Reported Location : Enclosure 1, Slot 3
Reported ESD(T:L) : 2,1(1:0)
Vendor : WDC
Model : WD3000FYYZ -01UL1
Firmware : 01.01K02
Serial number : WD -WCC131257088
Size : 2861588 MB
Write Cache : Enabled (write -back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Power State : Full rpm
Supported Power States : Full rpm,Powered off,Reduced rpm
NCQ status : Enabled
. 亮灯
sudo arcconf IDENTIFY 1 DEVICE 0 11
Controllers found: 1
The specified device is blinking.
Press any key to stop the blinking.
. 机房换盘
. 新盘初始化
sudo arcconf create 1 JBOD 0 11 noprompt
. umount 坏盘分区
. 用 smartctl 查看磁盘序列号(Serial number)
sudo smartctl -a /dev/sde
Vendor: WDC
Product: WD3000FYYZ -01UL1
Revision: 01.0
User Capacity: 2,995,729,203,200 bytes [2.99 TB]
Logical block size: 512 bytes
Logical Unit id: 0x50014ee2b3ed2863
Serial number: WD -WCC131257088
Device type: disk
Local Time is: Wed Sep 17 12:47:29 2014 CST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
. 关机找机房换盘
[NOTE] SCSI 控制器没有亮灯机制,机房所有不好判断是那个盘,会出现拔错盘的情况出现。通过提供硬盘序列号,可以让机房人工核对序列号,防止拔错盘。
. 机房换盘
. 新盘初始化
. 创建vdisk的时候出现以下问题
sudo megacli -CfgLdAdd -r0[17:6] -a0
Adapter 0: Configure Adapter Failed
FW error description:
The current operation is not allowed because the controller has data in cache for offline or missing virtual drives.
Exit Code: 0x54
- 解决方法:
sudo megacli -GetPreservedCacheList -a0
sudo megacli -DiscardPreservedCache -L6 -a0
- 效果如下:
sudo megacli -GetPreservedCacheList -a0
Adapter #0
Virtual Drive(Target ID 06): Missing.
Exit Code: 0x00
sudo megacli -DiscardPreservedCache -L6 -a0
Adapter #0
Virtual Drive(Target ID 06): Preserved Cache Data Cleared.
Exit Code: 0x00
. 新换硬盘状态是JBOD状态,导致创建vdisk失败
sudo megacli -CfgLdAdd -r0[21:7] -a0
The specified physical disk does not have the appropriate attributes to complete the requested command.
Exit Code: 0x26
- 查看硬盘JBOD状态
sudo megacli -PDList -a0|less|grep JBOD
[NOTE] 新换硬盘状态是JBOD状态,导致创建vdisk失败
- 解决方法:
sudo megacli -PDMakeGood -PhysDrv[21:7] -force -a0
Adapter: 0: EnclId -21 SlotId -7 state changed to Unconfigured -Good.
Exit Code: 0x00
sudo megacli -CfgLdAdd -r0[21:7] -a0
Adapter 0: Created VD 8
Adapter 0: Configured the Adapter!!
Exit Code: 0x00
. megacli管理工具清理Cache
异常报错,不能清理Cache
,联系机房手工处理
sudo megacli -DiscardPreservedCache -L5 -a0
Adapter #0
Segmentation fault
. 设置RAID卡cache策略
- 查看RAID cache状态
sudo megacli -LDGetProp -Cache -Lall -a0
Adapter 0 -VD 0(target id: 0): Cache Policy:WriteThrough, ReadAdaptive, Direct, No Write Cache if bad BBU
Adapter 0 -VD 1(target id: 1): Cache Policy:WriteThrough, ReadAdaptive, Direct, No Write Cache if bad BBU
Adapter 0 -VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive, Direct, Write Cache OK if bad BBU
Exit Code: 0x00
- 设置RAID卡cache策略
sudo megacli -LDSetProp ForcedWB -L2 -a0
Set Write Policy to Forced WriteBack on Adapter 0, VD 2 (target id: 2) success
Exit Code: 0x00
[WARNING] cache 策略如果在RAID卡电池出现问题的时候,强制设为ForcedWB的情况下面,存在很多风险,当机器挂了或者断电的情况下面,cache中的数据就没法刷回磁盘,这样就存在数据丢失的情况。 这是一种牺牲安全换取性能的做法,不值得推荐。
. 查看intel ssd的寿命 - - LSI 的RAID卡
思路:
- 通过获取Slot Number
sudo megacli -pdlist -a0
或
sudo megasasctl -v
- 通过smatctl 获取ssd寿命信息
sudo smartctl -a -d megaRAID,8 /dev/sdd
表示Slot Number: 8
, 设备符为/dev/sdd
获取到的smart信息
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 096 096 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 544
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 9
170 Unknown_Attribute 0x0033 100 100 010 Pre -fail Always - 0
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 6
175 Program_Fail_Count_Chip 0x0033 100 100 010 Pre -fail Always - 13044744823
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End -to -End_Error 0x0033 100 100 090 Pre -fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 084 071 000 Old_age Always - 16 (Min/Max 10/29)
192 Power -Off_Retract_Count 0x0032 100 100 000 Old_age Always - 6
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 16
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 357286
226 Load -in_Time 0x0032 100 100 000 Old_age Always - 849
227 Torq -amp_Count 0x0032 100 100 000 Old_age Always - 50
228 Power -off_Retract_Count 0x0032 100 100 000 Old_age Always - 12518
232 Available_Reservd_Space 0x0033 100 100 010 Pre -fail Always - 0
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
234 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
- 获取需要的数据信息
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
100 100 表示没有损耗,随着损耗的增加数字递减,当数字降为1时便不再下降,这块ssd便寿终啦。
- 参考
http://hatemysql.com/2011/03/08/%E5%88%A9%E7%94%A8megacli%E5%92%8Csmartctl%E5%B7%A5%E5%85%B7%E8%8E%B7%E5%BE%97ssd%E7%9B%98%E4%BD%BF%E7%94%A8%E6%83%85%E5%86%B5/[利用MegaCli和smartCtl工具获得ssd盘使用情况]
http://blog.yufeng.info/archives/1096[smartctl获取RAID卡下intel ssd寿命]
sudo arcconf SAVESUPPORTARCHIVE
tar -zpcv -f Support.tar.gz /var/log/Support/
http://www.debian -administration.org/article/643/Migrating_a_live_system_from_ext3_to_ext4_filesystem[Migrating_a_live_system_from_ext3_to_ext4_filesystem]
. nfs搭建
ii nfs -common 1:1.2.2 -4squeeze2 NFS support files common to client and server
ii nfs -kernel -server 1:1.2.2 -4squeeze2 support for NFS kernel server
cat /etc/exports
/mnt/nfs *(rw,all_squash,anonuid=65534,anongid=65534,no_subtree_check)
sudo /etc/init.d/nfs -kernel -server restart
. 检查下是否有问题
showmount -e
. clients的挂法
10.131.3.7:/mnt/nfs /nfs nfs nfsvers=3,proto=tcp,rsize=8192,wsize=8192,hard,intr 0 3
. 系统盘只读,能否直接重启机器?
机器重启要等应用确认,否则硬盘挂啦服务没挂,贸然重启会出现状况
. 云硬盘查看方法
curl 169.254.169.254/latest/meta -data/block -device -mapping/
. 浪潮的服务器raid卡是否带电池?
我们的浪潮服务器raid卡是一般是不带电池的,所以不存在电池故障。
在 Linux 需要限制用户对磁盘的使用量,可以用 quota,quota 对 nfs 有效,只需要在 nfs server 上面安装
# apt-get install quota quotatool
编辑 /etc/default/quota,使
run_warnquota="true"
warnquota 用于发邮件提醒超额使用磁盘的用户,配置 /etc/warnquota.conf,参考 /usr/share/doc/quota/warnquota.conf 让它使用 ldap 来查找用户的邮件地址
# Command used to send email
MAIL_CMD = "/usr/sbin/sendmail -t"
# From email used in generated emails
FROM = "[email protected]"
SUPPORT = "[email protected]"
# Subject line
SUBJECT = "=?utf-8?Q?=E8=AF=B7=E6=82=A8=E6=B8=85=E7=90=86=E6=9C=8D=E5=8A=A1=E5=99=A8=E4=B8=8A=E7=9A=84=E4=B8=BB=E7=9B=AE=E5=BD=95?="
# Send a copy to this address
CC_TO = "[email protected]"
CC_BEFORE = "2 days"
MESSAGE = "您好 %i,||请在宽限期(grace)內用帐号 %i 登录任意一台服务器清理您的主目录 /styx/home/%i,|使磁盘使用量(used)小于软限制(soft),否则您将不能正常使用磁盘。|"
SIGNATURE = "如有任何疑问,请联系邮件 [email protected],POPO [email protected],或者电话 0571-8985-2495。"
CHARSET = "UTF-8"
LDAP_MAIL = "true"
LDAP_URI = "ldap://directory.163.org"
LDAP_BASEDN = "dc=corp,dc=netease,dc=com"
LDAP_SEARCH_ATTRIBUTE = "uid"
LDAP_MAIL_ATTRIBUTE = "mail"
LDAP_DEFAULT_MAIL_DOMAIN = "corp.netease.com"
注意 SUBJECT 那里不支持直接写中文,需要用 email 支持的编码(如 quopri)来编写,比如这样生成:
python -c 'print "=?utf-8?Q?"+"测试".encode("quopri")+"?="'
激活分区的 quota 功能,ext 系列的文件系统可以编辑 /etc/fstab,添加 usrquota/grpqouta 挂载选项,然后
# mount -o remount /mount/point
xfs 文件系统不能通过 remount 来激活,需要 umount/mount
使用 edquota 或者 quotatool 来配置用户/组的磁盘限额
# edquota -u $user
# quotatool -b -u $user -l $hard_limit -q $soft_limit $mount_point
然后打开 quota
# /etc/init.d/quota start