环境: hp ilo4 DL380 Gen9,ilo 页面检测到磁盘 degrade,但是操作系统 centos7 中显示磁盘(做过 raid 后的逻辑盘)正常。更换磁盘后,系统显示不出新的磁盘符 /dev/sdx,需要手动重启操作系统, 做 raid
目的:不停机,直接使用 hpssacli 做 raid0
需要提前删除更换的 ceph osd 对应的 lv、pv、vg
1、不重新对更换后的磁盘做 raid,系统显示不出更换后的磁盘设备
[root@cmp21 osd]# lsblk -d
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 558.9G 0 disk
sdb 8:16 0 1.7T 0 disk
sdc 8:32 0 1.7T 0 disk
sdd 8:48 0 1.7T 0 disk
sde 8:64 0 1.7T 0 disk
sdf 8:80 0 1.7T 0 disk
sdg 8:96 0 1.7T 0 disk
sdh 8:112 0 1.7T 0 disk
sdi 8:128 0 1.7T 0 disk
sdk 8:160 0 1.7T 0 disk
sdl 8:176 0 1.7T 0 disk
sdm 8:192 0 1.7T 0 disk
sdn 8:208 0 1.7T 0 disk
sdo 8:224 0 1.7T 0 disk
sdp 8:240 0 1.7T 0 disk
sdq 65:0 0 1.7T 0 disk
sdr 65:16 0 1.7T 0 disk
sds 65:32 0 1.7T 0 disk
sdt 65:48 0 1.7T 0 disk
sdu 65:64 0 1.7T 0 disk
缺少 /dev/sdj
2、使用 hpssacli 检查逻辑盘,发现 logicaldrive 10 failed
[root@cmp21 osd]# hpssacli ctrl slot=0 ld all show status
logicaldrive 1 (558.9 GB, 1): OK
logicaldrive 2 (1.6 TB, 0): OK
logicaldrive 3 (1.6 TB, 0): OK
logicaldrive 4 (1.6 TB, 0): OK
logicaldrive 5 (1.6 TB, 0): OK
logicaldrive 6 (1.6 TB, 0): OK
logicaldrive 7 (1.6 TB, 0): OK
logicaldrive 8 (1.6 TB, 0): OK
logicaldrive 9 (1.6 TB, 0): OK
logicaldrive 10 (1.6 TB, 0): Failed
logicaldrive 11 (1.6 TB, 0): OK
logicaldrive 12 (1.6 TB, 0): OK
logicaldrive 13 (1.6 TB, 0): OK
logicaldrive 14 (1.6 TB, 0): OK
logicaldrive 15 (1.6 TB, 0): OK
logicaldrive 16 (1.6 TB, 0): OK
logicaldrive 17 (1.6 TB, 0): OK
logicaldrive 18 (1.6 TB, 0): OK
logicaldrive 19 (1.6 TB, 0): OK
logicaldrive 20 (1.6 TB, 0): OK
logicaldrive 21 (1.6 TB, 0): OK
而物理盘状态显示没有问题
[root@cmp21 osd]# hpssacli ctrl slot=0 pd all show status
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 600 GB): OK
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 600 GB): OK
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 1800.3 GB): OK
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 1800.3 GB): OK
physicaldrive 1I:1:5 (port 1I:box 1:bay 5, 1800.3 GB): OK
physicaldrive 1I:1:6 (port 1I:box 1:bay 6, 1800.3 GB): OK
physicaldrive 1I:1:7 (port 1I:box 1:bay 7, 1800.3 GB): OK
physicaldrive 1I:1:8 (port 1I:box 1:bay 8, 1800.3 GB): OK
physicaldrive 1I:1:9 (port 1I:box 1:bay 9, 1800.3 GB): OK
physicaldrive 1I:1:10 (port 1I:box 1:bay 10, 1800.3 GB): OK
physicaldrive 1I:1:11 (port 1I:box 1:bay 11, 1800.3 GB): OK
physicaldrive 1I:1:12 (port 1I:box 1:bay 12, 1800.3 GB): OK
physicaldrive 1I:1:13 (port 1I:box 1:bay 13, 1800.3 GB): OK
physicaldrive 1I:1:14 (port 1I:box 1:bay 14, 1800.3 GB): OK
physicaldrive 1I:1:15 (port 1I:box 1:bay 15, 1800.3 GB): OK
physicaldrive 1I:1:16 (port 1I:box 1:bay 16, 1800.3 GB): OK
physicaldrive 1I:1:17 (port 1I:box 1:bay 17, 1800.3 GB): OK
physicaldrive 1I:1:18 (port 1I:box 1:bay 18, 1800.3 GB): OK
physicaldrive 1I:1:19 (port 1I:box 1:bay 19, 1800.3 GB): OK
physicaldrive 1I:1:20 (port 1I:box 1:bay 20, 1800.3 GB): OK
physicaldrive 1I:1:21 (port 1I:box 1:bay 21, 1800.3 GB): OK
physicaldrive 1I:1:22 (port 1I:box 1:bay 22, 1800.3 GB): OK
3、删除 logicaldrive 10
[root@cmp21 osd]# hpssacli ctrl slot=0 ld 10 delete
Warning: Deleting an array can cause other array letters to become renamed.
E.g. Deleting array A from arrays A,B,C will result in two remaining
arrays A,B ... not B,C
Warning: Deleting the specified device(s) will result in data being lost.
Continue? (y/n) y
4、重新创建 raid0
[root@cmp21 osd]# hpssacli ctrl slot=0 create type=ld drives=1I:1:11 raid=0
Warning: Creation of this logical drive has caused array letters to become
renamed.
5、查看系统磁盘信息
[root@cmp21 osd]# lsblk -d
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 558.9G 0 disk
sdb 8:16 0 1.7T 0 disk
sdc 8:32 0 1.7T 0 disk
sdd 8:48 0 1.7T 0 disk
sde 8:64 0 1.7T 0 disk
sdf 8:80 0 1.7T 0 disk
sdg 8:96 0 1.7T 0 disk
sdh 8:112 0 1.7T 0 disk
sdi 8:128 0 1.7T 0 disk
sdj 8:144 0 1.7T 0 disk
sdk 8:160 0 1.7T 0 disk
sdl 8:176 0 1.7T 0 disk
sdm 8:192 0 1.7T 0 disk
sdn 8:208 0 1.7T 0 disk
sdo 8:224 0 1.7T 0 disk
sdp 8:240 0 1.7T 0 disk
sdq 65:0 0 1.7T 0 disk
sdr 65:16 0 1.7T 0 disk
sds 65:32 0 1.7T 0 disk
sdt 65:48 0 1.7T 0 disk
sdu 65:64 0 1.7T 0 disk
注意事项:
需要提前删除更换的 ceph osd 对应的 lv、pv、vg,否则磁盘符会增加,不会恢复到原来的磁盘符
[root@cmp13 ~]# lsblk -d
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 558.9G 0 disk
sdb 8:16 0 1.7T 0 disk
sdc 8:32 0 1.7T 0 disk
sdd 8:48 0 1.7T 0 disk
sde 8:64 0 1.7T 0 disk
sdf 8:80 0 1.7T 0 disk
sdg 8:96 0 1.7T 0 disk
sdh 8:112 0 1.7T 0 disk
sdi 8:128 0 1.7T 0 disk
sdj 8:144 0 1.7T 0 disk
sdk 8:160 0 1.7T 0 disk
sdl 8:176 0 1.7T 0 disk
sdm 8:192 0 1.7T 0 disk
sdn 8:208 0 1.7T 0 disk
sdo 8:224 0 1.7T 0 disk
sdp 8:240 0 1.7T 0 disk
sdq 65:0 0 1.7T 0 disk
sds 65:32 0 1.7T 0 disk
sdt 65:48 0 1.7T 0 disk
sdu 65:64 0 1.7T 0 disk
sdv 65:80 0 1.7T 0 disk
这时使用 hpssacli 清除 ld,再重新创建 ld,新的磁盘符不会出现,还会出现 multiple uuid 错误,pvscan --cache 会报错
pvs
pvscan --cache
hpssacli 清除 ld
hpssacli ctrl slot=0 ld x delete
刷新磁盘管理
echo "- - -" > /sys/class/scsi_host/hostx/scan
清除 /dev/sdv
echo 1 > /sys/block/sdx/device/delete
清除错误 Error reading device /dev/ceph-3310ea32-d1d3-4b15-86ef-ef3047684498/osd-data-99e5b417-19f9-427c-aa77-d8dc7cece135 at 0 length 512.
[root@cmp13 ~]# pvs
Error reading device /dev/ceph-3310ea32-d1d3-4b15-86ef-ef3047684498/osd-data-99e5b417-19f9-427c-aa77-d8dc7cece135 at 0 length 512.
Error reading device /dev/ceph-3310ea32-d1d3-4b15-86ef-ef3047684498/osd-data-99e5b417-19f9-427c-aa77-d8dc7cece135 at 0 length 4.
Error reading device /dev/ceph-3310ea32-d1d3-4b15-86ef-ef3047684498/osd-data-99e5b417-19f9-427c-aa77-d8dc7cece135 at 4096 length 4.
PV VG Fmt Attr PSize PFree
/dev/sdb ceph-8fb04441-4899-4470-979f-b82319df1c04 lvm2 a-- <1.64t 0
/dev/sdc ceph-dea3360a-dea5-45fd-918b-86b530750d51 lvm2 a-- <1.64t 0
/dev/sdd ceph-91a54c77-1d74-4c7a-96c7-d97cda9ce269 lvm2 a-- <1.64t 0
/dev/sde ceph-76e9dd42-35ac-426d-9733-1c123be18a4c lvm2 a-- <1.64t 0
/dev/sdf ceph-d8ed0ac2-07e2-47ca-8be2-2517064336fe lvm2 a-- <1.64t 0
/dev/sdg ceph-f82bd637-daf2-4ad2-9660-95df9b1dfca9 lvm2 a-- <1.64t 0
/dev/sdh ceph-c688cfe8-0488-4646-942c-bba590eb01c5 lvm2 a-- <1.64t 0
/dev/sdi ceph-70ae3747-d758-4b28-973f-0acd2d21e6b2 lvm2 a-- <1.64t 0
/dev/sdj ceph-44c4d98f-1940-4d9f-b132-3fdf5178233c lvm2 a-- <1.64t 0
/dev/sdk ceph-a1b13b3b-b8d7-40d4-b7d4-ca5174adb012 lvm2 a-- <1.64t 0
/dev/sdl ceph-2f15820c-0e12-4518-a103-26bd3ee35d4e lvm2 a-- <1.64t 0
/dev/sdm ceph-ed5d6897-d93c-4037-a3e8-5a7fce070ba8 lvm2 a-- <1.64t 0
/dev/sdn ceph-b8502c37-69b9-45a9-bbaf-d29d6d322d991 lvm2 a-- <1.64t 0
/dev/sdo ceph-6968d0c9-6b29-4457-8e08-b292324675f4 lvm2 a-- <1.64t 0
/dev/sdp ceph-a102a5f5-a59e-4dd0-9c88-8c721603af6e lvm2 a-- <1.64t 0
/dev/sdq ceph-49543ed2-4b3e-43d9-aa25-7693ded3c898 lvm2 a-- <1.64t 0
/dev/sds ceph-15fb9166-e800-4556-bfee-4a97b7f598b5 lvm2 a-- <1.64t 0
/dev/sdt ceph-f7f0916c-385d-4f41-a041-0009d914bd0e lvm2 a-- <1.64t 0
/dev/sdu ceph-42315175-e55c-4bc3-8db6-c184986f28b4 lvm2 a-- <1.64t 0
[root@cmp13 ~]# dmsetup remove /dev/ceph-3310ea32-d1d3-4b15-86ef-ef3047684498/osd-data-99e5b417-19f9-427c-aa77-d8dc7cece135
参考文章:
17 hpacucli Command Examples for Linux on HP Servers
Opening the CLI in Console mode
Physically removed a disk before deactivating volume group and can’t get LVM to stop printing errors about it
Linux 不重启扫描存储磁盘