linux-硬盘热插拔实验

硬盘热插拔实验:

     做这个实验,是为了以后安装完ceph集群后,碰到硬盘故障,如果支持热插拔,就可以方便更换硬盘,并重新把磁盘加入ceph存储。

     如果,硬盘不支持热插拔,当有磁盘故障,更换磁盘后,系统不能识别到磁盘,需要重启系统后,让磁盘重新识别上。

硬件:
机器型号:dell 630 ; 系统:centos 7.4 ;
raid卡:PERC H730P Mini ,4个1T的磁盘设置为 non-raid ; 

试验结果:支持磁盘热插拔,当更换硬盘后,系统自动识别新插入的硬盘。

实验过程:
开机状态下拔掉一个硬盘:sdd
[root@localhost ~]# tail /var/log/messages  -f
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#7 CDB: Write(10) 2a 00 20 81 07 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545326848
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#6 CDB: Write(10) 2a 00 20 81 05 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545326336
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#5 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#5 CDB: Write(10) 2a 00 20 81 03 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545325824
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#4 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#4 CDB: Write(10) 2a 00 20 81 01 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545325312
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#3 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#3 CDB: Write(10) 2a 00 20 80 ff 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545324800
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#2 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#2 CDB: Write(10) 2a 00 20 80 fd 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545324288
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#1 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#1 CDB: Write(10) 2a 00 20 80 fb 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545323776
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#0 CDB: Write(10) 2a 00 20 80 f9 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545323264
Nov 21 00:43:44 localhost kernel: Aborting journal on device sdd1-8.
Nov 21 00:43:44 localhost kernel: JBD2: Error -5 detected when updating journal superblock for sdd1-8.
Nov 21 00:43:44 localhost systemd: Unmounting /data-d...
Nov 21 00:43:44 localhost kernel: EXT4-fs error (device sdd1): ext4_put_super:791: Couldn't clean up the journal
Nov 21 00:43:44 localhost kernel: EXT4-fs (sdd1): Remounting filesystem read-only
Nov 21 00:43:44 localhost systemd: Unmounted /data-d.

查看块设备信息,发现sdd已经不存在:
[root@localhost /]# lsblk 
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0  1.1T  0 disk 
├─sda1            8:1    0  500M  0 part /boot
└─sda2            8:2    0  1.1T  0 part 
  ├─centos-root 253:0    0   50G  0 lvm  /
  ├─centos-swap 253:1    0 31.4G  0 lvm  [SWAP]
  └─centos-home 253:2    0    1T  0 lvm  /home
sdb               8:16   0  1.1T  0 disk 
└─sdb1            8:17   0  1.1T  0 part /data-b
sdc               8:32   0  1.1T  0 disk 
└─sdc1            8:33   0  1.1T  0 part /data-c
sr0              11:0    1 1024M  0 rom  



重新插上硬盘:sdd ,并查看日志:
[root@localhost ~]# tail /var/log/messages  -f
Nov 21 00:46:37 localhost systemd: Started Network Manager Script Dispatcher Service.
Nov 21 00:46:37 localhost nm-dispatcher: Dispatching action 'dhcp4-change' for em1
Nov 21 00:48:01 localhost kernel: scsi 0:0:3:0: Direct-Access     HGST     HUC101812CSS204  FJ23 PQ: 0 ANSI: 6
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: [sdd] Disabling DIF Type 2 protection
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: [sdd] 2344225968 512-byte logical blocks: (1.20 TB/1.09 TiB)
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: Attached scsi generic sg3 type 0
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: [sdd] Write Protect is off
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: [sdd] Write cache: disabled, read cache: enabled, supports DPO and FUA
Nov 21 00:48:01 localhost kernel: sdd: sdd1
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: [sdd] Attached SCSI disk
再次查看块设备信息,sdd已经被系统识别:
[root@localhost /]# lsblk 
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0  1.1T  0 disk 
├─sda1            8:1    0  500M  0 part /boot
└─sda2            8:2    0  1.1T  0 part 
  ├─centos-root 253:0    0   50G  0 lvm  /
  ├─centos-swap 253:1    0 31.4G  0 lvm  [SWAP]
  └─centos-home 253:2    0    1T  0 lvm  /home
sdb               8:16   0  1.1T  0 disk 
└─sdb1            8:17   0  1.1T  0 part /data-b
sdc               8:32   0  1.1T  0 disk 
└─sdc1            8:33   0  1.1T  0 part /data-c
sdd               8:48   0  1.1T  0 disk 
└─sdd1            8:49   0  1.1T  0 part 
sr0              11:0    1 1024M  0 rom 


关于硬盘分区 uuid

参考 :

http://blog.csdn.net/blaider/article/details/48264473
http://blog.csdn.net/smstong/article/details/46417213


实验:

[root@localhost /]# blkid /dev/sdd1
/dev/sdd1: UUID="aa9766c9-08e0-4f47-bfe0-4d5d9fda9b6b" TYPE="ext4"
删除分区后,重新建分区,uuid就变了:
[root@localhost /]# blkid /dev/sdd1
/dev/sdd1: UUID="5e673343-791a-4f60-a9ec-fdd52c8c394c" TYPE="ext4"
[root@localhost /]# ls /dev/disk/by-uuid/ -l
lrwxrwxrwx. 1 root root 10 11月 21 01:05 5e673343-791a-4f60-a9ec-fdd52c8c394c -> ../../sdd1

你可能感兴趣的:(ceph)