何为SCSI锁?
在一个共享存储的环境下,多台主机可能会同时访问同一台存储设备,如果此时多台主机在同一时点上对一个Lun进行写的操作,那么可想而知这个Lun将不知道哪个数据先写,哪个数据后写。为了防止这种情况发生而导致的数据损坏,于是就引入了SCSI锁的概念。如下图中HostA对Lun进行读写时,对Lun加上SCSI锁,此时HostB将无法对该Lun进行访问。
HostA HostB
\ /
\ /
Lun
1. SCSI锁的类型。
通常来讲目前SCSI锁有两种类型:SCSI-2Reservation和SCSI-3 Reservation,这里SCSI-3Reservation也称之为Persistent Reservation。这两种类型的的锁是不能共存在一个Lun上的。
SCSI-2 Reservation只允许设备被发出加锁的Initiator访问,这里Initiator一般指HBA。比如HostA上的fcs0对访问的LUN加上SCSI-2锁,此时即使HostA上的fcs1也无法访问该Lun。所以SCSI-2 Reservation有时也被称为single-pathreservation。
SCSI-3 Reservation(PersistentReservation)是使用PR Key来对设备进行加锁。通常一台Host会有唯一的PR Key,不同的host,PRKey也不同。所以一般SCSI-3 Reservation通常被应用在多通路的共享环境下面。
2. 什么情况下设备会被加锁?
一般设备被打开时将会被加上锁。比如varyonvg、dd等等,需要注意的是对于dd这种命令当它运行时设备会被加锁,运行完成后会自动解锁。
注意:varyonvg -c不会对设备加锁。
另外,当vg varyon之后,只有varyoffvg或者varyonvg -b才会对vg相关的设备进行解锁。直接用shutdown命令不会做varyoffvg的动作,因此不会解锁。
3. 处理SCSI-3锁
1) 查看SC_DISK_ERR* orFSCSI_ERR*的sense data.
01 – thisindicates the SCSI status field is valid
18 - SCSI device is reserved by anotherHost
For example:
SENSE DATA
0600 0000 0000 0000 0000 0000 0000 00000000 0000 0000 0000 0118 0000 0000 0000
This is usually seen in an SC_DISK_ERR* orFSCSI_ERR* error (errpt –a output).
2) 处理ESS/DS8000/DS6000上的persistent reservation。
lquerypr -vh /dev/vpathX可以查看persistentreservation。如果有PR的话,返回值将是PR Key,可与uname -a相比较对应。
lquerypr -ch /dev/vpathX可以用来清除persistentreservation。注意:该命令慎用!!!
在SDDPCM的环境下用pcmquerypr。
3) 处理DS4000上的persistent reservation。
通过SM中Advanced->Maintenance->PersistentReservations查看与清除logical drive的SCSI-3reservation。
4. 处理SCSI-2锁
有时我们用lquerypr无法看到vpath上有锁,或者SM的persistentreservation输出也显示无锁,但hdisk/vpath就是无法访问。这时请检查是否有SCSI-2reservation。
注意:以下命令方式不仅限于SCSI-2锁,对SCSI-3锁也适用。
1) DS4000: hlmTestLunShow SSID查看。
hlmTestLunShow 8
LunNumber:0x8 LunInfo :0x6642c5c State:0x0
QuiescenceFlag:0x0 Owner:0x1 IsReady :0x1
reserveId:0xe resv3rdId:0xffff
value = 128 = 0x80
输出中reseveId表示SCSI-2 reservation,resv3rdId表示SCSI-3 reservation。值为0xffff表示没有锁,0xe表示被hostid为0xe的host占用。
解锁方法:
ü hlmTestRelease reservId,SSID
ü 将Lun从一个控制器切换到另一个控制器
ü 在AIX上 ,使用HACMP的相关命令/usr/sbin/cluster/events/utils/cl_flutereset /dev/hdiskXX
2) DS6800: catreef "fb/volstatuslss"查看
catreef "fb/volstatus 0x12"
Vol Rsv DA State FB Status Known Format Status
---- ----- ------------------ --------------------- -------------------------
1200 PPRC GOOD Ready formatted
1201 TRAD GOOD Ready formatted
1202 PR GOOD Ready formatted
1203 None GOOD Ready formatted
PPRC Means PPRC (suggest its probably a PPRC target)
TRAD Traditional SCSI2 reserve
PR SCSI3 Persistent reserve.
解锁方法:
ü cmt -aRESET_LUN_RESERVATION -t volume
如:cmt -aRESET_LUN_RESERVATION -t 0x1202
3) DS8000: cat "/dev/cpss0/fb/volstatuslss"查看
cat "/dev/cpss0/fb/volstatus0x98"
Vol Rsv DA State FB Status Known Format Status
--- ----- ------------- --------------------- -----------------------
981D PR GOOD Ready formatted
解锁方法:
ü cmt -aRESET_LUN_RESERVATION -t volume
如:cmt -aRESET_LUN_RESERVATION -t 0x981D
5. reserve_policy
每个厂商的设备与驱动都有自己的属性,但大多都类似。我们这里以MPIO的reserve_policy为例:
No Reserve reservation policy
If you set MPIO devices with this reservepolicy, there is no reserve being made on MPIO devices. A device withoutreservation can be accessed by any initiators at any time. Input/output can besent from all the paths of the MPIO device. This is the default reserve policyof SDDPCM. (请一定注意这点)
Exclusive Host Access single-pathreservation policy
This is the scsi-2 reservation policy. Ifyou set this reserve policy for MPIO devices, only the fail_over path selectionalgorithm can be selected for the devices. With this reservation policy, anMPIO device only has one path being opened, and a scsi-2 reservation is made bythis path on the device. Input/output can only be sent through this path. Whenthis path is broken, another path will be opened and scsi-2 reservation will bemade by the new path. All input and output will be routed to this path.
Persistent Reserve Exclusive Host Accessreservation policy
If you set an MPIO device with thispersistent reserve policy, a persistent reservation is made on this device witha persistent reserve (PR) key. Any initiators who register with the same PR keycan access this device. Normally, you should pick a unique PR key for a server.Different servers should have different unique PR key. Input and output isrouted to all paths of the MPIO device, because all paths of an MPIO device areregistered with the same PR key. In a nonconcurrent clustering environment,such as HACMP, this is the reserve policy that you should select.
Current HACMP clustering software supportsno_reserve policy with Enhanced Concurrent Mode volume group. HACMP support forpersistent reserve policies for supported storage MPIO devices is notavailable.
Persistent Reserve Shared Host Accessreservation policy
If you set an MPIO device with thispersistent reserve policy, a persistent reservation is made on this device witha persistent reserve (PR) key. However, any initiators that implementedpersistent registration can access this MPIO device, even if the initiators areregistered with different PR keys. In a concurrent clustering environment, suchas HACMP, this is the reserve policy that you should select for sharingresources among multiple servers.
Current HACMP clustering software supportsno_reserve policy with Enhanced Concurrent Mode volume group. HACMP support forpersistent reserve policies for supported storage MPIO devices is notavailable.
6. AIX上有关锁的命令
1) varyonvg/varyoffvg
varyonvg会对相关的hdisk/vpath等设备加上锁。一般情况下,DS4000的hdisk,DS6000/DS8000/ESS的vpath设备会被加上SCSI-3的锁。
但早期DS4000的微码(SM8.4或更早,未经确认)则使用SCSI-2的锁。
对于DS6000/DS8000/ESS,如果没有使用vpath,则使用SCSI-2的锁。
当然我们也可以通过修改dpo、hdisk、vpath等属性指定锁的方式。
varyoffvg则会对VG相关的设备进行正常的解锁操作。
varyonvg -b也会将VG相关的设备进行解锁操作。通常该命令在VG正在被使用的主机上运行,与“-u”参数一起使用,可用来在HA环境下的LVM操作。
注意:“-b”参数会调用SC_FORCED_OPEN去打开VG中hdisk的锁,但同时对于SCSI和FC设备,该命令会解开这个hdisk所在的targetaddress上所有LUN的锁。例如,如果hdisk0、hdisk1都在fcs0下,hdisk0属于datavg,hdisk1属于testvg,此时使用varyonvg -b datavg,hdisk0与hdisk1都会被解锁。
另外,在某些特定环境下可以在另一台共享该VG但没有varyon该VG的AIX主机上使用varyonvg-b来解锁。例如,HostA与HostB共享hdisk0,hdisk0组成datavg,该datavg目前在HostA上varyon。在某些特定环境下,在HostB上运行“varyonvg -bdatavg”可以解掉hdisk0上的锁。特别注意:varyonvg设计不是用在此种环境下的,可能出现一些不可预知情况,请慎用!一个可以尝试该方法的情况是HostA与HostB共享一台非IBM存储,这台存储没有自己的工具用来解锁,也不被HACMP支持。hdisk0所构成的datavg在HostA上被varyon,此时HostA异常宕机,此时HostB肯定无法正常接管,因此hdisk0的锁无法释放。这时候可以尝试在HostB上使用varyonvg -b datavg来解锁,但不一定成功(需要看存储厂商的支持情况)。
总而言之,当一个设备不用之后,请正常varyoffvg后再关机。慎用varyonvg -b来解锁。
2) HACMP相关命令
正常情况下,在HACMP切换时,会调用/usr/es/sbin/cluster/events/utils/cl_disk_available脚本去判断设备的类型、是否有锁等,然后再调用相关命令用于解锁。
/usr/es/sbin/cluster/events/utils:
cl_flutereset (for DS4000)
cl_fscsilunreset (for SCSI-3)
cl_iscsilunreset (for iSCSI)
cl_pscsilunreset (for SCSI-2)
cl_scdiskreset (for IBM 7135)
cl_vpathreset (for sdd)
注意:单独使用这些命令不一定对所有存储都适用,而且单独使用这些命令不被IBM官方所支持。
3) lquerypr/pcmquerypr/pcmgenprkey
这几条命令在《Multipath SubsystemDevice Driver User’s Guide》上有详细说明。在这不一一阐述。
7. 需要注意的一种情况(节选自《MultipathSubsystem Device Driver User’s Guide》)
Understanding the persistent reserve issuewhen migrating from SDD to non-SDD volume groups after a system reboot
There is an issue with migrating from SDDto non-SDD volume groups after a system reboot. This issue only occurs if theSDD volume group was varied on prior to the system reboot and auto varyon wasnot set when the volume group was created. After the system reboot, the volumegroup will not be varied on.
The command to migrate from SDD to non-SDDvolume group (vp2hd) will succeed, but a subsequent command to vary on thevolume group will fail. This is because during the reboot, the persistentreserve on the physical volume of the volume group was not released, so whenyou vary on the volume group, the command will do a SCSI-2 reserve and failwith a reservation conflict.
There are two ways to avoid this issue.
1. Unmount the filesystems and vary offthe volume groups before rebooting the system.
2. Execute lquerypr -Vh /dev/vpathX on thephysical LUN before varying on volume groups after the system reboot. If theLUN is reserved by the current host, release the reserve by executing lquerypr-Vrh /dev/vpathX command. After successful execution, you will be able to varyon the volume group successfully.
简单总结来讲,这个问题产生原因是两个:
1. AIX对hdisk组成的VG做varyon,相关hdisk加上的是SCSI-2 reservation;对vpath组成的VG做varyon,相关vpaht加上的是SCSI-3 reservation。
2. 在VG Varyon的情况下,直接shutdown AIX不会解锁。
该问题非常典型,请大家自己举一反三。
附录:
1. varyonvg -b参数说明:
Breaks disk reservations on disks lockedas a result of a normal varyonvg command. Use this flag on a volume group thatis already varied on.
Notes:
l This flag unlocks all disks in a givenvolume group.
l The -b flag opens the disks in the volumegroup using SC_FORCED_OPEN flag. For SCSI and FC disks this forces open allluns on the target address that this disk resides on. Volume Groups shouldtherefore not share target addresses when using this varyon option.
l The -b flag can cause a system hang ifused on a volume group that contains an active paging space.
2. Using theSC_FORCED_OPEN Option
The SC_FORCED_OPEN option causesthe SCSI device driver to call the SCSI adapter device driver's Bus DeviceReset ioctl (SCIORESET) operation on the first open. This forces the deviceto release another initiator's reservation. After the SCIORESET commandis completed, other SCSI commands are sent as in a normal open. If any of theSCSI commands fail due to a reservation conflict, the open registers thefailure as an EBUSY status. This is also the result if a reservationconflict occurs during a normal open. The SCSI device driver should require thecaller to have appropriate authority to request the SC_FORCED_OPENoption because this request can force a device to drop a SCSI reservation. Ifthe caller attempts to initiate this system call without the proper authority,the SCSI device driver should return a value of -1, with the errnoglobal variable set to a value of EPERM.
3. Responsibilities ofthe SCSI Device Driver
SCSI device drivers are responsible forthe following actions:
l Interfacing with block I/O andlogical-volume device-driver code in the operating system.
l Translating I/O requests from theoperating system into SCSI commands suitable for the particular SCSI device.These commands are then given to the SCSI adapter device driver for execution.
l Issuing any and all SCSI commands to theattached device. The SCSI adapter device driver sends no SCSI commands exceptthose it is directed to send by the calling SCSI device driver.
l Managing SCSI device reservations andreleases. In the operating system, it is assumed that other SCSI initiatorsmight be active on the SCSI bus. Usually, the SCSI device driver reserves theSCSI device at open time and releases it at close time (except when told to dootherwise through parameters in the SCSI device driver interface). Once thedevice is reserved, the SCSI device driver must be prepared to reserve the SCSIdevice again whenever a Unit Attention condition is reported through the SCSIrequest-sense data.
4. Responsibilities ofthe Device Driver
FCP, iSCSI, and Virtual SCSI Client devicedrivers are responsible for the following actions:
l Interfacing with block I/O and logical-volumedevice-driver code in the operating system.
l Translating I/O requests from theoperating system into commands suitable for the particular device. Thesecommands are then given to the adapter device driver for execution.
l Issuing any and all commands to theattached device. The adapter device driver sends no commands except those it isdirected to send by the calling device driver.
l Managing device reservations andreleases. In the operating system, it is assumed that other initiators might beactive on the transport layer. Usually, the device driver reserves the deviceat open time and releases it at close time (except when told to do otherwisethrough parameters in the device driver interface). Once the device isreserved, the device driver must be prepared to reserve the device againwhenever a Unit Attention condition is reported through the request-sense data.