AIX 6.1的系统,使用的是EMC CLARiiON存储,oracle10.2.0.5
lsvg data03vg VOLUME GROUP: data03vg VG IDENTIFIER: 00f79d1100004c00000001386f00edfb VG STATE: active PP SIZE: 128 megabyte (s) VG PERMISSION: read/write TOTAL PPs: 5315 (680320 megabytes) MAX LVs: 512 FREE PPs: 711 (91008 megabytes) LVs: 78 USED PPs: 4604 (589312 megabytes) OPEN LVs: 0 QUORUM: 3 (Enabled) TOTAL PVs: 5 VG DESCRIPTORS: 5 STALE PVs: 0 STALE PPs: 0 ACTIVE PVs: 5 AUTO ON: no Concurrent: Enhanced-Capable Auto-Concurrent: Disabled VG Mode: Non-Concurrent MAX PPs per VG: 130048 MAX PPs per PV: 2032 MAX PVs: 64 LTG size (Dynamic): 1024 kilobyte(s) AUTO SYNC: no HOT SPARE: no BB POLICY: relocatable PV RESTRICTION: none INFINITE RETRY: no
lsvg data01vg VOLUME GROUP: data01vg VG IDENTIFIER: 00f79d1100004c00000001386effcc48 VG STATE: active PP SIZE: 128 megabyte (s) VG PERMISSION: read/write TOTAL PPs: 6378 (816384 megabytes) MAX LVs: 512 FREE PPs: 1146 (146688 megabytes) LVs: 88 USED PPs: 5232 (669696 megabytes) OPEN LVs: 0 QUORUM: 4 (Enabled) TOTAL PVs: 6 VG DESCRIPTORS: 6 STALE PVs: 0 STALE PPs: 0 ACTIVE PVs: 6 AUTO ON: no Concurrent: Enhanced-Capable Auto-Concurrent: Disabled VG Mode: Non-Concurrent MAX PPs per VG: 130048 MAX PPs per PV: 2032 MAX PVs: 64 LTG size (Dynamic): 1024 kilobyte(s) AUTO SYNC: no HOT SPARE: no BB POLICY: relocatable PV RESTRICTION: none INFINITE RETRY: no
lsvg data02vg VOLUME GROUP: data02vg VG IDENTIFIER: 00f79d1100004c00000001386f007c90 VG STATE: active PP SIZE: 128 megabyte (s) VG PERMISSION: read/write TOTAL PPs: 2126 (272128 megabytes) MAX LVs: 512 FREE PPs: 18 (2304 megabytes) LVs: 39 USED PPs: 2108 (269824 megabytes) OPEN LVs: 0 QUORUM: 2 (Enabled) TOTAL PVs: 2 VG DESCRIPTORS: 3 STALE PVs: 0 STALE PPs: 0 ACTIVE PVs: 2 AUTO ON: no Concurrent: Enhanced-Capable Auto-Concurrent: Disabled VG Mode: Non-Concurrent MAX PPs per VG: 130048 MAX PPs per PV: 2032 MAX PVs: 64 LTG size (Dynamic): 1024 kilobyte(s) AUTO SYNC: no HOT SPARE: no BB POLICY: relocatable PV RESTRICTION: none INFINITE RETRY: no
看到vg mode:non-concurrent状态,那在看下vg中pv的状态:
lsvg -p data03vg data03vg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdiskpower11 active 1063 43 01..00..00..00..42 hdiskpower17 removed 1063 167 21..00..00..00..146 hdiskpower18 removed 1063 167 21..00..00..00..146 hdiskpower19 removed 1063 167 21..00..00..00..146 hdiskpower20 removed 1063 167 21..00..00..00..146
lsvg -p data01vg data01vg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdiskpower7 active 1063 0 00..00..00..00..00 hdiskpower8 active 1063 20 00..00..00..00..20 hdiskpower9 active 1063 24 02..00..00..00..22 hdiskpower10 active 1063 0 00..00..00..00..00 hdiskpower16 missing 1063 551 21..00..105..212..213 hdiskpower21 missing 1063 551
lsvg -p data02vg data02vg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdiskpower0 active 1063 0 00..00..00..00..00 hdiskpower24 active 1063 18 00..00..00..00..18
Thu Mar 21 17:53:58 BEIST 2013
Errors in
file /oracle/app/oracle/admin/ctsdb/bdump/ctsdb2_m000_19595456.trc:
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
odmget HACMPdisktype HACMPdisktype: PdDvLn = "disk/pseudo/power" ghostdisks = "SCSI3" checkres = "SCSI_TUR" breakres = "/usr/lpp/EMC/Symmetrix/bin/emcpowerreset" parallel = "false" makedev = "MKDEV" reserved1 = "" reserved2 = "" reserved3 = ""
l***c –a | grep cl clcomd caa 7929856 active clcomdES clcomdES 9633858 active clstrmgrES cluster 9240596 active gsclvmd inoperative clinfoES cluster 17104944 active clconfd caa inoperative nimsh nimclient inoperative
两节点的gsclvmd 都是inoperative,看来只能重启hacmp来把gsclvmd给拉起来。
节点1: su – oracle srvctl stop listener –n ctscrm1 ps –ef | grep “LOCAL=NO”| grep –v grep | awk ‘{print $2}’|xargs kill -9 oracle> alter system switch logfile; oracle> alter system checkpoint; srvctl stop instance –d ctsdb –I ctsdb1
节点2: su – oracle srvctl stop listener –n ctscrm2 ps –ef | grep “LOCAL=NO”| grep –v grep | awk ‘{print $2}’|xargs kill -9 oracle> alter system switch logfile; oracle> alter system checkpoint; srvctl stop instance –d ctsdb –I ctsdb2
crsctlstop crs
smit clstop
结果悲剧再次发生,两个节点虽然hacmp都停,可现在本身就有异常,想把hacmp停了,以为vg也可以跟着卸载下来,可事实是vg都没卸载下来,手动卸载,节点1的data01vg一直卡着不动,无解了,只能shutdown –Fr了,重启后vg都已经卸载了。开始修改EMC clarion系统存储,需要使用并行环境:
smit hacmp》extended configure》extended resource configure》hacmp extended resource configure 》 configure custom disk methods 》 change/show custom disk methods 选择disk/pseudo/power 修改parallel 为true 两个节点都这么做。
odmget HACMPdisktype HACMPdisktype: PdDvLn = "disk/pseudo/power" ghostdisks = "SCSI3" checkres = "SCSI_TUR" breakres = "/usr/lpp/EMC/Symmetrix/bin/emcpowerreset" parallel = "true" makedev = "MKDEV" reserved1 = "" reserved2 = "" reserved3 = ""
启动hacmp:smit clstart 这回又出问题了,启动报错,具体的错误我没记,反正报的就是卷组的时间戳不同步了。继续搞:
节点1,2 exportvg data01vg;import –y data01vg hdiskpower7 exportvg data02vg;import –y data02vg hdiskpower24 exportvg data03vg;import –y data03vg hdiskpower17
这回在启动hacmp,smit clstart 这时OK了,hacmp可以正常启动了,检查各个vg的状态也都是Concurrent的了。
本以为现在问题已经解决了,启动crsctl start crs;crs_stat –t
crs没启动,根本就没有反应。/etc/init.d/init.crs start 还是启动不了,没任何反应,日志也没任何变化,这时想到了,由于exportvg和importvg了,权限没有进行修改。修改裸设备的权限:
cd /dev;chown oracle:dba rdb*;chown oracle:dba rbss*;chown oracle:dba rctsbss*;
chown oracle:dba rvote*;chmod 766 rvote* chmod 766 *ocr*; /etc/init.d/init.crs start;
这时启动了,过一会crs_stat –t可以看到都是online了,到此问题都解决了。
lslpp –l | grep –i emc EMC.CELERRA.aix.rte COMMITTED EMC CELERRA AIX Support EMC.CLARiiON.aix.rte COMMITTED EMC CLARiiON AIX Support EMC.CLARiiON.fcp.rte COMMITTED EMC CLARiiON FCP Support EMC.CLARiiON.ha.rte COMMITTED EMC CLARiiON HA Concurrent EMCpower.base COMMITTED PowerPath Base Driver and EMCpower.encryption COMMITTED PowerPath Encryption with RSA EMCpower.migration_enabler EMCpower.mpx COMMITTED PowerPath Multi_Pathing EMC.CELERRA.aix.rte COMMITTED EMC CELERRA AIX Support EMC.CLARiiON.aix.rte COMMITTED EMC CLARiiON AIX Support EMC.CLARiiON.fcp.rte COMMITTED EMC CLARiiON FCP Support devices.common.IBM.modemcfg.data
这个就是emc存储提供ha concurrent的软件包,如果没有的话是不支持不了concurrent的,得需要安装。然后进行parallel的修改就像上面提到的:
smit hacmp》extended configure》extended resource configure》hacmp extended resource configure 》 configure custom disk methods 》 change/show custom disk methods
/usr/sbin/cluster/utilities/clcustomdisk –c -tdisk/pseudo/power -Ndisk/pseudo/power -gSCSI3 -hSCSI_TUR -b/usr/lpp/EMC/CLARiiON/bin/emcpowerreset -ptrue –mMKDEV