Vmware 下Oracle RAC搬家引起CRS-1006/CRS-0215/CRS-0233

   最近虚拟机下的Oracle 10g RAC搬家,搬家完毕之后,Oracle 集群resource之VIP无法正常启动,收到了CRS-0233: Resource or relatives are currently involved with another operation 错误提示。为为啥呢,原来啊,搬家了地址发生变化了,你得使用你家里的新地址阿....

1、环境描述
  Oracle 10g RAC + Suse 10
  注,将RAC虚拟机搬家之后,通常情况下我们在添加虚拟机时选择复制(移动这个没试过,不知道是否会有问题)
  其次,由于选择了复制,虚拟机要生成一个新的UUID(UUID是指在一台机器上生成的数字,它保证对在同一虚拟环境中的机器唯一性)。
  同时MAC地址以及网络接口名也会发生相应的变化(首次启动虚拟机原来的eth0以及eth1不可用),通常情况下需要对此修改。
  不同的Linux系统对新网络接口有不同的处理方法。Oracle Linux以及RedHat可以直接把原来的删除然后将新的网络接口名重命名为原来的。
  对于SuseLinux稍微有点麻烦,可以参考这里: http://blog.csdn.net/robinson_0612/article/details/8131771

2、CRS-1006/CRS-0215/CRS-0233错误
  #修改网卡之后,重新启动两个节点
  #resource vip  状态为offline
  oracle@bo2dbp:~> ./crs_stat.sh |grep bo2dbp
   Resource name                                Target     State             
  --------------                                ------     -----             
  ora.bo2dbp.ASM1.asm                           ONLINE     ONLINE on bo2dbp  
  ora.bo2dbp.LISTENER_BO2DBP.lsnr               ONLINE     OFFLINE           
  ora.bo2dbp.LISTENER_ORA10G_BO2DBP.lsnr        ONLINE     OFFLINE           
  ora.bo2dbp.gsd                                ONLINE     ONLINE on bo2dbp  
  ora.bo2dbp.ons                                ONLINE     OFFLINE           
  ora.bo2dbp.vip                                ONLINE     OFFLINE           
  ora.ora10g.db                                 ONLINE     ONLINE on bo2dbp  
  ora.ora10g.ora10g1.inst                       ONLINE     ONLINE on bo2dbp 
  
  #尝试手动启动ons 
  oracle@bo2dbp:~> crs_start ora.bo2dbp.ons 
  Attempting to start `ora.bo2dbp.ons` on member `bo2dbp`
  Start of `ora.bo2dbp.ons` on member `bo2dbp` failed.
  CRS-1006: No more members to consider
  
  CRS-0215: Could not start resource 'ora.bo2dbp.ons'.
  
  #通过onsctl方式启动也收到失败信息
  oracle@bo2dbp:~> onsctl start
  Number of onsconfiguration retrieved, numcfg = 2
  onscfg[0]
     {node = bo2dbp.2gotrade.com, port = 6200}
  Adding remote host bo2dbp.2gotrade.com:6200
  onscfg[1]
     {node = bo2dbs.2gotrade.com, port = 6200}
  Adding remote host bo2dbs.2gotrade.com:6200
  Number of onsconfiguration retrieved, numcfg = 2
  onscfg[0]
     {node = bo2dbp.2gotrade.com, port = 6200}
  Adding remote host bo2dbp.2gotrade.com:6200
  onscfg[1]
     {node = bo2dbs.2gotrade.com, port = 6200}
  Adding remote host bo2dbs.2gotrade.com:6200
  onsctl: ons failed to start

  #尝试手动启动vip,收到了CRS-0233错误
  oracle@bo2dbp:~> crs_start ora.bo2dbp.vip
  CRS-0233: Resource or relatives are currently involved with another operation.

3、分析故障
  #查看节点bo2dbp主机上的ip地址
  oracle@bo2dbp:~> ifconfig     #当前系统的网络接口名为eth2, eth5
  eth2      Link encap:Ethernet  HWaddr 00:0C:29:4A:66:28  
            inet addr:192.168.7.51  Bcast:192.168.7.255  Mask:255.255.255.0
  
  eth5      Link encap:Ethernet  HWaddr 00:0C:29:4A:66:32  
            inet addr:10.10.7.51  Bcast:10.10.7.255  Mask:255.255.255.0
            
  #查看节点bo2dbp上集群网络层的配置,网络接口名与实际的网卡名一致
  oracle@bo2dbp:~> oifcfg iflist
  eth2  192.168.7.0
  eth5  10.10.7.0
  
  oracle@bo2dbp:~> oifcfg getif -global  #此处发现网络接口名与实际的网卡名不一致
  eth3  192.168.7.0  global  public
  eth4  10.10.7.0  global  cluster_interconnect

  #查看节点bo2dbs主机上的ip地址
  oracle@bo2dbs:~> ifconfig 
  eth5      Link encap:Ethernet  HWaddr 00:0C:29:27:43:EB  
       inet addr:10.10.7.52  Bcast:10.10.7.255  Mask:255.255.255.0
       
  eth6      Link encap:Ethernet  HWaddr 00:0C:29:27:43:E1 
       inet addr:192.168.7.52  Bcast:192.168.7.255  Mask:255.255.255.0
       
  #查看节点bo2dbp上集群网络层的配置,网络接口名与实际的网卡名一致
  oracle@bo2dbs:~> oifcfg iflist
  eth6  192.168.7.0
  eth5  10.10.7.0
  
  oracle@bo2dbs:~> oifcfg getif -global #此处同样发现网络接口名与实际的网卡名不一致,应该是原来的网络接口名
  eth3  192.168.7.0  global  public
  eth4  10.10.7.0  global  cluster_interconnect
  
  #从上面的情况来看,各个节点的网络接口名不一致,同时网络集群层的配置信息使用了原来的接口名,应当对其更新
  #为了统一网络接口名,下面将其使用统一的名字为bond1, bond2

4、解决故障
  #将网卡统一重命名,方法参照:http://blog.csdn.net/robinson_0612/article/details/8131771
  #下面是重命名后的结果 
  oracle@bo2dbp:~> oifcfg iflist
  bond1  192.168.7.0
  bond2  10.10.7.0     
  oracle@bo2dbs:~> oifcfg iflist
  bond1  192.168.7.0
  bond2  10.10.7.0
  
  #下面的查询中集群层的public与cluster_interconnect还是之前的配置信息
  #应该需要改成一致,此处我们先不改,看看会出现什么错误
  oracle@bo2dbp:~> oifcfg getif -global
  eth3  192.168.7.0  global  public
  eth4  10.10.7.0  global  cluster_interconnect
  oracle@bo2dbs:~> oifcfg getif -global
  eth3  192.168.7.0  global  public
  eth4  10.10.7.0  global  cluster_interconnect
 
  #重新启动crs
  oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/crsctl start crs
  root'''s password:
  Attempting to start CRS stack 
  The CRS stack will be started shortly
  
  #下面的查询表明crs后台进程正常
  oracle@bo2dbp:~> crsctl check crs
  CSS appears healthy
  CRS appears healthy
  EVM appears healthy
  
  #下面查询的结果还是跟以前一样
  oracle@bo2dbp:~> ./crs_stat.sh |grep bo2dbp
   Resource name                                Target     State             
  --------------                                ------     -----             
  ora.bo2dbp.ASM1.asm                           ONLINE     ONLINE on bo2dbp  
  ora.bo2dbp.LISTENER_BO2DBP.lsnr               ONLINE     OFFLINE           
  ora.bo2dbp.LISTENER_ORA10G_BO2DBP.lsnr        ONLINE     OFFLINE           
  ora.bo2dbp.gsd                                ONLINE     ONLINE on bo2dbp  
  ora.bo2dbp.ons                                ONLINE     OFFLINE          
  ora.bo2dbp.vip                                ONLINE     OFFLINE           
  ora.ora10g.db                                 ONLINE     ONLINE on bo2dbp  
  ora.ora10g.ora10g1.inst                       ONLINE     ONLINE on bo2dbp  
 
  #将所有的资源全部关闭
  oracle@bo2dbp:~> crs_stop -all
 
  #使用oifcfg修改集群层网络配置
  oracle@bo2dbp:~> oifcfg delif -global
  oracle@bo2dbp:~> oifcfg getif -global
  oracle@bo2dbp:~> oifcfg setif -global bond1/192.168.7.0:public
  oracle@bo2dbp:~> oifcfg setif -global bond2/10.10.7.0:cluster_interconnect
  oracle@bo2dbp:~> oifcfg getif -global
  bond1  192.168.7.0  global  public
  bond2  10.10.7.0  global  cluster_interconnect
 
  #reboot之后,资源状态还是跟之前一样
  #先查看vip日志信息,我们先来解决vip的问题
  bo2dbp:/u01/oracle/crs/log/bo2dbp/racg # tail -50 ora.bo2dbp.vip.log
  2012-12-28 11:25:13.783: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: 
            clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs
  
  2012-12-28 11:25:13.783: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: 
       clsrcexecut: cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip start bo2dbp
  
  2012-12-28 11:25:13.783: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: clsrcexecut: rc = 1, time = 3.220s
  
  2012-12-28 11:25:16.979: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]:
       clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs
  
  2012-12-28 11:25:16.979: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: clsrcexecut:
       cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip check bo2dbp
  
  2012-12-28 11:25:16.979: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: clsrcexecut: rc = 1, time = 3.190s
  
  2012-12-28 11:25:16.979: [RACG][2151948784] [16581][2151948784][ora.bo2dbp.vip]: end for resource = 
       ora.bo2dbp.vip, action = start, status = 1, time = 6.430s
  
  2012-12-28 11:25:23.807: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: eth3: error 
       fetching interface information: Device not found    #此处提示eth3没有找到,我们希望使用的是bond1
  checkIf: interface eth3 is down           
  Invalid parameters, or failed to bring up VIP (host=bo2dbp)
  
  2012-12-28 11:25:23.807: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: 
    clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs
  
  2012-12-28 11:25:23.807: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: clsrcexecut: 
    cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip start bo2dbp
  
  2012-12-28 11:25:23.807: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: clsrcexecut:
    rc = 1, time = 3.220s
  
  2012-12-28 11:25:27.018: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: clsrcexecut: 
    env ORACLE_CONFIG_HOME=/u01/oracle/crs
  
  2012-12-28 11:25:27.018: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: clsrcexecut: 
    cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip check bo2dbp
  
  2012-12-28 11:25:27.018: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: clsrcexecut: rc = 1, time = 3.210s
  
  2012-12-28 11:25:27.018: [RACG][695611888] [17488][695611888][ora.bo2dbp.vip]: end for resource = 
    ora.bo2dbp.vip, action = start, status = 1, time = 6.450s
  
  2012-12-28 11:25:33.822: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: eth3: error 
    fetching interface information: Device not found   #再次出现eth3没有找到的错误提示
  checkIf: interface eth3 is down
  Invalid parameters, or failed to bring up VIP (host=bo2dbp)
  
  2012-12-28 11:25:33.822: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: 
    env ORACLE_CONFIG_HOME=/u01/oracle/crs
  
  2012-12-28 11:25:33.822: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: 
    cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip start bo2dbp
  
  2012-12-28 11:25:33.822: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: rc = 1, time = 3.210s
  
  2012-12-28 11:25:37.063: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: 
    env ORACLE_CONFIG_HOME=/u01/oracle/crs
  
  2012-12-28 11:25:37.063: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: 
    cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /u01/oracle/crs/bin/racgvip check bo2dbp
  
  2012-12-28 11:25:37.063: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: clsrcexecut: rc = 1, time = 3.240s
  
  2012-12-28 11:25:37.063: [RACG][3600347632] [18308][3600347632][ora.bo2dbp.vip]: end for resource = 
    ora.bo2dbp.vip, action = start, status = 1, time = 6.490s
 
  #从上面的日志可知还是那个网络接口名的问题
  #也可以看出action = start, status = 1, time = 6.490s 这个地方应该是Target为Online,而实际上State为offline
  #网络接口层也改了,那就是这个eth3还在OCR中没有被更新,接下来尝试更新
  
  bo2dbp:/u01/oracle/crs/bin # ./srvctl modify nodeapps -n bo2dbp -A 192.168.7.61/255.255.255.0/bond1
  
  #对第二个节点上也采用相同的方式来更新
  bo2dbs:~ # /u01/oracle/crs/bin/srvctl modify nodeapps -n bo2dbs -A 192.168.7.62/255.255.255.0/bond1
 
  #此时成功启动vip
  oracle@bo2dbp:~> crs_start ora.bo2dbp.vip
  Attempting to start `ora.bo2dbp.vip` on member `bo2dbp`
  Start of `ora.bo2dbp.vip` on member `bo2dbp` succeeded.
 
  #接下来查看ons的日志信息
  oracle@bo2dbp:/u01/oracle/crs/log/bo2dbp/racg> tail -20 ora.bo2dbp.ons.log
   
  ............ 
  onscfg[0]
     {node = bo2dbp.2gotrade.com, port = 6200}
  Adding remote host bo2dbp.2gotrade.com:6200
  onscfg[1]
     {node = bo2dbs.2gotrade.com, port = 6200}
  Adding remote host bo2dbs.2gotrade.com:6200
  ons is n
  2012-12-28 11:00:49.345: [    RACG][2554102256] [19071][2554102256][ora.bo2dbp.ons]: ot running ...
  
  2012-12-28 11:00:49.345: [    RACG][2554102256] [19071][2554102256][ora.bo2dbp.ons]: 
    clsrcexecut: env ORACLE_CONFIG_HOME=/u01/oracle/crs
  
  2012-12-28 11:00:49.345: [    RACG][2554102256] [19071][2554102256][ora.bo2dbp.ons]: clsrcexecut: 
    cmd = /u01/oracle/crs/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/oracle/crs/bin/onsctl ping
  
  2012-12-28 11:00:49.345: [    RACG][2554102256] [19071][2554102256][ora.bo2dbp.ons]: clsrcexecut: rc = 1, time = 0.210s
  
  2012-12-28 11:00:49.346: [    RACG][2554102256] [19071][2554102256][ora.bo2dbp.ons]: end 
    for resource = ora.bo2dbp.ons, action = start, status = 1, time = 7.560s
  
  2012-12-28 11:00:55.661: [    RACG][368746992] [19812][368746992][ora.bo2dbp.ons]: onsctl: shutting down ons daemon ...
  CONNECT: Connection refused
  Number of onsconfiguration retrieved, numcfg = 2
  onscfg[0]
     {node = bo2dbp.2gotrade.com, port = 6200}
  Adding remote host bo2dbp.2gotrade.com:6200
  onscfg[1]
     {node = bo2dbs.2gotrade.com, 
  2012-12-28 11:00:55.661: [    RACG][368746992] [19812][368746992][ora.bo2dbp.ons]: port = 6200}
  ...............
  
  #关于ons的这个错误,记得之前有类似的情形,之前是在安装的时候碰到的,是由于没有本地回环造成的。
  #这个问题再次出现了,由于从原来的配置复制/etc/hosts时不小心那个地方被注释掉了,汗......
  #具体参考 http://blog.csdn.net/robinson_0612/article/details/6303583
  
  #尝试启动ons成功
  oracle@bo2dbp:~> crs_start ora.bo2dbp.ons
  Attempting to start `ora.bo2dbp.ons` on member `bo2dbp`
  Start of `ora.bo2dbp.ons` on member `bo2dbp` succeeded.
  
  #Author : Robinson
  #Blog   : http://blog.csdn.net/robinson_0612

5、小结
 a、对于RAC环境下虚拟机的迁移之后,首选需要更改各个虚拟机配置文件中所有磁盘的路径(local disk,asmdisk,ocr,votingdisk) 
 b、添加虚拟机方式使用了复制方式(移动方式不确定)后将导致网卡发生变化,主要是保证MAC地址唯一
 c、需要对网络重新进行配置,如果希望使用原来的网络接口名,则重命名或命令行下修改相关配置文件使得保留原有网卡名
 d、如果使用了X window方式修改网络配置,应注意hosts文件是否发生相应的修改
 e、如果使用了新的网络接口名或者新的IP地址,应该重新配置集群网络层
 f、同时也需要将新网络接口名或者新的IP地址更新的OCR
 g、最后一句,从日志来分析与解决问题是源头,是快速定位问题的最佳途径

更多参考

有关Oracle RAC请参考
     使用crs_setperm修改RAC资源的所有者及权限
     使用crs_profile管理RAC资源配置文件
     RAC 数据库的启动与关闭
     再说 Oracle RAC services
     Services in Oracle Database 10g
     Migrate datbase from single instance to Oracle RAC
     Oracle RAC 连接到指定实例
     Oracle RAC 负载均衡测试(结合服务器端与客户端)
     Oracle RAC 服务器端连接负载均衡(Load Balance)
     Oracle RAC 客户端连接负载均衡(Load Balance)
     ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
     ORACLE RAC 监听配置 (listener.ora tnsnames.ora)
     配置 RAC 负载均衡与故障转移
     CRS-1006 , CRS-0215 故障一例 
     基于Linux (RHEL 5.5) 安装Oracle 10g RAC
     使用 runcluvfy 校验Oracle RAC安装环境

有关Oracle 网络配置相关基础以及概念性的问题请参考:
     配置非默认端口的动态服务注册
     配置sqlnet.ora限制IP访问Oracle
     Oracle 监听器日志配置与管理
     设置 Oracle 监听器密码(LISTENER)
     配置ORACLE 客户端连接到数据库

有关基于用户管理的备份和备份恢复的概念请参考
     Oracle 冷备份
     Oracle 热备份
     Oracle 备份恢复概念
     Oracle 实例恢复
     Oracle 基于用户管理恢复的处理
     SYSTEM 表空间管理及备份恢复
     SYSAUX表空间管理及恢复
     Oracle 基于备份控制文件的恢复(unsing backup controlfile)

有关RMAN的备份恢复与管理请参考
     RMAN 概述及其体系结构
     RMAN 配置、监控与管理
     RMAN 备份详解
     RMAN 还原与恢复
     RMAN catalog 的创建和使用
     基于catalog 创建RMAN存储脚本
     基于catalog 的RMAN 备份与恢复
     RMAN 备份路径困惑
     使用RMAN实现异机备份恢复(WIN平台)
     使用RMAN迁移文件系统数据库到ASM
     linux 下RMAN备份shell脚本
     使用RMAN迁移数据库到异机

有关ORACLE体系结构请参考
     Oracle 表空间与数据文件
     Oracle 密码文件
     Oracle 参数文件
     Oracle 联机重做日志文件(ONLINE LOG FILE)
     Oracle 控制文件(CONTROLFILE)
     Oracle 归档日志
     Oracle 回滚(ROLLBACK)和撤销(UNDO)
     Oracle 数据库实例启动关闭过程
     Oracle 10g SGA 的自动化管理
     Oracle 实例和Oracle数据库(Oracle体系结构) 

你可能感兴趣的:(oracle,数据库,负载均衡,database)