11g R2 节点系统重建后,删除节点及添加节点 过程和问题解决

故障现象:

http://www.santongit.com/thread-12327-1-1.html 

  一个RAC数据库,两个节点,RedHat 6.3_X64的系统,因为业务问题,节点2的服务器的系统进行了重装。
  现需要重建节点2 。

节点的重建
  一:先从集群中清除节点2的信息
      因为节点2服务器系统已经重装所以在清除节点时,在RAC上清除本地的操作就不需要操作了。直接在节点1上面
    从集群中清除节点2的信息
    (1):
        [root@racdb1 ~]# olsnodes -t –s                  #####查看集群中的节点   
        [root@racdb1 ~]# crsctl unpin css -n racdb2       #####在所有保留的节点上执行
        
    (2):删除节点2的数据库实例 使用dbca
     [oracle@racdb1 ~]$ dbca –图形界面
     验证racdb2实例已被删除
        查看活动的实例:
        [oracle@racdb1 ~]$ sqlplus / as sysdba
        SQL> select thread#,status,instance from v$thread;
       
       注:此过程可能报错,因为在节点2上已经重装系统,在DBCA删除实例时无法找到节点2上的相应文件。只要保证
           在数据库中查不到racdb2的实例即可
       
          查看库的配置:
           [root@racdb1 ~]# srvctl config database -d orcl
           
     (3):在racdb1节点上停止racdb2节点NodeApps
           
          [oracle@racdb1 bin]$ srvctl stop nodeapps -n racdb2 -f
                           
     (4):在保留节点使用oracle用户更新集群列表
            在每个保留的节点上执行:----------因为这是两个节点所以只在racdb1上执行就可以了
            [root@racdb1 ~]# su – oracle
            [oracle@racdb1 ~]$ $ORACLE_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME 
            “CLUSTER_NODES={racdb1}”
       注:此时会报错,因为这是集群需要在racdb2上执行相关语句。报如下错误:
          SEVERE: oracle.sysman.oii.oiip.oiipg.OiipgRemoteOpsException: Error occured while trying to run Unix command /u01/app/11.2.0/grid/oui/bin/../bin/runInstaller  -paramFile /u01/app/11.2.0/grid/oui/bin/../clusterparam.ini  -silent -ignoreSysPrereqs -updateNodeList -noClusterEnabled ORACLE_HOME=/u01/app/11.2.0/grid CLUSTER_NODES=racdb1,racdb2 CRS=true  "INVENTORY_LOCATION=/u01/app/oraInventory" LOCAL_NODE=racdb2 -remoteInvocation -invokingNodeName racdb1 -logFilePath "/u01/app/oraInventory/logs" -timestamp 2014-12-03_11-23-57PM on nodes racdb2. [PRKC-1044 : Failed to check remote command execution setup for node racdb2 using shells /usr/bin/ssh and /usr/bin/rsh 
  File "/usr/bin/rsh" does not exist on node "racdb2"
  No RSA host key is known for racdb2 and you have requested strict checking.Host key verification failed.]
  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmdOnUnix(OiipgClusterRunCmd.java:276)
  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runAnyCmdOnNodes(OiipgClusterRunCmd.java:369)
  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmd(OiipgClusterRunCmd.java:314)
  at oracle.sysman.oii.oiic.OiicBaseInventoryApp.runRemoteInvOpCmd(OiicBaseInventoryApp.java:281)
  at oracle.sysman.oii.oiic.OiicUpdateNodeList.clsCmdUpdateNodeList(OiicUpdateNodeList.java:296)
  at oracle.sysman.oii.oiic.OiicUpdateNodeList.doOperation(OiicUpdateNodeList.java:240)
  at oracle.sysman.oii.oiic.OiicBaseInventoryApp.main_helper(OiicBaseInventoryApp.java:890)
  at oracle.sysman.oii.oiic.OiicUpdateNodeList.main(OiicUpdateNodeList.java:401)
  Caused by: oracle.ops.mgmt.cluster.ClusterException: PRKC-1044 : Failed to check remote command execution setup for node racdb2 using shells /usr/bin/ssh and /usr/bin/rsh 
  File "/usr/bin/rsh" does not exist on node "racdb2"
  No RSA host key is known for racdb2 and you have requested strict checking.Host key verification failed.
  at oracle.ops.mgmt.cluster.ClusterCmd.runCmd(ClusterCmd.java:2149)
  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmdOnUnix(OiipgClusterRunCmd.java:270)
  ... 7 more
  SEVERE: Remote 'UpdateNodeList' failed on nodes: 'racdb2'. Refer to '/u01/app/oraInventory/logs/UpdateNodeList2014-12-03_11-23-57PM.log' for details.
  It is recommended that the following command needs to be manually run on the failed nodes: 
   /u01/app/11.2.0/grid/oui/bin/runInstaller -updateNodeList -noClusterEnabled ORACLE_HOME=/u01/app/11.2.0/grid CLUSTER_NODES=racdb1,racdb2 CRS=true  "INVENTORY_LOCATION=/u01/app/oraInventory" LOCAL_NODE=
  Please refer 'UpdateNodeList' logs under central inventory of remote nodes where failure occurred for more details.
  
        因为racdb2重装了系统没有相对应的文件,所以执行不成功。此时打开  RAC 的节点和CRS 配置文件 inventory.xml 进行手工

        清除 racdb2 的节点。
        
(5):删除racdb2节点的VIP
    [root@racdb1 ~]# crs_stat -t
如果仍然有racdb2节点的VIP服务存在,执行如下:
[root@racdb1 ~]# srvctl stop vip -i ora.racdb2.vip -f
[root@racdb1 ~]# srvctl remove vip -i ora.racdb2.vip -f
[root@racdb1 ~]# crsctl delete resource ora.racdb2.vip -f
       (6):在任一保留的节点(racdb1)上删除racdb2节点
        [root@racdb1 ~]# crsctl delete node -n racdb2
        [root@racdb1 ~]# olsnodes -t -s

       (7):保留节点(racdb1)使用grid用户更新集群列表
        在所有保留的节点上执行:  
[grid@racdb1 ~]$ $ORACLE_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME “CLUSTER_NODES={racdb1}” CRS=true
        
        注:此时会报错,因为这是集群需要在racdb2上执行相关语句。报如下错误:
          SEVERE: oracle.sysman.oii.oiip.oiipg.OiipgRemoteOpsException: Error occured while trying to run Unix command /u01/app/11.2.0/grid/oui/bin/../bin/runInstaller  -paramFile /u01/app/11.2.0/grid/oui/bin/../clusterparam.ini  -silent -ignoreSysPrereqs -updateNodeList -noClusterEnabled ORACLE_HOME=/u01/app/11.2.0/grid CLUSTER_NODES=racdb1,racdb2 CRS=true  "INVENTORY_LOCATION=/u01/app/oraInventory" LOCAL_NODE=racdb2 -remoteInvocation -invokingNodeName racdb1 -logFilePath "/u01/app/oraInventory/logs" -timestamp 2014-12-03_11-23-57PM on nodes racdb2. [PRKC-1044 : Failed to check remote command execution setup for node racdb2 using shells /usr/bin/ssh and /usr/bin/rsh 
  File "/usr/bin/rsh" does not exist on node "racdb2"
  No RSA host key is known for racdb2 and you have requested strict checking.Host key verification failed.]
         at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmdOnUnix(OiipgClusterRunCmd.java:276)
         at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runAnyCmdOnNodes(OiipgClusterRunCmd.java:369)
  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmd(OiipgClusterRunCmd.java:314)
  at oracle.sysman.oii.oiic.OiicBaseInventoryApp.runRemoteInvOpCmd(OiicBaseInventoryApp.java:281)
  at oracle.sysman.oii.oiic.OiicUpdateNodeList.clsCmdUpdateNodeList(OiicUpdateNodeList.java:296)
  at oracle.sysman.oii.oiic.OiicUpdateNodeList.doOperation(OiicUpdateNodeList.java:240)
  at oracle.sysman.oii.oiic.OiicBaseInventoryApp.main_helper(OiicBaseInventoryApp.java:890)
  at oracle.sysman.oii.oiic.OiicUpdateNodeList.main(OiicUpdateNodeList.java:401)
  Caused by: oracle.ops.mgmt.cluster.ClusterException: PRKC-1044 : Failed to check remote command execution setup for node racdb2 using shells /usr/bin/ssh and /usr/bin/rsh 
  File "/usr/bin/rsh" does not exist on node "racdb2"
  No RSA host key is known for racdb2 and you have requested strict checking.Host key verification failed.
        at oracle.ops.mgmt.cluster.ClusterCmd.runCmd(ClusterCmd.java:2149)
  at oracle.sysman.oii.oiip.oiipg.OiipgClusterRunCmd.runCmdOnUnix(OiipgClusterRunCmd.java:270)
  ... 7 more

        SEVERE: Remote 'UpdateNodeList' failed on nodes: 'racdb2'. Refer to '/u01/app/oraInventory/logs/UpdateNodeList2014-12-03_11-23-57PM.log' for details.
  It is recommended that the following command needs to be manually run on the failed nodes: 
   /u01/app/11.2.0/grid/oui/bin/runInstaller -updateNodeList -noClusterEnabled ORACLE_HOME=/u01/app/11.2.0/grid CLUSTER_NODES=racdb1,racdb2 CRS=true  "INVENTORY_LOCATION=/u01/app/oraInventory" LOCAL_NODE=
  Please refer 'UpdateNodeList' logs under central inventory of remote nodes where failure occurred for more details.

         
          因为racdb2重装了系统没有相对应的文件,所以执行不成功。此时打开 racdb1上的 RAC 的节点和CRS 配置文件 inventory.xml 进行手工清除 racdb2 的CRS信息
        (8)验证racdb2节点被删除
        在任一保留的节点上:
        [grid@racdb1 ~]$ cluvfy stage -post nodedel -n racdb2
        [grid@racdb1 ~]$ crsctl status resource -t
验证racdb2节点被删除
查看活动的实例:
[oracle@racdb1 ~]$ sqlplus / as sysdba
SQL> select thread#,status,instance from v$thread;
    
至此集群中的节点2的信息完全清除完毕!

因为racdb2 的系统已经重装,所以在删除节点更新节点信息和集群信息时 会报错。此时可以手工修改,但是在安装时会有报错
提示racdb2没有清除干净
    


二:重新添加racdb2
    
  (1):添加相应的用户和组 添加相应的用户和组 添加相应的用户和组
              
  (2):配置 hosts hostshostshosts文件 ,新增节点和原有都配置为相同的 新增节点和原有都配置为相同

  (3):配置系统参数,用户参数和原有节点一样
  
  (4):创建相应的目录
  
  (5):检查racdb2是否满足rac安装条件(在已经有节点下面用grid,oracle用户执行)
              [root@racdb1 ~]# su - grid
   [grid@racdb1 ~]$ cluvfy stage -pre nodeadd -n racdb2 -fixup -verbose
   [grid@racdb1 ~]$ cluvfy stage -post hwos -n racdb2

  (6): 添加新节点的软件
   在已经有节点下面执行这个命令添加新节点的集群软件(grid用户执行)
  [root@racdb1 ~]# su - grid
  [grid@racdb1 ~]$ /u01/app/grid/product/11.2.0/grid/oui/bin/addNode.sh -silent "CLUSTER_NEW_NODES={racdb2}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={racdb2-vip}" "CLUSTER_NEW_PRIVATE_NODE_NAMES={racdb2-priv}" #用grid用户执行
   
注:因为在删除时节点已经重装系统,无法在节点2上执行相应的操作,我们是在节点1上手工清除的节点2的信息
        手工清除后,再进行检测时是清除干净的。但是在添加时会报如下错误
          Performing tests to see whether nodes racdb2,racdb2 are available
    ............................................................... 100% Done.

          Error ocurred while retrieving node numbers of the existing nodes. Please check if clusterware home is properly configured.
    SEVERE:Error ocurred while retrieving node numbers of the existing nodes. Please check if clusterware home is properly configured.
                  此时在官网上查到的解决办法是
                   
grid@racdb1  bin]$ ./detachHome.sh 
      Starting Oracle Universal Installer...

     Checking swap space: must be greater than 500 MB.   Actual 2986 MB    Passed
                   The inventory pointer is located at /etc/oraInst.loc
                   The inventory is located at /u01/app/oraInventory
                   'DetachHome' was successful.

                   [grid@racdb1 bin]$ ./attachHome.sh 
                   Starting Oracle Universal Installer...

                   Checking swap space: must be greater than 500 MB.   Actual 2986 MB    Passed
                   Preparing to launch Oracle Universal Installer from /tmp/OraInstall2010-06-01_08-53-48PM. Please wait ...[grid@racdb1 bin]$ The inventory pointer is located at /etc/oraInst.loc
                   The inventory is located at /u01/app/oraInventory
                    'AttachHome' was successful.
                 这两个步骤就是根据集群信息重新重建一下  inventory.xml
                 有的地方提示是要执行下面的脚本,个人感觉不能执行,因为若要执行会把racdb1的Cluster 删除掉,这对于集群是致命的!!
                  [grid@racdb1 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=$oracle_home "CLUSTER_NODES={racdb1}" -local
                   Starting Oracle Universal Installer...

                   Checking swap space: must be greater than 500 MB.   Actual 2671 MB    Passed
                    The inventory pointer is located at /etc/oraInst.loc
                    The inventory is located at /u01/app/oraInventory


  (7):运行提示的root.sh脚本
   /u01/app/oraInventory/orainstRoot.sh                                                                          #在新节点 racdb2用root用户执行 
   /u01/app/grid/product/11.2.0/grid/root.sh                                                                    #在新节点racdb2用root用户执行
  
  (8):验证集群软件是否添加成功
  [grid@racdb1 bin]$ cluvfy stage -post nodeadd -n racdb2 -verbose 


   (9).添加新节点数据库
  为新节点安装数据库软件(在已经有节点下用oracle用户执行)
  [root@racdb1 ~]# su - oracle
   [oracle@racdb1 ~]$ /app/oracle/product/11.2.0/db_1/oui/bin/addNode.sh -silent "CLUSTER_NEW_NODES={racdb2}     
  运行提示的root.sh脚本
      #在新节点 racdb2用root用户执行                                                                
     /app/oracle/product/11.2.0/db_1/root.sh                                                             

              注:在添加节点数据库时可能会遇到无法进行cp 但是也没有报错,此时可以直接把 oracle软件从racdb1 拷贝到 racdb2 上,直接拷贝的
                    不需要执行root.sh

         (10):添加实例
  [oracle@racdb1 ~]$ dbca
  或用命令行直接添加实例(在已经有节点下面用oracle用户执行)
  [oracle@racdb1 ~]$ dbca -silent -addInstance -nodeList racdb2  -gdbName orcl -instanceName orcldb2 -sysDBAUserName sys -sysDBAPassword "***"   在oracle用户下面执行 

              注:再添加完实例后,若是直接拷贝的数据库软件,新添加实例可能无法启动报如下错误:
                    
                    ORA-01078: failure in processing system parameters
                    ORA-01565: error in identifying file '+DATA1/orcl/spfileorcl.ora'
                    ORA-17503: ksfdopn:2 Failed to open file +DATA1/orcl/spfileorcl.ora
  报以上错误是因为直接拷贝的数据库软件有两个软件权限不对,执行以下语句修改权限
       cd $GRID_HOME/bin
                     chmod 6751 oracle

                     cd $ORACLE_HOME/bin
                     chmod 6751 oracle

  (11)验证已添加实例
   查看活动的实例:
  [oracle@racdb1 ~]$ sqlplus / as sysdba
   SQL> select thread#,status,instance from gv$thread; 

你可能感兴趣的:(11g R2 节点系统重建后,删除节点及添加节点 过程和问题解决)