Oracle Study之--Oracle 11g RAC故障(Failed to create or upgrade OLR)
系统环境:
操作系统: RedHat EL5.5
Cluster: Oracle Grid 11.2.0.1.0
Oracle: Oracle 11g 11.2.0.1.0
故障现象:
在Oracle 11gR2 RAC添加新节点的过程中,在new node上运行root.sh时出现以下故障:
[root@node3 install]# /u01/11.2.0/grid/root.sh
Running Oracle 11g root.sh script...
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying dbhome to /usr/local/bin ...
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2015-09-29 15:03:49: Parsing the host name
2015-09-29 15:03:49: Checking for super user privileges
2015-09-29 15:03:49: User has super user privileges
Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params
PROTL-16: Internal Error
Command return code of 41 (10496) from command: /u01/11.2.0/grid/bin/ocrconfig -local -upgrade grid oinstall
Failed to create or upgrade OLR
查看日志:
[root@node3 node3]# more altertnod3.log
Oracle Database 11g Clusterware Release 11.2.0.1.0 - Production Copyright 1996, 2009 Oracle. All ri
ghts reserved.
[client(7222)]CRS-10001:ACFS-9200: Supported
2015-09-29 18:02:06.259
[client(7429)]CRS-2106:The OLR location /u01/11.2.0/grid/cdata/node3.olr is inaccessible. Details i
n /u01/11.2.0/grid/log/node3/client/ocrconfig_7429.log.
2015-09-29 18:02:06.269
[client(7429)]CRS-2101:The OLR was formatted using version 3.
2015-09-29 18:02:12.436
[ohasd(7468)]CRS-2112:The OLR service started on node node3.
2015-09-29 18:02:12.750
[ohasd(7468)]CRS-2772:Server 'node3' has been assigned to pool 'Free'.
2015-09-29 18:02:37.305
[ohasd(7468)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running
).
2015-09-29 18:02:37.331
[gpnpd(8641)]CRS-2328:GPNPD started on node node3.
[root@node3 node3]# more /u01/11.2.0/grid/log/node3/client/ocrconfig_7429.log
Oracle Database 11g Clusterware Release 11.2.0.1.0 - Production Copyright 1996, 2009 Oracle. All ri
ghts reserved.
2015-09-29 18:02:06.229: [ OCRCONF][3046557376]ocrconfig starts...
2015-09-29 18:02:06.237: [ OCRCONF][3046557376]Upgrading OCR data
2015-09-29 18:02:06.257: [ OCROSD][3046557376]utread:3: Problem reading buffer 9245000 buflen 4096
retval 0 phy_offset 102400 retry 0
2015-09-29 18:02:06.257: [ OCROSD][3046557376]utread:3: Problem reading buffer 9245000 buflen 4096
retval 0 phy_offset 102400 retry 1
2015-09-29 18:02:06.257: [ OCROSD][3046557376]utread:3: Problem reading buffer 9245000 buflen 4096
retval 0 phy_offset 102400 retry 2
2015-09-29 18:02:06.257: [ OCROSD][3046557376]utread:3: Problem reading buffer 9245000 buflen 4096
retval 0 phy_offset 102400 retry 3
2015-09-29 18:02:06.257: [ OCROSD][3046557376]utread:3: Problem reading buffer 9245000 buflen 4096
retval 0 phy_offset 102400 retry 4
2015-09-29 18:02:06.257: [ OCROSD][3046557376]utread:3: Problem reading buffer 9245000 buflen 4096
retval 0 phy_offset 102400 retry 5
2015-09-29 18:02:06.257: [ OCRRAW][3046557376]propriogid:1_1: Failed to read the whole bootblock.
Assumes invalid format.
2015-09-29 18:02:06.257: [ OCRRAW][3046557376]proprioini: all disks are not OCR/OLR formatted
2015-09-29 18:02:06.257: [ OCRRAW][3046557376]proprinit: Could not open raw device
2015-09-29 18:02:06.257: [ default][3046557376]a_init:7!: Backend init unsuccessful : [26]
2015-09-29 18:02:06.257: [ OCRCONF][3046557376]Exporting OCR data to [OCRUPGRADEFILE]
2015-09-29 18:02:06.257: [ OCRAPI][3046557376]a_init:7!: Backend init unsuccessful : [33]
2015-09-29 18:02:06.257: [ OCRCONF][3046557376]There was no previous version of OCR. error:[PROCL-3
3: Oracle Local Registry is not configured]
2015-09-29 18:02:06.258: [ OCROSD][3046557376]utread:3: Problem reading buffer 9246000 buflen 4096
retval 0 phy_offset 102400 retry 0
问题分析 :
关于这个问题,Meatlink 上[ID 1068212.1]有关于这个问题的描述,同时也说明明了由bug 8670579所引起的,而且还是未公开的,说是不认识新的AMD芯片
,但是这次环境,主机使用的时Intel CPU,所以应该和AMD cpu无关。
再次检查新节点(node3)的系统环境,发现:
[oracle@node3 ~]$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 zcx1 localhost //这里127.0.0.1对应的主机名竟是node1的zcx1,因为此文件从node1上拷贝过来,忘了做修改,可能导致问题发生。
192.168.8.101 zcx1
192.168.8.103 zcx1-vip
10.10.10.101 zcx1-priv
192.168.8.102 zcx2
192.168.8.104 zcx2-vip
10.10.10.102 zcx2-priv
192.168.8.105 rac_scan
192.168.8.106 node3
192.168.8.107 node3-vip
10.10.10.106 node3-priv
修改hosts(所有node)文件:
127.0.0.1 localhost
192.168.8.101 zcx1
192.168.8.103 zcx1-vip
10.10.10.101 zcx1-priv
192.168.8.102 zcx2
192.168.8.104 zcx2-vip
10.10.10.102 zcx2-priv
192.168.8.105 rac_scan
192.168.8.106 node3
192.168.8.107 node3-vip
10.10.10.106 node3-priv
删除新节点grid的安装信息,并更新node信息,在原有节点上再次执行添加new node的脚本:
[grid@zcx1 ~]$ sh addnode.sh
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 1905 MB Passed
Checking monitor: must be configured to display at least 256 colors. Actual 16777216 Passed
Oracle Universal Installer, Version 11.2.0.1.0 Production
Copyright (C) 1999, 2009, Oracle. All rights reserved.
Performing tests to see whether nodes zcx2,node3 are available
............................................................... 100% Done.
-----------------------------------------------------------------------------
Cluster Node Addition Summary
Global Settings
Source: /u01/11.2.0/grid
New Nodes
Space Requirements
New Nodes
node3
/u01: Required 3.72GB : Available 25.25GB
Installed Products
Product Names
Oracle Grid Infrastructure 11.2.0.1.0
Sun JDK 1.5.0.17.0
Installer SDK Component 11.2.0.1.0
Oracle One-Off Patch Installer 11.2.0.0.2
Oracle Universal Installer 11.2.0.1.0
Oracle Configuration Manager Deconfiguration 10.3.1.0.0
Enterprise Manager Common Core Files 10.2.0.4.2
Oracle DBCA Deconfiguration 11.2.0.1.0
Oracle RAC Deconfiguration 11.2.0.1.0
Oracle Quality of Service Management (Server) 11.2.0.1.0
Installation Plugin Files 11.2.0.1.0
Universal Storage Manager Files 11.2.0.1.0
Oracle Text Required Support Files 11.2.0.1.0
Automatic Storage Management Assistant 11.2.0.1.0
Oracle Database 11g Multimedia Files 11.2.0.1.0
Oracle Multimedia Java Advanced Imaging 11.2.0.1.0
Oracle Globalization Support 11.2.0.1.0
Oracle Multimedia Locator RDBMS Files 11.2.0.1.0
Oracle Core Required Support Files 11.2.0.1.0
Bali Share 1.1.18.0.0
Oracle Database Deconfiguration 11.2.0.1.0
Oracle Quality of Service Management (Client) 11.2.0.1.0
Expat libraries 2.0.1.0.1
Oracle Containers for Java 11.2.0.1.0
Perl Modules 5.10.0.0.1
Secure Socket Layer 11.2.0.1.0
Oracle JDBC/OCI Instant Client 11.2.0.1.0
Oracle Multimedia Client Option 11.2.0.1.0
LDAP Required Support Files 11.2.0.1.0
Character Set Migration Utility 11.2.0.1.0
Perl Interpreter 5.10.0.0.1
PL/SQL Embedded Gateway 11.2.0.1.0
OLAP SQL Scripts 11.2.0.1.0
Database SQL Scripts 11.2.0.1.0
Oracle Extended Windowing Toolkit 3.4.47.0.0
SSL Required Support Files for InstantClient 11.2.0.1.0
SQL*Plus Files for Instant Client 11.2.0.1.0
Oracle Net Required Support Files 11.2.0.1.0
Oracle Database User Interface 2.2.13.0.0
RDBMS Required Support Files for Instant Client 11.2.0.1.0
Enterprise Manager Minimal Integration 11.2.0.1.0
XML Parser for Java 11.2.0.1.0
Oracle Security Developer Tools 11.2.0.1.0
Oracle Wallet Manager 11.2.0.1.0
Enterprise Manager plugin Common Files 11.2.0.1.0
Platform Required Support Files 11.2.0.1.0
Oracle JFC Extended Windowing Toolkit 4.2.36.0.0
RDBMS Required Support Files 11.2.0.1.0
Oracle Ice Browser 5.2.3.6.0
Oracle Help For Java 4.2.9.0.0
Enterprise Manager Common Files 10.2.0.4.2
Deinstallation Tool 11.2.0.1.0
Oracle Java Client 11.2.0.1.0
Cluster Verification Utility Files 11.2.0.1.0
Oracle Notification Service (eONS) 11.2.0.1.0
Oracle LDAP administration 11.2.0.1.0
Cluster Verification Utility Common Files 11.2.0.1.0
Oracle Clusterware RDBMS Files 11.2.0.1.0
Oracle Locale Builder 11.2.0.1.0
Oracle Globalization Support 11.2.0.1.0
Buildtools Common Files 11.2.0.1.0
Oracle RAC Required Support Files-HAS 11.2.0.1.0
SQL*Plus Required Support Files 11.2.0.1.0
XDK Required Support Files 11.2.0.1.0
Agent Required Support Files 10.2.0.4.2
Parser Generator Required Support Files 11.2.0.1.0
Precompiler Required Support Files 11.2.0.1.0
Installation Common Files 11.2.0.1.0
Required Support Files 11.2.0.1.0
Oracle JDBC/THIN Interfaces 11.2.0.1.0
Oracle Multimedia Locator 11.2.0.1.0
Oracle Multimedia 11.2.0.1.0
HAS Common Files 11.2.0.1.0
Assistant Common Files 11.2.0.1.0
PL/SQL 11.2.0.1.0
HAS Files for DB 11.2.0.1.0
Oracle Recovery Manager 11.2.0.1.0
Oracle Database Utilities 11.2.0.1.0
Oracle Notification Service 11.2.0.0.0
SQL*Plus 11.2.0.1.0
Oracle Netca Client 11.2.0.1.0
Oracle Net 11.2.0.1.0
Oracle JVM 11.2.0.1.0
Oracle Internet Directory Client 11.2.0.1.0
Oracle Net Listener 11.2.0.1.0
Cluster Ready Services Files 11.2.0.1.0
Oracle Database 11g 11.2.0.1.0
-----------------------------------------------------------------------------
Instantiating scripts for add node (Tuesday, September 29, 2015 5:51:51 PM CST)
. 1% Done.
Instantiation of add node scripts complete
Copying to remote nodes (Tuesday, September 29, 2015 5:51:53 PM CST)
............................................................................................... 96% Done.
Home copied to new nodes
Saving inventory on nodes (Tuesday, September 29, 2015 5:56:12 PM CST)
SEVERE:Remote 'AttachHome' failed on nodes: 'node3'. Refer to '/u01/app/oraInventory/logs/addNodeActions2015-09-29_05-51-42PM.log' for details.
You can manually re-run the following command on the failed nodes after the installation:
/u01/11.2.0/grid/oui/bin/runInstaller -attachHome -noClusterEnabled ORACLE_HOME=/u01/11.2.0/grid ORACLE_HOME_NAME=Ora11g_gridinfrahome1 CLUSTER_NODES=zcx1,zcx2,node3 CRS=true "INVENTORY_LOCATION=/u01/app/oraInventory" -invPtrLoc "/u01/11.2.0/grid/oraInst.loc" LOCAL_NODE=<node on which command is to be run>.
Please refer 'AttachHome' logs under central inventory of remote nodes where failure occurred for more details.
SEVERE:Remote 'UpdateNodeList' failed on nodes: 'node3'. Refer to '/u01/app/oraInventory/logs/addNodeActions2015-09-29_05-51-42PM.log' for details.
You can manually re-run the following command on the failed nodes after the installation:
/u01/11.2.0/grid/oui/bin/runInstaller -updateNodeList -noClusterEnabled ORACLE_HOME=/u01/11.2.0/grid CLUSTER_NODES=zcx1,zcx2,node3 CRS=true "INVENTORY_LOCATION=/u01/app/oraInventory" -invPtrLoc "/u01/11.2.0/grid/oraInst.loc" LOCAL_NODE=<node on which command is to be run>.
Please refer 'UpdateNodeList' logs under central inventory of remote nodes where failure occurred for more details.
. 100% Done.
Save inventory complete
WARNING:
The following configuration scripts need to be executed as the "root" user in each cluster node.
/u01/11.2.0/grid/root.sh #On nodes node3
To execute the configuration scripts:
1. Open a terminal window
2. Log in as "root"
3. Run the scripts in each cluster node
The Cluster Node Addition of /u01/11.2.0/grid was successful.
Please check '/tmp/silentInstall.log' for more details.
[grid@zcx1 ~]$
在new node上清除原来的grid的配置信息,然后重新运行root.sh script:
[grid@node3 ~]$ /u01/11.2.0/grid/oui/bin/runInstaller -updateNodeList -noClusterEnabled ORACLE_HOME=/u01/11.2.0/grid CLUSTER_NODES=zcx1,zcx2,node3 CRS=true "INVENTORY_LOCATION=/u01/app/oraInventory" -invPtrLoc "/u01/11.2.0/grid/oraInst.loc" LOCAL_NODE=node3
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 4095 MB Passed
The inventory pointer is located at /u01/11.2.0/grid/oraInst.loc
The inventory is located at /u01/app/oraInventory
[root@node3 ~]# /u01/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force
2015-09-29 18:01:23: Parsing the host name
2015-09-29 18:01:23: Checking for super user privileges
2015-09-29 18:01:23: User has super user privileges
Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params
Usage: srvctl <command> <object> [<options>]
commands: enable|disable|start|stop|status|add|remove|modify|getenv|setenv|unsetenv|config
objects: database|service|asm|diskgroup|listener|home|ons|eons
For detailed help on each command and object and its options use:
srvctl <command> -h or
srvctl <command> <object> -h
PRKO-2012 : nodeapps object is not supported in Oracle Restart
ACFS-9200: Supported
CRS-4046: Invalid Oracle Clusterware configuration.
CRS-4000: Command Stop failed, or completed with errors.
CRS-4046: Invalid Oracle Clusterware configuration.
CRS-4000: Command Stop failed, or completed with errors.
You must kill crs processes or reboot the system to properly
cleanup the processes started by Oracle clusterware
/bin/dd: opening `/u01/11.2.0/grid/cdata/node3.olr': No such file or directory
error: package cvuqdisk is not installed
Successfully deconfigured Oracle clusterware stack on this node
[root@node3 install]# perl roothas.pl -deconfig -force
2015-09-29 15:41:36: Checking for super user privileges
2015-09-29 15:41:36: User has super user privileges
2015-09-29 15:41:36: Parsing the host name
Using configuration parameter file: ./crsconfig_params
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Stop failed, or completed with errors.
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Delete failed, or completed with errors.
CRS-4047: No Oracle Clusterware components configured.
CRS-4000: Command Stop failed, or completed with errors.
You must kill ohasd processes or reboot the system to properly
cleanup the processes started by Oracle clusterware
ACFS-9200: Supported
acfsroot: ACFS-9313: No ADVM/ACFS installation detected.
Either /etc/oracle/olr.loc does not exist or is not readable
Make sure the file exists and it has read and execute access
/bin/dd: opening `': No such file or directory
Successfully deconfigured Oracle Restart stack
[root@node3 ~]# /u01/11.2.0/grid/root.sh
Running Oracle 11g root.sh script...
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying dbhome to /usr/local/bin ...
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n)
[n]: y
Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2015-09-29 18:02:04: Parsing the host name
2015-09-29 18:02:04: Checking for super user privileges
2015-09-29 18:02:04: User has super user privileges
Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node zcx1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
CRS-2672: Attempting to start 'ora.mdnsd' on 'node3'
CRS-2676: Start of 'ora.mdnsd' on 'node3' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'node3'
CRS-2676: Start of 'ora.gipcd' on 'node3' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'node3'
CRS-2676: Start of 'ora.gpnpd' on 'node3' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node3'
CRS-2676: Start of 'ora.cssdmonitor' on 'node3' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'node3'
CRS-2672: Attempting to start 'ora.diskmon' on 'node3'
CRS-2676: Start of 'ora.diskmon' on 'node3' succeeded
CRS-2676: Start of 'ora.cssd' on 'node3' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'node3'
CRS-2676: Start of 'ora.ctssd' on 'node3' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'node3'
CRS-2676: Start of 'ora.drivers.acfs' on 'node3' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'node3'
CRS-2676: Start of 'ora.asm' on 'node3' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'node3'
CRS-2676: Start of 'ora.crsd' on 'node3' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'node3'
CRS-2676: Start of 'ora.evmd' on 'node3' succeeded
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
node3 2015/09/29 18:05:39 /u01/11.2.0/grid/cdata/node3/backup_20150929_180539.olr
error: failed to stat /media/RHEL_5.5 i386 DVD: No such file or directory
Preparing packages for installation...
cvuqdisk-1.0.7-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
Updating inventory properties for clusterware
Starting Oracle Universal Installer...
Checking swap space: must be greater than 500 MB. Actual 4095 MB Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /u01/app/oraInventory
------脚本运行成功 !
检查CRS运行状态:
[root@node3 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@node3 ~]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DG1.dg ora....up.type ONLINE ONLINE node3
ora....ER.lsnr ora....er.type ONLINE ONLINE node3
ora....N1.lsnr ora....er.type ONLINE ONLINE zcx1
ora....VOTE.dg ora....up.type ONLINE ONLINE node3
ora.RCY1.dg ora....up.type ONLINE ONLINE node3
ora.asm ora.asm.type ONLINE ONLINE node3
ora.eons ora.eons.type ONLINE ONLINE node3
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE node3
ora....SM3.asm application ONLINE ONLINE node3
ora....E3.lsnr application ONLINE ONLINE node3
ora.node3.gsd application OFFLINE OFFLINE
ora.node3.ons application ONLINE ONLINE node3
ora.node3.vip ora....t1.type ONLINE ONLINE node3
ora.oc4j ora.oc4j.type OFFLINE OFFLINE
ora.ons ora.ons.type ONLINE ONLINE node3
ora.prod.db ora....se.type ONLINE ONLINE zcx1
ora....taf.svc ora....ce.type ONLINE ONLINE zcx1
ora....ry.acfs ora....fs.type ONLINE ONLINE node3
ora.scan1.vip ora....ip.type ONLINE ONLINE zcx1
ora....SM1.asm application ONLINE ONLINE zcx1
ora....X1.lsnr application ONLINE ONLINE zcx1
ora.zcx1.gsd application OFFLINE OFFLINE
ora.zcx1.ons application ONLINE ONLINE zcx1
ora.zcx1.vip ora....t1.type ONLINE ONLINE zcx1
ora....SM2.asm application ONLINE ONLINE zcx2
ora....X2.lsnr application ONLINE ONLINE zcx2
ora.zcx2.gsd application OFFLINE OFFLINE
ora.zcx2.ons application ONLINE ONLINE zcx2
ora.zcx2.vip ora....t1.type ONLINE ONLINE zcx2