记一次复杂的rac升级

本案例记录了crs升级,主机升级过程,期间感触颇多,经过几个小时的努力,所幸升级成功!这里我要说的是,如果应用上线的时候,能把数据库升级至10.2.0.3或者把主机版本升级至5306以上,就不会有如此故障出现。目前国内很多应用厂商不应该只管产品上线,而把“擦屁股”的活,交给客户来做。

一、问题诊断
某客户数据库自3月初以来频繁发生数据库重启现场,任意节点后台alert日志显示
ORA-27300: OS system dependent operation:invalid_process failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgpstime3
ORA-27144: attempt to kill process failed
经诊断,是由于Oracle 10.2.0.1在AIX 5305上bug导致,详见metalink ID 458442.1。Oracle提供解决方案只要升级操作系统即可,考虑到数据库版本过低,故决定先升级数据库至10.2.0.4,后升级操作系统至5309。
二、升级前准备
1、检查hosts文件配置
确保双节点检查/etc/hosts文件配置正确
127.0.0.1 localhost loopback # loopback (lo0) name/address

#public network
192.162.1.97 racdb01
192.162.1.98 racdb02

#virtual ip address
192.162.1.99 racdb01_vip
192.162.1.100 racdb02_vip

#private network used for rac
10.0.0.1 racdb01_priv
10.0.0.2 racdb02_priv
2、检查crs运行状况
确保双节点crs运行正常
oracle.orcl1@racdb01:/home/oracle$crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

oracle.orcl2@racdb02:/home/oracle$crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

3、检查ocr,voting盘权限
确保ocr,voting权限正常。
crw-rw-rw- 1 root dba 26, 6 Mar 13 22:31 ocr_disk2
crw-rw-rw- 1 root dba 26, 5 Mar 13 22:31 ocr_disk1
crw-rw-rw- 1 oracle dba 26, 8 Mar 13 22:31 voting_disk2
crw-rw-rw- 1 oracle dba 26, 9 Mar 13 22:31 voting_disk3
crw-rw-rw- 1 oracle dba 26, 7 Mar 13 22:31 voting_disk1
4、检查listener
由于Oracle 10.2.0.1 监听存在bug,检查双节点监听,确保监听运行正常。
oracle.orcl1@racdb01:/home/oracle$lsnrctl status LISTENER_RACDB01

LSNRCTL for IBM/AIX RISC System/6000: Version 10.2.0.1.0 - Production on 13-MAR-2010 21:54:41

Copyright (c) 1991, 2005, Oracle. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=racdb01_vip)(PORT=1521)(IP=FIRST)))
STATUS of the LISTENER
------------------------
Alias LISTENER_RACDB01
Version TNSLSNR for IBM/AIX RISC System/6000: Version 10.2.0.1.0 - Production
Start Date 14-JAN-2010 08:44:52
Uptime 58 days 13 hr. 9 min. 49 sec
Trace Level off
Security ON: Local OS Authentication
SNMP ON
Listener Parameter File /oracle/products/rdbms/network/admin/listener.ora
Listener Log File /oracle/products/rdbms/network/log/listener_racdb01.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.162.1.99)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.162.1.97)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...
Service "+ASM_XPT" has 1 instance(s).
Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "orcl" has 2 instance(s).
Instance "orcl1", status READY, has 2 handler(s) for this service...
Instance "orcl2", status READY, has 1 handler(s) for this service...
Service "orclXDB" has 2 instance(s).
Instance "orcl1", status READY, has 1 handler(s) for this service...
Instance "orcl2", status READY, has 1 handler(s) for this service...
Service "orcl_XPT" has 2 instance(s).
Instance "orcl1", status READY, has 2 handler(s) for this service...
Instance "orcl2", status READY, has 1 handler(s) for this service...
The command completed successfully

oracle.orcl2@racdb02:/home/oracle$lsnrctl status LISTENER_RACDB02

LSNRCTL for IBM/AIX RISC System/6000: Version 10.2.0.1.0 - Production on 13-MAR-2010 21:53:55

Copyright (c) 1991, 2005, Oracle. All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=racdb02_vip)(PORT=1521)(IP=FIRST)))
STATUS of the LISTENER
------------------------
Alias LISTENER_RACDB02
Version TNSLSNR for IBM/AIX RISC System/6000: Version 10.2.0.1.0 - Production
Start Date 14-JAN-2010 01:02:53
Uptime 58 days 20 hr. 51 min. 1 sec
Trace Level off
Security ON: Local OS Authentication
SNMP ON
Listener Parameter File /oracle/products/rdbms/network/admin/listener.ora
Listener Log File /oracle/products/rdbms/network/log/listener_racdb02.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.162.1.100)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.162.1.98)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM2", status BLOCKED, has 1 handler(s) for this service...
Service "+ASM_XPT" has 1 instance(s).
Instance "+ASM2", status BLOCKED, has 1 handler(s) for this service...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "orcl" has 2 instance(s).
Instance "orcl1", status READY, has 1 handler(s) for this service...
Instance "orcl2", status READY, has 2 handler(s) for this service...
Service "orclXDB" has 2 instance(s).
Instance "orcl1", status READY, has 1 handler(s) for this service...
Instance "orcl2", status READY, has 1 handler(s) for this service...
Service "orcl_XPT" has 2 instance(s).
Instance "orcl1", status READY, has 1 handler(s) for this service...
Instance "orcl2", status READY, has 2 handler(s) for this
5、确定Oracle Time Zone
由于Oracle升级会带来Time Zone变化,需要事先检查其状况
SQL> select version from v$timezone_file;

VERSION
----------
2

SQL> select c.owner || '.' || c.table_name || '(' || c.column_name || ') -' || c.data_type || ' ' col
2 from dba_tab_cols c, dba_objects o
3 where c.data_type like '%TIME ZONE'
4 and c.owner=o.owner
5 and c.table_name = o.object_name
6 and o.object_type = 'TABLE'
7 order by col
8 /

COL
--------------------------------------------------------------------------------
SYS.ALERT_QT(SYS_NC00029$) -TIMESTAMP(0) WITH TIME ZONE
SYS.FGR$_FILE_GROUPS(CREATION_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.FGR$_FILE_GROUP_FILES(CREATION_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.FGR$_FILE_GROUP_VERSIONS(CREATION_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.OPTSTAT_HIST_CONTROL$(SPARE6) -TIMESTAMP(6) WITH TIME ZONE
SYS.OPTSTAT_HIST_CONTROL$(SVAL2) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_EVENT_LOG(LOG_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_EVENT_QTAB(SYS_NC00032$) -TIMESTAMP(0) WITH TIME ZONE
SYS.SCHEDULER$_EVENT_QTAB(SYS_NC00044$) -TIMESTAMP(0) WITH TIME ZONE
SYS.SCHEDULER$_EVENT_QTAB(SYS_NC00045$) -TIMESTAMP(0) WITH TIME ZONE
SYS.SCHEDULER$_GLOBAL_ATTRIBUTE(ATTR_TSTAMP) -TIMESTAMP(6) WITH TIME ZONE

COL
--------------------------------------------------------------------------------
SYS.SCHEDULER$_JOB(END_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_JOB(LAST_ENABLED_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_JOB(LAST_END_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_JOB(LAST_START_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_JOB(NEXT_RUN_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_JOB(START_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_JOB_RUN_DETAILS(LOG_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_JOB_RUN_DETAILS(REQ_START_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_JOB_RUN_DETAILS(START_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_SCHEDULE(END_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_SCHEDULE(REFERENCE_DATE) -TIMESTAMP(6) WITH TIME ZONE

COL
--------------------------------------------------------------------------------
SYS.SCHEDULER$_STEP_STATE(END_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_STEP_STATE(START_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_WINDOW(ACTUAL_START_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_WINDOW(END_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_WINDOW(LAST_START_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_WINDOW(MANUAL_OPEN_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_WINDOW(NEXT_START_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_WINDOW(START_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_WINDOW_DETAILS(LOG_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_WINDOW_DETAILS(REQ_START_DATE) -TIMESTAMP(6) WITH TIME ZONE
SYS.SCHEDULER$_WINDOW_DETAILS(START_DATE) -TIMESTAMP(6) WITH TIME ZONE

COL
--------------------------------------------------------------------------------
SYS.TSM_DST$(DST_END_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.TSM_DST$(DST_INST_START_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.TSM_DST$(DST_START_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.TSM_SRC$(SRC_END_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.TSM_SRC$(SRC_INST_START_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.TSM_SRC$(SRC_START_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_ALERT_HISTORY(CREATION_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_ALERT_HISTORY(TIME_SUGGESTED) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_ALERT_OUTSTANDING(CREATION_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_ALERT_OUTSTANDING(TIME_SUGGESTED) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_OPTSTAT_AUX_HISTORY(SAVTIME) -TIMESTAMP(6) WITH TIME ZONE

COL
--------------------------------------------------------------------------------
SYS.WRI$_OPTSTAT_AUX_HISTORY(SPARE6) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_OPTSTAT_HISTGRM_HISTORY(SAVTIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_OPTSTAT_HISTGRM_HISTORY(SPARE6) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY(SAVTIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY(SPARE6) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_OPTSTAT_IND_HISTORY(SAVTIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_OPTSTAT_IND_HISTORY(SPARE6) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_OPTSTAT_OPR(END_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_OPTSTAT_OPR(SPARE6) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_OPTSTAT_OPR(START_TIME) -TIMESTAMP(6) WITH TIME ZONE
SYS.WRI$_OPTSTAT_TAB_HISTORY(SAVTIME) -TIMESTAMP(6) WITH TIME ZONE

COL
--------------------------------------------------------------------------------
SYS.WRI$_OPTSTAT_TAB_HISTORY(SPARE6) -TIMESTAMP(6) WITH TIME ZONE
WMSYS.WM$VERSIONED_TABLES(SYS_NC00020$) -TIMESTAMP(0) WITH TIME ZONE
WMSYS.WM$VERSIONED_TABLES(SYS_NC00021$) -TIMESTAMP(0) WITH TIME ZONE

58 rows selected.

SQL> SELECT object_name FROM dba_objects WHERE object_id IN (SELECT obj# FROM scheduler$_window);

OBJECT_NAME
--------------------------------------------------------------------------------
WEEKNIGHT_WINDOW
WEEKEND_WINDOW

SQL> SELECT object_name FROM dba_objects WHERE object_id IN (SELECT obj# FROM scheduler$_job);

OBJECT_NAME
--------------------------------------------------------------------------------
PURGE_LOG
FGR$AUTOPURGE_JOB
AUTO_SPACE_ADVISOR_JOB
RLM$EVTCLEANUP
RLM$SCHDNEGACTION

6、重启crs
重启双节点crs和rdbms,确保crs和rdbms运行正常。
在一节点停止双节点crs
oracle.orcl1@racdb01:/home/oracle$srvctl stop service -d orcl
oracle.orcl1@racdb01:/home/oracle$srvctl stop asm -n racdb01
oracle.orcl1@racdb01:/home/oracle$srvctl stop asm -n racdb02
oracle.orcl1@racdb01:/home/oracle$srvctl stop nodeapps -n racdb01
oracle.orcl1@racdb01:/home/oracle$srvctl stop nodeapps -n racdb02

双节点停止crs,注意需要root权限执行。
oracle.orcl1@racdb01:/home/oracle$crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

oracle.orcl2@racdb02:/home/oracle$crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

oracle.orcl1@racdb01:/home/oracle$crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

双节点启动crs,注意需要root权限执行。
oracle.orcl1@racdb01:/home/oracle$crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly

oracle.orcl2@racdb02:/home/oracle$crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly

由于10.2.0.1存在bug,二节点启动alert日志显示如下,此过程持续5分钟左右,期间会将所有事务挂起。
ALTER DATABASE OPEN
Picked broadcast on commit scheme to generate SCNs
7、备份归档日志
为防止意外发生,进行数据库备份。由于存在全量备份,只要增量备份归档日志即可。
RMAN> backup archivelog all format '/rman/mcb_%d__%s_%p_%T.bkp';
。。。
channel ORA_DISK_1: starting archive log backupset
channel ORA_DISK_1: specifying archive log(s) in backup set
input archive log thread=1 sequence=32493 recid=127642 stamp=713571037
input archive log thread=1 sequence=32494 recid=127643 stamp=713571868
input archive log thread=1 sequence=32495 recid=127646 stamp=713571946
input archive log thread=2 sequence=31557 recid=127641 stamp=713571035
input archive log thread=2 sequence=31558 recid=127644 stamp=713571854
input archive log thread=2 sequence=31559 recid=127645 stamp=713571927
channel ORA_DISK_1: starting piece 1 at 13-MAR-10
channel ORA_DISK_1: finished piece 1 at 13-MAR-10
piece handle=/rman/mcb_ORCL__42373_1_20100313.bkp tag=TAG20100313T222538 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 13-MAR-10

Starting Control File and SPFILE Autobackup at 13-MAR-10
piece handle=+ASM_RECO/orcl/autobackup/2010_03_13/s_713571940.980.713571941 comment=NONE
Finished Control File and SPFILE Autobackup at 13-MAR-10
8、确认数据库闪回是否打开
一旦发生意外,启用闪回功能,即可回到意外发生之前状态。
SQL> select flashback_on from v$database;
FLASHBACK_ON
------------------
YES

9、添加环境变量
双节点oracle用户下添加环境变量
export PATH=$PATH:/home/oracle/scripts:$ORACLE_HOME/OPatch

10、检查rsh互通性
由于在升级过程节点之间需要拷贝软件,必须确保双节点root用户,oracle用户rsh互通。
11、检查inventory.xml
确定CRS_HOME和ORACLE_HOME。
oracle.orcl1@racdb01:/oracle/oraInventory/ContentsXML$more inventory.xml





10.2.0.1.0
2.1.0.6.0
















12、检查小补丁
双节点检查小布丁应用情况
oracle.orcl2@racdb02:/home/oracle$opatch lsinventory -all
Invoking OPatch 10.2.0.1.0

Oracle interim Patch Installer version 10.2.0.1.0
Copyright (c) 2005, Oracle Corporation. All rights reserved..


Oracle Home : /oracle/products/rdbms
Central Inventory : /oracle/oraInventory
from : /oracle/products/rdbms/oraInst.loc
OPatch version : 10.2.0.1.0
OUI version : 10.2.0.1.0
OUI location : /oracle/products/rdbms/oui
Log file location : /oracle/products/rdbms/cfgtoollogs/opatch/opatch-00_Mar_13_22-55-57-GMT-08_Sat.log

Lsinventory Output file location : /oracle/products/rdbms/cfgtoollogs/opatch/lsinv/lsinventory-00_Mar_13_22-55-57-GMT-08_Sat.txt

--------------------------------------------------------------------------------
List of Oracle Homes:
Name Location
OUIHomeCRS /oracle/products/crs
OUIHomedb10g /oracle/products/rdbms

Installed Top-level Products (1):

Oracle Database 10g 10.2.0.1.0
There are 1 products installed in this Oracle Home.


Interim patches (2) :

Patch 4632780 : applied on Wed Oct 29 13:50:27 GMT-08:00 2008
Created on 29 Dec 2005, 02:24:12 hrs US/Pacific
Bugs fixed:
4632780

Patch 4861123 : applied on Wed Oct 29 13:47:28 GMT-08:00 2008
Created on 9 Dec 2005, 21:51:18 hrs US/Pacific
Bugs fixed:
4518443


Rac system comprising of multiple nodes
Local node = racdb02
Remote node = racdb01

--------------------------------------------------------------------------------

OPatch succeeded.

13、更改crs用户属性
双节点更改crs用户属性
#chuser capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE oracle

14、备份ocr和voting盘
停止crs,rdbms各种服务和进程
备份ocr盘
oracle.orcl1@racdb01:/home/oracle$ocrconfig -showbackup

racdb02 2010/03/13 16:44:09 /oracle/products/crs/cdata/crs

racdb02 2010/03/13 12:44:09 /oracle/products/crs/cdata/crs

racdb02 2010/03/13 08:44:09 /oracle/products/crs/cdata/crs

racdb02 2010/03/12 00:44:07 /oracle/products/crs/cdata/crs

racdb02 2010/03/05 00:43:59 /oracle/products/crs/cdata/crs

oracle.orcl2@racdb02:/home/oracle$ocrconfig -export /rman/ocr.bak

备份voting盘
oracle.orcl1@racdb01:/home/oracle$crsctl query css votedisk
0. 0 /dev/voting_disk1
1. 0 /dev/voting_disk2
2. 0 /dev/voting_disk3

located 3 votedisk(s).

oracle.orcl2@racdb02:/rman$dd if=/dev/voting_disk1 of=/rman/voting.bak
2097152+0 records in.
2097152+0 records out.
15、备份crs和rdbms软件
双节点备份crs和rdbms软件
oracle.orcl1@racdb01:/oracle$tar -cf products.tar products
oracle.orcl1@racdb01:/oracle$tar -cf oraInventory.tar oraInventory
oracle.orcl2@racdb02:/oracle$tar -cf products.tar products
oracle.orcl2@racdb02:/oracle$tar -cf oraInventory.tar oraInventory
16、检查xlc版本和补丁IY73570
双节点检查xlc版本和补丁IY73570应用情况,注意xlc版本必须高于8.0
oracle.orcl1@racdb01:/oracle/products/crs/log/racdb01/cssd$lslpp -l |grep -i xlc
xlC.adt.include 8.0.0.0 COMMITTED C Set ++ Application
xlC.aix50.rte 8.0.0.0 COMMITTED C Set ++ Runtime for AIX 5.0
xlC.cpp 6.0.0.0 COMMITTED C for AIX Preprocessor
xlC.msg.en_US.cpp 6.0.0.0 COMMITTED C for AIX Preprocessor
xlC.msg.en_US.rte 8.0.0.0 COMMITTED C Set ++ Runtime
xlC.rte 8.0.0.0 COMMITTED C Set ++ Runtime

oracle.orcl1@racdb01:/oracle/products/crs/log/racdb01/cssd$/usr/sbin/instfix -a -ivk IY73570
IY73570 Abstract: Aug 2005 IBM C++ Runtime Environment for AIX

IY73570 Symptom Text:


----------------------------
Fileset xlC.aix50.rte:7.0.0.5 is applied on the system.
All filesets for IY73570 were found.
当xlc版本低于8.0时,比如6.0,未安装补丁IY73570,crs安装最后执行root.sh时,报如下错误:
oracle.orcl1@racdb01:/oracle/products/crs/install$./root102.sh
Error : Please change the CRS_ORACLE_USER id to have the following OS capabilities :
< CAP_PROPAGATE CAP_BYPASS_RAC_VMM CAP_NUMA_ATTACH >
by running command then 'crsctl stop crs' and 'crsctl start crs'.
The CSS daemon, ocssd.bin, will not run in realtime without this change
Oracle strongly recommends that the CSS daemon run realtime
Creating pre-patch directory for saving pre-patch clusterware files
Completed patching clusterware files to /oracle/products/crs
Relinking some shared libraries.
Relinking of patched files is complete.
WARNING: directory '/oracle/products' is not owned by root
WARNING: directory '/oracle' is not owned by root
Preparing to recopy patched init and RC scripts.
Recopying init and RC scripts.
Startup will be queued to init within 30 seconds.
exec(): 0509-036 Cannot load program crsctl.bin because of the following errors:
0509-130 Symbol resolution failed for crsctl.bin because:
0509-136 Symbol __ct__Q2_3std8_LocinfoFPCci (number 176) is not exported from
dependent module /usr/lib/libC.a[ansi_64.o].
0509-192 Examine .loader section symbols with the
'dump -Tv' command.


三、CRS升级
本次crs升级采用Non Rolling Upgrade,即采用2个节点同时升级的方式。

1、执行slibclean
注意双节点root用户执行
/usr/sbin/slibclean

2、升级crs
根据图形界面指示依次next。
升级至最后,图形界面显示:
The installer has detected that your Cluster Ready Services (CRS) installation is distributed across the following nodes:

racdb01
racdb02

Because the software consists of local identical copies distributed across each of the nodes in the cluster, it is possible to patch your CRS installation in a rolling manner, one node at a time.

To complete the installation of this patchset, you must perform the following tasks on each node:

1. Log in as the root user.
2. As the root user, perform the following tasks:

a. Shutdown the CRS daemons by issuing the following command:
/oracle/products/crs/bin/crsctl stop crs
b. Run the shell script located at:
/oracle/products/crs/install/root102.sh
This script will automatically start the CRS daemons on the
patched node upon completion.

3. After completing this procedure, proceed to the next node and repeat.
双节点停止crs
oracle.orcl1@racdb01:/home/oracle$crsctl stop crs
Stopping resources.
Error while stopping resources. Possible cause: CRSD is down.
Stopping CSSD.
Unable to communicate with the CSS daemon
双节点执行/root102.sh
oracle.orcl2@racdb02:/oracle/products/crs/install$./root102.sh
Error : Please change the CRS_ORACLE_USER id to have the following OS capabilities :
< CAP_PROPAGATE CAP_BYPASS_RAC_VMM CAP_NUMA_ATTACH >
by running command then 'crsctl stop crs' and 'crsctl start crs'.
The CSS daemon, ocssd.bin, will not run in realtime without this change
Oracle strongly recommends that the CSS daemon run realtime
Creating pre-patch directory for saving pre-patch clusterware files
Completed patching clusterware files to /oracle/products/crs
Relinking some shared libraries.
Relinking of patched files is complete.
WARNING: directory '/oracle/products' is not owned by root
WARNING: directory '/oracle' is not owned by root
Preparing to recopy patched init and RC scripts.
Recopying init and RC scripts.
Startup will be queued to init within 30 seconds.
Starting up the CRS daemons.
Adding daemons to inittab
Waiting for the patched CRS daemons to start.
This may take a while on some systems.
.
10204 patch successfully applied.
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 2: racdb02 racdb02_priv racdb02
Creating OCR keys for user 'root', privgrp 'system'..
Operation successful.
clscfg -upgrade completed successfully

四、升级rdbms
1、执行root.sh
升级完成之后双节点root权限执行root.sh,结果显示如下:
oracle.orcl1@racdb01:/oracle/products/rdbms/oui/bin$/oracle/products/rdbms/root.sh
Running Oracle10 root.sh script...

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /oracle/products/rdbms

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y
Copying dbhome to /usr/local/bin ...
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y
Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y
Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.

oracle.orcl2@racdb02:/oracle/products/crs/install$/oracle/products/rdbms/root.sh
Running Oracle10 root.sh script...

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /oracle/products/rdbms

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y
Copying dbhome to /usr/local/bin ...
The file "oraenv" already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y
Copying oraenv to /usr/local/bin ...
The file "coraenv" already exists in /usr/local/bin. Overwrite it? (y/n) [n]: y
Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed
2、刷新数据字典
SQL> STARTUP UPGRADE
SQL> SPOOL patch.log
SQL> @?/rdbms/admin/catupgrd.sql
SQL> SPOOL OFF
期间出现如下问题,
SQL> alter session set current_schema="SYS";

Session altered.

SQL>
SQL> REM recompile invalid ORDSYS types
SQL> execute DBMS_SESSION.reset_package;

PL/SQL procedure successfully completed.

SQL> execute utl_recomp.recomp_serial('ORDSYS');
BEGIN utl_recomp.recomp_serial('ORDSYS'); END;

*
ERROR at line 1:
ORA-03113: end-of-file on communication channel


SQL>
SQL> select o.name, o.status, o.type# from sys.obj$ o, sys.user$ u
2 WHERE o.status > 1
3 AND (u.name = 'ORDSYS'
4 OR u.name = 'ORDPLUGINS'
5 OR u.name = 'SI_INFORMTN_SCHEMA')
6 AND u.user# = o.owner#
7 AND rownum < 50;
ERROR:
ORA-03114: not connected to ORACLE

alert日志显示:
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool","oracle/OracleStringSYS","joxlod exec hp","SGAClass")
Sun Mar 14 02:09:06 2010
Trace dumping is performing id=[cdmp_20100314020906]
Sun Mar 14 02:09:13 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool","oracle/aurora/rdbms/DbmsJavaSYS","joxlod exec hp","SGAClass")
Sun Mar 14 02:09:14 2010
Trace dumping is performing id=[cdmp_20100314020914]
Sun Mar 14 02:09:27 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool","oracle/aurora/rdbms/DbmsJavaSYS","joxlod exec hp","SGAClass")
Sun Mar 14 02:09:29 2010
SERVER COMPONENT id=JAVAVM: status=LOADING, version=10.2.0.1.0, timestamp=2010-03-14 02:09:29
Sun Mar 14 02:09:29 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool","java/lang/StringSYS","joxlod exec hp","SGAClass")
Sun Mar 14 02:09:29 2010
Trace dumping is performing id=[cdmp_20100314020929]
Sun Mar 14 02:09:30 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool","java/lang/StringSYS","joxlod exec hp","SGAClass")
Sun Mar 14 02:09:32 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool","java/lang/StringSYS","joxlod exec hp","SGAClass")
Sun Mar 14 02:09:33 2010
Trace dumping is performing id=[cdmp_20100314020933]
Sun Mar 14 02:09:33 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-04031: unable to allocate 4096 bytes of shared memory ("java pool","java/lang/StringSYS","joxlod exec hp","SGAClass")
Sun Mar 14 02:09:35 2010
SERVER COMPONENT id=XML: status=VALID, version=10.2.0.4.0, timestamp=2010-03-14 02:09:35
SERVER COMPONENT id=CATJAVA: status=LOADING, version=10.2.0.1.0, timestamp=2010-03-14 02:09:35

Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-00600: internal error code, arguments: [26599], [1], [190], [], [], [], [], []
Sun Mar 14 02:15:15 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-00600: internal error code, arguments: [26599], [1], [190], [], [], [], [], []
Sun Mar 14 02:15:17 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-00600: internal error code, arguments: [26599], [1], [190], [], [], [], [], []
Sun Mar 14 02:15:20 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-00600: internal error code, arguments: [26599], [1], [190], [], [], [], [], []
Sun Mar 14 02:15:22 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-00600: internal error code, arguments: [26599], [1], [190], [], [], [], [], []
Sun Mar 14 02:15:24 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-00600: internal error code, arguments: [26599], [1], [190], [], [], [], [], []
Sun Mar 14 02:15:27 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-00600: internal error code, arguments: [26599], [1], [190], [], [], [], [], []
Sun Mar 14 02:15:29 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-00600: internal error code, arguments: [26599], [1], [190], [], [], [], [], []
Sun Mar 14 02:15:32 2010
Errors in file /oracle/admin/orcl/udump/orcl1_ora_790752.trc:
ORA-00600: internal error code, arguments: [26599], [1], [190], [], [], [], [], []
经诊断,由Oracle 10.2.0.1 SGA自动分配内存bug所致,解决办法将sga_target=0,java_pool_size=300M。

3、执行changePerm.sh
双节点执行changePerm.sh
oracle.orcl2@racdb02:/oracle/products/rdbms/install$./changePerm.sh

-------------------------------------------------------------------------------
Disclaimer: The purpose of this script is to relax permissions on some of the
files in the database Oracle Home so that all clients can access them.
Please note that Oracle Corporation recommends using the most restrictive file
permissions as possible for your given implementation. Running this script
should be done only after considering all security ramifications.
-------------------------------------------------------------------------------

-n Do you wish to continue (y/n) [n]:
y
Spooling the error log /tmp/changePerm_err.log...
Finished running the script successfully
4、升级OEM
oracle.orcl1@racdb01:/oracle/products/rdbms/install$emca -upgrade db -cluster

STARTED EMCA at Mar 14, 2010 3:55:10 AM
EM Configuration Assistant, Version 10.2.0.1.0 Production
Copyright (c) 2003, 2005, Oracle. All rights reserved.

Enter the following information:
ORACLE_HOME for the database to be upgraded: /oracle/products/rdbms
Database unique name: orcl
Listener port number: 1521

Do you wish to continue? [yes(Y)/no(N)]: Y
Mar 14, 2010 3:55:58 AM oracle.sysman.emcp.EMConfig perform
INFO: This operation is being logged at /oracle/products/rdbms/cfgtoollogs/emca/orcl/emca_2010-03-14_03-55-10-AM.log.
Mar 14, 2010 3:56:00 AM oracle.sysman.emcp.util.DBControlUtil stopOMS
INFO: Stopping Database Control (this may take a while) ...
Mar 14, 2010 3:56:11 AM oracle.sysman.emcp.EMDBCConfig instantiateOC4JConfigFiles
INFO: Propagating /oracle/products/rdbms/oc4j/j2ee/OC4J_DBConsole_racdb01_orcl1 to remote nodes ...
Mar 14, 2010 3:56:13 AM oracle.sysman.emcp.EMDBCConfig instantiateOC4JConfigFiles
INFO: Propagating /oracle/products/rdbms/oc4j/j2ee/OC4J_DBConsole_racdb02_orcl2 to remote nodes ...
Mar 14, 2010 3:57:50 AM oracle.sysman.emcp.util.DBControlUtil secureDBConsole
INFO: Securing Database Control (this may take a while) ...
Mar 14, 2010 3:59:08 AM oracle.sysman.emcp.util.DBControlUtil startOMS
INFO: Starting Database Control (this may take a while) ...
Mar 14, 2010 4:00:55 AM oracle.sysman.emcp.EMDBPostConfig performUpgrade
INFO: Database Control started successfully
Mar 14, 2010 4:00:55 AM oracle.sysman.emcp.EMDBPostConfig performUpgrade
INFO: >>>>>>>>>>> The Database Control URL is https://racdb01:1158/em <<<<<<<<<<<
Mar 14, 2010 4:00:55 AM oracle.sysman.emcp.EMDBPostConfig showClusterDBCAgentMessage
INFO:
**************** Current Configuration ****************
INSTANCE NODE DBCONTROL_UPLOAD_HOST
---------- ---------- ---------------------

orcl1 racdb01 racdb01
orcl2 racdb02 racdb01
五、操作系统升级数据后续操作步骤
1、编译rdbms
双节点编译rdbms软件
oracle.orcl1@racdb01:/oracle/products/rdbms/bin$relink all
2、编译crs
oracle.orcl1@racdb01:/oracle/products/crs/bin$./genclntsh
3、重新配置oem
主机升级完成后oem后台日志显示
2010-03-16 08:28:49 Thread-1 WARN http: snmehl_connect: connect failed to (racdb01:3938): A remote host refused an attempted connect operation. (error = 79)
2010-03-16 08:28:55 Thread-1 WARN http: snmehl_connect: connect failed to (racdb01:3938): A remote host refused an attempted connect operation. (error = 79)
2010-03-16 08:29:01 Thread-1 WARN http: snmehl_connect: connect failed to (racdb01:3938): A remote host refused an attempted connect operation. (error = 79)
2010-03-16 08:29:08 Thread-1 WARN http: snmehl_connect: connect failed to (racdb01:3938): A remote host refused an attempted connect operation. (error = 79)
2010-03-16 08:29:14 Thread-1 WARN http: snmehl_connect: connect failed to (racdb01:3938): A remote host refused an attempted connect operation. (error = 79)
2010-03-16 08:29:20 Thread-1 WARN http: snmehl_connect: connect failed to (racdb01:3938): A remote host refused an attempted connect operation. (error = 79)
2010-03-16 08:29:26 Thread-1 WARN http: snmehl_connect: connect failed to (racdb01:3938): A remote host refused an attempted connect operation. (error = 79)
2010-03-16 08:29:32 Thread-1 WARN http: snmehl_connect: connect failed to (racdb01:3938): A remote host refused an attempted connect operation. (error = 79)
经诊断是由于主机升级之后时区变化所致
1)停止dbconsole
oracle.orcl1@racdb01:/home/oracle$emctl stop dbconsole
Oracle Enterprise Manager 10g Database Control Release 10.2.0.4.0
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
https://racdb01:1158/em/console/aboutApplication
Stopping Oracle Enterprise Manager 10g Database Control ...
... Stopped.
2)重新设置时区
oracle.orcl1@racdb01:/home/oracle$export TZ=BEIST-8
oracle.orcl1@racdb01:/home/oracle$emctl resetTZ agent
Oracle Enterprise Manager 10g Database Control Release 10.2.0.4.0
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
Updating /oracle/products/rdbms/racdb01_orcl1/sysman/config/emd.properties...
Time zone set to +08:00.

To complete this process, you must either:

connect to the database served by this DBConsole as user 'sysman', and execute:

SQL> exec mgmt_target.set_agent_tzrgn('racdb01:3938','+08:00')

-- or --

connect to the database served by this DBConsole as user 'sys', and execute:

SQL> alter session set current_schema = SYSMAN;
SQL> exec mgmt_target.set_agent_tzrgn('racdb01:3938','+08:00')

oracle.orcl1@racdb01:/home/oracle$sqlplus "/as sysdba"

SQL*Plus: Release 10.2.0.4.0 - Production on Tue Mar 16 08:22:50 2010

Copyright (c) 1982, 2007, Oracle. All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> alter session set current_schema = SYSMAN;

Session altered.

SQL> exec mgmt_target.set_agent_tzrgn('racdb01:3938','+08:00')

PL/SQL procedure successfully completed.

SQL> exit
3)启动dbconsole
oracle.orcl1@racdb01:/home/oracle$emctl start dbconsole
Oracle Enterprise Manager 10g Database Control Release 10.2.0.4.0
Copyright (c) 1996, 2007 Oracle Corporation. All rights reserved.
https://racdb01:1158/em/console/aboutApplication
Starting Oracle Enterprise Manager 10g Database Control .................. started.
------------------------------------------------------------------
Logs are generated in directory /oracle/products/rdbms/racdb01_orcl1/sysman/log

你可能感兴趣的:(ORACLE管理)