今天检查数据库中的备份输出脚本时,发现RMAN备份出现了错误。
继续清除数据库中僵死的会话。
经过前面的努力,数据库中所有的JOB恢复正常,没有长时间持有锁的会话,事务视图中也没有长时间运行的事务:
SQL> SELECT INSTANCE_NAME FROM V$INSTANCE;
INSTANCE_NAME
----------------
tradedb1
已选择1行。
SQL> SELECT SID, TYPE, ID1, ID2, LMODE, REQUEST, CTIME, BLOCK
2 FROM V$LOCK
3 WHERE CTIME > 86400
4 AND CTIME < 864000;
未选定行
SQL> SELECT ADDR, START_DATE
2 FROM V$TRANSACTION
3 WHERE START_DATE < TRUNC(SYSDATE);
未选定行
SQL> SELECT SID, JOB, LAST_DATE, THIS_DATE
2 FROM DBA_JOBS_RUNNING;
未选定行
检查另外一个实例:
SQL> SELECT INSTANCE_NAME FROM V$INSTANCE;
INSTANCE_NAME
----------------
tradedb2
SQL> SELECT SID, TYPE, ID1, ID2, LMODE, REQUEST, CTIME, BLOCK
2 FROM V$LOCK
3 WHERE CTIME > 86400
4 AND CTIME < 864000;
未选定行
SQL> SELECT ADDR, START_DATE
2 FROM V$TRANSACTION
3 WHERE START_DATE < TRUNC(SYSDATE);
未选定行
SQL> SELECT SID, JOB, LAST_DATE, THIS_DATE
2 FROM DBA_JOBS_RUNNING;
未选定行
虽然看似数据库状态正常了,但是启动RMAN连接数据库依旧报错:
bash-3.00$ rman target /
恢复管理器: Release10.2.0.3.0 - Production on星期二5月26 16:54:55 2009
Copyright (c) 1982, 2005, Oracle. All rights reserved.
RMAN-06900:警告:无法生成V$RMAN_STATUS或V$RMAN_OUTPUT行
RMAN-06901:警告:禁止更新V$RMAN_STATUS和V$RMAN_OUTPUT行
来自目标数据库的ORACLE错误:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554:内部恢复管理器程序包初始化失败
RMAN-06003:来自目标数据库的ORACLE错误:
ORA-03114:未连接到ORALCE
虽然没有完全的解决问题,但是前面的操作还是有效的,随后的JOB已经重新运行,并顺利的结束了,既然RMAN需要的资源仍然被锁,说明还有部分僵死的会话没有被清除,检查会话,根据登陆时间找到疑似的问题会话,并根据会话的等待实际来判断是否需要在后台清除:
SQL> SELECT SID, USERNAME, PROGRAM, SERVICE_NAME, LOGON_TIME
2 FROM V$SESSION
3 WHERE LOGON_TIME BETWEEN TO_DATE('2009-5-23', 'YYYY-MM-DD')
4 AND TO_DATE('2009-5-25', 'YYYY-MM-DD')
5 ORDER BY 5;
SID USERNAME PROGRAM SERVICE_NAME LOGON_TIME
---- -------------- ----------------------------- --------------------- -------------------
299 ZHEJIANG SYS$USERS 2009-05-23 17:01:01
106 ZHEJIANG SYS$USERS 2009-05-23 18:27:00
145 ZHEJIANG SYS$USERS 2009-05-23 18:27:00
103 ZHEJIANG SYS$USERS 2009-05-23 18:28:56
161 ZHEJIANG SYS$USERS 2009-05-23 18:29:04
281 ZHEJIANG_SELE oracle@newreport (TNS V1-V3) tradedb.us.oracle.com 2009-05-24 01:29:04
121 oracle@ahrac1 (q002) SYS$BACKGROUND 2009-05-24 03:00:00
114 GPO SYS$USERS 2009-05-24 22:53:05
已选择8行。
SQL> SELECT SID, EVENT, P1TEXT, P1, P2TEXT, P2, SECONDS_IN_WAIT TIME
2 FROM V$SESSION_WAIT
3 WHERE SID IN (299, 106, 145, 103, 161, 281, 114);
SID EVENT P1TEXT P1 P2TEXT P2 TIME
---------- ------------------------------ ---------- ---------- ---------- -------- -------
103 gc cr request file# 31 block# 136123 260432
106 gc cr request file# 31 block# 136123 260549
114 SQL*Net message from client driver id 1952673792 #bytes 1 2017
145 gc cr request file# 31 block# 136123 260442
161 gc cr request file# 22 block# 199111 260370
281 SQL*Net message from client driver id 1413697536 #bytes 1 234836
299 gc cr request file# 31 block# 22463 260550
已选择7行。
SQL> SELECT 'kill -9 ' || SPID
2 FROM V$PROCESS
3 WHERE ADDR IN
4 (SELECT PADDR FROM V$SESSION
5 WHERE SID IN (299, 106, 145, 103, 161, 281, 114));
'KILL-9'||SPID
--------------------
kill -9 18558
kill -9 13025
kill -9 13027
kill -9 16230
kill -9 16437
kill -9 2389
kill -9 27
已选择7行。
SQL> HOST
$ kill -9 18558
$ kill -9 13025
$ kill -9 13027
$ kill -9 16230
$ kill -9 16437
$ kill -9 2389
$ kill -9 27
$ exit
同样的操作,在另一个实例上运行:
SQL> SELECT SID, USERNAME, PROGRAM, SERVICE_NAME, LOGON_TIME
2 FROM V$SESSION
3 WHERE LOGON_TIME BETWEEN TO_DATE('2009-05-23', 'YYYY-MM-DD')
4 AND TO_DATE('2009-05-25', 'YYYY-MM-DD')
5 ORDER BY 5
6 ;
SID USERNAME PROGRAM SERVICE_NAME LOGON_TIME
---------- ---------- ----------------------- --------------------- -------------------
99 ZHEJIANG SYS$USERS 2009-05-23 17:06:21
279 ZHEJIANG SYS$USERS 2009-05-23 18:21:15
274 ZHEJIANG SYS$USERS 2009-05-23 18:21:32
315 ZHEJIANG SYS$USERS 2009-05-23 18:23:02
SQL> SELECT SID, EVENT, P1TEXT, P1, P2TEXT, P2, SECONDS_IN_WAIT TIME
2 FROM V$SESSION_WAIT
3 WHERE SID IN (99, 279, 274, 315);
SID EVENT P1TEXT P1 P2TEXT P2 TIME
---------- -------------------- ---------- ---------- --------------- ---------- ----------
99 enq: TT - contention name|mode 1414791172 tablespace ID 3 261689
274 enq: TT - contention name|mode 1414791172 tablespace ID 3 261635
279 enq: TT - contention name|mode 1414791172 tablespace ID 3 261686
315 enq: TT - contention name|mode 1414791172 tablespace ID 3 261479
SQL> SELECT COUNT(*) FROM V$SESSION WHERE EVENT = 'enq: TT - contention';
COUNT(*)
----------
129
SQL> SELECT 'kill -9 ' || SPID
2 FROM V$PROCESS
3 WHERE ADDR IN
4 (SELECT PADDR FROM V$SESSION
5 WHERE SID IN (99, 279, 274, 315));
'KILL-9'||SPID
--------------------
kill -9 5199
kill -9 5638
kill -9 8112
kill -9 2297
SQL> HOST
$ kill -9 5199
$ kill -9 5638
$ kill -9 8112
$ kill -9 2297
$ exit
删除了大量的僵死进程,发现数据库中仍然有很多类似的会话,看来通过清除僵死进程的方法只能治标而不能治本。
到目前位置,RMAN进程连接问题一直都没有解决。而且前面提到的多个RACGMAIN CHECK进程也没有解决,看来必须要从根本上解决这个问题了。
oracle视频教程请关注:http://u.youku.com/user_video/id_UMzAzMjkxMjE2.html