同事在Toad里面执行SQL语句时,突然无线网络中断了,让我检查一下具体情况,如下所示(有些信息,用xxx替换,因为是在处理那些历史归档数据,使用的一个特殊用户,所以可以用下面SQL找到对应的会话信息):
SQL> SELECT B.USERNAME ,
2 B.SID ,
3 B.SERIAL# ,
4 LOGON_TIME ,
5 A.OBJECT_ID
6 FROM V$LOCKED_OBJECT A, V$SESSION B
7 WHERE A.SESSION_ID = B.SID AND B.USERNAME=&USERNAME
8 ORDER BY B.LOGON_TIME;
USERNAME SID SERIAL# LOGON_TIM OBJECT_ID
------------------------------ ---------- ---------- --------- ----------
xxxxxx 523 41890 06-MAY-16 825891
xxxxxx 523 41890 06-MAY-16 825892
执行了kill会话的语句后,检查发现对应的会话仍然存在,只是SERIAL#值变化了,再次去kill会话时,出现ORA-00030错误,如下所示
SQL> alter system kill session '523, 41890' immediate;
System altered.
SQL> SELECT A.ORACLE_USERNAME ,
2 A.OS_USER_NAME ,
3 B.OWNER ,
4 B.OBJECT_NAME ,
5 A.SESSION_ID ,
6 A.PROCESS ,
7 A.LOCKED_MODE
8 FROM V$LOCKED_OBJECT A, DBA_OBJECTS B
9 WHERE B.OBJECT_ID = A.OBJECT_ID AND B.OWNER=&OWNER
10 ORDER BY A.ORACLE_USERNAME,
11 A.OS_USER_NAME;
ORACLE_USERNAME OS_USER_NAME OWNER OBJECT_NAME SESSION_ID PROCESS LOCKED_MODE
---------------- ------------- ----------- ----------------- ---------------------- -------------
xxxxxxxxxxxxxxx ZhanxxxnL xxxxxxxxxxxx INV_xxxx_HD 523 6208:7548 3
xxxxxxxxxxxxxxx ZhanxxxxL xxxxxxxxxxxx INV_xxxx_LINES 523 6208:7548 3
SQL> SELECT B.USERNAME ,
2 B.SID ,
3 B.SERIAL# ,
4 LOGON_TIME ,
5 A.OBJECT_ID
6 FROM V$LOCKED_OBJECT A, V$SESSION B
7 WHERE A.SESSION_ID = B.SID
AND B.USERNAME=&USERNAME
8 ORDER BY B.LOGON_TIME;
USERNAME SID SERIAL# LOGON_TIM OBJECT_ID
------------------------------ ---------- ---------- --------- ----------
xxxxxxxxxxxxxx 523 41891 06-MAY-16 825892
xxxxxxxxxxxxxx 523 41891 06-MAY-16 825891
SQL> alter system kill session '523, 41891' immediate;
alter system kill session '523, 41891' immediate
*
ERROR at line 1:
ORA-00030: User session ID does not exist.
在metalink上,查看了ORA-00030错误的描述、原因、解决方案。如下所示
SQL> ho oerr ora 30
00030, 00000, "User session ID does not exist."
// *Cause: The user session ID no longer exists, probably because the
// session was logged out.
// *Action: Use a valid session ID.
The command may have been issued for one or more of the following reasons:
1. The process no longer exists at the os level, but does show up as active in v$session.
2. The user reboots the client machine without logging off, leaving a shadow process.
3. That session is holding onto a lock that needs to be released.
CAUSE
This error occurs because PMON is already trying to kill the session.
This is indicated by the fact that the serial number keeps changing.
When PMON attempts to cleanup a dead session, it will increase the serial number.
PMON may take a long time to clean up the process. If the process was doing a very large transaction at the time it aborted, then PMON has to rollback the large transaction.
When PMON makes progress, i.e. if it manages to free at least some of the process's resource, it will repeatedly keep trying to delete the process. When it finally gets to the point where it can't free up any of the process's resource (i.e. there are no more free buffers), it will print a message to the trace file and try to delete that process a second time.
The problem is encountered when PMON lacks the resources needed to remove the process. If there are not enough buffers, then the removal of the process is delayed. This is a free buffer problem in the data cache.
SOLUTION
Encountering an ORA-30 when attempting to manually kill a process is not necessarily a bug but a result of trying to kill a process already marked as killed.
PMON can take anywhere from 5 minutes to over 24 hours to clean up a job. The impact is that often the process being cleaned up is holding locks that prevents others from performing certain operations.
The solution is to wait for PMON to clean up the process.
基本上只能等待pmon进程回收处理这个进程,等了十来分钟,这个会话进程还是没有被清理,于是我查看了一下会话的相关信息,在网上查看到相关资料,可以从系统层面kill掉会话
SQL>
SQL> select event from v$session_wait where sid=523;
EVENT
----------------------------------------------------------------
db file sequential read
SQL> select sql_text from v$session a,v$sqltext_with_newlines b
2 where decode(a.sql_hash_value, 0, prev_hash_value, sql_hash_value)=b.hash_value
3 and a.sid=&sid order by piece;
Enter value for sid: 523
old 3: and a.sid=&sid order by piece
new 3: and a.sid=523 order by piece
SQL_TEXT
----------------------------------------------------------------
DELETE from inv_xxx_lines WHERE (xxx) IN ( SELECT tr
ans_line_id FROM xxxx GROUP BY trans_line_id HAVING C
OUNT(xxxxx) > 1) AND ROWID NOT IN (SELECT MIN(ROWID) FRO
M xxxx GROUP BY xxx HAVING COUNT(*) > 1)
于是我尝试从系统层面kill掉对应的系统进程。执行完成后,验证发现对应的会话已经Kill掉了。不知道是凑巧pmon进程回收了这个会话进程还是真的能从系统进程能kill掉(因为不能重新这种场景),如果下次碰到这种场景,就可以测试、验证了。特此记录一下
SQL> ! kill -9 4884
参考资料:
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=533785808734847&id=1011386.6&_afrWindowMode=0&_adf.ctrl-state=13ipo04jjr_4
http://www.linuxidc.com/Linux/2011-09/43730.htm