DB Name |
DB Id |
Instance |
Inst num |
Startup Time |
Release |
RAC |
ORCL |
4567890123 |
ORCL1 |
1 |
07-Jan-15 19:01 |
11.1.0.6.0 |
YES |
Host Name |
Platform |
CPUs |
Cores |
Sockets |
Memory (GB) |
orcl1 |
AIX-Based Systems (64-bit) |
16 |
8 |
|
61.50 |
|
Snap Id |
Snap Time |
Sessions |
Cursors/Session |
Begin Snap: |
41170 |
06-Mar-15 09:58:35 |
205 |
17.2 |
End Snap: |
41171 |
06-Mar-15 10:58:26 |
194 |
18.3 |
Elapsed: |
|
59.85 (mins) |
|
|
DB Time: |
|
644.49 (mins) |
|
|
查看CPU个数:
$ bindprocessor -q
The available processors are: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
在此快照间隔中, 总共约 60 分钟, cpu 就共有 60*16=960 分钟, DBtime为 644.49 分钟,则:
cpu 花费了 644.49 分钟在处理 Oralce 非空闲等待和运算上(比方逻辑读)
也就是说 cpu 有 644.49 /960*100%=67% 花费在处理 Oracle 的操作上,这还不包括后台进程
Top5 Timed Foreground Events
Event |
Waits |
Time(s) |
Avg wait (ms) |
% DB time |
Wait Class |
cursor: pin S wait on X |
3,579,491 |
35,037 |
10 |
90.61 |
Concurrency |
library cache pin |
259 |
3,508 |
13543 |
9.07 |
Concurrency |
DB CPU |
|
108 |
|
0.28 |
|
DFS lock handle |
13,879 |
6 |
0 |
0.02 |
Other |
SQL*Net message from dblink |
6,283 |
3 |
0 |
0.01 |
Network |
在这些等待事件(除了db cpu)中, 等待次数最高的是cursor: pin S wait on X
生成一下这个一个小时内的ash报告进一步分析
DB Name |
DB Id |
Instance |
Inst num |
Release |
RAC |
Host |
ORCL |
4567890123 |
ORCL1 |
1 |
11.1.0.6.0 |
YES |
orcl1 |
CPUs |
SGA Size |
Buffer Cache |
Shared Pool |
ASH Buffer Size |
16 |
25,231M (100%) |
5,888M (23.3%) |
10,942M (43.4%) |
M (%) |
|
Sample Time |
Data Source |
Analysis Begin Time: |
06-Mar-15 09:58:35 |
DBA_HIST_ACTIVE_SESS_HISTORY |
Analysis End Time: |
06-Mar-15 10:58:35 |
DBA_HIST_ACTIVE_SESS_HISTORY |
Elapsed Time: |
60.0 (mins) |
|
Sample Count: |
4,699 |
|
Average Active Sessions: |
13.05 |
|
Avg. Active Session per CPU: |
0.82 |
|
Report Target: |
None specified |
|
Blocking Sid |
% Activity |
Event Caused |
% Event |
User |
Program |
# Samples Active |
XIDs |
2069, 3099 |
76.21 |
cursor: pin S wait on X |
76.21 |
USR_ABCD |
dcba@orcl1 (TNS V1-V3) |
901/360 [250%] |
0 |
由此可以看出cursor:pin S wait on X 是由 “2069,3099”这个session 导致的。
Sid, Serial# |
% Activity |
Event |
% Event |
User |
Program |
# Samples Active |
XIDs |
1969,22749 |
7.64 |
cursor: pin S wait on X |
7.62 |
USR_ABCD |
dcba@orcl1 (TNS V1-V3) |
358/360 [ 99%] |
0 |
1978, 6440 |
7.64 |
cursor: pin S wait on X |
7.62 |
USR_ABCD |
dcba@orcl1 (TNS V1-V3) |
358/360 [ 99%] |
0 |
1989,17438 |
7.64 |
cursor: pin S wait on X |
7.64 |
USR_ABCD |
dcba@orcl1 (TNS V1-V3) |
359/360 [100%] |
0 |
2004, 8866 |
7.64 |
cursor: pin S wait on X |
7.60 |
USR_ABCD |
dcba@orcl1 (TNS V1-V3) |
357/360 [ 99%] |
0 |
2069, 3099 |
7.64 |
library cache pin |
7.64 |
USR_ABCD |
dcba@orcl1 (TNS V1-V3) |
359/360 [100%] |
0 |
上面红色部分是“2069,3099”这个session 的等待事件
SQL Id |
SQL Text |
dn00zn4qjg469 |
SELECT * FROM xx WHERE XX; |
分析到这里,就需要到ORCL1 实例上(不是到ORCL2实例上)执行select * from v$session where sid='2069' and serial#='3099'去看看这个session 是哪个主机发出的。
SQL> set line 300
SQL> select sid,machine,program,status,event from v$session wheresid='2069' and serial#='3099';
SID MACHINE PROGRAM STATUS EVENT
---------- ------------- ------------------------- -------- ----------------------
2069 orcl1 dcba@orcl1 (TNS V1-V3) ACTIVE library cache pin
其实,cursor: pin S wait on X 也好,library cache pin 也罢,这都是shared pool 的等待事件。
可以通过下面的sql语句查看这个session正在运行的这个sql--dn00zn4qjg469 是啥样的sql语句
SQL> select sql_text from v$sql where sql_Id='dn00zn4qjg469';
此时咨询了以下应用软件的工程师,了解一下那个sql语句是干啥用的,结果工程师回应如果是查询语句,把它切掉就行
但是kill session是有原则的,应该先kill os pid,然后再去kill session
每个session都在 db server 上有一个操作系统进程
那么现在就需要通过session去找os pid
SQL> select p.spid from v$session s,v$process p wheres.paddr=p.addr and s.sid='2069';
SPID
------------------------
345054
查出来os pid是345054,然后在ps -ef | grep 345054
进一步确认 345054是不是 oracle数据库自己的进程
$ ps -ef | grep 345054
oracle 303344 406340 0 17:13:47 pts/1 0:00 grep 345054
arcsde 345054 340258 0 Feb 09 - 0:50 oracleORCL1(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
LOCAL=YES的进程,是由在数据库服务器上的进程(典型的就是一些业务程序)与oracle 数据库交互后,oracle 数据库产生的进程。
再 ps -ef | grep 340258
340258 是 345054的父进程。
$ ps -ef | grep 340258
arcsde 340258 463126 0 Feb 09 - 0:01 /u01/arfgd/sdeee/bin/gsrvresri_sde WIN-L41
arcsde 345054 340258 0 Feb 09 - 0:50 oracleORCL1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 484124 406340 0 17:20:30 pts/1 0:00 grep 340258
红色处就是应用程序开发商在数据库服务器上部署的业务程序
开始kill os pid: kill -9 345054
然后再kill session: alter system kill session'2069,3099';
$ kill -9 345054
$ ps -ef | grep 345054
oracle 303264 406340 0 17:27:59 pts/1 0:00 grep 345054
$ sqlplus / as sysdba
SQL*Plus: Release 11.1.0.6.0 - Production on Fri Mar 6 17:28:07 2015
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bitProduction
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> alter system kill session '2069,3099';
alter system kill session '2069,3099'
*
ERROR at line 1:
ORA-00031: session marked for kill
kill session的时候仅仅是将会话杀掉。在有些时候,由于较大的事务或需要运行较长的SQL语句将导致需要kill的session并不能立即杀掉。对于这种情况将收到 "marked for kill"提示(如上),一旦会话当前事务或操作完成,该会话被立即杀掉。
查看一下那个进程,结果还在
SQL> select sid,machine,program,status,event from v$session wheresid='2069' and serial#='3099';
SID MACHINE PROGRAM STATUS EVENT
---------- -------------- --------------------------- -------- ------------------------
2069 orcl1 abcd@orcl1 (TNS V1-V3) KILLED library cache pin
等待一会再查询
SQL> select sid,machine,program,status,event from v$session wheresid='2069' and serial#='3099';
no rows selected
ok,整理了这个文档仅供参考