Response time = Service time + Wait time
Snap Id
|
Snap Time
|
Sessions
|
Cursors/Session
|
|
Begin Snap:
|
9820
|
13-2月 -12 08:00:17 |
735
|
89.3
|
End Snap:
|
9821
|
13-2月 -12 09:00:13 |
965
|
92.3
|
Elapsed:
|
59.93 (mins)
|
|||
DB Time:
|
437.53 (mins)
|
Snap Id
|
Snap Time
|
Sessions
|
Cursors/Session
|
|
Begin Snap:
|
571
|
31-May-10 08:00:16
|
472
|
56.9
|
End Snap:
|
572
|
31-May-10 09:00:17
|
307
|
50.1
|
Elapsed:
|
60.02 (mins)
|
|||
DB Time:
|
1,139.42 (mins)
|
Cpu time/ (Elapsed time * CPU cores)
Event
|
Waits
|
Time(s)
|
Avg Wait(ms)
|
% Total Call Time
|
Wait Class
|
latch: cache buffers chains
|
76,560
|
33,283
|
435
|
48.7
|
Concurrency
|
CPU time
|
23,129
|
33.8
|
|||
log file sync
|
28,057
|
2,754
|
98
|
4.0
|
Commit
|
db file sequential read
|
312,477
|
2,169
|
7
|
3.2
|
User I/O
|
log file parallel write
|
24,580
|
633
|
26
|
.9
|
System I/O
|
在awr中,Time Model Statistics用于回答“到底前台进程消耗了多少时间?”,“语句解析消耗了多少时间?”等诸如此类的问题,我们来看一个例子:
Statistic Name
|
Time (s)
|
% of DB Time
|
connection management call elapsed time
|
21,708.46
|
15.22
|
sql execute elapsed time
|
21,596.44
|
15.14
|
parse time elapsed
|
17,648.50
|
12.37
|
hard parse elapsed time
|
9,081.89
|
6.37
|
hard parse (sharing criteria) elapsed time
|
4,158.48
|
2.92
|
hard parse (bind mismatch) elapsed time
|
3,189.62
|
2.24
|
PL/SQL execution elapsed time
|
2,362.21
|
1.66
|
DB CPU
|
2,311.28
|
1.62
|
PL/SQL compilation elapsed time
|
145.25
|
0.10
|
failed parse elapsed time
|
89.78
|
0.06
|
sequence load elapsed time
|
69.16
|
0.05
|
repeated bind elapsed time
|
0.21
|
0.00
|
DB time
|
142,622.31
|
|
background elapsed time
|
4,871.65
|
|
background cpu time
|
22.41
|
我们看到,真于用于sql执行的时间(sql execute elapsed time)只占到了15.14%;这个比率是非常低的,作来一个相对正常的系统,这一比率不应低于90%甚至更高!我们看到用于连接管理(connection management call elapsed time)的时间为15%,用于sql解析(parse time elapsed)的时为12.37%,且其中硬解析sql(hard parse elapsed time)的时间为6.37%。从这几个方面来看,我们可以作一些合理的推测:
NUM_LCPUS
|
0
|
NUM_VCPUS
|
0
|
AVG_BUSY_TIME
|
149,127
|
AVG_IDLE_TIME
|
211,742
|
AVG_IOWAIT_TIME
|
40,260
|
AVG_SYS_TIME
|
14,015
|
AVG_USER_TIME
|
135,004
|
BUSY_TIME
|
2,387,903
|
IDLE_TIME
|
3,389,758
|
IOWAIT_TIME
|
646,085
|
SYS_TIME
|
225,900
|
USER_TIME
|
2,162,003
|
LOAD
|
0
|
OS_CPU_WAIT_TIME
|
2,474,600
|
RSRC_MGR_CPU_WAIT_TIME
|
0
|
PHYSICAL_MEMORY_BYTES
|
49,123,688,448
|
NUM_CPUS
|
16
|
NUM_CPU_CORES
|
8
|
Event
|
Waits
|
Time(s)
|
Avg Wait(ms)
|
% Total Call Time
|
Wait Class
|
CPU time
|
17,217
|
54.5
|
|||
db file sequential read
|
1,155,423
|
8,422
|
7
|
26.6
|
User I/O
|
db file scattered read
|
30,541
|
660
|
22
|
2.1
|
User I/O
|
gc current block 2-way
|
900,865
|
538
|
1
|
1.7
|
Cluster
|
gc cr grant 2-way
|
467,627
|
244
|
1
|
.8
|
Cluster
|
对于IO类型的等待事件来说,比如常见的db file sequential read平均等待时间一般不要超过10ms,这是根据当前硬盘的机械结构、转速等得出的一个经验值,因为单块10000转的普通磁盘,寻道时间在7-10ms左右。当然,如果考虑存储缓存、文件系统缓存、IO大小等情况,这个值一般在4-7ms之间;同时必须是IO系统在正常的iops(每秒的io次数,用于衡量存储并发吞量)下才有意义。如果超过经验值很多,就要分析具体的原因了,可能是存储的缓存功能没有启用、raid组中的磁盘损坏、不合理的raid级别等各种原因;当然也有可能是系统io负载过大,存储的负担较重,如果是这种情况就需要给合应用的io量(物理读与逻辑读),特别是sql的io情况来分析了。
Event
|
Waits
|
Time(s)
|
Avg wait (ms)
|
% DB time
|
Wait Class
|
db file sequential read
|
561,555
|
37,163
|
66
|
34.73
|
User I/O
|
read by other session
|
405,945
|
20,251
|
50
|
18.92
|
User I/O
|
log file sync
|
30,655
|
15,101
|
493
|
14.11
|
Commit
|
direct path read
|
171,963
|
12,254
|
71
|
11.45
|
User I/O
|
DB CPU
|
9,907
|
9.26
|
前面谈的都是“时间”,下面我们来谈一下次数,同时作为本文的结束。首先,一些latch相关的等待事件,与latch中的sleep的次数相一致!比如library cache lock/pin、latch:shared pool等。比如:
Event
|
Waits
|
%Time -outs |
Total Wait Time (s) |
Avg wait (ms) |
Waits /txn |
rdbms ipc reply |
1,622
|
0.00
|
1
|
0
|
0.05
|
latch: cache buffers chains |
3,941
|
0.00
|
1
|
0
|
0.12
|
Latch Name |
Get Requests |
Misses
|
Sleeps
|
Spin Gets |
Sleep1
|
Sleep2
|
Sleep3
|
cache buffers chains |
3,686,756,120
|
7,504,592
|
3,941
|
7,505,392
|
0
|
0
|
Event
|
Waits
|
Time(s)
|
Avg Wait(ms) |
% Total Call Time |
Wait Class |
CPU time |
|
17,217
|
|
54.5
|
|
db file sequential read |
1,155,423
|
8,422
|
7
|
26.6
|
User I/O |
db file scattered read |
30,541
|
660
|
22
|
2.1
|
User I/O |
gc current block 2-way |
900,865
|
538
|
1
|
1.7
|
Cluster
|
gc cr grant 2-way |
467,627
|
244
|
1
|
.8
|
Cluster
|
Operating System Statistics
Statistic
|
Total
|
NUM_LCPUS
|
0
|
NUM_VCPUS
|
0
|
AVG_BUSY_TIME
|
149,127
|
AVG_IDLE_TIME
|
211,742
|
AVG_IOWAIT_TIME
|
40,260
|
AVG_SYS_TIME
|
14,015
|
AVG_USER_TIME
|
135,004
|
BUSY_TIME
|
2,387,903
|
IDLE_TIME
|
3,389,758
|
IOWAIT_TIME
|
646,085
|
SYS_TIME
|
225,900
|
USER_TIME
|
2,162,003
|
LOAD
|
0
|
OS_CPU_WAIT_TIME
|
2,474,600
|
RSRC_MGR_CPU_WAIT_TIME
|
0
|
PHYSICAL_MEMORY_BYTES
|
49,123,688,448
|
NUM_CPUS
|
16
|
NUM_CPU_CORES
|
8
|
Avg global enqueue get time (ms):
|
2.5
|
Avg global cache cr block receive time (ms):
|
172.2
|
Avg global cache current block receive time (ms):
|
6.6
|
Avg global cache cr block build time (ms):
|
0.0
|
Avg global cache cr block send time (ms):
|
0.0
|
Global cache log flushes for cr blocks served %:
|
13.6
|
Avg global cache cr block flush time (ms):
|
0.8
|
Avg global cache current block pin time (ms):
|
0.2
|
Avg global cache current block send time (ms):
|
0.0
|
Global cache log flushes for current blocks served %:
|
0.3
|
Avg global cache current block flush time (ms):
|
3.2
|
这些值的官方数字从几ms到10ms不等,但我们这里的Avg global cache cr block receive time (ms)达到了惊人的172.2ms,这个指标表示,Oracle实例接收全局的CR块(一致性读数据块)的平均时间,这表示RAC中,各个节点间的热块急用是相当严重,需要在底层打散IO或者调整应用,减少总的IO量,并降低各个节点访问相同数据块的机率。具体的打散IO热点的技术,请参照其他文档。