gc buffer busy acquire和gc buffer busy release原理及案例

         昨天正式环境上出现数据库CPU 100%的问题,数据库是128个CPU,128G内存,power系列,非常强劲,十几万的逻辑读只需要1s。出现问题之后,调整两条负载最高的两条SQL,问题解决,但有两个cluster类别的等待事件第 一次见,在metlink中找了一篇文章:

共享:RAC等待事件:gc buffer busy acquire

概述

---------------------
         gc buffer busy是RAC数据库中常见的等待事件,11g开始gc buffer  busy分为gc buffer busy acquire和gc buffer  busy release:
         gc buffer busy acquire是当session#1尝试请求访问远程实例(remote  instance) buffer,但是在session#1之前已经有相同实例上另外一个session#2请求访问了相同的buffer,并且没有完成,那么session#1等待gc buffer busy acquire。
        gc buffer busy release是在session#1之前已经有远程实例的session#2请求访问了相同的buffer,并且没有完成,那么session#1等待gc buffer busy release。
原因/解决方法
---------------------
- 热点块(hot block)
在AWR中Segments by Global Cache Buffer Busy 记录了访问频繁的gc buffer.
解决方法可以根据热点块的类型采取不同的解决方法,比如采取分区表,分区索引,反向index等等。这点与单机数据库中的buffer busy waits类似。


- 低效SQL语句
低效SQL语句会导致不必要的buffer被请求访问,增加了buffer busy的机会。在AWR中可以找到TOP SQL。解决方法可以优化SQL语句减少buffer访问。这点与单机数据库中的buffer busy waits类似。


- 数据交叉访问
RAC数据库,同一数据在不同数据库实例上被请求访问。
如果应用程序可以实现,那么我们建议不同的应用功能/模块数据分布在不同的数据库实例上被访问,避免同一数据被多个实例交叉访问,可以减少buffer的争用,避免gc等待。


- Oracle bug
建议安装Oracle推荐的最新Patch Set和PSU。
Patch set和PSU信息请参考:Oracle Recommended Patches -- Oracle Database (Doc ID 756671.1)


当时数据库的负载,已经有几千个session堵塞了。

Snap Id Snap Time Sessions Cursors/Session Instances
Begin Snap: 4531 01-6月 -15 09:00:06 1137 106.2 2
End Snap: 4533 01-6月 -15 11:00:05 2728 42.4 2
Elapsed:   119.98 (mins)      
DB Time:   72,409.59 (mins)      

Top 10 Foreground Events by Total Wait Time

Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
gc buffer busy acquire 1,164,631 2324.3K 1996 53.5 Cluster
DB CPU   200.7K   4.6  
enq: TX - row lock contention 21,390 179K 8368 4.1 Application
rdbms ipc reply 9,488,345 162.2K 17 3.7 Other
buffer busy waits 3,899 94.5K 24243 2.2 Concurrency
gc buffer busy release 2,288 62.3K 27219 1.4 Cluster
log file sync 96,502 50.9K 528 1.2 Commit
latch: row cache objects 185,628 47.8K 258 1.1 Concurrency
gc current request 20 40.3K 2.0E+06 .9 Cluster
db file sequential read 4,074,919 30.7K 8 .7 User I/O

SQL ordered by Elapsed Time

  • Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
  • % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
  • %Total - Elapsed Time as a percentage of Total DB time
  • %CPU - CPU Time as a percentage of Elapsed Time
  • %IO - User I/O Time as a percentage of Elapsed Time
  • Captured SQL account for 84.5% of Total DB Time (s): 4,344,576
  • Captured PL/SQL account for 0.2% of Total DB Time (s): 4,344,576

Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Module SQL Text
2,295,852.93 1,559 1,472.64 52.84 0.05 0.04 9baga95gnc3qc JDBC Thin Client UPDATE BENCH_DONE T SET T....
546,544.31 3,694 147.95 12.58 16.40 0.01 8fcx0qg5xa5wy JDBC Thin Client SELECT ID, PLAN_STAT...
234,360.72 1,715 136.65 5.39 0.00 0.00 8w406h1z76j1d JDBC Thin Client UPDATE BENCH_DONE T SET T....
216,796.09 895 242.23 4.99 16.60 0.02 azj8hg39rjqyq JDBC Thin Client SELECT COUNT(1) ALL_COUNT, NVL...

SQL ordered by Gets

  • Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
  • %Total - Buffer Gets as a percentage of Total Buffer Gets
  • %CPU - CPU Time as a percentage of Elapsed Time
  • %IO - User I/O Time as a percentage of Elapsed Time
  • Total Buffer Gets: 26,785,040,521
  • Captured SQL account for 47.2% of Total
Buffer Gets Executions Gets per Exec %Total Elapsed Time (s) %CPU %IO SQL Id SQL Module SQL Text
7,426,092,420 3,694 2,010,311.97 27.72 546,544.31 16.4 0 8fcx0qg5xa5wy JDBC Thin Client SELECT ID, PLAN_STAT...
2,468,792,930 895 2,758,427.85 9.22 216,796.09 16.6 0 azj8hg39rjqyq JDBC Thin Client SELECT COUNT(1) ALL_COUNT, NVL...
230,823,164 3,586 64,367.87 0.86 11,911.51 12.8 .1 16mmtrx7rw0fb JDBC Thin Client select dutylogvos0_.M_DUTY_REC...
228,243,875 1,559 146,404.03 0.85 2,295,852.93 .1 0 9baga95gnc3qc JDBC Thin Client UPDATE BENCH_DONE T SET T....

总结:本次事件主要是低效SQL引起,逻辑读较高,执行频率又高。9baga95gnc3qc这条SQL在上周的表现是10万逻辑读,本周涨到14万,在此对此数据库打一个基线,一个节点,2小时执行1千次,逻辑读超过10万,则需要优化。

你可能感兴趣的:(gc buffer busy acquire和gc buffer busy release原理及案例)