[20170324]cpu 100%,latch free等待分析

[20170324]cpu 100%,latch free等待分析.txt

--//这几天在ITPUB的讨论,链接如下:http://www.itpub.net/thread-2085574-7-1.html
--//很明显程序存在大量的sql优化问题.

--//lz上传了20160926,20160927,20170322,20170323,20170324的AWR报表.

--//20160926
Top 10 Foreground Events by Total Wait Time

Event                  Waits    Total Wait Time (sec)    Wait Avg(ms)    % DB time    Wait Class
DB CPU                                         71.1K                     101.3    
direct path read       120,510                 2518.5              21    3.6          User I/O

--//第1个主要是DB CPU(占了100%),而第2个direct path read.很明显sql语句存在大量的全表扫描.

--//20160927:
Top 10 Foreground Events by Total Wait Time

Event                   Waits    Total Wait Time (sec)    Wait Avg(ms)   % DB time    Wait Class
DB CPU                                          4010                     108.2    
read by other session   35,597                  72.3                2    2.0           User I/O
db file sequential read 36,028                  62.9                2    1.7           User I/O
db file scattered read  29,435                  60.8                2    1.6           User I/O
direct path read        191,418                 44.3                0    1.2           User I/O

--//可以看出在内存充足的情况下,即使直接路径读,IO还是很快的,可以参考我的链接:
http://blog.itpub.net/267265/viewspace-2134041/=> [20170221]数据文件与文件系统缓存.txt

--//也就是文件系统的缓存掩盖了sql语句执行计划的缺陷,当OS内存紧张时,问题就暴露无遗了.

--//20170322:
--//这是出现cpu 100%,latch free等待的情况:
Top 10 Foreground Events by Total Wait Time

Event                        Waits    Total Wait Time (sec)    Wait Avg(ms)   % DB time    Wait Class
DB CPU                                         124.7K                            66.6    
latch free                    4,969            822.1                     165       .4      Other
latch: cache buffers chains  22,886            138.1                       6       .1      Concurrency
db file sequential read      56,230            111                         2       .1      User I/O
log file sync                35,309             93                         3       .0      Commit

--//这个时候direct path read并没有出现,估计进入数据缓存,这样CBC latch就出现,虽然仅仅138.1秒.但是平均等待已经165ms,
--//但是CPU已经忙不过来.实际上你看看IO 相关等待时间很小.
--//从另外一个侧面说明CPU 100%,有时候比IO 繁忙更可怕.

--//20170323:
--//20170322重启后的情况,情况缓解,你仔细看:15:00:44-18:00:53情况

IOStat by Function summary

'Data' columns suffixed with M,G,T,P are in multiples of 1024 other columns suffixed with K,M,G,T,P are in multiples of 1000
ordered by (Data Read + Write) desc

Function Name      Reads: Data  Reqs per sec  Data per sec  Writes: Data Reqs per sec Data per sec Waits: Count Avg Tm(ms)
Direct Reads       17.7G        1.70          1.673M        0M           0.00         0M           18.3K        0.09
Others             290M         1.21          .027M         390M         1.05         .036M        24.4K        0.02
Buffer Cache Reads 319M         3.50          .03M          0M           0.00         0M           34.6K        0.80
DBWR                 0M         0.00          0M            235M         1.86         .022M        20.1K        0.03
LGWR                 0M         0.00          0M            124M         5.03         .011M        108.7K       0.01
Direct Writes        0M         0.00          0M            110M         1.09         .01M         11.8K        0.00
TOTAL:            18.3G         6.40          1.729M        859M         9.04         .079M        217.9K       0.14

--//可以发现直接路径读总量达到17.7G.实际上你注意看Avg Tm(ms)才0.09ms非常块.说明文件系统缓存掩盖了不良sql语句的执行计划.

--//20170324,已经建立2个索引.
Top 10 Foreground Events by Total Wait Time

Event                        Waits    Total Wait Time (sec)    Wait Avg(ms)   % DB time    Wait Class
DB CPU                                                6010                    86.3    
db file sequential read     320,152                  790.3          2         11.3         User I/O
log file sync                29,891                  111.5          4         1.6          Commit
db file parallel read         1,175                   60.2         51         .9           User I/O
read by other session        28,873                   53.7          2         .8           User I/O
direct path read             14,374                   15.2          1         .2           User I/O

--//DB CPU占百分比已经下降.看看IO的情况:
IOStat by Function summary

'Data' columns suffixed with M,G,T,P are in multiples of 1024 other columns suffixed with K,M,G,T,P are in multiples of 1000
ordered by (Data Read + Write) desc

Function Name      Reads: Data  Reqs per sec  Data per sec  Writes: Data Reqs per sec Data per sec Waits: Count Avg Tm(ms)
Direct Reads       13.9G        2.00          1.977M        0M           0.00         0M           14.4K        0.75
Buffer Cache Reads  8.5G        50.96          1.209M       0M           0.00         0M          333.5K        2.38
Others              395M        1.35          .055M         656M         1.30         .091M        19.1K        0.14
DBWR                  0M        0.00          0M            375M         3.21         .052M        23.1K        0.12
LGWR                  1M        0.01          0M            227M         5.47         .032M        78.5K        1.16
Direct Writes         0M        0.00          0M             69M         1.10         .01M         7907         0.00
TOTAL:             22.8G        54.31         3.241M        1.3G         11.08        .184M       476.6K        1.89

--//可以发现Direct Reads依旧很高.说明给继续优化,只要OS内存,问题就会掩盖住,一旦不足问题就暴露.
--//sql优化才是解决问题的王道..

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/267265/viewspace-2136024/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/267265/viewspace-2136024/

你可能感兴趣的:([20170324]cpu 100%,latch free等待分析)