Heap size 80869K exceeds notification threshold (51200K)

      前阵子的alert日志获得了所需堆尺寸的大小超出指定阙值的提示,即Heap size 80869K exceeds notification threshold (51200K)。从Oracle 10.2.0.2之后该阙值已经增加到了50MB。通过LRU算法理论上来说应该是足够的。这个问题是由于SGA中碎片太多,一时间无法找到容纳当前语句于是抛出该提示信息。需要注意的是此时数据库并没有给出ora-04031错误提示。下面给出具体的描述。

 

一、alert 日志提取的提示信息

Fri Feb  7 19:47:21 2014
Memory Notification: Library Cache Object loaded into SGA
Heap size 80869K exceeds notification threshold (51200K)
Details in trace file /u02/database/UK3200/udump/UK3200_ora_6240.trc
KGL object name :INSERT INTO ACCOUNTING_FILE_DTL_TBL (ACCOUNT_CODE_ID, ACCOUNT_CODE, ACCOUNT_NAME, VOUCHER_NO, ACC_NUM, CURR_CD, ACCOUNT_PERIO
D, VALUE_DATE, TRANS_DATE, ACC_POS_HIST_ID, DEBIT_CREDIT, TRANS_DESC, TRANS_TYPE_CD, AE_ID, AMOUNT, CATEGORY, FX, DESCRIPTION, CUST_SUB_TYPE,
VOU_TYPE, DISPLAY_TYPE, IS_HIDE_WHEN_ZERO, TYPE_GRP, DESC_ACCTING_STD, ACCOUNT_JV_TYPE_ID, JV_DESC, INPUT_DATE ) WITH VOU_TBL AS ( SELECT A.AC
C_NUM, A.CURR_CD, SUBSTR (:B2 , 1, 6) AS ACCOUNT_PERIOD, A.VALUE_DATE, A.APPROVAL_DATE, A.TRANSACTION_C

Mon Feb 10 18:22:32 2014
Memory Notification: Library Cache Object loaded into SGA
Heap size 80865K exceeds notification threshold (51200K)
KGL object name :INSERT INTO ACCOUNTING_FILE_DTL_TBL (ACCOUNT_CODE_ID, ACCOUNT_CODE, ACCOUNT_NAME, VOUCHER_NO, ACC_NUM, CURR_CD, ACCOUNT_PERIO
D, VALUE_DATE, TRANS_DATE, ACC_POS_HIST_ID, DEBIT_CREDIT, TRANS_DESC, TRANS_TYPE_CD, AE_ID, AMOUNT, CATEGORY, FX, DESCRIPTION, CUST_SUB_TYPE,
VOU_TYPE, DISPLAY_TYPE, IS_HIDE_WHEN_ZERO, TYPE_GRP, DESC_ACCTING_STD, ACCOUNT_JV_TYPE_ID, JV_DESC, INPUT_DATE ) WITH VOU_TBL AS ( SELECT A.AC
C_NUM, A.CURR_CD, SUBSTR (:B2 , 1, 6) AS ACCOUNT_PERIOD, A.VALUE_DATE, A.APPROVAL_DATE, A.TRANSACTION_C

#udump下的日志文件达到724MB,这个是这个提示累计下来的结果
oracle@linux-1234:~> ls -hltr /u02/database/UK3200/udump/UK3200_ora_6240.trc
-rw-r----- 1 oracle oinstall 724M 2014-02-07 19:47 /u02/database/UK3200/udump/UK3200_ora_6240.trc


二、故障分析
      对于一个需要执行的SQL或者PL/SQL时,该代码需要从library cache中分配一块连续的空闲空间来解析语句。Oracle首先扫描shared pool查找空闲内存,如果没有发现大小正好合适的空闲chunk,就查找更大的chunk,如果找到比请求的大小更大的空闲chunk,则将它分裂,多余部分继续放到空闲列表中。于是产生了碎片。系统经过长时间运行后,就会产生大量小的内存碎片。当请求分配一个较大的内存块时,尽管shared pool总空闲空间还很大,但是没有一个单独的连续空闲块能满足需要,则系统会根据LRU算法来淘汰那些不再使用的语句,这之后依旧一直无法分配到内存空间,则就可能产生4031错误。
      通常情况下,如果检查发现shared_pool_size足够大,那4031错误一般就是由于碎片太多引起的。
      尽管碎片无法避免,但应尽可能减少碎片。以下是可能产生碎片的一些潜在因素:
             没有使用共享SQL;
             过多的没有必要的解析调用(软解析);
             没有使用绑定变量

      对于我们的这个情况是内存够用,一时无法获得连续的空闲内存块。不过该insert语句的确比较长,接近400行。

关于shared_pool与SGA的分析

--环境
SQL> select * from v$version where rownum<2;

BANNER
----------------------------------------------------------------
Oracle Database 10g Release 10.2.0.3.0 - 64bit Production

--下面是关于对象重载的查询结果,尽管存在一些INVALIDATIONS,总体表现良好
SQL> @reload_ratio.sql

NAMESPACE             GETS    GETHITS GETHIT_RATIO       PINS    PINHITS PINHIT_RATIO    RELOADS INVALIDATIONS
--------------- ---------- ---------- ------------ ---------- ---------- ------------ ---------- -------------
SQL AREA            607614     100098        16.47  268976344  267458010        99.44     103083        101598
TABLE/PROCEDURE    4601210    4474456        97.25   29426797   28904427        98.22     178625             0
BODY                 39663      34617        87.28    2503211    2490492        99.49       6638             0
TRIGGER              72063      70249        97.48    1631610    1628323         99.8       1470             0
INDEX                94655      51973        54.91     469366     380268        81.02       3599             0
CLUSTER              27630      27383        99.11      61368      60883        99.21        238             0
OBJECT                   0          0          100          0          0          100          0             0
PIPE                     0          0          100          0          0          100          0             0
JAVA SOURCE            160        150        93.75        239        102        42.68         84             0
JAVA RESOURCE           81         75        92.59        243        160        65.84         27             0
JAVA DATA              162        159        98.15       1130       1126        99.65          0             0

11 rows selected.

--关于library cache的命中率
SQL> SELECT SUM (pins) "Executions",
  2  SUM (reloads) "Cache Misses while Executing",
  3  ROUND ( (SUM (pins) / (SUM (reloads) + SUM (pins))) * 100, 2) "Hit Ratio, %"
  4  FROM V$LIBRARYCACHE;

Executions Cache Misses while Executing Hit Ratio, %
---------- ---------------------------- ------------
 303544356                       294260         99.9

--当前系统此时可用的空闲内存 
SQL> SELECT pool,name,bytes/1024/1024 FROM v$sgastat WHERE name LIKE '%free memory%' AND pool = 'shared pool';

POOL         NAME                       BYTES/1024/1024
------------ -------------------------- ---------------
shared pool  free memory                     91.7747498

--有关shared_pool 的advice,从结果来看,当前的shared_pool为340mb
SQL> SELECT shared_pool_size_for_estimate est_size,
  2  shared_pool_size_factor size_factor,
  3   estd_lc_size,
  4  estd_lc_memory_objects obj_cnt,estd_lc_time_saved_factor sav_factor
  5  FROM v$shared_pool_advice;

EST_SIZE SIZE_FACTOR ESTD_LC_SIZE    OBJ_CNT SAV_FACTOR
---------- ----------- ------------ ---------- ----------
       196       .5765           36       2822       .803
       232       .6824           71       4400      .8567
       268       .7882          106       5114      .9078
       304       .8941          142       6095      .9532
       340           1          177       7757          1
       376      1.1059          212       9903     1.0434
       412      1.2118          248      12080     1.0874
       448      1.3176          283      14288     1.1382
       484      1.4235          317      16277     1.1818
       520      1.5294          351      16612     1.2166
       556      1.6353          386      18515     1.2442
       592      1.7412          421      18663     1.2679
       628      1.8471          456      18885     1.2877
       664      1.9529          490      19127     1.3033
       700      2.0588          525      19255     1.3164

--当前的sga advice结果即便是增加到715mb,ESTD_DB_TIME时间为219849,效果并不理想
--也就是说增加sga的大小对当前数据库作用不大 
--Author: Leshami
--Blog  : http://blog.csdn.net/leshami   
SQL> select sga_size,sga_size_factor,estd_db_time from v$sga_target_advice order by 1;

  SGA_SIZE SGA_SIZE_FACTOR ESTD_DB_TIME
---------- --------------- ------------
       286              .5       675749
       429             .75       257214
       572               1       222542
       715            1.25       219849
       858             1.5       219048
      1001            1.75       219048
      1144               2       219048    


三、Oracle提出的解决方案
Memory Notification: Library Cache Object Loaded Into Sga (Doc ID 330239.1)  To Bottom

APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.2.0.1 and later
Information in this document applies to any platform.
Oracle Server Enterprise Edition
.
SYMPTOMS
The following messages are reported in alert.log after 10g Release 2 is installed.
        Memory Notification: Library Cache Object loaded into SGA
        Heap size 2294K exceeds notification threshold (2048K)
CHANGES
Installed / Upgraded to 10g Release 2
CAUSE
These are warning messages that should not cause the program responsible for these errors to fail.  They appear as a result of new event messaging mechanism and memory manager in 10g Release 2.
The meaning is that the process is just spending a lot of time in finding free memory extents during an allocate as the memory may be heavily fragmented.  Fragmentation in memory is impossible to eliminate completely, however, continued messages of large allocations in memory indicate there are tuning opportunities on the application. 
The messages do not imply that an ORA-4031 is about to happen.

 

SOLUTION
In 10g we have a new undocumented parameter that sets the KGL heap size warning threshold.   This parameter was not present in 10gR1.  Warnings are written if heap size exceeds this threshold.
   
Set  _kgl_large_heap_warning_threshold  to a reasonable high value or zero to prevent these warning messages. Value needs to be set in bytes.

If you want to set this to 8192 (8192 * 1024) and are using an spfile:

(logged in as "/ as sysdba")

SQL> alter system set "_kgl_large_heap_warning_threshold"=8388608 scope=spfile ;

SQL> shutdown immediate
SQL> startup

If using an "old-style" init parameter,

Edit the init parameter file and add

_kgl_large_heap_warning_threshold=8388608
 NOTE:  The default threshold in 10.2.0.1 is 2M.   So these messages could show up frequently in some application environments.
In 10.2.0.2,  the threshold was increased to 50MB after regression tests, so this should be a reasonable and recommended value.

最终的解决:

1、根据meatlink修改隐藏参数,不能确定该设置是否存在负面影响,也没有google到这个问题的答案
2、调整代码(略))
SQL> @hidden_para
Enter value for para: large_heap

KSPPINM                            KSPPSTVL              DESCRIB
---------------------------------  --------------------  ---------------------------------------------
_kgl_large_heap_warning_thresh old 52428800             maximum heap size before KGL writes warnings
                                                        to the alert log

SQL> alter system set "_kgl_large_heap_warning_threshold"=82809856 scope=spfile ;               

你可能感兴趣的:(Heap size 80869K exceeds notification threshold (51200K))