前阵子的alert日志获得了所需堆尺寸的大小超出指定阙值的提示,即Heap size 80869K exceeds notification threshold (51200K)。从Oracle 10.2.0.2之后该阙值已经增加到了50MB。通过LRU算法理论上来说应该是足够的。这个问题是由于SGA中碎片太多,一时间无法找到容纳当前语句于是抛出该提示信息。需要注意的是此时数据库并没有给出ora-04031错误提示。下面给出具体的描述。
一、alert 日志提取的提示信息
Fri Feb 7 19:47:21 2014 Memory Notification: Library Cache Object loaded into SGA Heap size 80869K exceeds notification threshold (51200K) Details in trace file /u02/database/UK3200/udump/UK3200_ora_6240.trc KGL object name :INSERT INTO ACCOUNTING_FILE_DTL_TBL (ACCOUNT_CODE_ID, ACCOUNT_CODE, ACCOUNT_NAME, VOUCHER_NO, ACC_NUM, CURR_CD, ACCOUNT_PERIO D, VALUE_DATE, TRANS_DATE, ACC_POS_HIST_ID, DEBIT_CREDIT, TRANS_DESC, TRANS_TYPE_CD, AE_ID, AMOUNT, CATEGORY, FX, DESCRIPTION, CUST_SUB_TYPE, VOU_TYPE, DISPLAY_TYPE, IS_HIDE_WHEN_ZERO, TYPE_GRP, DESC_ACCTING_STD, ACCOUNT_JV_TYPE_ID, JV_DESC, INPUT_DATE ) WITH VOU_TBL AS ( SELECT A.AC C_NUM, A.CURR_CD, SUBSTR (:B2 , 1, 6) AS ACCOUNT_PERIOD, A.VALUE_DATE, A.APPROVAL_DATE, A.TRANSACTION_C Mon Feb 10 18:22:32 2014 Memory Notification: Library Cache Object loaded into SGA Heap size 80865K exceeds notification threshold (51200K) KGL object name :INSERT INTO ACCOUNTING_FILE_DTL_TBL (ACCOUNT_CODE_ID, ACCOUNT_CODE, ACCOUNT_NAME, VOUCHER_NO, ACC_NUM, CURR_CD, ACCOUNT_PERIO D, VALUE_DATE, TRANS_DATE, ACC_POS_HIST_ID, DEBIT_CREDIT, TRANS_DESC, TRANS_TYPE_CD, AE_ID, AMOUNT, CATEGORY, FX, DESCRIPTION, CUST_SUB_TYPE, VOU_TYPE, DISPLAY_TYPE, IS_HIDE_WHEN_ZERO, TYPE_GRP, DESC_ACCTING_STD, ACCOUNT_JV_TYPE_ID, JV_DESC, INPUT_DATE ) WITH VOU_TBL AS ( SELECT A.AC C_NUM, A.CURR_CD, SUBSTR (:B2 , 1, 6) AS ACCOUNT_PERIOD, A.VALUE_DATE, A.APPROVAL_DATE, A.TRANSACTION_C #udump下的日志文件达到724MB,这个是这个提示累计下来的结果 oracle@linux-1234:~> ls -hltr /u02/database/UK3200/udump/UK3200_ora_6240.trc -rw-r----- 1 oracle oinstall 724M 2014-02-07 19:47 /u02/database/UK3200/udump/UK3200_ora_6240.trc
二、故障分析
对于一个需要执行的SQL或者PL/SQL时,该代码需要从library cache中分配一块连续的空闲空间来解析语句。Oracle首先扫描shared pool查找空闲内存,如果没有发现大小正好合适的空闲chunk,就查找更大的chunk,如果找到比请求的大小更大的空闲chunk,则将它分裂,多余部分继续放到空闲列表中。于是产生了碎片。系统经过长时间运行后,就会产生大量小的内存碎片。当请求分配一个较大的内存块时,尽管shared pool总空闲空间还很大,但是没有一个单独的连续空闲块能满足需要,则系统会根据LRU算法来淘汰那些不再使用的语句,这之后依旧一直无法分配到内存空间,则就可能产生4031错误。
通常情况下,如果检查发现shared_pool_size足够大,那4031错误一般就是由于碎片太多引起的。
尽管碎片无法避免,但应尽可能减少碎片。以下是可能产生碎片的一些潜在因素:
没有使用共享SQL;
过多的没有必要的解析调用(软解析);
没有使用绑定变量
对于我们的这个情况是内存够用,一时无法获得连续的空闲内存块。不过该insert语句的确比较长,接近400行。
关于shared_pool与SGA的分析
--环境 SQL> select * from v$version where rownum<2; BANNER ---------------------------------------------------------------- Oracle Database 10g Release 10.2.0.3.0 - 64bit Production --下面是关于对象重载的查询结果,尽管存在一些INVALIDATIONS,总体表现良好 SQL> @reload_ratio.sql NAMESPACE GETS GETHITS GETHIT_RATIO PINS PINHITS PINHIT_RATIO RELOADS INVALIDATIONS --------------- ---------- ---------- ------------ ---------- ---------- ------------ ---------- ------------- SQL AREA 607614 100098 16.47 268976344 267458010 99.44 103083 101598 TABLE/PROCEDURE 4601210 4474456 97.25 29426797 28904427 98.22 178625 0 BODY 39663 34617 87.28 2503211 2490492 99.49 6638 0 TRIGGER 72063 70249 97.48 1631610 1628323 99.8 1470 0 INDEX 94655 51973 54.91 469366 380268 81.02 3599 0 CLUSTER 27630 27383 99.11 61368 60883 99.21 238 0 OBJECT 0 0 100 0 0 100 0 0 PIPE 0 0 100 0 0 100 0 0 JAVA SOURCE 160 150 93.75 239 102 42.68 84 0 JAVA RESOURCE 81 75 92.59 243 160 65.84 27 0 JAVA DATA 162 159 98.15 1130 1126 99.65 0 0 11 rows selected. --关于library cache的命中率 SQL> SELECT SUM (pins) "Executions", 2 SUM (reloads) "Cache Misses while Executing", 3 ROUND ( (SUM (pins) / (SUM (reloads) + SUM (pins))) * 100, 2) "Hit Ratio, %" 4 FROM V$LIBRARYCACHE; Executions Cache Misses while Executing Hit Ratio, % ---------- ---------------------------- ------------ 303544356 294260 99.9 --当前系统此时可用的空闲内存 SQL> SELECT pool,name,bytes/1024/1024 FROM v$sgastat WHERE name LIKE '%free memory%' AND pool = 'shared pool'; POOL NAME BYTES/1024/1024 ------------ -------------------------- --------------- shared pool free memory 91.7747498 --有关shared_pool 的advice,从结果来看,当前的shared_pool为340mb SQL> SELECT shared_pool_size_for_estimate est_size, 2 shared_pool_size_factor size_factor, 3 estd_lc_size, 4 estd_lc_memory_objects obj_cnt,estd_lc_time_saved_factor sav_factor 5 FROM v$shared_pool_advice; EST_SIZE SIZE_FACTOR ESTD_LC_SIZE OBJ_CNT SAV_FACTOR ---------- ----------- ------------ ---------- ---------- 196 .5765 36 2822 .803 232 .6824 71 4400 .8567 268 .7882 106 5114 .9078 304 .8941 142 6095 .9532 340 1 177 7757 1 376 1.1059 212 9903 1.0434 412 1.2118 248 12080 1.0874 448 1.3176 283 14288 1.1382 484 1.4235 317 16277 1.1818 520 1.5294 351 16612 1.2166 556 1.6353 386 18515 1.2442 592 1.7412 421 18663 1.2679 628 1.8471 456 18885 1.2877 664 1.9529 490 19127 1.3033 700 2.0588 525 19255 1.3164 --当前的sga advice结果即便是增加到715mb,ESTD_DB_TIME时间为219849,效果并不理想 --也就是说增加sga的大小对当前数据库作用不大 --Author: Leshami --Blog : http://blog.csdn.net/leshami SQL> select sga_size,sga_size_factor,estd_db_time from v$sga_target_advice order by 1; SGA_SIZE SGA_SIZE_FACTOR ESTD_DB_TIME ---------- --------------- ------------ 286 .5 675749 429 .75 257214 572 1 222542 715 1.25 219849 858 1.5 219048 1001 1.75 219048 1144 2 219048
三、Oracle提出的解决方案
Memory Notification: Library Cache Object Loaded Into Sga (Doc ID 330239.1) To Bottom
APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.2.0.1 and later
Information in this document applies to any platform.
Oracle Server Enterprise Edition
.
SYMPTOMS
The following messages are reported in alert.log after 10g Release 2 is installed.
Memory Notification: Library Cache Object loaded into SGA
Heap size 2294K exceeds notification threshold (2048K)
CHANGES
Installed / Upgraded to 10g Release 2
CAUSE
These are warning messages that should not cause the program responsible for these errors to fail. They appear as a result of new event messaging mechanism and memory manager in 10g Release 2.
The meaning is that the process is just spending a lot of time in finding free memory extents during an allocate as the memory may be heavily fragmented. Fragmentation in memory is impossible to eliminate completely, however, continued messages of large allocations in memory indicate there are tuning opportunities on the application.
The messages do not imply that an ORA-4031 is about to happen.
SOLUTION
In 10g we have a new undocumented parameter that sets the KGL heap size warning threshold. This parameter was not present in 10gR1. Warnings are written if heap size exceeds this threshold.
Set _kgl_large_heap_warning_threshold to a reasonable high value or zero to prevent these warning messages. Value needs to be set in bytes.
If you want to set this to 8192 (8192 * 1024) and are using an spfile:
(logged in as "/ as sysdba")
SQL> alter system set "_kgl_large_heap_warning_threshold"=8388608 scope=spfile ;
SQL> shutdown immediate
SQL> startup
If using an "old-style" init parameter,
Edit the init parameter file and add
_kgl_large_heap_warning_threshold=8388608
NOTE: The default threshold in 10.2.0.1 is 2M. So these messages could show up frequently in some application environments.
In 10.2.0.2, the threshold was increased to 50MB after regression tests, so this should be a reasonable and recommended value.
最终的解决:
1、根据meatlink修改隐藏参数,不能确定该设置是否存在负面影响,也没有google到这个问题的答案 2、调整代码(略)) SQL> @hidden_para Enter value for para: large_heap KSPPINM KSPPSTVL DESCRIB --------------------------------- -------------------- --------------------------------------------- _kgl_large_heap_warning_thresh old 52428800 maximum heap size before KGL writes warnings to the alert log SQL> alter system set "_kgl_large_heap_warning_threshold"=82809856 scope=spfile ;