昨天去一客户那里做巡检,遇到了经典的ora-04030错误,alert日志如下:
Tue Oct 28 09:57:46 2014
Errors in file /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/trace/wmsdb_ora_33358038.trc (incident=177302):
ORA-04030: out of process memory when trying to allocate 118808 bytes (QERHJ hash-joi,kllcqas:kllsltba)
Incident details in: /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177302/wmsdb_ora_33358038_i177302.trc
Errors in file /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/trace/wmsdb_ora_33358038.trc (incident=177303):
ORA-04030: out of process memory when trying to allocate 169040 bytes (pga heap,kgh stack)
ORA-04030: out of process memory when trying to allocate 118808 bytes (QERHJ hash-joi,kllcqas:kllsltba)
Incident details in: /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177303/wmsdb_ora_33358038_i177303.trc
Errors in file /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177302/wmsdb_ora_33358038_i177302.trc:
ORA-04030: out of process memory when trying to allocate 169040 bytes (pga heap,kgh stack)
ORA-04030: out of process memory when trying to allocate 118808 bytes (QERHJ hash-joi,kllcqas:kllsltba)
Errors in file /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/trace/wmsdb_ora_33358038.trc (incident=177304):
ORA-04030: out of process memory when trying to allocate 160 bytes (pga heap,control file cache)
ORA-04030: out of process memory when trying to allocate 118808 bytes (QERHJ hash-joi,kllcqas:kllsltba)
Incident details in: /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177304/wmsdb_ora_33358038_i177304.trc
Tue Oct 28 09:57:53 2014
Trace dumping is performing id=[cdmp_20141028095753]
Tue Oct 28 09:57:54 2014
Sweep Incident[177303]: completed
Tue Oct 28 09:57:57 2014
Errors in file /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177302/wmsdb_ora_33358038_i177302.trc:
ORA-04030: out of process memory when trying to allocate 160 bytes (pga heap,control file cache)
ORA-04030: out of process memory when trying to allocate 118808 bytes (QERHJ hash-joi,kllcqas:kllsltba)
Errors in file /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/trace/wmsdb_ora_33358038.trc (incident=177305):
ORA-04030: out of process memory when trying to allocate 16776728 bytes (QERHJ hash-joi,QERHJ Hash Table Entries)
Incident details in: /oracle/app/oracle/diag/rdbms/wmsdb/wmsdb/incident/incdir_177305/wmsdb_ora_33358038_i177305.trc
Trace dumping is performing id=[cdmp_20141028095802]
......
该客户的数据库是Oracle 11.1.0.7,运行在p550主机上,16个逻辑CPU,16G内存,AIX 6.1 64位系统,共分配了8G内存给SGA,2G内存给PGA,由memory_target=10G参数动态调整,同时,memory_max_target也设置成10G,表示Oracle最多可以占用OS的内存为10G(物理内存的62.5%)。这2个参数是11g新增的,可以动态分配SGA和PGA,而在10g中仅只能通过设置sga_target来动态管理sga中的各内存组件,pga是要另外手动设置的。
SQL> show parameters target
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
archive_lag_target integer 0
db_flashback_retention_target
integer
1440
fast_start_io_target integer 0
fast_start_mttr_target integer 0
memory_max_target
big integer
10G
memory_target big integer
10G
pga_aggregate_target
big integer
2G
sga_target big integer
8G
网上google了一下,很多关于ora-04030的错误的描述都大同小异,主要由以下几种:
A. 对于32 BIT系统,有SGA 1.7G限制
B. 某些OS系统本身也有一些内存参数限制
C. OS系统本身物理内存+Swap的限制
对于A,由于系统是64 bit的,没有1.7G的限制;
对于B,用ulimit查看的结果为ulimited;
对于C,OS本身的内存为16G,不过用topas查看,内存使用率已经为90%
通常ORA-04030的问题一般是PGA过度分配造成的(对应的操作是sort/hash_join)。从9i开始,pga_aggregate_target指定了所有session总共使用的最大PGA上限,这里为2G。如果该值被设定了则默认的workarea_size_policy=auto, sort_area_size/sort_area_retained_size将被忽略。那么直接减小pga_aggregate_target就能解决一部分ORA-04030问题。
另外,该错误意味着Oracle服务器进程无法从操作系统分配更多内存。该内存由PGA(Program Global Area)组成,其内容取决于服务器配置。对于专用的服务器进程,内存包含堆栈以及用于保存用户会话数据、游标信息和排序区的UGA(User Global Area)。在多线程配置中(共享服务器),UGA被分配在SGA(System Global Area)中,所以在这种配置下UGA不是造成ORA-4030错误的原因。因此,ORA-04030表示进程需要更多内存(堆栈 UGA 或 PGA)来执行其任务。
由于发生了这个错误,因此无法从操作系统分配内存。这个错误可能是进程本身导致的,例如进程需要过多的内存,或者一些其他原因导致操作系统内存被耗尽,例如SGA太大或系统虚拟内存(物理Memory + Swap)中要容纳的进程过多。许多操作系统会对单个进程能够获取的内存量加以限制,以便自我保护。
查阅了一篇官方文档,是这么说的:
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.3 and later --11.2.0.3版本居然也没有修复这个bug!
Information in this document applies to any platform.
***Checked for currency 24-July-2014***
SYMPTOMS
1. A session crashes with:
ORA-04030:out of process memory when trying to allocate 4194344 bytes(QERHJ hash-join,QERHJ list array)
2. Review of the instance parameters set reveals:
"_pga_max_size"=614400KB
CAUSE
The cause of this problem has been identified in:
Bug:13447197 - ORA-04030: OUT OF PROCESS MEMORY WHEN TRYING TO ALLOCATE 262168 BYTES (QERGH HAS
suspended as related to:
unpublished Bug:9506362 - ORA-04030: OUT OF PROCESS MEMORY (QERHJ HASH-JOI,KLLCQAS:KLLSLTBA)
Both bugs have been suspended due to lack of reproducibility.
--这是一个在11g上的bug
SOLUTION
As the bugs have been suspended due to lack of information to perform analysis, the only possible
workaround is to set the _PGA_MAX_SIZE instance parameter to a smaller value.
--唯一可能的解决方案是设置隐含参数“_PGA_MAX_SIZE”为一个较小的值
This parameter determines the maximum size which can be used for per-process PGA memory. The default value
is 200MB and the range of valid values is from 10MB up to 4TB-1.
--这个参数可以指定为每一个进程分配地PGA内存,缺省值为200M,合法的值为10M~(4T-1)
The per-process PGA memory can be limited by setting the _PGA_MAX_SIZE to a smaller value which internally
forces the hash-join to use the less memory and avoid the ORA-4030, like in:
SQL> alter system set "_pga_max_size"=100M;
--用“_PGA_MAX_SIZE”强制hash-join用更少的内存,来避免ora-04030,如设置为100M。
有关避免此错误的一般建议
1.一些操作需要大量的内存,对于排序问题,减少SORT_AREA_SIZE会有所帮助。Oracle服务器进程会将
PGA中的SORT_AREA_SIZE字节分配给排序操作。如果完成搜索需要更多内存,服务器进程将会使用temporary
segment。这意味着减少SORT_AREA_SIZE会对需要大量排序操作的查询性能产生影响。
2.对于9i及更高版本,通过将参数WORKAREA_SIZE_POLICY设置为AUTO,以及在初始化文件中指定
PGA_AGGREGATE_TARGET的大小,即可启用自动SQL执行内存管理功能。使用自动PGA内存管理将有助于减少发生
ORA-04030错误的可能性。注意,OpenVMS操作系统在Oracle 9i版本上不支持PGA_AGGREGATE_TARGET,但是在
Oracle 10g版本上是支持的。
3.PL/SQL程序也可分配大量内存,因此可能需要重写应用程序的某些部分。尽管PL/SQL表非常容易使用,但它确实需要
在PGA中分配内存。
4.查看optimizer策略,一些访问路径可能会因排序操作、较多行上的函数使用等原因而需要更多内存。
5.在某些操作系统上(例如 Microsoft Windows),可能要降低 SGA 的大小以便于PGA获得更大的内存。
6.确保操作系统和Oracle的内存限制设置合理。
7.确保有足够的可用内存(物理Memory和Swap)。