数据库常见告警、报错与解决方法记录

遇到的错误号,错误原因、处理方法及参考文档小结

一、 Oracle 报错记录

报错号 原因 参考 建议
error while loading shared libraries: libskgxp10.so: cannot open shared object file: No such file or directory 环境变量LD_LIBRARY_PATH没有设置 Oracle的常见错误及解决办法 - 沧海一滴 - 博客园

.bash_profile 设置

LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib

ORA-609

The ORA-609 error is thrown when a client connection of any kind failed to complete or aborted the connection process before the connection/authentication process was complete.


Very often, this connection abort is due to a timeout.  Beginning with 10gR2, a default value for inbound connect timeout has been set at 60 seconds.  This time limit is often inadequate for the entire connection process to complete. 

文档 ID 1116960.1

1.将SQLNET.INBOUND_CONNECT_TIMEOUT设为120秒(默认60秒)应该能解决大多数ORA-609问题

2.如果该报错发生在实例正在关闭时,可以忽略

执行sqlplus / as sysdba报错

ORA-12547: TNS:lost contact

$ORACLE_HOME/bin/oracle文件权限问题

造成错误“ORA-12547: TNS:lost contact”的常见原因有哪些? - ^_^小麦苗^_^ - 博客园

在Oracle中,如果$ORACLE_HOME/bin/oracle文件的属主或权限出了问题,该如何修复呢 - 墨天轮

chmod 6751 $ORACLE_HOME/bin/oracle
ORA-00471: DBWR process terminated with error DBWR进程由于错误中止(例如OOM,kill -9等),会导致DB挂掉 ORA-00471: DBWR process terminated with error案例 - 潇湘隐者 - 博客园 正常启动数据库即可,如果是因为触发OOM,需要再具体分析占用内存过高原因
ORA-46268: Conflicting operation on audit table(s) Session was interrupted by CTRL+C https://nazim-dba.blogspot.com/2018/06/error-sql-begin-dbmsauditmgmt.html 可忽略
从库(physical standby)偶尔报错 ORA-00322 ORA-00312 This errors are written due to a very small timing window, when the SRL header
is being updated to reflect new contents from the primary and in the same way
the ARCH process is doing his activity and seeing a CF flog entry that reflects an earlier state.
Physical standby - alert*log shows ORA-322, ORA-312 intermittently (文档 ID 1592057.1)

可忽略

check the alert.log for any further issues. As long as ORA-322, ORA-312 is not combined with other issues, you can ignore this error.

查看视图相关sql执行计划时报错

ORA-01039: insufficient privileges on underlying objects of the view

用户无视图基表select权限

无,另外使用TKPROF EXPLAIN PLAN 时也可能遇到该报错,参考 文档 ID 1029064.6

授予视图基表select权限
ORA-00700[kskvmstatact] 发生swap Doc ID 1919850.1 可忽略

ORA-00600: [1433], [60], [], [], [], [], [], [], [], [], [], [] 导致数据库宕机

----- Call Stack Trace -----
skdstdst <- ksedst <- dbkedDefDump <- ksedmp <- ksdxdmp
<- ksdxfdmp <- ksdxdocmdmultex <- 204 <- ksdxdocmdmult <- ksadms
<- ksbend <- ksadmp <- ksaamb <- ksasnd <- ksbasend
<- ksoreq_submit <- ksbsrv_opt <- ksvspawn <- ksvcreateex <- ksvcreate
<- kebm_mmon_schedule_ <- slave <- dbke_mmon_process_i <- ksb_run_managed_act <- ion
<- ksbcti <- ion <- ksbabs <- kebm_mmon_main <- ksbrdp
<- opirip <- opidrv <- sou2o <- opimai_real <- ssthrdmain
<- main <- libc_start_main <- start

命中Bug 13541842,Bug描述中并没有记载触发条件。后台进程是Oracle进行调度的,很难确认具体在什么情况下会触发。另外,从 Bug 13541842 的内容中并没有找到DG相关的描述,所以这个问题不是DG特有的问题,是数据库的问题。

这个问题发生在Oracle的进程间传递内部消息的动作上。当内部消息在message queue中找不到空闲缓冲的时候,就会发生这个错误。可能与归档进程、mmon、dbwr进程或rac有关,遇到的情况是与MMON 进程有关。

故障发生过程:
DB遇到了Bug 13541842 -> 导致内部消息在message queue中找不到空闲缓冲 -> 导致ORA-00600 [1433],[60]报错 -> DB宕机

Doc ID 13541842.8

SR 3-21912811411

数据库挂了后一般可以直接startup

无workaround,需要打对应补丁13541842

ORA-00600: [qerixSuspend:!miss]

Main Stack:

qerixSaveFetchState2 <- qertbSaveFetchState2 <- qerjoSaveMethod <- kcbpioqWillWaitForIO <- kcbgtcr <- kcbgtcr <- ktrget2 <- kdsgrp <- qetlbr <- qertbFetchByRowID <- qerjotRowProc <- qertbFetchByRowID <- qerjotFetch <- qerjotFetch <- qerjotFetch <- qerjotFetch <- qeruaFetch <- opifch2 <- opifch <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main

报错sql执行计划中使用了过多的NESTED LOOPS(没查到限制值是多少)

Explain plan output shows that the query is using excessive NESTED LOOPS.  This resulted in this ORA-600[qerixSuspend:!miss] error.

文档 ID 2389197.1

根本方法:优化sql

sql中可能使用了大量or条件(例如上百个),能否减少条件;使用NESTED LOOPS是否高效,能否用其他关联方法

workaround

1.  At the session level:

alter session set "_nlj_batching_enabled"=0;

~or~

2.  At the query level(加hint):

/*+ opt_param('_nlj_batching_enabled',0) */

~or~

3.  At the system level:

alter system set "_nlj_batching_enabled"=0;

ORA-00600: [kkoljt1]

Main Stack:
kkoljt <- kkogfp <- kkooqb <- kkoqbc <- apakkoqb <- apaqbdDescendents <- apaqbd<- kkofkrSetup <- apadrv <- opitca <- kksFullTypeCheck <- rpiswu2 <- kksLoadChild<- kxsGetRuntimeLock <- kksfbc <- kkspsc0 <- kksParseCursor <- opiosq0 <- kpoal8 <- opiodr<- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main

In case a query with a left outer joins crashes with ORA-600 [kkoljt1] you might hit this bug.

Bug 21577599 - ORA-600 kkoljt1 left outer join and join elimination ( Doc ID 21577599.8 )

SR 3-25073217651

1. 打对应patch
2. 升级到12.2
3. workaround:

alter session set "_complex_view_merging"=false;

或者
alter session set "_simple_view_merging"=false;

ORA-00600 [kdifind:kcbget_24]

Main Stack:

dbgeEndDDEInvocationImpl <- kcbget <- ktbxchg <- kdifind <- kdimod0 <- kauXuFinish <- kxibFinish <- qerupFetch <- updaul <- updThreePhaseExe <- 352 <- updexe <- opiexe <- opipls

在对表或索引块的非连续性multi read时遇到Bug 13464002 ORA-00600 [kdifind:kcbget_24] Errors (文档 ID 1466413.1)
Bug 13464002 - ORA-600 [kcbchg1_12] or ORA-600 [kdifind:kcbget_24] (文档 ID 13464002.8)

1. 升级至 12.1 或 11.2.0.4
2. 安装补丁 13464002

3. workaround
ALTER SYSTEM SET "_db_file_noncontig_mblock_read_count"=0 SCOPE=SPFILE; (重启数据库生效)

Setting "_db_file_noncontig_mblock_read_count"=0 will disable prefetching of non-contiguous blocks.

ORA-00600 [733] [314572880] [top call heap]


Main Stack:
ksmarfg <- kghgex <- kghfnd <- kghalo <- kghgex <- kghfnd <- kghalo <- kghgrw <- msqapp <- msqfun
<- k2copnws <- msqopnws <- msqbin <- k2copnws <- msqopnws <- msqbin <- k2copnws <- msqopnws
<- msqbin <- k2copnws <- msqopnws <- msqbin <- k2copnws <- msqopnws <- msqbin <- k2copnws
<- msqopnws <- msqbin <- k2copnws <- msqopnws <- msqbin <- k2copnws <- msqopnws <- msqbin
<- k2copnws <- msqopnws <- msqbin <- k2copnws <- msqopnws <- msqbin <- k2copnws

Current SQL: select * from v$sql_plan where plan_hash_value='xxxx'

BUG 19846328 - ORA-00600 [733] [TOP CALL HEAP] WHILE SELECTING FROM V$SQL_PLAN
Status: 36 - Duplicate Bug


BUG 28741243 - FA QUERIES FAIL WITH EXPLAIN PLAN

Doc ID 19846328.8

Doc ID 28741243.8

workaround 

alter session set query_rewrite_enabled = false; (测试设置后能查出结果,但还是会有600报错)

或打对应补丁

ORA-00600: [17113], [0x000000000]

Main Stack:
kgherror <- kghprmalo <- kghalp <- kksalc <- qksbgCreateCursorEnv <- qksbgGetCursorVal
<- kxsBindPeekEnabled <- kxsPeekBinds <- kkscsCompareBinds <- kksfbc <- opiexe <- kpoal8
<- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- ssthrdmain
<- main

This appears to be a memory management problem in attempting to acquire a chunk of memory.

Select fails With ORA-600 [17113] ( Doc ID 1399120.1 ) Solution
==================
This should be treated similar to an ORA-4031 error.
  • Perform alter system flush shared_pool and retry the failing transaction.
  • Shutdown and startup of the database may also help.
  • Please check the available memory for the database instance and if possible increase the SGA or shared_pool_size.

ORA-03170: deadlocked on readable physical standby (undo segment 401)

仅在从库执行select count(*) from tmp_v类似语句查询特定视图时才会报错,重启DB后依旧报错

CALL STACK
============
skdstdst <- ksedst <- dbkedDefDump <- ksedmp <- dbkdaKsdActDriver
<- dbgdaExecuteAction <- dbgdaRunAction <- dbgdRunActions <- dbgdProcessEventAct <- ions
<- dbgdChkEventKgErr <- dbkdChkEventRdbmsEr <- ksfpec <- dbgePostErrorKGE <- 1063
<- dbkePostKGE_kgsf <- kgesev <- ksesec1 <- ktuGetUsegDba <- ktuGetCommitTimesMa
<- ktuGetCommitTimes <- ktrvgxct1 <- kdlw_locator_op_rea <- kdlw_locator_op <- kdlf_locator_op
<- koklCurrentDataInli <- koklglen2 <- ne <- koklglen <- kokle_l2be_intermed
<- iate <- kokle_l2be <- iate <- evaopn2 <- evaopn2
<- evareo <- expepr <- evaiand <- expeal <- kdstfRRRRRRRRRRRRRR
<- kmP <- kdsttgr <- kmP <- qertbFetch <- rwsfcd
<- qerlbFetch <- qerhjFetch <- rwsfcd <- qerhjFetch <- qersoProcessULS
<- qersoFetchSimple <- 1196 <- qersoFetch <- rwsfcd <- qerltFetch
<- insdlexe <- insExecStmtExecIniE <- ngine <- insexe <- ngine
<- qes3tExecSQL <- qerleStart <- rwsstd <- qergsStart <- selexe0
<- opiexe <- kpoal8 <- opiodr <- ttcpip <- opitsk
<- opiino <- opiodr <- opidrv <- sou2o <- opimai_real
<- ssthrdmain <- main <- libc_start_main <- start

命中 Bug 24578056

Bug 25883955 - QUERIES FAIL WITH ORA-3170 ON ACTIVE DATA GUARD --> Closed as Duplicate of Unpublished Bug 24578056

Queries Fail Intermittently With ORA-3170 On ADG ( Doc ID 2311894.1 )

1. upgrade your DB to DATABASE BUNDLE PATCH 12.1.0.2.190716

2. 打补丁 24578056

workaround

建议业务方将报错sql移到主库查询

ORA-22921: length of input buffer is smaller than amount requested

there is no any known bug found, and
according other case, there is a workaround can solve this issue,
so please take following action to avoid this issue.

Action Plan
=================

SQL> alter system set "_report_capture_cycle_time"=0 scope=both sid='*‘;

无文档,SR回复

workaround

alter system set "_report_capture_cycle_time"=0 scope=both sid='*‘;

将该参数设为0会禁用12c的"Automatic Report Capturing Feature"新特性,该新特性由MMON_SLAVE监控资源消耗高的sql并自动产生监控报告,禁用该新特性对DB无害。

ORA-12516:TNS监听程序找不到符合协议堆栈要求的可用处理程序

ORA-12516:TNS:listener could not find available handler with matching protocol stack

当前连接数超了最大连接数,登录时会遇到ORA-00020 maximum number of processes (%s) exceeded报错

常规做法:

利用 lsnr_analyzer.pl 分析突增的连接来源并联系业务方调整连接数。

紧急做法(难定位或需要快速恢复):

1. 停监听 lsnrctl stop,等数据库访问连接数下降用户再登录;或者ps -ef |grep LOCAL=NO,kill掉几个进程然后进去;或者ps -ef|grep LOCAL=NO|grep -v grep|cut -c 9-15|xargs kill -9,杀掉部分oracle连接

2. 登录并改大processes参数,重启数据库

ORA-235

Concurrent update activity on a control file caused a process to read inconsistent information from the control file without a lock.

文档 ID 2312580.1 可忽略

grid中出现大量类似信息 

[OLOGGERD(9360)]CRS-8500: Oracle Clusterware OLOGGERD process is starting with operating system process ID 9360

Bug 24339991(内部文档,无法查看)

Doc ID 2279733.1

SR 3-16292501171

以下workaround为SR提供

root用户执行

/u01/app/products/grid/12.1.0/bin/crsctl stop resource ora.crf -init
/u01/app/products/grid/12.1.0/bin/crsctl modify resource ora.crf -attr "AUTO_START=never" -init -unsupport

验证方法
crsctls stat res ora.crf -t -init <<< This command should return: "OFFLINE/OFFLINE"
crsctl stat res ora.crf -p -init|grep AUTO_START <<< This command should return: never.

[ORAAGENT(29527)]CRS-5818: Aborted command 'check' for resource 'ora.ractest.racsrv.svc'. Details at (:CRSAGF00113:) {0:25:2} in /.../crsd_oraagent_oracle.trc

[ORAAGENT(29527)]CRS-5832: Agent '/.../oraagent_oracle' was unable to process commands. Details at (:CRSAGF00128:) {1:17772:489} in /.../crsd_oraagent_oracle.trc

ORA-07445: exception encountered: core dump [__intel_ssse3_rep_memcpy()+10272] [SIGSEGV] [ADDR:0x7F36575F2903] [PC:0x5A9AD50] [Address not mapped to object] [] 

Main Stack:
qerocConvertToSQL <- qerocPlsFetch <- qerocFetch <- kpofchswcbk <- rpiswu2 <- kpofrws
<- opifch2 <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o
<- opimai_real <- ssthrdmain <- main

Bug 18191823 - Hang / ORA-600 [19708] etc.. referencing stale REFCURSOR bind Doc ID 18191823.8

1. Upgrade to 12.2.0.1 or apply 12.1.0.2.160719, where the bug is fixed.

2. Apply patch 18191823 -> to be able to suggest one, provide the output of 'opatch lsinventory -detail'. Copy/paste the output into a file and upload the file.

3. As a workaround, ensure the duration of the bound PL/SQL variable is at least as long as that of the ref cursor to which it is bound.

ORA-07445: exception encountered: core dump [qerocPlsFetch()+117] [SIGSEGV] [ADDR:0x7F964F70FCC0] [PC:0xAA7BD45] [Invalid permissions for mapped object] []

Main Stack:
qerocPlsFetch <- qerocFetch <- kpofchswcbk <- rpiswu2 <- kpofrws <- opifch2 <- kpoal8
<- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- ssthrdmain
<- main

Bug 18191823 - Hang / ORA-600 [19708] etc.. referencing stale REFCURSOR bind

有没有发现虽然报错不同,但是Main Stack和bug跟上面那个是一样的

Doc ID 18191823.8

1. Upgrade to 12.2.0.1 or apply 12.1.0.2.160719, where the bug is fixed.

2. Apply patch 18191823 -> to be able to suggest one, provide the output of 'opatch lsinventory -detail'. Copy/paste the output into a file and upload the file.

3. As a workaround, ensure the duration of the bound PL/SQL variable is at least as long as that of the ref cursor to which it is bound.

ORA-07445 [kteclck()+586] [SIGSEGV] [ADDR:0x0] [PC:0x147EDBA] [Address not mapped to object]

This bug may be suspected if the followings are true:
 - this issue only occurs on Standby/ADG 
 - ORA-7445 [kteclck()] occurs
 - the plan table of SQL being parsed shows "TEMP TABLE TRANSFORMATION"
 - stack is likely to include: 
   ... kteclck ktelbhw qesldBumpHwm qesldCommitHwmLoad ...

Bug 17426539  ORA-7445 in kteclck() on Dataguard Read Only Standby(已合并到 Bug 25489342  ORA-7445 in kteclck() on Dataguard Read Only Standby Due to a Regression of Bug 17426539)

bug触发原因

This codepath is exercised when a segment header block of a
temp segment is trying to undergo a redo-only change to set
or clear the locker transaction id in its header structure. 

文档 ID 17426539.8
文档 ID 25489342.8

1. 升级至18.1.0版本

2. 将查询移至主库执行

3. 查询中不使用temp table(未实际测试)

4. 打补丁 25489342

ORA-07286: sksagdi: cannot obtain device information *Cause:  Stat on the log archiving device failed.
无法进行归档或将归档发送至从库
Doc ID 316281.1 找到无法归档的原因,一般在alter日志中可以看到详细报错。可能是LOG_ARCHIVE_DEST设置错误、可能是主库或从库归档目录(或FRA区)满了,确定后具体处理
ORA-48913 MAX_DUMP_FILE_SIZE 设置过低,trace file写满 文档 ID 1153040.1 视情况而定,可以加大MAX_DUMP_FILE_SIZE的值或将其设为unlimited,如果够大了可以忽略
ORA-3136 从 10.2.0.1 开始,参数 SQLNET.INBOUND_CONNECT_TIMEOUT 的默认设置是60秒。如果客户端无法在60秒内进行身份验证,警告将出现在警报日志中,客户端连接将被终止。 文档 ID 2331569.1

通常需要在监听器和数据库中加大 INBOUND CONNECT TIMEOUT 的值。通常建议将数据库(sqlnet.ora)值设置得稍微高于监听器(listener.ora)

当oracle性能压力过大时,也可能出现大量此报错

ORA-01555 快照过旧  

检查undo大小是否合理,如果合理通常是找到对应的SQL语句进行优化。

ORA-00028 session 被kill Doc ID 1230858.1 可忽略
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_38" Seed database was most likely not created right by package dbms_stats.init_package not being ran.  文档 ID 2127675.1

sqlplus / as sysdba

EXEC dbms_stats.init_package();

ORA-00600[pmuocon2#1:invalid magic number] 自定义汇聚函数bug Bug 21519686 - ORA-600 [pmuocon2#1: invalid magic num] from SQL using UDAG (used defined aggregate) (Doc ID 21519686.8)

以下选择其一:

1.打patch 21519686

2.升级至12.2

3.修改隐含参数_odci_aggregate_save_space = true

ORA-00600:[qernsRowP], [1], [], [], [], [], [], [], [], [], [], []

可能是GROUP BY NOSORT hit中了Oracle的bug,但遇到的sql并不符合mos 文档中bug的描述 ID 285913.1

 Alter session set events '10119 trace name context forever, level 1';

(注意在系统级别设置这个事件可能会对数据库性能造成影响)

ORA-1652

Failed to allocate an extent of the required number of blocks for a temporary segment in the tablespace indicated.

文档 ID 1267351.1
NOTE:364417.1

增加tempfile或找出使用temp tablespace使用过量的语句,进行优化

CRS-10001 ACFS-9203:true

ADVM/ACFS device drivers were installed/loaded.

oerr acfs 9203

可忽略
ORA-12751 
  • There is a database issue that prevents the slave action from completing on time
  • There is an issue related to how long the action being performed is taking such that it violates in internal run time policy (i.e. a run time policy violation). For example: A specific SQL or set of SQL statements in a particular thread of operation is taking too long to complete 
  • A thread of operation exceeds pre-determined CPU usage levels.

文档 ID 761298.1

 

When encountering this issue, check for other database issues at the time and investigate those as they could be slowing the actions such that limits are exceeded.

If the database is generally performing slowly then investigate the cause of the slowness, if the database hangs then investigate the hang

ORA-03137: TTC protocol internal error : [12333] [254] [64] [49] [] [] [] [] merge语句中绑定变量太长

一个绑定变量太长

ORA-3137[12333] on a MERGE Statement using a Bind Variable Larger than 1000 Bytes(文档 ID 2307683.1)

两个绑定变量太长

Merge or Insert is Failing with ORA-3137 [12333](文档 ID 2039740.1)

一个绑定变量太长,可以选择以下任一种解决方法:

  • 升级到18.1
  • 安装补丁21616967
  • 将merge语句改写为insert或者update

两个绑定变量太长:

修改merge语句,避免在sql语句中使用两个太长的绑定变量

 ORA-01274: cannot add data file that was originally created as '/path/data01.dbf'

 Automated standby file management was disabled, so an added file could not automatically be created on the standby. 

 The error from the creation attempt is displayed in another message. The control file file entry for the file is 'UNNAMEDnnnnn'.

 文档 ID 739618.1

 alter database create datafile '/oracle/product/GSIPRDGB/dbs/UNNAMED00210' as new;

ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT=AUTO scope=both;

alter database recover managed standby database using current logfile disconnect;

ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], [], [], [], [], [] on ADG导致备库hang死

----- Call Stack Trace -----

skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp
<- ksfdmp <- dbgexPhaseII <- dbgexExplicitEndInc <- dbgeEndDDEInvocationImpl <- dbgeEndDDEInvocation
<- nImpl <- kcbgtcr <- kcbgcurstdby <- kcbgcur <- ktfbhget
<- ktfbfef <- qerfxFetch <- qerjotFetch <- qerjotFetch <- rwsfcd
<- qeruaFetch <- qervwFetch <- qerghFetch <- qervwFetch <- qersoProcessULS
<- qersoFetch <- qerjotFetch <- opifch2 <- kpoal8 <- opiodr
<- ttcpip <- opitsk <- opiino <- opiodr <- opidrv
<- sou2o <- opimai_real <- ssthrdmain <- main <- libc_start_main
<- start

命中Bug 18899974,在备库上,对表空间的元数据进行检查时,通过调用kcbgcur()内部函数发生异常,SCN出现问题,备库自身的SCN与主库同步过来的SCN号不能达成一致。

Bug Symptoms:

- ORA-00600: internal error code, arguments: [kcbgtcr_13]
- Current SQL was selecting from SYS.DBA_TABLESPACES, SYS.DBA_DATA_FILES, SYS.DBA_FREE_SPACE, SYS.DBA_TEMP_FILES, GV$SORT_SEGMENT, SYS.TS$, V$PARAMETER for 'UNDO'
- The call stack matches
- This is not RAC specific and the problem can be manifested in non-RAC databases

Bug 18899974 - ORA-600 [kcbgtcr_13] on ADG for SPACE metadata blocks and UNDO blocks ( Doc ID 18899974.8 )

紧急处理方法

DBA重启备库。若hang死无法登录,使用sqlplus -prelim。参考 sqlplus -prelim使用方法_Hehuyi_In的博客-CSDN博客

文档方法:

  • 升级到 12.2 版本
  • Apply patch 18899974
ORA-04030(koh-kghu sessi,pmuccst: adt/record) 单个进程使用内存超出4G(默认) Doc ID 1325100.1

DBA介入,找出消耗内存超过限制的进程。

可以修改以下参数将单个进程使用内存限制提高至16G

For versions 11.2.0.4 and lower:
_use_realfree_heap=TRUE
_realfree_heap_pagesize_hint = 262144

For 12.1 and higher:
_use_realfree_heap=TRUE
_realfree_heap_pagesize = 262144

ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

1.执行DML时,当要操作的数据已经被加锁,这时在另一个回话中再次要取得这个对象的锁时,新会话要么被挂起,要么抛出ORA-00054异常。


2.当DML在执行中,又同时在相同对象上执行DDL语句。比如Update操作的事务尚未提交,在另一个会话中开始执行表结构修改、变更索引的SQL时,也往往会出现ORA-00054异常。

文档 ID 1945579.1

找出占用资源的会话正在执行的语句,联系开发判断能否kill掉释放资源

ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn5

The error messages indicating that oracle has problem in forking more process, the maximum number of PROCESSES allowed per user could be too low

Doc ID 392006.1

① 调整/etc/security/limits.conf 中ora用户的nproc限制值

② 修改/etc/security/limits.d/90-nproc.conf配置,这是linux 6 中的新特性,在5中没有该文件

RHEL6下引入了配置文件/etc/security/limits.d/90-nproc.conf
只有当使用*号让全局用户生效的时候,生效的nproc的值大小是受文件/etc/security/limits.d/90-nproc.conf中nproc值大小制约的,而如果仅仅是针对某个用户,那么就不受该文件nproc值大小的影响。

③ 重启监听,可以的话最好也重启数据库

报错与上面类似,但是原因不同

ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2

This happens due to less space available for network buffer reservation.

报错信息说明udp buffer耗尽,需要增加udp buffer避免此问题发生

Oracle Linux: ORA-27301:OS Failure Message: No Buffer Space Available ( Doc ID 2041723.1 )

For udp buffer, oracle-rdbms-server-12cR1-preinstall requirement lower limits.
To avoid unnecessary UDP buffer overflow and 'gc cr multi block request' wait, we always suggest to set them to 4K:
Please refer to:
https://community.oracle.com/login.jspa?referer=https://community.oracle.com/message/12081569#12081569

disabled NUMA and increased MTU and UDP buffer

改法参考

https://blog.csdn.net/Hehuyi_In/article/details/89562117

UDP buffer是指:

net.core.rmem_default = 4194304
net.core.rmem_max = 4194304
net.core.wmem_default = 4194304
net.core.wmem_max = 4194304

ORA-07445: 出现异常错误: 核心转储 [qecinisub()+60] [SIGSEGV][ADDR:0x8A8] [PC:0xCCDF3
0C] [Address not mapped to object] []
hit中Oracle bug 21522416,该问题不一定会复现

Bug 21522416(内部文档,无法访问)

 alter system set optimizer_dynamic_sampling=0; 

若问题继续出现,建议联系Oracle Support

ORA-16401: archive log rejected by Remote File Server (RFS)

ORA-16055: FAL request rejected

主库切换日志过于频繁 文档 ID 1243177.1

1. Ignore these Messages as long as the Standby Database keeps synchronized with the Primary

2. Database Increase the Size of the Online Redologs to reduce Redolog Switch Frequency

3. Increase Network Bandwith between the Primary and Standby Database

ora-1882 occured when executing a select statement with dblink

Bug 16731148 ORA-1882 using TIMESTAMP literals over a database link

An unexpected ORA-1882 may be raised when using TIMESTAMP literals
across a database link connection if the NLS_NUMERIC_CHARACTERS setting
does not use a dot (".") as the decimal separator.

Doc ID 16731148.8

workaround

alter session set NLS_NUMERIC_CHARACTERS = ',.';
select timestamp '2012-07-24 07:00:00.00' from dual@somelink;

PRVG-1101 : SCAN name "cluscan.us.oracle.com" failed to resolve
PRVF-4664 : Found inconsistent name resolution entries for SCAN name "cluscan.us.oracle.com"
PRVF-4657 : Name resolution setup check for "scanclunm" (IP address: 10.4.0.202) failed

Cause 1. SCAN name is expected to be resolved by local hosts file instead of DNS or GNS

Cause 2. nslookup fails to find record for SCAN name

Cause 3. SCAN name is canonical name(CNAME record) in DNS

文档 ID 887471.1 如果rac安装时scan name 是通过hosts文件而非DNS或GNS解析,该报错可忽略
ORA-04031: unable to allocate 65576 bytes of shared memory ("shared pool","unknown object","ktli log buf s","ktli log bufs")

一般ORA-04031错误可能由两个原因引起:

1.内存中存在大量碎片,导致在分配内存的时候,没有连续的内存可存放

2.   共享池容量不足

对于第一种原因,一般是需要在开发的角度上入手,比如增加绑定变量、缩短sql语句、减少硬解析来改善和避免

对于第二种,需要调整sga及shared_pool相关参数可能还需要扩大内存

ORA-00600: internal error code, arguments: [ktecgsc:objdchk_kcbgcur_3]

原因可能是truncate的并发或者速度太快,来不及将段头的信息更新,就进行了CR(consistent read)校验。这个校验其实是不必要的。在11.2.0.3中进行了这个CR校验,在11.2.0.4中就取消了。

SR 3-17769074971

文档 ID 2101512.1

文档 ID 2230425.1

文档 ID 2101512.1

以下选择其一:

  1. 升级至11.2.0.4或更高版本
  2. 安装补丁15974138
ORA-02063: preceding 2 lines from MES

[Oracle] 分布式事务和两阶段提交(2PC)_haiross的专栏-CSDN博客

DBA_2PC_PENDING中的分布式锁-解锁_ITPUB博客

Ocssd.Bin Process Consumes 100% CPU in only one node of the RAC OneNode environment Bug 22986384 - OCSSD threads 7,8 and 13 are using large amount of cpu

Doc ID 22986384.8

安装补丁 Patch 22986384 

ora_m001_p+ process consumed high CPU

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14932 oracle 20 0 5277220 329092 260452 R 100.0 2.0 0:29.17 ora_m001_p+
3852 root 20 0 1885036 34864 19104 S 2.0 0.2 437:20.36 ohasd.bin
5688 root 20 0 1050156 22068 14072 S 2.0 0.1 418:32.31 orarootage+
3978 root 20 0 1589596 56604 20016 S 1.0 0.4 199:10.02 orarootage+

命中bug 20636003

ORA-12751: "Cpu Time Or Run Time Policy Violation" Reported Due to DBMS_FEATURE_AWR Running Slow on 12c ( Doc ID 2095583.1 )

apply the patch 20636003

OR
A number of workarounds are available:

1)If a very high retention period is not needed, lower it to 14 days, or a week, to have less snapshots.


2)Disable Feature Usage Statistics 
alter system set "_swrf_mmon_dbfus"=false;

3)The problem query is recursive and is run as part of Dynamic Sampling Directives introduced in 12c (the /* DS_SVC */ comment identifies that this statement is from the SYS.DBMS_FEATURE_AWR package). These directives can be disabled while the Feature Usage Statistics remain enabled. In order that this expensive recursive query will not run anymore, Dynamic Sampling Directives can be disabled using the following command:
alter system set "_optimizer_dsdir_usage_control"=0;

ORA-00600: internal error code, arguments: [ktecgsc:objdchk_kcbgcur_3], [1654385], [4], [0], [0], [], [], [], [], [], [], []

从报错信息看,大部分都是在报错前执行了truncate语句。而truncate table的时候,表的object id不变,但是data object id是会变。

段头中记录的信息,和buffer cache中记录的信息,由于某些原因不一致,导致校验报错了。

原因可能是truncate的并发或者速度太快,来不及将段头的信息更新,就进行了CR(consistent read)校验。这个校验其实是不必要的。在11.2.0.3中进行了这个CR校验,在11.2.0.4中就取消了。

SR 3-17769074971

安装补丁 Patch 15974138: UNNECESSARY FORCE CR SEGMENT HEADER REQUEST
ORA-00904:"WM_CONCAT":标识符无效 12c中删除了WM_CONCAT函数,改用LISTAGG代替 文档 ID 2215183.1 改用LISTAGG代替
ORA-07445: 出现异常错误: 核心转储 [kghalp()+51] [SIGSEGV] [ADDR:0x7FC0FF4671DB] [PC:0xCD890C3] [Address not mapped to object] []  The error seems to be hit when allocating memory from the SQL Costing code as we have kkeutlCompHistActVals() -> qksshMemAllocPerm() 

Bug 21522416 - ORA-7445 [KGHALP()+51] [SIGSEGV]
SR 3-17358806271 : What caused ORA-07445:[kghalp()+51] [SIGSEGV] error

alter system set optimizer_dynamic_sampling=0; 

The parameter OPTIMIZER_DYNAMIC_SAMPLING controls the level of sampling performed by the optimizer. The only impact of setting it to 0 will be that at the run time (while an SQL is running), if there is a better plan available, it won't switch to the better plan, it will rather stick to the present plan. 

ORA-07445: 出现异常错误: 核心转储 [kkeutlCompHistActVals()+235] [SIGSEGV] [ADDR:0x7FA39F33B968] [PC:0xCA87E9B] [Address not mapped to object] []

----- Call Stack Trace -----

skdstdst <- ksedst <- dbkedDefDump <- ksedmp <- ssexhd
<- sslsshandler <- sighandler <- kkeutlCompHistActVa <- kkeh1sl_Int <- ls
<- kkehsl_Int <- kkehslCursorMatch <- kkocsPredSel <- kkscsVerifyBindEqui <- vRP
<- kkscsVerifyBindEqui <- valence <- kkscsMatchEquivLite <- rals <- kkscsCompareBinds
<- kksfbc <- opiexe <- kpoal8 <- opiodr <- ttcpip
<- opitsk <- opiino <- opiodr <- opidrv <- sou2o
<- opimai_real <- ssthrdmain <- main <- libc_start_main <- start

报错语句 "select ... decode(ma.iffreesample, 'Y', '是', 'N', '否') as ifFreeSample," 中有中文逗号,可能这个中文逗号实际上是有一些不可见字符,而不可见字符无法复制,所以在您那边执行没报错而在sqlt采集重现时报错是ora-0091

<<<一般中文逗号有可能会导致ORA-07445的报错吗?
如前所述,从call stack上看这是个解析时的报错(kkscsCompareBinds、kkscsVerifyBindEquivalence、kkocsPredSel、kkehslCursorMatch这些都是跟绑定变量和谓词选择性相关的),而这个中文逗号前后的不可见字符会影响到sql解析,所以是有可能导致这个7445报错的。

SR回复:

这个ORA-07445的确没有已知的bug相匹配,但是开bug的话也要经过严格审计,首先需要排除这个中文逗号的常规语法问题才行。只能先建议您修正这个sql的语法问题再观察是否还有ORA-07445[kkeutlCompHistActVals()+235]报错。

修正中文逗号为英文,并检查前后是否有不可见字符。修改后的代码已发版,目前暂时没有遇到报错。

14:36:09

ORA-00600: internal error code, arguments: [pfri.c: pfri8: plio mismatch ], [], [], [], [], [], [], [], [], [], [], []
ORA-04061: existing state of package body "XXX" has been invalidated
ORA-04065: not executed, altered or dropped package body "XXX" 

15:00

ORA-07445: 出现异常错误: 核心转储 [kgmexec()+400] [SIGSEGV] [ADDR:0x30] [PC:0x93C9750] [Address not mapped to object] []

Error: ORA-600[pfri.c: pfri8: plio mismatch]/ORA-4061/ORA-4065
Main Stack:
pfri8_inst_spec <- pfri1_inst_spec <- pfrrun <- plsql_run <- pricar <- pricbr <- prient2
<- prient <- kkxrpc <- kporpc <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o
<- opimai_real <- ssthrdmain <- main

Error: ORA-07445 [kgmexec()+400] [SIGSEGV] [ADDR:0x30] [PC:0x93C9750] [Address not mapped to object]
Main Stack:
kgmexec <- evapls <- evaopnc <- rpiswu2 <- evaopn2 <- qeruaRowProcedure <- qerjoFetch
<- rwsfcd <- qeruaFetch <- qersoProcessULS <- qersoFetch <- qervwFetch <- qercoFetch
<- opifch2 <- opifch <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o
<- opimai_real <- ssthrdmain <- main

SR回复:

They do not appear at the same time, I do not see them being related. It could be related to the invalid objects, but it's not the same issue.
Please compile the invalid objects, especially the ones under APPS schema and check if the ORA-07445 is still seen. If the error is still seen then it will have to be investigated separately.

the timeline is: package APPS.OE_ORDER_UTIL is invalid -> ORA-00600[pfri.c: pfri8: plio mismatch] -> recompilation of the invaild object -> library cache pin wait event

无相关文档,SR回复这两个bug应无关联,应与invalid objects相关

Seems that package "XXX" is invalid. Please recompile the package and try again.

alter package "XXX" compile;
alter package "XXX" compile body;

ORA-28040: 没有匹配的验证协议 9i clients are not supported with Oracle Database 12

文档 ID 2111118.1

java.sql.SQLException: ORA-28040: 没有匹配的验证协议_ddd306的专栏-CSDN博客

以下选择其一:

1. 升级jdbc至11g或12c(官方文档推荐)

2. 在sqlnet.ora文件的最后添加SQLNET.ALLOWED_LOGON_VERSION=8

ORA-08102:index key not found,obj#57848,file 6, block 6324(2)

ORA-08102这种错误说明索引或表出现了数据不一致的,索引上记录的键值和表里的数据不一致,引起访问失败

遇到的情况是在online创建索引过程中恰好碰上了CLEANUP_ONLINE_IND_BUILD job 运行。可能是online建索引的时候,会创建一些临时extent,但是CLEANUP_ONLINE_IND_BUILD会清理掉。导致create index online虽然显示是创建成功的,但是实际索引是有问题的。
后续查询的时候,去找这些被job清理的extent去了。所以就找不到索引的对象。

查看官方文档这个报错还可能和bug 21532755 有关

文档 ID 8102.1

ORA-08102 错误解决_数据库技术_Linux公社-Linux系统门户网站

文档 ID 21532755.8

Online Index Rebuild Problem in 12C/18C/19C | Neil Chandler's DB Blog

法一

SELECT owner, object_name, object_type
FROM Dba_Objects WHERE object_id IN (57848);

可以查出object_name的名字,然后重建索引

alter index PK_TB_WARE rebuild online; (记得要用rebuild online ,因为他会重新读表来创建索引,而rebuild可能会读取原先的索引段而不会去读表)

注意,这个时候千万不要人为终止,否则会遇到ora-08104

如果还不能解决,就删除重新创建

drop index PK_TB_WARE;

create index PK_TB_WARE on tb_ware(id);

法二

analyze table t validate structure cascade

检查表里的行数据的完全性,并检查表或者索引的结构,并把分析过的结果写入INDEX_STATS 数据字典中

控制文件备份遇到报错

ORA-00230: operation disallowed: snapshot control file enqueue unavailable

控制文件备份需要持有查看持有CF enqueue,但该锁目前被其他会话持有(一般会是其他rman进程或者CKPT进程) http://ju.outofmemory.cn/entry/179736

查看持有CF enqueue会话

SELECT s.SID, USERNAME AS "User", PROGRAM, MODULE, ACTION, LOGON_TIME "Logon" FROM V$SESSION s, V$ENQUEUE_LOCK l WHERE l.SID = s.SID AND l.TYPE = 'CF'  -- AND l.ID1 = 0 AND l.ID2 = 2;

根据查到的进程信息判断是kill掉还是等待当前持锁会话运行完

ORA-00600 [KTSLU_PUA_REMCHK-1]

导致主从同步中断

The assert is raised when trying to apply Redo for Secure LOB Segment. Redo was generated with changes introduced by fix for Bug:22905136. This fix was included in 12.1.0.2.170418DBBP. The error is generated because redo generated is not compatible with environments running on a release lower than  12.1.0.2.170418 DBBP or without patch:22905136 installed.

遇到的情况是主库打了12.1.0.2.170418 DBBP而从库没打,两边小版本不一致

ORA-00600:[KTSLU_PUA_REMCHK-1] Could be generated after Applying April 2017 Database Bundle Patch (12.1.0.2.170418 DBBP) (文档 ID 2267842.1)

两边小版本必须一致,要么没打补丁的库打补丁,要么打了补丁的库回滚;一般第一种比较推荐

ORA-07445[pesld10_Undo_XREF_Instance()+23]

This issue is caused by a product defect.

It was investigated in:

     Bug 13554646 - ORA-7445 [PESLD10_UNDO_XREF_INSTANCE()+60]

which was ultimately closed as a duplicate of unpublished Bug 13429702.

Error in the Alert Log: ORA-7445[pesld10_Undo_XREF_Instance()+4] (文档 ID 1456810.1)

法一:升级到已修复版本

法二:打补丁Patch 13429702 

ORA-00600: internal error code, arguments: [qcscbndv1], [65535] sql中绑定变量数超过oracle上限65535,在11.2前该报错为ORA-7445[opiaba],遇到的报错sql绑定变量超过12万个 ORA-600[qcscbndv1], [65535, ORA-600[Kghssgfr2], ORA-600[17112] Instance Failure (文档 ID 1311230.1) 请业务方改sql
Error connecting to database ERP: ORA-10 EBS专有报错,APPLSYSPUB用户状态为EXPIRED(GRACE)或EXPIRED时会出现

Configuring Authentication

【DB笔试面试543】Oracle用户的状态有几种?分别表示什么含义? - 云+社区 - 腾讯云

Concurrent Processing R12 : Unable To View The Concurrent Request Output In A Browser (Doc ID 1121304.1)

修改用户密码,最简单是设回原密码
ORA-00600[kkqctViewOnLeftJPPD: No View QB found!] 与 Optimizer Join Push Predicate (JPPD) 有关,在11.2.0.3版本遇到 Bug 12820813 - ORA-600 [kkqctViewOnLeftJPPD: No View QB found!] during JPPD (Doc ID 12820813.8)

法一:打对应补丁

法二(workaround):alter session set "_optimizer_push_pred_cost_based"=FALSE;

二、 SQL Server 报错记录

报错号 原因 参考 建议
eventlog:SSPI handshake failed with error code 0x8009030c, state 14 while establishing a connection with integrated security; the connection has been closed. Reason: AcceptSecurityContext failed. The Windows error code indicates the cause of failure. The logon attempt failed   [CLIENT: 192.168.1.3]

192.168.1.3中有账号连到数据库时登录验证失败。

常见于192.168.1.3中曾用A用户(遇到的情况是域账号)连接到数据库,后来A用户密码被修改,但192.168.1.3中仍在使用旧密码尝试连接,导致登录验证报错

登录192.168.1.3服务器断开以前使用A用户的连接

或者

登录数据库服务器kill掉A用户的连接

eventlog:BackupIoRequest::ReportIoError: write failure on backup device 'VNBU0-10424-14500-1538441467'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.). NBU备份服务器内存不足导致备份失败 一般下次备份时会成功,如果不成功可以手动发起

Error 8623

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

sql语句过于复杂,例如in中有太多值或者sql嵌套太多层

Explicitly including an extremely large number of values (many thousands of values separated by commas) within the parentheses, in an IN clause can consume resources and return errors 8623 or 8632.

IN (Transact-SQL) - SQL Server | Microsoft Docs

可以利用扩展事件找到对应语句

SQL Server捕获发生The query processor ran out of internal resources and could not produce a query plan...错误的SQL语句 - 潇湘隐者 - 博客园

To work around this problem, store the items in the IN list in a table, and use a SELECT subquery within an IN clause.

三、 PostgreSQL 报错记录

报错信息 原因 参考 建议

psql连接数据库遇到报错

-bash-4.2$ psql
psql: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

可能原因:

1. pg实例未启动

2. 安装了多个pg实例,参数文件中的 unix_socket_directories 指向 /tmp 而 psql 的 libpq 指向/var/run/postgresql/

https://dba.stackexchange.com/questions/182189/how-do-i-access-postgres-when-i-get-an-error-about-var-run-postgresql-s-pgsql

1. 启动pg实例

2. 使用 psql -h /tmp 连接,-h可以在连接时指定 unix_socket_directories

psql连接报错

psql: symbol lookup error: psql: undefined symbol: PQsetErrorContextVisibility

LD_LIBRARY_PATH环境变量配置错误,psql无法找到其libpq.so文件 postgresql - psql: symbol lookup error: psql: undefined symbol: PQsetErrorContextVisibility - Stack Overflow

find / -name libpq.so

将pg的lib库路径添加至LD_LIBRARY_PATH环境变量即可

你可能感兴趣的:(Oracle,报错&告警,SQLServer,oracle,postgresql,sqlserver,报错,告警)