oracle等待事件3——高速缓冲内enq锁

6、 enq:TC-contention

在手动执行检查点操作中,一部分需要获得TC锁(thread checkpointlock 或 tablespace checkpointlock )在获得TC锁过程中,若发生争用,则需要等待enq:TC-contention 事件。事实上获得TC锁的过程稍微复杂。

1) 服务器进程首先以X模式获得TC锁

2) 服务器进程将已获得的TC锁变更为SSX模式。同时,CKPT进程以SS模式获得该锁。CKPT获得锁后执行检查点操作。

3) 服务器欲重新以X模式获得TC锁,等待CKPT释放该锁,这时的等待事件就是enqueue:TC-contention

4) 检查点工作结束后,CKPT进程将会释放TC锁,服务器进程就会获得TC锁,因此得知检查点工作已经结束。


Enq:TC-contention 等待即便在没有多个进程引起争用的情况下,也可以发生,在这一点上与其他锁争用引起的等待现象不同。需要理解的是在等待现象中,存在只有争用才能引发的等待现象,但是也存在不发生争用,也会单纯为了等待工作结束而等待的情况。。

发生检查点的情况虽然很多,但不是所有的情况都会发生TC锁引起的等待,之后再进程由服务器进程引发的检查点同步过程中发生。

enq:TC-contention 等待发生的代表案例如下:并行查询   和 表空间热备

1:、并行查询(parallel query)

pq发生检查点的原因是slave session引起的direct path read。这就是所谓的“直接路径读”,它不经过高速缓冲区直接读取数据文件。oracle在如下三种情况下使用direct path read(也叫physical read direct) 方式的读取。

(1)内存区域上不能完成排序工作时,会在临时段的区域里存储和读取的过程中,发生direct path write ,direct path read 。这时的等待事件可以通过direct path read temp、direct path write temp 观察。

(2)slave session(从属会话)为了扫描直接读取的数据文件时,使用direct path read 。这时等待事件通过direct path read 事件观察。

(3)若判断是因为I/O系统的性能下降,导致不能将以足够快的速度读取,oracle为了临时方便会使用direct path read。


slave session执行direct path read 对象时数据文件,从数据文件上直接读取数据时,因为不经过SGA,所以可能发生当期SGA上的块和数据文件上的块之间版本不一致的现象,为了防止这些现象,oracle对数据文件执行direct path read 之前,应该执行检查点。coordinate session在驱动slave session之前,对于执行direct path read现象,请求段级别的检查点,检查点发生之前一直处于enq:TC-contention等待事件状态。coordinate session上可以发现enq:TC-eontention等待,slave session上则可以发现direct path read 等待。

2、表空间热备份(tablespace hot backup)

执行alter tablespace 。。begin backup后,将属于此表空间的所有高速缓冲区的脏数据记录到磁盘上,这个过程经历enq:TC-contention 等待。



7、enq:CI-contention 和 enq:RO-contention


“Cross Instance call Enqueue”是一种在一个或多个instance实例间调用后台进程行为时用到的队列锁,具体调用的后台进程行为包括检查点checkpoint、日志切换logfile switch、shutdown实例、载入数据文件头等等。需要注意的是这种Enqueue Lock并不仅仅在RAC中使用,即便是单节点也会用到。CI锁的数量取决于并行执行Cross Instance Call调用的进程的总数。

SQL> col ksqsttyp for a20
SQL> col ksqstrsn for a20
SQL> col ksqstexpl for a80
SQL> set linesize 200 pagesize 2000;
SQL> select ksqsttyp,ksqstrsn,ksqstexpl from x$ksqst where ksqsttyp='CI';

KSQSTTYP             KSQSTRSN             KSQSTEXPL
-------------------- -------------------- --------------------------------------------------------------------------------
CI                   contention           Coordinates cross-instance function invocations

SQL> show parameter cluster_database

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     FALSE
cluster_database_instances           integer     1


SQL> select * from v$enqueue_stat where eq_type='CI';

   INST_ID EQ TOTAL_REQ# TOTAL_WAIT#  SUCC_REQ# FAILED_REQ# CUM_WAIT_TIME
---------- -- ---------- ----------- ---------- ----------- -------------
         1 CI        595           0        595           0             0

当系统中出现有大量这种跨实例后台进程调用时,将出现CI队列锁的争用。

假设在一个RAC场景中,同时有大量的回话开始对不同的数据表执行TRUNCATE截断操作,TRUNCATE的一个前提是在所有实例上(因为对象表的dirty buffer可能分布在多个实例上)发生对象级别的检查点(object level checkpoint),检查点发生时CKPT进程会通知DBWR写出指定对象表相关的脏块,DBWR需要扫描Buffer Cache以找出脏块,而如果Buffer Cache很大那么扫描将花费大量的时间,而在此过程中前台进程将一直排他地持有着本地的CI队列锁,这就将造成CI锁的严重争用。

为了减少CI队列锁地争用,我们第一步所要做的是找出实际的Cross Instance call跨实例调用的类型。这里要另外提一下的是在10g以前不管是v$session_wait或statspack中都不会将enqueue锁等待事件的具体enqueue lock类型写明,一般需要我们从p1/p2/p3列中找出enqueue的具体身份,例如”WAIT #1: nam=’enqueue’ ela= 910796 p1=1128857606   p2=1   p3=4″,这里的p1为1128857606也就是16进制的43490006,高位的’4349′转换为ascii码也就是’CI’,而这里的p2/p3对应为V$lock中的ID1/ID2,ID1=1代表了”Reuse (checkpoint and invalidate) block range”,ID2=4代表了”Mounted excl, use to allocate mechanism”。

具体ID1/ID2代表的含义在不同版本中有所变化,可以参考下表:

Id1, Id2 Combination: Oracle 10gR1 27 TO 29 *Same as 9i R2 30 process waiters after row cache requeue 31 Active Change Directory extent relocation 32 block change tracking state change 33 kgl mulitversion obsolete 34 set previous resetlogs data 35 set recovery destination pointer 36 fast object reuse request 37 test ksbcic() 38 ASM diskgroup discovery wait 39 ASM diskgroup release 40 ASM push DB updates 41 ASM add ACD chunk 42 ASM map resize message 43 ASM map lock message 44 ASM map unlock message (phase 1) 45 ASM map unlock message (phase 2) 46 ASM generate add disk redo marker 47 ASM check of PST validity 48 ASM offline disk CIC 49 Logical Standby Sync Point SCN 50 update SQL Tuning Base existence bitvector 51 PQ induced Checkpointing 52 ASM F1X0 relocation 53 Scheduler autostart 54 KZS increment grant/revoke counter 55 ASM disk operation message 56 ASM I/O error emulation 57 DB Supp log cursor invalidation 58 Cache global range invalidation 59 Cache global object invalidation 60 ASM Pre-Existing Extent Lock wait 61 Perform a ksk action through DBWR 62 ASM diskgroup refresh wait Oracle 10gR2 30 to 62 *Same as 10gR1 63 KCBO object checkpoint 64 KCBO object pq checkpoint 65 global health check event 66 Oracle Label Security refresh 67 thread internal enable 68 cross-instance registration 69 KGL purge unused subheaps 70 clear pin instance flag 71 Rolling operations CIC Oracle 9iR2 Id1 Meaning ~~~ ~~~~~~ 25 TO 26 *Same as 9i R1 27 set Database Force Logging mode 28 invalidate cached file address translations 29 Cursor Unauthorize Mode 30 snapshot too old diagnosis 31 process waiters after row cache requeue Oracle 9iR1 Id1 Meaning ~~~ ~~~~~~~~ 18 TO 24 *Same as Oracle 8i 25 Update Dscn Tracking (ktcndt) 26 Purge dictionary Object number Cache Oracle 8i Id1 Meaning ~~~ ~~~~~~~ 0 TO 17 *Same as Oracle 8(please see "Oracle8*" for the Meaning) 18 Object reuse request 19 Rolling release checks 20 Propagate begin backup scn for a file 21 Refresh top plan (for db scheduler) 22 Clear checkpoint progress record 23 Drop temp file 24 Quiesce database Restricted Id2 Meaning ~~~ ~~~~~~~ 0x01 Used to pass in parameters 0x02 Used to invoke the function in backgroud process 0x03 Used to indicate the foreground has not returned 0x04 Mounted excl, use to allocate mechanism 0x05 Used to queue up interested clients Oracle 8* Id1 Meaning ~~~ ~~~~~~~ 0 Checkpoint block range 1 Reuse (checkpoint and invalidate) block range 2 LGWR Checkpointing and Hot Backup 3 DBWR syncronization of SGA with control file 4 Log file add/drop/rename notification 5 Write buffer for CR read 6 Test call 7 Invalidate KCK cache in all instances 8 Alter rollback segment optimal 9 Signal Query Servers/coordinator 10 Create Remote parallel query Server 11 Set Global Partitions 12 Stop Disk Writes 13 Drop Sort Segments 14 Release unused space from Sort Segments 15 Instance Recovery for Parallel operation Group 16 Validate parallel slave Lock Value 17 Check transaction state objects 18 Flush blocks in object 19 Rolling release checks 20 Propagate begin backup scn for a file 21 Clear checkpoint progress record Oracle 7 Id1 Meaning ~~~ ~~~~~~~ 0 Flush buffers for reuse as new class 1 LGWR checkpointing and Hot Backup 2 DBWR synchronization of SGA with control file 3 Log file add/drop/rename notification 4 Write buffer for CR read 5 Test Call 6 Invalidate KCK cache in all instances 7 Alter rollback segment optimal 8 Signal Query Servers/coordinator 9 Create Remote Parallel Query Server 10 Set Global Partitions 11 Stop Disk Writes 12 Drop Sort Segments 13 Release unused space from Sort Segments 14 Instance Recovery for Parallel operation Group 15 Validate parallel slave Lock Value 16 Check Transaction State Objects Id2 Meaning ~~~ ~~~~~~~ 1 Pass in Parameters 2 Invoke the call in background process 3 Foreground has not returned yet 4 Used to allocate the CI call 5 Used to queue up interested clients 


enq: RO - fast object reuse 等待事件

查了一下这个等待,出现这个等待比较高的情况一般都有异常:
1.truncate表或者分区表时
2.收集统计信息采用degree>1时
这个event表示在等待DBWR to clean cache.
出现异常的时候症状:the CKPT background process is the one holding the needed RO enqueue although it is actually doing nothing.

Bug:7385253.

这个wait event表示在等待DBWR to clean cache.

如果要优化这个问题,需要综合考虑,比如减少cache size,增加dbwr process或减少MTTR等等。

Is it still locked? 
For some reason the truncate is still waiting for CKPT process. When you truncate or drop a table, CKPT does a range flush of the db_cache_size, which seems to be completed according to your alert_log. 
That was an issue in 9i-10g. This looks like bug 4201369 which is supposed to be fixed in 10.1.0.5. I will suggest you open a tar on this! They will make you do a hanganalysis which should clarify the issue.


该等待事件多与bug 相关

 

2.1 Bug 1Bug 7385253

 

Bug 7385253 - Slow Truncate / DBWR useshigh CPU / CKPT blocks on RO enqueue [ID 7385253.8]

 

Affects:

Product (Component)

Oracle Server (Rdbms)

Range of versions believed to be affected

Versions >= 10 but BELOW 11.2

Versions confirmed as being affected

  • 11.1.0.7
  • 10.2.0.4
  • 10.2.0.3
  • 10.2.0.2

Platforms affected

Generic (all / most platforms affected)

Fixed:

This issue is fixed in

  • 11.2.0.1 (Base Release)
  • 11.1.0.7.3 (Patch Set Update)
  • 10.2.0.5 (Server Patch Set)
  • 10.2.0.4.1 (Patch Set Update)
  • 11.1.0.7 Patch 25 on Windows Platforms
  • 10.2.0.4 Patch 14 on Windows Platforms
  • 10.2.0.4 RAC Recommended Patch Bundle #3
  • 10.2.0.4 Generic Recommended Patch Bundle #3

 

 

该Bug的3个表现:

(1)    Hang(Involving Shared Resource)

(2)    PerformanceAffected (General)

(3)    Waits for "enq:RO - fast object reuse"

 

 

DBWR may use alot of CPU and seem to spin in or around kcbo_write_qdue to large number offree buffers on the object reuse queue or checkpoint queue.

 

In some casesthe CKPT holds the RO enqueue for very long blocking other operations  with waitevent "enq: RO - fast objectreuse".

 

Operations so farreported being affected are :

- Apply Processes in StandBy databases

- Gather stats

- Truncates

- drop/shrink/alter tablespace

 

Note: This fix was previously incorrectlylisted as not affecting 11g.

     The bug itself is present in 11g but it is unlikely to show anysignificant symptom due to other 11g changes meaning that free buffers are nolonger kept on the object queue.

 

对与该Bug 的解决方法:

setting _db_fast_obj_truncate=FALSE <--did not fix the issue
enabling asyn i/o <-- customer refused to implement to avoid corruptionsrisk
applying 7287289 <-- did not fix the issue

 

2.2 文档二

 

'enq: RO - fastobject reuse' contention when gathering schema/table statistics in parallel [ID762085.1]

 

 

Symptoms

(1)Database has been recently upgradedfrom 10.2.0.1 to 10.2.0.4.
(2)There is 'enq: RO - fastobject reuse' contention when gathering schema/table statistics in parallelusing DBMS_STATS package (with DEGREE>1).

 

其也是因为Bug 7385253导致这个问题。

 

解决方法:

1) Flushing the buffer cache.
OR
2) Setting "_db_fast_obj_truncate" =FALSE. This reverts back to the9i way of invalidating buffers in the buffer cache. 

Kindly note thatboth workarounds could have an impact on the database performance. Instead, itis recommended applying the corresponding patch.

--2种解决方法对db 性能都有很大影响,建议应用合适的patch

 

2.3 文档三

Bug8544896 - Waits for "enq: RO - fast object reuse" with high DBWR CPU[ID 8544896.8]

 

Affects:

Product (Component)

Oracle Server (Rdbms)

Range of versions believed to be affected

Versions >= 10.2.0.4 but BELOW 10.2.0.5

Versions confirmed as being affected

  • 10.2.0.4

Platforms affected

Generic (all / most platforms affected)


 It is believed to be a regression in default behaviour thus:
   Regression introduced in 10.2.0.4

Fixed:

This issue is fixed in

  • 10.2.0.4.3 (Patch Set Update)
  • 10.2.0.4 Patch 27 on Windows Platforms

 

 

This problem is introduced in 10.2.0.4.

 

Sessions can wait on "enq: RO - fastobject reuse" while DBWR consumes lots of CPU when performing truncatetype operations.

 

 

Workaround:

(1)Flush the buffer cache beforetruncating

 OR

(2) set _db_fast_obj_truncate = FALSE.

 

 

我这里出现这2个等待事件都与Truncate 操作有关。

你可能感兴趣的:(oracle等待事件3——高速缓冲内enq锁)