曾几何时对SMON功能的了解程度可以作为评判一位DBA理论知识的重要因素,至今仍有很多公司在DBA面试中会问到SMON有哪些功能这样的问题。首先这是一道开放式的题目,并不会奢求面试者能够打全(答全几乎是不可能的,即便是在你阅读本篇文章之后),答出多少可以作为知识广度的评判依据(如果面试人特意为这题准备过,那么也很好,说明他已经能系统地考虑问题了),接着还可以就具体的某一个功能说开去,来了解面试者的知识深度,当然这扯远了。
我们所熟知的SMON是个兢兢业业的家伙,它负责完成一些列系统级别的任务。与PMON(Process Monitor)后台进程不同的是,SMON负责完成更多和整体系统相关的工作,这导致它会去做一些不知名的”累活”,当系统频繁产生这些”垃圾任务”,则SMON可能忙不过来。因此在10g中SMON变得有一点懒惰了,如果它在短期内接收到过多的工作通知(SMON: system monitor process posted),那么它可能选择消极怠工以便让自己不要过于繁忙(SMON: Posted too frequently, trans recovery disabled),之后会详细介绍。
SMON的主要作用包括:
1.清理临时段(SMON cleanup temporary segments)
触发场景
很多人错误地理解了这里所说的临时段temporary segments,认为temporary segments是指temporary tablespace临时表空间上的排序临时段(sort segment)。事实上这里的临时段主要指的是永久表空间(permanent tablespace)上的临时段,当然临时表空间上的temporary segments也是由SMON来清理(cleanup)的,但这种清理仅发生在数据库实例启动时(instance startup)。
永久表空间上同样存在临时段,譬如当我们在某个永久表空间上使用create table/index等DDL命令创建某个表/索引时,服务进程一开始会在指定的永久表空间上分配足够多的区间(Extents),这些区间在命令结束之前都是临时的(Temporary Extents),直到表/索引完全建成才将该temporary segment转换为permanent segment。
另外当使用drop命令删除某个段时,也会先将该段率先转换为temporary segment,之后再来清理该temporary segment(DROP object converts the segment to temporary and then cleans up the temporary segment)。常规情况下清理工作遵循谁创建temporary segment,谁负责清理的原则。换句话说,因服务进程rebuild index所产生的temporary segment在rebuild完成后应由服务进程自行负责清理。一旦服务进程在成功清理temporary segment之前就意外终止了,亦或者服务进程在工作过程中遇到了某些ORA-错误导致语句失败,那么SMON都会被要求(posted)负责完成temporary segment的清理工作。
对于永久表空间上的temporary segment,SMON会三分钟清理一次(前提是接到post),如果SMON过于繁忙那么可能temporary segment长期不被清理。temporary segment长期不被清理可能造成一个典型的问题是:在rebuild index online失败后,后续执行的rebuild index命令要求之前产生的temporary segment已被cleanup,如果cleanup没有完成那么就需要一直等下去。在10gR2中我们可以使用dbms_repair.online_index_clean来手动清理online index rebuild的遗留问题:
The dbms_repair.online_index_clean function has been created to cleanup online index rebuilds.
Use the dbms_repair.online_index_clean function to resolve the issue.
Please note if you are unable to run the dbms_repair.online_index_clean function it is due to the fact
that you have not installed the patch for Bug 3805539 or are not running on a release that includes this fix.
The fix for this bug is a new function in the dbms_repair package called dbms_repair.online_index_clean,
which has been created to cleanup online index [[sub]partition] [re]builds.
New functionality is not allowed in patchsets;
therefore, this is not available in a patchset but is available in 10gR2.
Check your patch list to verify the database is patched for Bug 3805539
using the following command and patch for the bug if it is not listed:
opatch lsinventory -detail
Cleanup after a failed online index [re]build can be slow to occurpreventing subsequent such operations
until the cleanup has occured.
接着我们通过实践来看一下smon是如何清理永久表空间上的temporary segment的:
设置10500事件以跟踪smon进程,这个诊断事件后面会介绍 SQL> alter system set events '10500 trace name context forever,level 10'; System altered. 在第一个会话中执行create table命令,这将产生一定量的Temorary Extents SQL> create table smon as select * from ymon; 在另一个会话中执行对DBA_EXTENTS视图的查询,可以发现产生了多少临时区间 SQL> SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY'; COUNT(*) ---------- 117 终止以上create table的session,等待一段时间后观察smon后台进程的trc可以发现以下信息: *** 2011-06-07 21:18:39.817 SMON: system monitor process posted msgflag:0x0200 (-/-/-/-/TMPSDROP/-/-) *** 2011-06-07 21:18:39.818 SMON: Posted, but not for trans recovery, so skip it. *** 2011-06-07 21:18:39.818 SMON: clean up temp segments in slave SQL> SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY'; COUNT(*) ---------- 0 可以看到smon通过slave进程完成了对temporary segment的清理
与永久表空间上的临时段不同,出于性能的考虑临时表空间上的Extents并不在操作(operations)完成后立即被释放和归还。相反,这些Temporary Extents会被标记为可用,以便用于下一次的排序操作。SMON仍会清理这些Temporary segments,但这种清理仅发生在实例启动时(instance startup):
For performance issues, extents in TEMPORARY tablespaces are not released or deallocated(解除配置) once the operation is complete.Instead, the extent is simply marked as available for the next sort operation. SMON cleans up the segments at startup. A sort segment is created by the first statement that used a TEMPORARY tablespacefor sorting, after startup. A sort segment created in a TEMPOARY tablespace is only released at shutdown. The large number of EXTENTS is caused when the STORAGE clause has been incorrectly calculated.
现象
可以通过以下查询了解数据库中Temporary Extent的总数,在一定时间内比较其总数,若有所减少那么说明SMON正在清理Temporary segment
SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';
也可以通过v$sysstat视图中的”SMON posted for dropping temp segment”事件统计信息来了解SMON收到清理要求的情况:
SQL> select name,value from v$sysstat where name like '%SMON%'; NAME VALUE ---------------------------------------------------------------- ---------- total number of times SMON posted 8 SMON posted for undo segment recovery 0 SMON posted for txn recovery for other instances 0 SMON posted for instance recovery 0 SMON posted for undo segment shrink 0 SMON posted for dropping temp segment 1
另外在清理过程中SMON会长期持有Space Transacton(ST)队列锁,其他会话可能因为得不到ST锁而等待超时出现ORA-01575错误:
01575, 00000, "timeout waiting for space management resource" // *Cause: failed to acquire necessary resource to do space management. // *Action: Retry the operation.
如何禁止SMON清理临时段
可以通过设置诊断事件event=’10061 trace name context forever, level 10′禁用SMON清理临时段(disable SMON from cleaning temp segments)。
alter system set events '10061 trace name context forever, level 10';
相关诊断事件
除去10061事件外还可以用10500事件来跟踪smon的post信息,具体的事件设置方法见<EVENT: 10500 “turn on traces for SMON>