RESMGR:cpu quantum等待事件处理过程

阅读更多
由于数据库上线过程中出现大量的RESMGR:cpu quantum等待事件,出现性能问题,关闭了resource manager功能,关闭过程如下:
ALTER SYSTEM SET “_resource_manager_always_on”=FALSE SCOPE=SPFILE SID='*';

   execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN','');
   execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN','');
   execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN','');
   execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN','');
   execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN','');
   execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN','');
   execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');

#4 重启数据库使生效

重启后,resource manager 关闭,数据库不在出现RESMGR:cpu quantum等待事件。
但是数据库自动任务计划调度开始报错,后台报错如下:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_186"
ORA-29373: resource manager is not on

视图检查报错:
SQL> select client_name,window_name,job_name,job_status,job_start_time from dba_autotask_job_history ;

CLIENT_NAME          WINDOW_NAME          JOB_NAME                       JOB_STATUS                     JOB_START_TIME
-------------------- -------------------- ------------------------------ ------------------------------ ----------------------------------------
auto optimizer stats WEDNESDAY_WINDOW     ORA$AT_OS_OPT_SY_102           FAILED                         09-APR-14 10.00.01.582006 PM PRC
collection

auto optimizer stats FRIDAY_WINDOW        ORA$AT_OS_OPT_SY_105           FAILED                         11-APR-14 10.00.06.999466 PM PRC
collection

auto optimizer stats TUESDAY_WINDOW       ORA$AT_OS_OPT_SY_99            FAILED                         08-APR-14 10.00.07.885416 PM PRC
collection


三 前期故障排查
为了确认故障原因,我们对该报错进行打开29373 errorstack,收集了报错中的详细跟踪日志:
后台alert报错如下:
Sun May 04 06:00:04 2014
Dumping diagnostic data in directory=[cdmp_20140504060004], requested by (instance=1, osid=55167 (J001)), summary=[abnormal process termination].
Errors in file /oracle/database/diag/rdbms/nfdb/nfdb1/trace/nfdb1_j001_55167.trc:
ORA-29373: resource manager is not on
Errors in file /oracle/database/diag/rdbms/nfdb/nfdb1/trace/nfdb1_j001_55167.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_271"
ORA-29373: resource manager is not on
Dumping diagnostic data in directory=[cdmp_20140504060009], requested by (instance=1, osid=55167 (J001)), summary=[abnormal process termination].


详细的trace跟踪报告如下:
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-29373: resource manager is not on
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

从trace文件中我们也只能确认是因为resource manager造成JOB无法调度

不过在检测中我们发现,管理resource manager的进程DBRM依旧在数据库中存在,
#ps –ef | grep dbrm
oracle 34920 1 0 Apr18 ? 00:00:59  ora_dbrm_nfdb1


这从一定程度上给予我们一定的怀疑方向,可能存在resource并没有完全关闭,而且从详细的trace文件中的从call stack trace里来看, 当前进程也没有没有关于resource manager的函数调用,而是一直在向另外一个进程post message。
所以,我们怀疑错误很有可能是由于DBRM没有正常关闭造成。

现对以上分析结果进行判断,获取可能依据

四 故障模拟

故障模拟一共分为如下几个部分:
步骤 操作 结果
Step1 和故障操作一样,设置“_resource_manager_always_on隐含参数,关闭resource manager windows 调用计划 后台报错
Step2 删除隐含参数,只设置resource manager windows 调用计划关闭 后台不报错
Step3 添加2个隐含参数,关闭resource manager windows 调用计划 后台不报错

详细测试过程如下:
测试我们采用将数据库时间设置到22点,让其自动调度JOB计划
4.1 模拟管理库故障环境
关闭数据库,调整时间,设置隐含参数,关闭windows 计划
关闭数据库:
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

调整时间:
[root@rhel6 ~]# date -s 2014/05/14
[root@rhel6 ~]# date -s 21:57:00
[root@rhel6 ~]# clock -w
[root@rhel6 ~]# date
Wed May 14 21:57:35 CST 2014

添加隐含参数,启动至open:
SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE;
SQL> startup force(测试环境,直接force启动,生产环境勿如此操作)

设置resource manager plan:
execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');

我们观察22点的alert信息,确实开始报错:
Wed May 14 22:00:03 2014
Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j003_4452.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_27"
ORA-29373: resource manager is not on
Wed May 14 22:00:03 2014
Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j004_4454.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SA_SPC_SY_28"
ORA-29373: resource manager is not on
Wed May 14 22:00:03 2014
Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j005_4456.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_29"
ORA-29373: resource manager is not on
Wed May 14 22:00:04 2014
XDB installed.
XDB initialized.

检查DBRM进程已经存在:
[root@rhel6 ~]# ps -ef | grep dbrm
ora11g    4346     1  0 21:57 ?        00:00:00 ora_dbrm_ora11g


检查后台JOB执行记录视图:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;

CLIENT_NAME                         WINDOW_NAME                    JOB_NAME                       JOB_STATUS                     JOB_START_TIME
----------------------------------- ------------------------------ ------------------------------ ------------------------------ ----------------------------------------
auto optimizer stats collection     WEDNESDAY_WINDOW               ORA$AT_OS_OPT_SY_27            FAILED                         14-MAY-14 10.00.03.233570 PM PRC

模拟过程与生产管理库一致,结果也一致,DBRM进程已经存在。


4.2 模拟去除隐含参数
关闭数据库,调整时间,去除隐含参数,关闭windows 计划
关闭数据库:
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

调整时间:
[root@rhel6 ~]# date -s 2014/05/14
[root@rhel6 ~]# date -s 21:57:00
[root@rhel6 ~]# clock -w
[root@rhel6 ~]# date
Wed May 14 21:57:35 CST 2014

去除隐含参数,启动至open:
隐含参数去除采用create pfile from spfile;
删除spfile,编辑pfile文件,删除隐含参数,以pfile启动数据库

设置resource manager plan:(由于前面已经设置过,无需再设置)

我们观察22点的alert信息,发现没有报错
Tue May 13 22:00:03 2014
Begin automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
End automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
Tue May 13 22:00:05 2014
XDB installed.
XDB initialized.


检查 DBRM进程:
[root@rhel6 ~]# ps -ef | grep dbrm
ora11g    3844     1  0 21:55 ?        00:00:00 ora_dbrm_ora11g

说明:此时resource manager由于只是关闭了resource manager plan计划,没有真正关闭resource manager 因此该进程依旧存在。

检查后台JOB执行视图信息:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;

CLIENT_NAME                              WINDOW_NAME                    JOB_NAME                  JOB_STATUS           JOB_START_TIME
---------------------------------------- ------------------------------ ------------------------- -------------------- ----------------------------------------
auto optimizer stats collection          TUESDAY_WINDOW                 ORA$AT_OS_OPT_SY_24       SUCCEEDED            13-MAY-14 10.00.02.102741 PM PRC

说明在隐含参数除掉的情况下,JOB可以正常执行,后台没有报错。

4.3 模拟隐含参数及resource manager plan均存在的情况
关闭数据库,调整时间,添加隐含参数,关闭windows 计划
关闭数据库:
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
调整时间:
[root@rhel6 ~]# date -s 2014/05/15
[root@rhel6 ~]# date -s 21:57:00
[root@rhel6 ~]# clock -w
[root@rhel6 ~]# date
Thu May 15 21:57:35 CST 2014
添加隐含参数,启动至open:
SQL> alter system set "_resource_manager_always_off"=true scope=spfile;
SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE;
SQL> startup force(测试环境,直接force启动,生产环境勿如此操作)
设置resource manager plan:(由于前面已经设置过,无需再设置)

我们观察22点的alert信息,发现没有报错
Thu May 15 22:00:03 2014
Begin automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
Thu May 15 22:00:05 2014
XDB installed.
XDB initialized.
End automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"

检查 DBRM进程:
[root@rhel6 ~]# ps -ef | grep dbrm
root      4650  3647  0 21:58 pts/2    00:00:00 grep dbrm

说明:此时DBRM进程消失

检查后台JOB执行视图信息:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;

CLIENT_NAME                         WINDOW_NAME                    JOB_NAME                       JOB_STATUS                     JOB_START_TIME
----------------------------------- ------------------------------ ------------------------------ ------------------------------ ----------------------------------------
auto optimizer stats collection     THURSDAY_WINDOW                ORA$AT_OS_OPT_SY_30            SUCCEEDED                      15-MAY-14 10.00.02.115232 PM PRC

说明在隐含参数两个都添加的情况下,完全屏蔽resource manager的的情况下,JOB可以正常执行,后台没有报错。


五 总结说明
以上测试结果证明,后台报错JOB执行失败原因应该是DBRM进程依旧活动,而DBRM进程是管理RESOURCE manager 当去除"_resource_manager_always_off"=true及"_resource_manager_always_on"=FALSE
或者将两个参数全部添加,均可避免该错误,统计信息自动收集也可以自动执行
由于管理数据库相对重要,且上线时候出现过严重的RESMGR:cpu quantum等待事件,建议不要移除隐含参数,而是添加"_resource_manager_always_off"=true隐含参数,重启数据库

当然,可以先在另外两套库上进行确认,再在管理库上进行操作

你可能感兴趣的:(RESMGR:cpu quantum等待事件处理过程)