阅读更多
由于数据库上线过程中出现大量的RESMGR:cpu quantum等待事件,出现性能问题,关闭了resource manager功能,关闭过程如下:
ALTER SYSTEM SET “_resource_manager_always_on”=FALSE SCOPE=SPFILE SID='*';
execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');
#4 重启数据库使生效
重启后,resource manager 关闭,数据库不在出现RESMGR:cpu quantum等待事件。
但是数据库自动任务计划调度开始报错,后台报错如下:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_186"
ORA-29373: resource manager is not on
视图检查报错:
SQL> select client_name,window_name,job_name,job_status,job_start_time from dba_autotask_job_history ;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME
-------------------- -------------------- ------------------------------ ------------------------------ ----------------------------------------
auto optimizer stats WEDNESDAY_WINDOW ORA$AT_OS_OPT_SY_102 FAILED 09-APR-14 10.00.01.582006 PM PRC
collection
auto optimizer stats FRIDAY_WINDOW ORA$AT_OS_OPT_SY_105 FAILED 11-APR-14 10.00.06.999466 PM PRC
collection
auto optimizer stats TUESDAY_WINDOW ORA$AT_OS_OPT_SY_99 FAILED 08-APR-14 10.00.07.885416 PM PRC
collection
三 前期故障排查
为了确认故障原因,我们对该报错进行打开29373 errorstack,收集了报错中的详细跟踪日志:
后台alert报错如下:
Sun May 04 06:00:04 2014
Dumping diagnostic data in directory=[cdmp_20140504060004], requested by (instance=1, osid=55167 (J001)), summary=[abnormal process termination].
Errors in file /oracle/database/diag/rdbms/nfdb/nfdb1/trace/nfdb1_j001_55167.trc:
ORA-29373: resource manager is not on
Errors in file /oracle/database/diag/rdbms/nfdb/nfdb1/trace/nfdb1_j001_55167.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_271"
ORA-29373: resource manager is not on
Dumping diagnostic data in directory=[cdmp_20140504060009], requested by (instance=1, osid=55167 (J001)), summary=[abnormal process termination].
详细的trace跟踪报告如下:
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-29373: resource manager is not on
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.
从trace文件中我们也只能确认是因为resource manager造成JOB无法调度
不过在检测中我们发现,管理resource manager的进程DBRM依旧在数据库中存在,
#ps –ef | grep dbrm
oracle 34920 1 0 Apr18 ? 00:00:59 ora_dbrm_nfdb1
这从一定程度上给予我们一定的怀疑方向,可能存在resource并没有完全关闭,而且从详细的trace文件中的从call stack trace里来看, 当前进程也没有没有关于resource manager的函数调用,而是一直在向另外一个进程post message。
所以,我们怀疑错误很有可能是由于DBRM没有正常关闭造成。
现对以上分析结果进行判断,获取可能依据
四 故障模拟
故障模拟一共分为如下几个部分:
步骤 操作 结果
Step1 和故障操作一样,设置“_resource_manager_always_on隐含参数,关闭resource manager windows 调用计划 后台报错
Step2 删除隐含参数,只设置resource manager windows 调用计划关闭 后台不报错
Step3 添加2个隐含参数,关闭resource manager windows 调用计划 后台不报错
详细测试过程如下:
测试我们采用将数据库时间设置到22点,让其自动调度JOB计划
4.1 模拟管理库故障环境
关闭数据库,调整时间,设置隐含参数,关闭windows 计划
关闭数据库:
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
调整时间:
[root@rhel6 ~]# date -s 2014/05/14
[root@rhel6 ~]# date -s 21:57:00
[root@rhel6 ~]# clock -w
[root@rhel6 ~]# date
Wed May 14 21:57:35 CST 2014
添加隐含参数,启动至open:
SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE;
SQL> startup force(测试环境,直接force启动,生产环境勿如此操作)
设置resource manager plan:
execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN','');
execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');
我们观察22点的alert信息,确实开始报错:
Wed May 14 22:00:03 2014
Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j003_4452.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_27"
ORA-29373: resource manager is not on
Wed May 14 22:00:03 2014
Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j004_4454.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SA_SPC_SY_28"
ORA-29373: resource manager is not on
Wed May 14 22:00:03 2014
Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j005_4456.trc:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_29"
ORA-29373: resource manager is not on
Wed May 14 22:00:04 2014
XDB installed.
XDB initialized.
检查DBRM进程已经存在:
[root@rhel6 ~]# ps -ef | grep dbrm
ora11g 4346 1 0 21:57 ? 00:00:00 ora_dbrm_ora11g
检查后台JOB执行记录视图:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME
----------------------------------- ------------------------------ ------------------------------ ------------------------------ ----------------------------------------
auto optimizer stats collection WEDNESDAY_WINDOW ORA$AT_OS_OPT_SY_27 FAILED 14-MAY-14 10.00.03.233570 PM PRC
模拟过程与生产管理库一致,结果也一致,DBRM进程已经存在。
4.2 模拟去除隐含参数
关闭数据库,调整时间,去除隐含参数,关闭windows 计划
关闭数据库:
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
调整时间:
[root@rhel6 ~]# date -s 2014/05/14
[root@rhel6 ~]# date -s 21:57:00
[root@rhel6 ~]# clock -w
[root@rhel6 ~]# date
Wed May 14 21:57:35 CST 2014
去除隐含参数,启动至open:
隐含参数去除采用create pfile from spfile;
删除spfile,编辑pfile文件,删除隐含参数,以pfile启动数据库
设置resource manager plan:(由于前面已经设置过,无需再设置)
我们观察22点的alert信息,发现没有报错
Tue May 13 22:00:03 2014
Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK"
End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK"
Tue May 13 22:00:05 2014
XDB installed.
XDB initialized.
检查 DBRM进程:
[root@rhel6 ~]# ps -ef | grep dbrm
ora11g 3844 1 0 21:55 ? 00:00:00 ora_dbrm_ora11g
说明:此时resource manager由于只是关闭了resource manager plan计划,没有真正关闭resource manager 因此该进程依旧存在。
检查后台JOB执行视图信息:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME
---------------------------------------- ------------------------------ ------------------------- -------------------- ----------------------------------------
auto optimizer stats collection TUESDAY_WINDOW ORA$AT_OS_OPT_SY_24 SUCCEEDED 13-MAY-14 10.00.02.102741 PM PRC
说明在隐含参数除掉的情况下,JOB可以正常执行,后台没有报错。
4.3 模拟隐含参数及resource manager plan均存在的情况
关闭数据库,调整时间,添加隐含参数,关闭windows 计划
关闭数据库:
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
调整时间:
[root@rhel6 ~]# date -s 2014/05/15
[root@rhel6 ~]# date -s 21:57:00
[root@rhel6 ~]# clock -w
[root@rhel6 ~]# date
Thu May 15 21:57:35 CST 2014
添加隐含参数,启动至open:
SQL> alter system set "_resource_manager_always_off"=true scope=spfile;
SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE;
SQL> startup force(测试环境,直接force启动,生产环境勿如此操作)
设置resource manager plan:(由于前面已经设置过,无需再设置)
我们观察22点的alert信息,发现没有报错
Thu May 15 22:00:03 2014
Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK"
Thu May 15 22:00:05 2014
XDB installed.
XDB initialized.
End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK"
检查 DBRM进程:
[root@rhel6 ~]# ps -ef | grep dbrm
root 4650 3647 0 21:58 pts/2 00:00:00 grep dbrm
说明:此时DBRM进程消失
检查后台JOB执行视图信息:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME
----------------------------------- ------------------------------ ------------------------------ ------------------------------ ----------------------------------------
auto optimizer stats collection THURSDAY_WINDOW ORA$AT_OS_OPT_SY_30 SUCCEEDED 15-MAY-14 10.00.02.115232 PM PRC
说明在隐含参数两个都添加的情况下,完全屏蔽resource manager的的情况下,JOB可以正常执行,后台没有报错。
五 总结说明
以上测试结果证明,后台报错JOB执行失败原因应该是DBRM进程依旧活动,而DBRM进程是管理RESOURCE manager 当去除"_resource_manager_always_off"=true及"_resource_manager_always_on"=FALSE
或者将两个参数全部添加,均可避免该错误,统计信息自动收集也可以自动执行
由于管理数据库相对重要,且上线时候出现过严重的RESMGR:cpu quantum等待事件,建议不要移除隐含参数,而是添加"_resource_manager_always_off"=true隐含参数,重启数据库
当然,可以先在另外两套库上进行确认,再在管理库上进行操作