SCN的可能最大值与耗尽问题
在2012年第一季度的CPU补丁中,包含了一个关于SCN修正的重要变更,这个补丁提示,在异常情况下,Oracle的SCN可能出现异常增长,使得数据库的一切事务停止,由于SCN不能后退,所以数据库必须重建,才能够重用。
这个BUG的BUG号是:
13489660 - DB-10.2.0.5-MOLECULE-020-CPUJAN2012
该BUG修正了SCN的问题。
这个BUG的影响在于,Oracle的SCN可能会被异常的增进,而至于极限,导致数据库无法正常工作,在这种情况下,只能重建数据库。但是这个风险的发生概率低,因为Oracle会在数据库内部控制SCN的合理增长,每秒SCN最多增长16348,这会将SCN控制在一个合理的增长范畴内。
Oracle使用6 Bytes记录SCN,也就是48位,其最大值是:
SQL> col scn for 99999999999999999
SQL> select power(2,48) scn from dual;
SCN
------------------
281 4749 7671 0656
---SCN HeadRoom 可以使用天数
Oracle在内部控制每秒增减的SCN不超过 16K,按照这样计算,这个数值可以使用大约544年:
SQL> select power(2,48) / 16 / 1024 / 3600 / 24 / 365 from dual;
POWER(2,48)/16/1024/3600/24/365
-------------------------------
544.770078
然而在出现异常时,尤其是当使用DB Link跨数据库查询时,SCN会被同步,
数据库之间可以通过dblink来进行数据访问,当通过dblink进行业务提交的时候,由于数据库之间存在不同的SCN,因此,为了让事务一致,Oracle将会以两者之间较大的SCN来进行同步,更新dblink两端的数据库SCN。但是,如果源数据库出现SCN生成率过高的问题,随着业务的不断运行,SCN的异常就会通过dblink传染到其他相关的数据库,而dblink使用的频率越大,这种传染的速度也就越快。如果企业内部存在网状的dblink结构,那么这将很容易将SCN的问题扩大到全网,极端情况下会引起大范围的宕机。
现在测试DB link与检查点(checkpoint)和SCN
测试的过程如下:
1)、获取remote数据库系统scn
[oracle@ora11] /home/oracle> sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sat Oct 24 20:59:02 2015
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select dbms_flashback.GET_SYSTEM_CHANGE_NUMBER scn from dual;
SCN
----------
2264312 ---开始查询值
2.通过db link进行SCN查询
---获取数据库名称
set serveroutput on
set feedback off
DECLARE
r_gname VARCHAR2 (40);
l_gname VARCHAR2 (40);
BEGIN
EXECUTE IMMEDIATE 'select GLOBAL_NAME from global_name@xulq_link'
INTO r_gname;
DBMS_OUTPUT.put_line ('gname of remote:' || r_gname);
SELECT GLOBAL_NAME INTO l_gname FROM global_name;
DBMS_OUTPUT.put_line ('gname of locald:' || l_gname);
END;
/
gname of remote:XULQ
gname of locald:SDXJ
--获取两个数据库的scn
declare
r_scn number;
l_scn number;
begin
execute immediate
'select dbms_flashback.GET_SYSTEM_CHANGE_NUMBER@xulq_link from dual' into r_scn;
dbms_output.put_line('scn of remote:'||r_scn);
select dbms_flashback.GET_SYSTEM_CHANGE_NUMBER into l_scn from dual;
dbms_output.put_line('scn of locald:'||l_scn);
end;
/
scn of remote:82921684
scn of locald: 82921684
我们可以看到,通过DB Link查询后,两个数据库的SCN被同步。
手工执行checkpoint,此时可以发现数据库的checkpoint scn被增进:
SQL> select dbms_flashback.GET_SYSTEM_CHANGE_NUMBER scn from dual;
SCN
----------
82921851 ---DBLINK访问后的,激增值。
SQL> select file#,CHECKPOINT_CHANGE# scn from v$datafile;
FILE# SCN
---------- ----------
1 82921825
2 82921825
3 82921825
4 82921825
5 82921825
6 82921825
6 rows selected.
产生原因: 这种机制其实是为了满足分布式事务(Distributed Transaction)的需要,只不过这里通过db link被触发。
当前SCN最大值
一个数据库当前最大的可能SCN被称为"最大合理SCN",该值可以通过如下方式计算:
col scn for 999,999,999,999,999,999
select
(
(
(
(
(
(
to_char(sysdate,'YYYY')-1988
)*12+
to_char(sysdate,'mm')-1
)*31+to_char(sysdate,'dd')-1
)*24+to_char(sysdate,'hh24')
)*60+to_char(sysdate,'mi')
)*60+to_char(sysdate,'ss')
) * to_number('ffff','XXXXXXXX')/4 scn
from dual
/
这个算法即SCN算法,以1988年1月1日 00点00时00分开始,每秒计算1个点数,最大SCN为16K。
在CPU补丁中,Oracle提供了一个脚本 scnhealthcheck.sql 用于检查数据库当前SCN的剩余情况。
该脚本的算法和以上描述相同,最终将最大合理SCN 减去当前数据库SCN,计算得出一个指标:HeadRoom。也就是SCN尚余的顶部空间,这个顶部空间最后折合成天数:
以下是这个脚本的内容:
------script begin ----
Rem
Rem $Header: rdbms/admin/scnhealthcheck.sql st_server_tbhukya_bug-13498243/8 2012/01/17 03:37:18 tbhukya Exp $
Rem
Rem scnhealthcheck.sql
Rem
Rem Copyright (c) 2012, Oracle and/or its affiliates. All rights reserved.
Rem
Rem NAME
Rem scnhealthcheck.sql - Scn Health check
Rem
Rem DESCRIPTION
Rem Checks scn health of a DB
Rem
Rem NOTES
Rem .
Rem
Rem MODIFIED (MM/DD/YY)
Rem tbhukya 01/11/12 - Created
Rem
Rem
define LOWTHRESHOLD=10
define MIDTHRESHOLD=62
define VERBOSE=FALSE
set veri off;
set feedback off;
set serverout on
DECLARE
verbose boolean:=&&VERBOSE;
BEGIN
For C in (
select
version,
date_time,
dbms_flashback.get_system_change_number current_scn,
indicator
from
(
select
version,
to_char(SYSDATE,'YYYY/MM/DD HH24:MI:SS') DATE_TIME,
((((
((to_number(to_char(sysdate,'YYYY'))-1988)*12*31*24*60*60) +
((to_number(to_char(sysdate,'MM'))-1)*31*24*60*60) +
(((to_number(to_char(sysdate,'DD'))-1))*24*60*60) +
(to_number(to_char(sysdate,'HH24'))*60*60) +
(to_number(to_char(sysdate,'MI'))*60) +
(to_number(to_char(sysdate,'SS')))
) * (16*1024)) - dbms_flashback.get_system_change_number)
/ (16*1024*60*60*24)
) indicator
from v$instance
)
) LOOP
dbms_output.put_line( '-----------------------------------------------------'
|| '---------' );
dbms_output.put_line( 'ScnHealthCheck' );
dbms_output.put_line( '-----------------------------------------------------'
|| '---------' );
dbms_output.put_line( 'Current Date: '||C.date_time );
dbms_output.put_line( 'Current SCN: '||C.current_scn );
if (verbose) then
dbms_output.put_line( 'SCN Headroom: '||round(C.indicator,2) );
end if;
dbms_output.put_line( 'Version: '||C.version );
dbms_output.put_line( '-----------------------------------------------------'
|| '---------' );
IF C.version > '10.2.0.5.0' and
C.version NOT LIKE '9.2%' THEN
IF C.indicator>&MIDTHRESHOLD THEN
dbms_output.put_line('Result: A - SCN Headroom is good');
dbms_output.put_line('Apply the latest recommended patches');
dbms_output.put_line('based on your maintenance schedule');
IF (C.version < '11.2.0.2') THEN
dbms_output.put_line('AND set _external_scn_rejection_threshold_hours='
|| '24 after apply.');
END IF;
ELSIF C.indicator<=&LOWTHRESHOLD THEN
dbms_output.put_line('Result: C - SCN Headroom is low');
dbms_output.put_line('If you have not already done so apply' );
dbms_output.put_line('the latest recommended patches right now' );
IF (C.version < '11.2.0.2') THEN
dbms_output.put_line('set _external_scn_rejection_threshold_hours=24 '
|| 'after apply');
END IF;
dbms_output.put_line('AND contact Oracle support immediately.' );
ELSE
dbms_output.put_line('Result: B - SCN Headroom is low');
dbms_output.put_line('If you have not already done so apply' );
dbms_output.put_line('the latest recommended patches right now');
IF (C.version < '11.2.0.2') THEN
dbms_output.put_line('AND set _external_scn_rejection_threshold_hours='
||'24 after apply.');
END IF;
END IF;
ELSE
IF C.indicator<=&MIDTHRESHOLD THEN
dbms_output.put_line('Result: C - SCN Headroom is low');
dbms_output.put_line('If you have not already done so apply' );
dbms_output.put_line('the latest recommended patches right now' );
IF (C.version >= '10.1.0.5.0' and
C.version <= '10.2.0.5.0' and
C.version NOT LIKE '9.2%') THEN
dbms_output.put_line(', set _external_scn_rejection_threshold_hours=24'
|| ' after apply');
END IF;
dbms_output.put_line('AND contact Oracle support immediately.' );
ELSE
dbms_output.put_line('Result: A - SCN Headroom is good');
dbms_output.put_line('Apply the latest recommended patches');
dbms_output.put_line('based on your maintenance schedule ');
IF (C.version >= '10.1.0.5.0' and
C.version <= '10.2.0.5.0' and
C.version NOT LIKE '9.2%') THEN
dbms_output.put_line('AND set _external_scn_rejection_threshold_hours=24'
|| ' after apply.');
END IF;
END IF;
END IF;
dbms_output.put_line(
'For further information review MOS document id 1393363.1');
dbms_output.put_line( '-----------------------------------------------------'
|| '---------' );
END LOOP;
end;
/
--scrippt end -----
--以上,脚本可以直接使用
一般应用补丁之后,一个新的隐含参数 _external_scn_rejection_threshold_hours 引入,通常设置该参数为 24 小时:
_external_scn_rejection_threshold_hours=24
这个设置降低了SCN Headroom的顶部空间,以前缺省的设置容量至少为31天,降低为 24 小时,可以增大SCN允许增长的合理空间。但是如果不加控制,SCN仍然可能会超过最大的合理范围,导致数据库问题。
这个问题的影响会极其严重,我们建议用户检验当前数据库的SCN使用情况,
在SCN告警阈值达到时,数据库中可能出现以下错误信息:
Advanced SCN by 8381 minutes worth to Ox0bad.4ab15e1,by distributed transaction remote logon,remote DB:ORCL.
Warning - High Database SCN: Current SCN value is 0x0b7b.0008e40b, threshold SCN value is 0x0b75.055dc000
If you have not previously reported this warning on this database, please notify Oracle Support so that additional diagnosis can be performed.
Warning: The SCN headroom for this database is only NN days!
Warning: The SCN headroom for this database is only N hours!
Rejected the attempt to advance SCN over limit by 984 hours worth to 0x0c00.0000ff66, by distributed transaction remote logon, remote DB: DB.ORCL.ORACLE.COM.
Client info : DB logon user SYS, machine sun, program sqlplus@orcl (TNS V1-V3), and OS user oracle
Rejected the attempt to advance SCN over limit by 9875 hours worth to 0x0c00.000003e6, by distributed transaction logon, remote DB: DB.ORCL.ORACLE.COM.
MOS参考文档:
NOTE:1376995.1 - Information on the System Change Number (SCN) and how it is used in the Oracle Database
NOTE:1393363.1 - Installing, Executing and Interpreting output from the "SCNhealthcheck.sql" script
NOTE:1388639.1 - Evidence to collect when reporting "high SCN rate" issues to Oracle Support
NOTE:1393360.1 - ORA-19706 and Related Alert Log Messages