ORA-16191的分析

ORA-16191的分析 

DATAGUARD的数据传输方式有两种:LGWR和ARCH。其中前者传送的是联机日志的redo,后者传输的是归档。
FAL是9i推出的DATAGUARD的一大改进,它可以自动传送备库缺失的日志,一般情况下,它是不需要手工干预的。
主库的日志要传输到备库,两者的密码文件中设置的口令必须一致。然而,如果我们不小心设置了不一致的密码,则需要重新生成备库的密码文件,但重置密码文件后,FAL还并不能自动处理日志的GAP,需要DBA做一些额外的工作。

在这种情况下,用LGWR和ARCH两种方式的处理方式有所区别,下面分别讨论:
一、ARCH方式传送日志
--主库参数
SQL> show parameter dest_2

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_create_online_log_dest_2 string
log_archive_dest_2 string SERVICE=standby LGWR ASYNC VA
LID_FOR=(ONLINE_LOGFILES,PRIMA
RY_ROLE) DB_UNIQUE_NAME=standb
y
--在备库上查询
SQL> select * from v$standby_log;

no rows selected

我们知道,用LGWR传送日志,如果备库没有standby redo log,则oracle自动会变成用ARCH传送归档。

为了模拟主备库密码文件不一致,我们重新生成备库密码文件,并置一个与主库不一样的密码。
[oracle@standby dbs]$ mv orapwprimary orapwprimary.bak
[oracle@standby dbs]$ orapwd file=orapwprimary password=aaa entries=10

接着重启主库,查看主库的alert日志,可以发现下面的信息:

Tue Jul 10 22:10:38 2007
Errors in file /u01/oracle/admin/primary/bdump/primary_arc1_3338.trc:
ORA-16055: FAL request rejected
ARCH: FAL archive failed. Archiver continuing
Tue Jul 10 22:10:38 2007
ORACLE Instance primary - Archival Error. Archiver continuing.
Tue Jul 10 22:10:47 2007
FAL[server]: Fail to queue the whole FAL gap
GAP - thread 1 sequence 68-68
DBID 1463588919 branch 626106231
Tue Jul 10 22:15:50 2007
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE, 
and that the SYS password is same in the password files.
returning error ORA-16191
------------------------------------------------------------
Tue Jul 10 22:15:50 2007
Errors in file /u01/oracle/admin/primary/bdump/primary_arc1_3338.trc:
ORA-16191: Primary log shipping client not logged on standby
PING[ARC1]: Heartbeat failed to connect to standby 'standby'. Error is 16191.

从上面的信息不难看出,oracle的确换成了ARCH的方式传送日志,并且因为密码文件的原因不能传送到备库中。
如果查看备库的日志,也可以发现类似如下的信息:
FAL[client]: Failed to request gap sequence 
GAP - thread 1 sequence 68-68
DBID 1463588919 branch 626106231
FAL[client]: All defined FAL servers have been attempted.
-------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
-------------------------------------------------------------

接着,我们在备库还原密码文件,并重启备库:
[oracle@standby dbs]$ rm orapwprimary
[oracle@standby dbs]$ mv orapwprimary.bak orapwprimary
SQL> shutdown immediate
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 167772160 bytes
Fixed Size 1260696 bytes
Variable Size 134218600 bytes
Database Buffers 29360128 bytes
Redo Buffers 2932736 bytes
Database mounted.
SQL> alter database recover managed standby database disconnect from session;

Database altered.

然而,隔一段时间后观察主、备库的日志,仍然有上面的报错信息,说明这种情况下重启备库是没有效果的。

我们再重启主库试试:
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 167772160 bytes
Fixed Size 1260696 bytes
Variable Size 150995816 bytes
Database Buffers 12582912 bytes
Redo Buffers 2932736 bytes
Database mounted.
Database opened.

隔一小段时间可以分别在主、备库上看到如下信息:
--主库
Tue Jul 10 22:41:57 2007
******************************************************************
LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2
******************************************************************
--备库
Tue Jul 10 22:31:57 2007
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[5]: Assigned to RFS process 16058
RFS[5]: Identified database type as 'physical standby'
Primary database is in MAXIMUM PERFORMANCE mode
Primary database is in MAXIMUM PERFORMANCE mode
RFS[5]: No standby redo logfiles created
Tue Jul 10 22:32:26 2007
Media Recovery Log /u01/archivelog/1_70_626106231.dbf
Media Recovery Waiting for thread 1 sequence 71 (in transit)

以上信息充分说明,DATAGUARD日志传送已经恢复正常。
实际上,经过测试,我们在修改密码文件后只需要重启主库即可恢复日志正常传送。

二、LGWR方式传送日志
为了让oracle可以用LGWR方式传送日志,我们先在备库上建立几个standby redo log。
SQL> alter database add standby logfile group 4 '/u01/oracle/oradata/primary/standbyredo04.log' size 50m;

Database altered.

SQL> alter database add standby logfile group 5 '/u01/oracle/oradata/primary/standbyredo5.log' size 50m;

Database altered.

SQL> alter database add standby logfile group 6 '/u01/oracle/oradata/primary/standbyredo6.log' size 50m;

Database altered.

SQL> alter database add standby logfile group 7 '/u01/oracle/oradata/primary/standbyredo7.log' size 50m;

Database altered.

重新生成一个与主库不一致的密码文件:
[oracle@standby dbs]$ mv orapwprimary orapwprimary.bak
[oracle@standby dbs]$ orapwd file=orapwprimary password=aaa entries=10

重新生成一个密码文件后,必须重启备库才能模拟错误(不重启的话是不会影响日志传输的)
SQL> shutdown immediate
ORA-01109: database not open


Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 167772160 bytes
Fixed Size 1260696 bytes
Variable Size 142607208 bytes
Database Buffers 20971520 bytes
Redo Buffers 2932736 bytes
Database mounted.
SQL> alter database recover managed standby database disconnect from session;

Database altered.

此时主库alert日志报错如下:
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE, 
and that the SYS password is same in the password files.
returning error ORA-16191
------------------------------------------------------------
Tue Jul 10 23:06:30 2007
Errors in file /u01/oracle/admin/primary/bdump/primary_arc0_3457.trc:
ORA-16191: Primary log shipping client not logged on standby
PING[ARC0]: Heartbeat failed to connect to standby 'standby'. Error is 16191.

备库alert报错:
Tue Jul 10 22:53:06 2007
FAL[client]: Failed to request gap sequence 
GAP - thread 1 sequence 79-79
DBID 1463588919 branch 626106231
FAL[client]: All defined FAL servers have been attempted.
-------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
-------------------------------------------------------------
上面的信息与ARCH方式传送日志的测试报错类似,就不解析了。
我们重启备库,并备库观察日志:
Tue Jul 10 23:13:28 2007
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[2]: Assigned to RFS process 16235
RFS[2]: Identified database type as 'physical standby'
Tue Jul 10 23:25:21 2007
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[3]: Assigned to RFS process 16237
RFS[3]: Identified database type as 'physical standby'
RFS[3]: Successfully opened standby log 4: '/u01/oracle/oradata/primary/standbyredo04.log'
Tue Jul 10 23:25:23 2007
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[4]: Assigned to RFS process 16239
RFS[4]: Identified database type as 'physical standby'
Primary database is in MAXIMUM PERFORMANCE mode
Primary database is in MAXIMUM PERFORMANCE mode
RFS[4]: Successfully opened standby log 4: '/u01/oracle/oradata/primary/standbyredo04.log'

从备库的alert信息看,日志传送已经恢复正常。

但此时主库还会继续报如下的错误:
Tue Jul 10 23:18:31 2007
FAL[server]: Fail to queue the whole FAL gap
GAP - thread 1 sequence 79-79
DBID 1463588919 branch 626106231
FAL[server]: Fail to queue the whole FAL gap
GAP - thread 1 sequence 80-80
DBID 1463588919 branch 626106231
LNS1 started with pid=18, OS id=3499

这是因为虽然备库重建了密码文件,并且存在standby redo log,但主库仍然尝试用ARCH的方式传送日志,直到切换日志。
我们切换一下主库日志,主库alert信息就不会再报错了:
SQL> alter system switch logfile;

System altered.

Tue Jul 10 23:35:02 2007
ARC2: Standby redo logfile selected for thread 1 sequence 80 for destination LOG_ARCHIVE_DEST_2
Tue Jul 10 23:35:05 2007
LNS: Standby redo logfile selected for thread 1 sequence 81 for destination LOG_ARCHIVE_DEST_2

进一步测试可以发现:在这种情况下,只重启主库也可以恢复日志的正常传送。

简单总结:
1、ARCH方式传送日志时,如果主备库密码文件不一致,在重新设置密码文件后,必须重启主库才能使日志传送恢复正常。
2、LGWR方式传送日志时,如果主备库密码文件不一致,在重新设置密码文件后,只需可以重启主库或重启备库就可使日志传送恢复正常。

看来LGWR在减少可能的数据损失同时,还可以减低主库重启的几率,推荐大家使用这种方式作为日志传送手段。

你可能感兴趣的:(ORA-16191的分析)