ORACLE 11G R2 DG BROKER 自动FAILE OVER

也就是 FAST FAILE OVER

 

http://docs.oracle.com/cd/B28359_01/server.111/b28295/cli.htm#BABEIIHD

7.6 Scenario 5: Enabling Fast-Start Failover and Starting theObserver

You can enable fast-start failover from any site, including theobserver site, while connected to any database in the broker configuration.Enabling fast-start failover does not trigger a failover. Instead, it allowsthe observer to begin observing the primary and standby databases and initiatea fast-start failover should conditions warrant a failover.

This section describes the steps to enable fast-start failoverand start the observer where the configuration property mode is to be set to:

·        Ensure standby redo logs areconfigured on the primary and target standby databases.

·        Ensure the LogXptMode Property is setto SYNC.

·        Set the FastStartFailoverTargetconfiguration property.

·        Upgrade the protection mode to MAXAVAILABILITY, if necessary.

·        Enable Flashback Database on theprimary and target standby databases, if necessary.

·        Start the observer.

·        Enable fast start failover.

·        Verify the fast-start failoverconfiguration.

 

DG FAILOVER 操作

FAILOVER 有SQL操作, DG BROKER 手工操作和DG BROKER 自动操作 FAST-FAILOVER

要开启数据库的FLASH BACK 功能目的是在于原来主库恢复的时候REINSTATE

一先看下SQL的操作

现在把主库给关了 SHUTDOWN IMMEDIATE

在备库上

SQL> ALTER DATABASE RECOVERMANAGED STANDBY DATABASE CANCEL;

ALTER DATABASE RECOVER MANAGED STANDBY DATABASEFINISH;

alter database recover managed standby databasedisconnect from session;

Alter database commit to switchover TO PRIMARYwith session shutdown;

SHUTDOWN IMMEDIATE;

STARTUP;

alter system switch logfile;

SELECTNAME,OPEN_MODE,PROTECTION_MODE,DATABASE_ROLE,SWITCHOVER_STATUS,DB_UNIQUE_NAMEFROM V$DATABASE;

 

二 手工DG BROKERFAILOVER

关闭主库

1.   SQL> shutdown abort  

2.   DGMGRL> failover to ‘DBSALVE’;  

重新启动原主库dg2

1.   SQL> startup  

2.   ORACLE instance started.

3.   ORA-16649: possible failover to another database prevents this database from 

4.   being 

5.   opened  

6.   SQL> select open_mode,database_role,db_unique_name,flashback_on from v$database;  

7.   OPEN_MODE            DATABASE_ROLE    DB_UNIQUE_NAME       FLASHBACK_ON 

8.   -------------------- ---------------- -------------------- ------------------ 

9.   MOUNTED              PRIMARY          DBMAST                  YES 

这个时候DMON DG BROKER 发现有两个主库原来的主库只能启动在MOUNT状态下

 

DGMGRL> reinstate database ‘DBMAST’;

 

自动FAST-FAILOVER

DGMGRL> connecT sys/oracle@DBMAST;

已连接。

DGMGRL> show configuration;

配置 - DG_BROKER_SALVE

  保护模式:    MaxAvailability

  数据库:

   DBMAST  - 主数据库

   DBSALVE - 物理备用数据库

快速启动故障转移: DISABLED

配置状态:

SUCCESS

 

DGMGRL>  enablefast_start failover;

错误: ORA-16651: 不能满足启用快速启动故障转移的要求

 

这个要开启FLASHBACK ON 不用重启,也不用MOUNT下

SQL> alter database flashback on;

 

数据库已更改。

Set linesize 1000

col db_unique_nameformat  a15

col open_modeformat  a20

col flashback_onformat a15

col database_roleformat a20

coldataguard_broker format a20

col protection_modeformat a25

colswitchover_status format a25

 

SQL> selectopen_mode,database_role,db_unique_name,flashback_on from v$database;

 

OPEN_MODE        DATABASE_ROLE     DB_UNIQUE_NAME  FLASHBACK_ON

-------------------- ----------------------------------- ---------------

READ WRITE         PRIMARY                   DBMAST         YES

 

 

备用库下先取消恢复

SQL> ALTER DATABASE RECOVER MANAGEDSTANDBY DATABASE CANCEL;

数据库已更改。

SQL> alter database flashback on;

数据库已更改。

SQL> alter database recover managedstandby database using current logfile disconnect;

数据库已更改。

SQL> select flashback_on fromv$database;

FLASHBACK_ON

---------------

YES

 

 

再次有效FAST_START FAILOVER;

DGMGRL> connect sys/oracle@dbmast;

已连接。

DGMGRL> show configuration;

配置 - DG_BROKER_SALVE

  保护模式:    MaxAvailability

  数据库:

    DBMAST - 主数据库

   DBSALVE - 物理备用数据库

快速启动故障转移: DISABLED

配置状态:

SUCCESS

 

DGMGRL> enable fast_start failover;

已启用。

 

 

DGMGRL> show configuration;

配置 - DG_BROKER_SALVE

  保护模式:    MaxAvailability

  数据库:

   DBMAST  - 主数据库

     警告: ORA-16819: 未启动快速启动故障转移观察程序

   DBSALVE - (*) 物理备用数据库

     警告: ORA-16819: 未启动快速启动故障转移观察程序

快速启动故障转移: ENABLED

配置状态:

WARNING

 

DGMGRL>  start observer;

观察程序已启动

 

启动observerdgmgrl需要一直挂着,所以最好放在后台启动,如:
nohup dgmgrl sys/oracle@db1 'start observer' &

 

默认情况下,observer会创建一个二进制的文件 fsfo.dat来保存主库和备库的连接信息。 这个文件会在调用dgmgrl命令的当前窗口下生成。

 

 

开启新的窗口来看吧

 

 

DGMGRL> connect sys/oracle@dbmast;

已连接。

DGMGRL> show configuration;

配置 - DG_BROKER_SALVE

  保护模式:    MaxAvailability

  数据库:

   DBMAST  - 主数据库

   DBSALVE - (*) 物理备用数据库

快速启动故障转移: ENABLED

配置状态:

SUCCESS

 

 

看下主库的信息,主备库都一样

col FS_FAILOVER_OBSERVER_HOST for a30

select fs_failover_observer_present,fs_failover_observer_host,fs_failover_threshold from v$database;

FS_FAILOVER_OBSERVER_FS_FAILOVER_OBSERVER_HOST     FS_FAILOVER_THRESHOLD

--------------------------------------------------- ---------------------

YES                 DB-MASTER                                              30

 

 

DGMGRL>SHOW FAST_START FAILOVER;

快速启动故障转移: ENABLED

  阈值:            30 秒

  目标:            DBSALVE

  观察程序:      DB-MASTER

  滞后限制:      30 秒 (未使用)

  关闭主数据库: TRUE

  自动恢复:      TRUE

  观察程序重新连接: (无)

  观察程序覆盖: FALSE

可配置的故障转移条件

  健康状况:

    Corrupted Controlfile          YES

    Corrupted Dictionary           YES

    Inaccessible Logfile            NO

    Stuck Archiver                  NO

    Datafile Offline               YES

  Oracle 错误条件:

(无)

 

DGMGRL> show configuration verbose;

配置 - DG_BROKER_SALVE

  保护模式:    MaxAvailability

  数据库:

   DBMAST  - 主数据库

   DBSALVE - (*) 物理备用数据库

  (*) 快速启动故障转移目标

  属性:

   FastStartFailoverThreshold      ='30'

   OperationTimeout                ='30'

   FastStartFailoverLagLimit       ='30'

   CommunicationTimeout            ='180'

   ObserverReconnect               ='0'

   FastStartFailoverAutoReinstate  ='TRUE'

   FastStartFailoverPmyShutdown    ='TRUE'

   BystandersFollowRoleChange      ='ALL'

   ObserverOverride                ='FALSE'

   ExternalDestination1            =''

   ExternalDestination2            =''

   PrimaryLostWriteAction          ='CONTINUE'

快速启动故障转移: ENABLED

  阈值:             30 秒

  目标:             DBSALVE

  观察程序:       DB-MASTER

  滞后限制:       30 秒 (未使用)

  关闭主数据库: TRUE

  自动恢复:       TRUE

  观察程序重新连接: (无)

  观察程序覆盖: FALSE

配置状态:

SUCCESS

 

可以修改相关的属性

edit configuration set propertyFastStartFailoverThreshold=120;

 

 

2.8 验证自动切换

 

 

在前面提到,在一下情况会发生切换:

1) Instance Failure

2) Shutdown Abort

3) Offline Datafiles due to I/O error

4) Network disconnection

 

所以我们这里模拟主库shutdown的情况,我们在主库执行shutdown abort 在查看主备库的情况。

 

(1)先在客户端配置一下TAF

在tnsnames.ora 文件里添加如下参数:

 

DBMAST =

 (DESCRIPTION =

    (LOAD_BALANCE=OFF)

   (FAILOVER=on)

     (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.200)(PORT = 1521))

      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.202)(PORT =1521))

   (CONNECT_DATA =

     (SERVICE_NAME = DBMAST)

      (FAILOVER_MODE=

        (TYPE=select)

        (METHOD=basic)

        (RETRIES=180)

        (DELAY=5)

      )

    )

)

 

这里参数意思是  延迟5秒 重试180次然后连接到202地址上. 这里关闭了负载均衡.

这里使用SELECT 表示 开启查询的会话可以断点续传功能

 FAILOVER_MODE 参数

 FAILOVER_MODE 参数必须包含CONNECT_DATA 选项,也可以包含一些其他的参数,具体参数和意义参考下表:

 

FAILOVER_MODE Subparameter

Description

BACKUP

Specify a different net service name for backup connections. A backup should be specified when using preconnect to pre-establish connections.

TYPE

Specify the type of failover. Three types of Oracle Net failover functionality are available by default to Oracle Call Interface (OCI) applications:

·         session: Set to failover the session. If a user's connection is lost, a new session is automatically created for the user on the backup. This type of failover does not attempt to recover selects.

·         select: Set to enable users with open cursors to continue fetching on them after failure. However, this mode involves overhead on the client side in normal select operations.

·         none: This is the default. No failover functionality is used. This can also be explicitly specified to prevent failover from happening.

METHOD

Determines how fast failover occurs from the primary node to the backup node:

·         basic: Set to establish connections at failover time. This option requires almost no work on the backup server until failover time.

·         preconnect: Set to pre-established connections. This provides faster failover but requires that the backup instance be able to support all connections from every supported instance.

RETRIES

Specify the number of times to attempt to connect after a failover. If DELAY is specified, RETRIES defaults to five retry attempts.

Note: If a callback function is registered, then this subparameter is ignored.

DELAY

Specify the amount of time in seconds to wait between connect attempts. If RETRIES is specified, DELAY defaults to one second.

Note: If a callback function is registered, then this subparameter is ignored.

切换验证

主库模拟挂了

SQL> shutdown abort;

ORACLE 例程已经关闭。

 

 

 

在备库的窗口上

DGMGRL> connect sys/oracle@dbsalve;

已连接。

DGMGRL> show configuration verbose;

配置 - DG_BROKER_SALVE

  保护模式:    MaxAvailability

  数据库:

   DBMAST  - 主数据库

   DBSALVE - (*) 物理备用数据库

  (*) 快速启动故障转移目标

  属性:

   FastStartFailoverThreshold      ='120'

   OperationTimeout                ='30'

   FastStartFailoverLagLimit       ='30'

   CommunicationTimeout            ='180'

   ObserverReconnect               ='0'

   FastStartFailoverAutoReinstate  ='TRUE'

   FastStartFailoverPmyShutdown    ='TRUE'

   BystandersFollowRoleChange      ='ALL'

   ObserverOverride                ='FALSE'

   ExternalDestination1            =''

   ExternalDestination2            =''

   PrimaryLostWriteAction          ='CONTINUE'

快速启动故障转移: ENABLED

  阈值:             120 秒

  目标:             DBSALVE

  观察程序:       DB-Salve

  滞后限制:       30 秒 (未使用)

  关闭主数据库: TRUE

  自动恢复:       TRUE

  观察程序重新连接: (无)

  观察程序覆盖: FALSE

配置状态:

ORA-01034: ORACLE 不可用

ORA-16625: 无法访问数据库 "DBMAST"

DGM-17017: 无法确定配置状态

 

BROKER的跟踪文件显示:

 

04/21/2016 09:35:56

Failed to connect to remote database DBMAST.Error is ORA-01034

Failed to send message to site DBMAST. Errorcode is ORA-01034.

04/21/2016 09:37:10

FAILOVER TO DBSALVE

Beginning failover to database DBSALVE

Notifying Oracle Clusterware to teardowndatabase for FAILOVER

04/21/2016 09:37:12

Notifying DMON of db close

DMON: Old primary "DBMAST" needsreinstatement

04/21/2016 09:37:21

Protection mode set to MAXIMUM AVAILABILITY

04/21/2016 09:37:23

Deferring associated archivelog destinationsof sites permanently disabled due to Failover

Notifying Oracle Clusterware to buildupprimary database after FAILOVER

Posting DB_DOWN alert ...

       ... with reason Data Guard Fast-Start Failover - Primary Disconnected

04/21/2016 09:37:24

Command FAILOVER TO DBSALVE completed

 

最后 OBSERVER 信息

DGMGRL> start observer;

观察程序已启动

09:37:10.65 2016年4月21日星期四

正在为数据库 "DBSALVE" 启动快速启动故障转移...

立即执行故障转移, 请稍候...

故障转移成功, 新的主数据库为 "DBSALVE"

09:37:24.91 2016年4月21日星期四

 

疑惑的是 应用端的进程,能立刻连上备库,没有去重试连主库,不知道这配置好像没达到自己的意图. 也就是说主库挂了,在备库等待2分钟内,应用端没有去重试主库180次,就立马连接了备库,而且只是在只读模式状态下.

 

 

再把原主库起来

SQL> startup mount

ORACLE 例程已经启动。

 

Total System Global Area  446775296 bytes

Fixed Size              2254104 bytes

Variable Size                360712936 bytes

Database Buffers        79691776 bytes

Redo Buffers                 4116480 bytes

数据库装载完毕。

col name format a10;

col db_unique_name format a10;

col open_mode format a20;

col protection_mode format a20;

col database_role format a20;

col switchover_status format a20;

SQL> SELECTNAME,OPEN_MODE,PROTECTION_MODE,DATABASE_ROLE,SWITCHOVER_STATUS,DB_UNIQUE_NAMEFROM V$DATABASE;

 

NAME          OPEN_MODE              PROTECTION_MODE      DATABASE_ROLE         SWITCHOVER_STATUS    DB_UNIQUE_

---------- ---------------------------------------- -------------------- -------------------- ----------

DBMAST     MOUNTED                   MAXIMUMAVAILABILITY PRIMARY                   NOT ALLOWED           DBMAST

 

 

还是主库角色

 

观察者窗口信息

DGMGRL> --start observer;

10:15:12.53 2016年4月21日星期四

正在为数据库 "DBMAST" 启动恢复过程...

正在恢复数据库 "DBMAST", 请稍候...

错误: ORA-16653: 无法恢复数据库

 

失败。

恢复数据库 "DBMAST" 失败

10:15:36.84 2016年4月21日星期四

 

 

DGMGRL> show configuration verbose;

 

配置 - DG_BROKER_SALVE

 

  保护模式:    MaxAvailability

  数据库:

   DBSALVE - 主数据库

      警告: ORA-16817: 快速启动故障转移配置不同步

 

   DBMAST  - (*) 物理备用数据库 (禁用)

     ORA-16661: 需要恢复备用数据库

 

  (*) 快速启动故障转移目标

 

  属性:

   FastStartFailoverThreshold      ='120'

   OperationTimeout                ='30'

   FastStartFailoverLagLimit       ='30'

   CommunicationTimeout            ='180'

   ObserverReconnect               ='0'

   FastStartFailoverAutoReinstate  ='TRUE'

   FastStartFailoverPmyShutdown    ='TRUE'

   BystandersFollowRoleChange      ='ALL'

   ObserverOverride                ='FALSE'

   ExternalDestination1            =''

   ExternalDestination2            =''

   PrimaryLostWriteAction          ='CONTINUE'

 

快速启动故障转移: ENABLED

  阈值:             120 秒

  目标:             DBMAST

  观察程序:       DB-Salve

  滞后限制:       30 秒 (未使用)

  关闭主数据库: TRUE

  自动恢复:       TRUE

  观察程序重新连接: (无)

  观察程序覆盖: FALSE

 

配置状态:

WARNING

DGMGRL> show configuration;

配置 - DG_BROKER_SALVE

  保护模式:    MaxAvailability

  数据库:

   DBSALVE - 主数据库

   DBMAST  - (*) 物理备用数据库

快速启动故障转移: ENABLED

配置状态:

ORA-16610: 命令 "REINSTATE DATABASE DBMAST" 正在进行中

DGM-17017: 无法确定配置状态

 

问题待续. 目前有两个问题,

问题一是 客户端TAF 配置 不理想.

问题二 原主库 起来后无法reinstated

问题三 这个是啥配置 从哪里来的呀?

原主库的数据库跟踪文件

Fatal NI connect error 12514, connecting to:

 (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.202)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=DBSALVE_DGB)(INSTANCE_NAME=DBSALVE)(CID=(PROGRAM=oracle)(HOST=DB-MASTER)(USER=oracle))))

 

 VERSION INFORMATION:

          TNSfor Linux: Version 11.2.0.4.0 - Production

          TCP/IPNT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production

  Time:21-4月 -2016 10:41:18

 Tracing not turned on.

  Tnserror struct:

    nsmain err code: 12564

   

TNS-12564: TNS: 拒绝连接

    nssecondary err code: 0

    ntmain err code: 0

    ntsecondary err code: 0

    ntOS err code: 0

 

BROKER 跟踪文件

Failed to connect to remote database DBSALVE.Error is ORA-12514

Failed to send message to site DBSALVE. Errorcode is ORA-12514.

 

FAQ 如果你不小心关掉了 start observer 窗口后要STOP OBSERVER  然后在新窗口开启START OBSERVER


你可能感兴趣的:(over,dataguard,Fast,faile,DG_BROKER)