Data Recovery Advisor
The Data Recovery Advisor (DRA) is a new tool that automatically diagnoses failures in the database, determines the repair options and, optionally, executes the repairs if you want. A database failure could be a corruption or loss of data, including the loss of datafiles. You can access DRA through the RMAN client or Enterprise Manager.
The DRA uses Oracle Database 11g’s new diagnosability infrastructure and the new diagnostic concepts such as problems, incidents, and failures to automatically diagnose data failures. The DRA can help you fix problems such as lost datafiles,data block corruption, I/O failures, and a crashed database. Here are the main goals of the DRA:
- Early detection of data failures, which limits damage
- Automatic detection of failures, along with recommendations and even repair of the failures
Reduced database downtime
The Data Recovery Advisor can diagnose and help fix problems such as the following:
- Inaccessible datafiles and control files
- Physical data corruption such as invalid block header field values
- Inconsistencies such as one datafile being older than others
- I/O failures such as hardware errors
A checker or check is a specific diagnostic procedure registered with the Health Monitor. You can proactively invoke these checkers, known as data integrity checks,yourself, and they are also run reactively by the database itself. The Health Monitor automatically runs a data integrity check whenever it finds an error in the database.
The data integrity check searches the database for any failures that caused the errors.A failure is defined as a problem such as data corruption that is diagnosed by the database. All diagnosed failures are stored in the ADR.
You can use the DRA to fix the data failures that caused the errors, but you can only do so after the failure diagnostics are stored in the ADR. You can use the Data Recovery Advisor to diagnose a failure and in most cases, fix it, either through a manual course of action or through an automated repair strategy. The failure data is stored outside the database. Consequently, you can view failures and fix them even when the database is in the nomount state.
All findings in the DRA are organized into groups pertaining to a specific failure.The DRA also assigns severity levels such as critical and high to each database failure it captures. Once the DBA requests the DRA for repair advice, the DRA provides all the automatic and manual repair choices available to you, along with its advice. You can choose to fix the problem yourself using the manual methods or to have the
DRA perform the repairs.
Note that the DRA consolidates related failures into a single failure. If 100 data blocks are corrupted in a datafile, the DRA will show a single failure. You can drill down to the level of an individual sub-failure.
The DRA doesn’t automatically tag a database error that is logged in the alert log file, such as an ORA-600 error, as a failure. To be considered a failure, a problem must be diagnosed by a check (or checker) registered with the Health Monitor.So, the chain of causation is this: an error leads to a Health Monitor data integrity check, which investigates the failures related to the error. If the database finds related failures, it lodges the information about the failure in the ADR. It is at this point that you can call on the DRA to generate repair advice.The preceding chain of causation relates to a reactive data integrity check, but the logic is the same when you launch your own proactive data integrity checks. A failure that your check reveals will be treated just as a failure that’s detected by a Health Monitor data integrity check. Also, any validate or backup commands you issue through RMAN will invoke an automatic data integrity check.
Failure
You must be aware of three failure characteristics: status, priority and grouping:
Failure status
When the database first reports a failure, the failure has a status of open. Once the failure is repaired, the status changes to closed.When you run the list failure command, it automatically closes all failures that have been resolved. If, for some reason, the database doesn’t set the status of a fixed failure to closed, you can do so by using the change failure . . . closed command. But remember that if you try to set the status of an unresolved failure to closed, the Data Recovery Advisor will re-create the failure with a different ID following the next data integrity check, when the check encounters the unresolved failure.
Failure priority
A failure can be assigned three levels of priority: critical,high, or low. A high priority level, such as the one assigned to a missing current control file, could bring the database down. Events such as a missing
archived red log are assigned the high level of priority. The DRA, however,assigns only the critical or high level of priority to a failure, and the list failure command shows you only the failures with these two levels
of priority. If, in your opinion, a failure with a high priority isn’t really a big deal and can wait for fixing, you can manually change the priority level of that failure to low, thus keeping that failure from appearing in the list
failure output. You can’t, however, change the priority level of a failure from critical to low.
Failure grouping
The DRA always groups related failures together under a single parent failure. You can, however, use the detail option in the list failure command to view details about the individual failures in a group.
Repair Options
The DRA offers you two types of repair options in some cases, manual and automatic.It’s your responsibility to perform the manual repair actions, whereas the DRA performs the automatic repair actions itself. The DRA first checks to see if it can perform the automatic repair before offering the repair choice to you. For example, in the case of a failure repair that involves restoring and recovering a datafile, the DRA will offer to automatically perform this repair only after first verifying that the necessary datafile backups and archived redo logs do exist and that RMAN can access them. It may further divide the manual repair actions into mandatory or optional actions. If a manual option is easier to perform than undertaking a more drastic repair in the way of restoring and recovering datafiles, for example, the Data Recovery Advisor presents both the manual and the automatic repair choices. If the Data Recovery Advisor knows that a problem can’t be fixed with an automatic repair, the DRA will state that the manual repair action is mandatory. If a repair action can be performed either manually or automatically, such as restoring and replacing a missing datafile, the DRA will offer both options.
Whenever the DRA offers an automatic repair option, it shows you the directory location where the repair script it’s going to use is stored. You can then edit and execute this file yourself, if you so desire.Wherever it’s possible to do so, the DRA tries to consolidate its repairs for multiple failures into a single repair consisting of several individual steps. If it can’t repair a failure that results from, say, a missing controlfile, it’ll report that information to you and suggest that you first fix those other problems. In this case, you may create a new control file and issue the list failure, advise failure, and repair failure commands once again, so the DRA can fix the failure for you.
Oracle does inform you that a DRA failure repair sometimes may not do the trick because the DRA doesn’t check every single byte in a backup file or an archived redo log. This leaves room for the possibility that a corrupt block in either a backup file or an archived redo log may still keep you from fixing the database failure.
In Oracle Database 11g Release 1, the Data Recovery Advisor supports only a single instance and not Oracle RAC databases. If all the instances of an Oracle RAC setup crash, you can mount the database in the single instance mode and use the DRA to fix certain problems as well as invoke data recovery checks. You can’t, however, use the DRA to detect failures such as an inaccessible datafile, which is local to another instance in the Oracle RAC configuration.
You can invoke the DRA through RMAN as well as through Enterprise Manager.In Enterprise Manager, the DRA is integrated with both the Support Workbench and the Health Monitor.
Using RMAN to Manage the Data Recovery Advisor
You can use the RMAN interface to invoke the Data Recovery Advisor and manage and resolve various types of data failures. Before you actually walk through a data failure repair with RMAN, it’s important to review the key RMAN commands pertaining to the DRA.
Viewing Failures
Use the list failure command to view a list of database failures that were caught by invoking an automatic or manual database check prompted by an error in the database. Here’s an example:
RMAN> list failure;
The V$IR_FAILURE view stores the results of all the list failure commands you issue.The list failure command lists a failure only if the database has diagnosed a failure. The list failure command doesn’t initiate a data integrity check by itself—it merely reports an already diagnosed failure.
You can view individual sub-failures by issuing the list failure . . . detail command. Because the Data Recovery Advisor consolidates failures wherever possible, you use the detail option to list individual failures. You can provide options such as critical, high, low, or closed to limit the list failure command’s output to only those failures that belong to a specific priority level. By default, the list failure command shows all failures with a priority level of critical or high. Here are some list failure command variations:
RMAN> list failure critical;
RMAN> list failure open;
RMAN> list failure closed;
RMAN> list failure exclude failure 12345;
Dealing with Failures
Use the change failure command to change the priority level of a database failure. This command is useful when you want to close a failure that’s really trivial.Note that you can change a priority of high to low and vice versa, but you can’t change a critical priority level. The default output list of the list failure command shows all high-priority failures. If there are too many high-priority failures that you’re already aware of, you may want to change a failure’s status from high to low to avoid seeing those failures in the output of the list failure command.The following example shows how to change the priority level of a command
from high to low:
RMAN> change failure 1 priority low;
RMAN> change failure 102 closed;
You can change a problem’s status to closed any time you want, even if the failure wasn’t really fixed. However, if the DRA gets new checker data pertaining to that problem, it will re-create the failure with a different ID.
Getting Repair Advice
Use the advise failure command to get advice on a failure. The command prints a summary of the failure. It also implicitly closes all open problems that have been fixed. By default, the command reports on all critical and high priority repairs. The command lists all failures that have a critical or high level of priority.The command shows a summary of the repair options as well as a single repair option for each failure. The repair options can be manual as well as automated. The two repair options are a no-data-loss and a data-loss repair.
Manual repair options can be mandatory or optional. Often, a manual repair option may be the best option, as it avoids more extreme measures such as restoring and recovering entire datafiles, as in the automatic repair option. After showing the available repair options, RMAN generates a repair script to implement the repair option it recommends. You can then use this script, also called a manual checklist, after making any necessary changes to it, to repair the failure.
Using the RMAN-generated repair script may often help you fix a problem quicker than letting RMAN perform the repair. For example, it’s far easier to manually restore a lost table from another database rather than to let RMAN perform a laborious restore and recover operation to fix the problem.
Once you issue the advise failure command,
RMAN> advise failure;
List of Database Failures
=========================
Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
82 HIGH OPEN 05-OCT-13 One or more non-system datafiles are missing
analyzing automatic repair options; this may take some time
using channel ORA_DISK_1
analyzing automatic repair options complete
Mandatory Manual Actions
========================
no manual actions available
Optional Manual Actions
=======================
1. If file /u01/app/oradata/orcl/users01.dbf was unintentionally renamed or moved, restore it
Automated Repair Options
========================
Option Repair Description
------ ------------------
1 Restore and recover datafile 4
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/diag/rdbms/orcl/orcl/hm/reco_1324291144.hm
you can query the V$IR_ MANUAL_CHECKLIST view to examine the recommended repair, as shown here:
SQL>select advise_id,rank,required,message from v$ir_manual_checklist;
ADVISE_ID RANK REQUIRED MESSAGE
---------- ---------- --------- --------------------------------------------------------------------------------
101 0 NO If file /u01/app/oradata/orcl/users01.dbf was unintentionally renamed or moved,
restore it
In the V$IR_MANUAL_CHECKLIST view, the MESSAGE column shows a summary of the repair advice offered by the DRA.
Note that the DRA always tries to consolidate the repair actions so multiple failures can be fixed with a single repair job.
Repairing Failures
Once the advise failure command reports its repair recommendations, you can either manually repair the failures or choose the automatic repair option. If you want the Data Recovery Advisor to automatically fix the failures, simply issue the repair failure command so RMAN can fix the failure. Sometimes, the advise failure command may ask you to perform some manual repairs first before executing the repair failure command. Here’s the basic repair failure command:
RMAN> repair failure;
You must issue the advise failure command before you can fix problems with the repair failure command. Otherwise, you’ll get an error, as shown in this example:
RMAN> repair failure;
using target database control file instead of recovery catalog
RMAN-00571: =======================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ========
RMAN-00571:================================================================
RMAN-03002: failure of repair command at 10/21/2007 12:15:24
RMAN-06954: REPAIR command must be preceded by ADVISE command in same session
By default, the repair failure command will implement the single repair recommendation shown in the output for the advise failure command. You can also choose to merely view RMAN’s repair procedure without actually repairing the failure by using the preview option with the repair failure command:
RMAN> repair failure preview;
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/diag/rdbms/orcl/orcl/hm/reco_1324291144.hm
contents of repair script:
# restore and recover datafile
sql 'alter database datafile 4 offline';
restore datafile 4;
recover datafile 4;
sql 'alter database datafile 4 online';
The repair failure preview command is helpful in ascertaining the actual steps of the RMAN repair procedure, without embarking on the repair process. The V$IR_REPAIR view shows the results of the repair failure command. Here’s the structure of the V$IR_REPAIR view:
SQL> select repair_id,advise_id,summary,rank
2 from v$ir_repair;
REPAIR_ID ADVISE_ID SUMMARY RANK
---------- ---------- ------------------------- ----------
102 101 NO DATA LOSS 1
You can also check the v$ir_failure_set VIEW,it's a cross-reference of failure and advise identifiers.
SQL> select advise_id,failure_id,manual_repairs_only from v$ir_failure_set;
ADVISE_ID FAILURE_ID MANUAL_RE
---------- ---------- ---------
101 85 NO
Following is a sample to use repair failure command to automatically repaire failure.
RMAN> repair failure;
Strategy: The repair includes complete media recovery with no data loss
Repair script: /u01/app/diag/rdbms/orcl/orcl/hm/reco_1324291144.hm
contents of repair script:
# restore and recover datafile
sql 'alter database datafile 4 offline';
restore datafile 4;
recover datafile 4;
sql 'alter database datafile 4 online';
Do you really want to execute the above repair (enter YES or NO)? Yes
executing repair script
sql statement: alter database datafile 4 offline
Starting restore at 05-OCT-13
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00004 to /u01/app/oradata/orcl/users01.dbf
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/dbs/01olkkka_1_1
channel ORA_DISK_1: piece handle=/u01/app/oracle/dbs/01olkkka_1_1 tag=TAG20131005T084914
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 05-OCT-13
Starting recover at 05-OCT-13
using channel ORA_DISK_1
starting media recovery
media recovery complete, elapsed time: 00:00:00
Finished recover at 05-OCT-13
sql statement: alter database datafile 4 online
repair failure complete
Although RMAN helped us in this case by providing a repair option and easily let us fix the problem, the Data Recovery Advisor can’t fix every failure ,like data dictionary corruption. Here’s an example:
RMAN> list failure;
using target database control file instead of recovery catalog
List of Database Failures
=========================
Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- --------------------
42725 CRITICAL OPEN 03-OCT-07 SQL dictionary health check:
dependency$.dobj# fk 126 on object DEPENDENCY$ failed
42722 CRITICAL OPEN 03-OCT-07 SQL dictionary health check:
dependency$.dobj# fk 126 on object DEPENDENCY$ failed
Issue the advise failure command, to view the repair options for the failures:
RMAN> advise failure;
List of Database Failures
=========================
Failure ID Priority Status Time Detected Summary
---------- -------- --------- ------------- -------
42725 CRITICAL OPEN 03-OCT-07 SQL dictionary health check:
dependency$.dobj# fk 126 on object DEPENDENCY$ failed
42722 CRITICAL OPEN 03-OCT-07 SQL dictionary health check:
dependency$.dobj# fk 126 on object DEPENDENCY$ failed
Mandatory Manual Actions
========================
1. Please contact Oracle Support Services to resolve
failure 42725: SQLdictionary health check:
dependency$.dobj# fk 126 on object DEPENDENCY$ failed
2. Please contact Oracle Support Services to resolve failure
42722: SQLdictionary health check: dependency$.
dobj# fk 126 on object DEPENDENCY$ failed
Optional Manual Actions
=======================
no manual actions available
Automated Repair Options
========================
no automatic repair options available
The Data Recovery Advisor doesn’t provide either a manual or an automated repair option in this case. Instead, it asks you to contact Oracle Support Services to resolve the failure reported by the SQL dictionary health check.
Proactive Checks
Although the database runs reactive checks on its own, Oracle recommends that your run proactive database health checks on a regular basis. It’s also best to run a proactive check to dig deeper into a problem when a reactive check indicates there is a database component failure. You can run these proactive checks through the Health Monitor or by executing the new validate database command.
The new validate database command in Oracle Database 11g lets you invoke a proactive health check for the database and its components. The validate database command is similar to the old validate backupset command, but is more sophisticated. Any time the validate database command detects a database failure, it initiates a failure assessment and logs the failure in the ADR. You can view these failure findings when you issue the list failure command. Here’s an example showing how to use the command:
RMAN> validate database;
Starting validate at 16-OCT-07
using target database control file instead of recovery catalog allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=109 device type=DISK
RMAN-06169: could not read file header for datafile 7 error reason 7
RMAN-00571: ===================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =======
RMAN-00571: ===================================================
RMAN-03002: failure of validate command at 10/16/2007 12:25:33
RMAN-06056: could not access datafile 7
You can employ the validate database command to validate at a fine level of granularity because you can validate individual backup sets or even individual data blocks now with the command. By default, the command checks for physical and not logical database corruption. However, you can make the command check for logical corruption by specifying the CHECK LOGICAL option. Of the two types of block corruption, intrablock and interblock, the validate database command checks for intrablock corruption only.
You can refer the VALIDATE command syntax from the following link:
http://docs.oracle.com/cd/E11882_01/backup.112/e10643/rcmsynta2024.htm#RCMRF162
参考至:《McGraw.Hill.OCP.Oracle.Database.11g.New.Features.for.Administrators.Exam.Guide.Apr.2008》
本文原创,转载请注明出处、作者
如有错误,欢迎指正