在IBM DBA 小荷的blog上看到一个用Logminer 做数据恢复的例子。 虽然对Logminer 也了解一点,但是用Logminer 做恢复还真没用过,所以也测试一下。 原文链接地址如下:
客户的一次疏忽,DBA的一次噩梦
http://www.oracleblog.org/working-case/dba-always-bad-luck-with-careless-customer/
一. 在测试之前讲一点理论知识
1.1. 补充日志(supplemental logging)
先看一下补充日志都包含哪些信息和特性:
(1)索引簇、链行和迁移行;
(2)直接路径插入;
(3)摘取LogMiner字典到重做日志;
(4)跟踪DDL;
(5)生成键列的SQL_REDO和SQL_UNDO信息;
(6)LONG和LOB数据类型。
这里我们重点看一下:track DDL 和 generate sql_redo and sql_undo.
Oracle 的online redo 会记录DB的所有操作,包括DDL 和 DML。 supplemental log支持track DDL。 也就是说,我们可以直接去Mining DML的内容。 但是如果要去Mining DDL 内容,就必须先启动supplemental log,oracle 收集有关更多的DDL 信息之后,我们才可以去Mining它的信息。
因为我们可以根据归档和online redo 去恢复数据,所以这些DDL 的内容,即使不启动 supplemental log,对与Oracle 内部来说肯定是可以识别的,只是我们不能Mining出来。 只有启动supplemental log之后,我们也就可以Mining出来了。
默认情况下Oracle 并没有启动supplemental log。因为记录太多的内容会增加写log的压力。
SQL_REDO 和 SQL_UNDO 是我们操作的SQL(DDL和DML)和用于回滚的SQL。 我们的恢复就是使用SQL_UNDO 来进行的。
在我们的DML 和DDL操作之前,需要先启动supplemental log。 不然生成的SQL_REDO 和 SQL_UNDO 是没有经过数据字典转换过的,这样不具可读性。 都是Oracle 内部的ID。
启动supplemental log:
SQL>alter database add supplemental log data;
关闭supplemental log:
SQL>alter database drop supplemental log data;
查看 supplemental log:
SQL>select supplemental_log_data_min from v$database;
1.2 Logminer 的三种模式
在我之前整理过的Blog里有详细说明:
Oracle Logminer 说明
http://blog.csdn.net/xujinyang/article/details/6972909
LogMiner dictionary :
The LogMiner dictionary allows LogMiner to provide table and column names, instead of internal object IDs, when it presents the redo log data that you request.
LogMiner uses the dictionary to translate internal object identifiers and datatypes to object names and external data formats. Without a dictionary, LogMiner returns internal object IDs and presents data as binary data.
LogMiner字典用于将内部对象ID号和数据类型转换为对象名和外部数据格式。使用LogMiner分析重做日志和归档日志时,应该生成LogMiner字典,否则将无法读懂分析结果。
INSERT INTO HR.JOBS(JOB_ID, JOB_TITLE, MIN_SALARY, MAX_SALARY) VALUES('IT_WT','Technical Writer', 4000, 11000);
如果没有数据字典进行转换,解析之后的结果是:
insert into "UNKNOWN"."OBJ# 45522"("COL 1","COL 2","COL 3","COL 4") values (HEXTORAW('45465f4748'),HEXTORAW('546563686e6963616c20577269746572'),HEXTORAW('c229'),HEXTORAW('c3020b'));
这个就没有什么可读性了。 现在我们来看三种模式。
1.2.1 Online Catalog
直接用DB的数据字典在线进行转换。 要求DB必须处于open 状态,只能Mining DML。 只能反应当前版本表中的信息。 即表没有没有进行DDL 修改。 只能Mining到表自修改之后到现在的数据。 之前的不能Mining。
这是效率最高的。 但是缺点也摆在这。 系统表是关键表,用这种方法会增加DB的压力。
1.2.2 Extracting a LogMiner Dictionary to the Redo Log Files
The process of extracting the dictionary to the redo log files does consume database resources, but if you limit the extraction to off-peak hours, then this should not be a problem, and it is faster than extracting to a flat file. Depending on the size of the dictionary, it may be contained in multiple redo log files. If the relevant redo log files have been archived, then you can find out which redo log files contain the start and end of an extracted dictionary.
To do so, query the V$ARCHIVED_LOG view, as follows:
SQL>SELECT NAME FROM V$ARCHIVED_LOG WHERE DICTIONARY_BEGIN='YES';
SQL>SELECT NAME FROM V$ARCHIVED_LOG WHERE DICTIONARY_END='YES';
使用这种方法必须启动supplemental log。 进程会讲database dictionary的信息extract到online redo log里去,从而减少对在Mining时对数据库资源的消耗。 如果database dictionary 非常大, 这时候在写online redo的时候发生了归档的操作。 那么可以通过上面的两个SQL 来查看。 因为dictionary信息写入了这些log文件,所以在Mining时,这些文件是必须包含在Mining里的,不然会报ORA-1371的错误。
To extract a LogMiner dictionary to the redo log files, the database must be open and in ARCHIVELOG mode and archiving must be enabled. While the dictionary is being extracted to the redo log stream, no DDL statements can be executed. Therefore, the dictionary extracted to the redo log files is guaranteed to be consistent (whereas the dictionary extracted to a flat file is not).
这个还有一个很大的问题在这。 就是在进行extract的时候,所以的DDL 都会被挂住。 即不能执行,只有当extract 结束以后,DDL 才能执行。 如果extract 的时间很长,那么DDL 被挂的时间也就很长。
虽然讲生产库上DDL 操作很少,但是这个extract directory to redo 操作也还是有风险的。 所以可行性最高的还是我们的第三种方法。
1.2.3 Extracting the LogMiner Dictionary to a Flat File
When the LogMiner dictionary is in a flat file, fewer system resources are used than when it is contained in the redo log files. Oracle recommends that you regularly back up the dictionary extract to ensure correct analysis of older redo log files.
Be sure that no DDL operations occur while the dictionary is being built.
同样需要启动supplemental log。 在extract to online redo的时候,Oracle会限制DDL 执行,直到directory extract 结束。 而extract to flat file的话,就需要用户来保证这个一致性了。 而且每次进行挖掘的时候都需要extract一次,从而保证一致性。
这种方法的致命伤是需要设置 UTL_FILE_DIR参数,而该参数的生效必须重启DB。
我们这里就是用extract to flat file来来演示数据恢复。
三. 用Logminer恢复的示例
一般生产环境不会启动supplemental log。 所以用Logminer 方法来做数据恢复不是一种常用的方法。 对于DML 操作,还是有一定的可行性。 DDL 就不行。
而且在没有启动supplemental log的情况下,Mining出来的SQL_REDO和SQL_UNDO数据是没有进过数据字典进行转换的,可读性很差。如:
delete from "UNKNOWN"."OBJ# 54173" where "COL 1" = HEXTORAW('53434f5454') and "COL 2" = HEXTORAW('504b5f44455054') and "COL 3" IS NULL and "COL 4" = HEXTORAW('c3060c30') and"COL 5" = HEXTORAW('c3060c30') and "COL 6" = HEXTORAW('494e444558') and "COL 7" = HEXTORAW('7869061e14303a') and "COL 8" = HEXTORAW('7869061e14303a') and "COL 9" =HEXTORAW('323030352d30362d33303a31393a34373a3537') and "COL 10" = HEXTORAW('56414c4944') and "COL 11" = HEXTORAW('4e') and "COL 12" = HEXTORAW('4e') and "COL 13" =HEXTORAW('4e') and ROWID = 'AAANOdAABAAATEmAAc';
insert into "UNKNOWN"."OBJ# 54173"("COL 1","COL 2","COL 3","COL 4","COL 5","COL 6","COL 7","COL 8","COL 9","COL 10","COL 11","COL 12","COL 13") values(HEXTORAW('53434f5454'),HEXTORAW('504b5f44455054'),NULL,HEXTORAW('c3060c30'),HEXTORAW('c3060c30'),HEXTORAW('494e444558'),HEXTORAW('7869061e14303a'),HEXTORAW('7869061e14303a'),HEXTORAW('323030352d30362d33303a31393a34373a3537'),HEXTORAW('56414c4944'),HEXTORAW('4e'),HEXTORAW('4e'),HEXTORAW('4e'));
Logminer 可以作为Flashback 的一种补充。 关于Flashback 参考我的Blog:
Oracle Flashback 技术 总结
http://blog.csdn.net/xujinyang/article/details/6830438
3.1 启动supplemental log
SYS@anqing2(rac2)> alter database add supplemental log data;
Database altered.
SYS@anqing2(rac2)> select supplemental_log_data_min from v$database;
SUPPLEME
--------
YES
这个参数可以动态修改,不需要重启DB。
3.2 创建测试表
SYS@anqing2(rac2)> create table huaining as select * from dba_objects;
Table created.
SYS@anqing2(rac2)> select count(*) from huaining;
COUNT(*)
----------
50253
-- 查看一下时间
SYS@anqing2(rac2)> alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
Session altered.
SYS@anqing2(rac2)> select sysdate from dual;
SYSDATE
-------------------
2011-06-19 13:23:29
这个时间之前的是我们的原始数据。 下面我们做一些DML 操作,做完操作之后,我们可以使用Flashback 去进行rollback。 这里我们使用Logminer去Mining 这些DML,然后用sql语句去进行rollback。
3.3. 一些DML 操作
SYS@anqing2(rac2)> select distinct owner from huaining;
OWNER
------------------------------
MDSYS
TSMSYS
DMSYS
PUBLIC
OUTLN
CTXSYS
OLAPSYS
SYSTEM
EXFSYS
SCOTT
ORACLE_OCM
DBSNMP
ORDSYS
ORDPLUGINS
SYSMAN
XDB
SYS
WMSYS
SI_INFORMTN_SCHEMA
19 rows selected.
SYS@anqing2(rac2)> delete from huaining where owner='SCOTT';
6 rows deleted.
SYS@anqing2(rac2)> commit;
Commit complete.
SYS@anqing2(rac2)> update huaining set owner='DAVE' where object_id<20;
18 rows updated.
SYS@anqing2(rac2)> commit;
Commit complete.
假设N长时间过去了,已经超过了UNDO retention的时间,没办法进行Flashback,这时候就可以用Logminer来把这段时间内的DML操作给挖出来,然后用SQL_UNDO的sql执行一下,就恢复出来。
3.4 设置数据字典目录
SYS@anqing2(rac2)> show parameter utl
NAME TYPE VALUE
----------------------------- ----------------------- ----------
create_stored_outlines string
utl_file_dir string
现在为空,没有值
SYS@anqing2(rac2)> alter system set utl_file_dir='/u01/backup' scope=both;
alter system set utl_file_dir='/u01/backup' scope=both
*
ERROR at line 1:
ORA-02095: specified initialization parameter cannot be modified
--必须重启才能生效
SYS@anqing2(rac2)> alter system set utl_file_dir='/u01/backup' scope=spfile;
System altered.
3.5 重启实例
[oracle@rac1 u01]$ sh crs_stat.sh
Name Target State Host
------------------------------ ---------- --------- -------
ora.anqing.anqing1.inst ONLINE ONLINE rac1
ora.anqing.anqing2.inst ONLINE ONLINE rac2
ora.anqing.db ONLINE ONLINE rac1
ora.rac1.ASM1.asm ONLINE ONLINE rac1
ora.rac1.LISTENER_RAC1.lsnr ONLINE ONLINE rac1
ora.rac1.gsd ONLINE ONLINE rac1
ora.rac1.ons ONLINE ONLINE rac1
ora.rac1.vip ONLINE ONLINE rac1
ora.rac2.ASM2.asm ONLINE ONLINE rac2
ora.rac2.LISTENER_RAC2.lsnr ONLINE ONLINE rac2
ora.rac2.gsd ONLINE ONLINE rac2
ora.rac2.ons ONLINE ONLINE rac2
ora.rac2.vip ONLINE ONLINE rac2
[oracle@rac1 u01]$ srvctl stop database -d anqing
[oracle@rac1 u01]$ sh crs_stat.sh
Name Target State Host
------------------------------ ---------- --------- -------
ora.anqing.anqing1.inst ONLINE OFFLINE
ora.anqing.anqing2.inst ONLINE OFFLINE
ora.anqing.db OFFLINE OFFLINE
ora.rac1.ASM1.asm ONLINE ONLINE rac1
ora.rac1.LISTENER_RAC1.lsnr ONLINE ONLINE rac1
ora.rac1.gsd ONLINE ONLINE rac1
ora.rac1.ons ONLINE ONLINE rac1
ora.rac1.vip ONLINE ONLINE rac1
ora.rac2.ASM2.asm ONLINE ONLINE rac2
ora.rac2.LISTENER_RAC2.lsnr ONLINE ONLINE rac2
ora.rac2.gsd ONLINE ONLINE rac2
ora.rac2.ons ONLINE ONLINE rac2
ora.rac2.vip ONLINE ONLINE rac2
--杯具的事情发生了,启动报错
[oracle@rac1 u01]$ srvctl start database -d anqing
PRKP-1001 : Error starting instance anqing1 on node rac1
CRS-0215: Could not start resource 'ora.anqing.anqing1.inst'.
PRKP-1001 : Error starting instance anqing2 on node rac2
CRS-0215: Could not start resource 'ora.anqing.anqing2.inst'.
看了下log,没有发现什么有价值的信息,后来把CRS 也重启了,这回连ASM 都启动不了,突发奇想,用sqlplus 连上去,居然ASM和 DB 都起来了。 今天不想研究这个问题, 先放一放,启动就好。
[oracle@rac1 u01]$ sh crs_stat.sh
Name Target State Host
------------------------------ ---------- --------- -------
ora.anqing.anqing1.inst ONLINE ONLINE rac1
ora.anqing.anqing2.inst ONLINE ONLINE rac2
ora.anqing.db ONLINE ONLINE rac2
ora.rac1.ASM1.asm ONLINE ONLINE rac1
ora.rac1.LISTENER_RAC1.lsnr ONLINE ONLINE rac1
ora.rac1.gsd ONLINE ONLINE rac1
ora.rac1.ons ONLINE ONLINE rac1
ora.rac1.vip ONLINE ONLINE rac1
ora.rac2.ASM2.asm ONLINE ONLINE rac2
ora.rac2.LISTENER_RAC2.lsnr ONLINE ONLINE rac2
ora.rac2.gsd ONLINE ONLINE rac2
ora.rac2.ons ONLINE ONLINE rac2
ora.rac2.vip ONLINE ONLINE rac2
SYS@anqing2(rac2)> show parameter utl;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
create_stored_outlines string
utl_file_dir string /u01/backup
3.6 建立数据字典
SYS@anqing2(rac2)> execute dbms_logmnr_d.build ('dict.ora','/u01/backup',dbms_logmnr_d.store_in_flat_file);
PL/SQL procedure successfully completed.
该数据字典可以直接用cat 命令来查看
3.7 添加归档日志
根据如下SQL找到对应的归档和online redo:
SQL>select * from v$archived_log order by stamp desc;
SQL>select MEMBER from v$logfile where group# in (select group# from v$log where status ='CURRENT');
SYS@anqing2(rac2)> exec dbms_logmnr.add_logfile(LogFileName=>'+DATA/anqing/onlinelog/redo03.log',Options=>dbms_logmnr.new);
PL/SQL procedure successfully completed.
SYS@anqing2(rac2)> exec dbms_logmnr.add_logfile(LogFileName=>'+FRA/anqing/archivelog/2_75_751552735.arc',Options=>dbms_logmnr.addfile);
PL/SQL procedure successfully completed.
SYS@anqing2(rac2)> exec dbms_logmnr.add_logfile(LogFileName=>'+FRA/anqing/archivelog/2_72_751552735.arc',Options=>dbms_logmnr.addfile);
PL/SQL procedure successfully completed.
SYS@anqing2(rac2)> exec dbms_logmnr.add_logfile(LogFileName=>'+FRA/anqing/archivelog/2_73_751552735.arc',Options=>dbms_logmnr.addfile);
PL/SQL procedure successfully completed.
SYS@anqing2(rac2)> exec dbms_logmnr.add_logfile(LogFileName=>'+FRA/anqing/archivelog/2_74_751552735.arc',Options=>dbms_logmnr.addfile);
PL/SQL procedure successfully completed.
SYS@anqing2(rac2)> exec dbms_logmnr.add_logfile(LogFileName=>'+FRA/anqing/archivelog/2_76_751552735.arc',Options=>dbms_logmnr.addfile);
PL/SQL procedure successfully completed.
3.8 开始Logminer
SYS@anqing2(rac2)> execute dbms_logmnr.start_logmnr(dictfilename=>'/u01/backup/dict.ora',options=>dbms_logmnr.ddl_dict_tracking);
PL/SQL procedure successfully completed.
3.9 查看结果
我们可以通过查看V$LOGMNR_CONTENTS 视图来查看我们挖掘的数据,但是这个视图的数据只对当前的SESSION 有效,所以我们需要创建一个表来保存该数据。
--修改时间格式
SQL>alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';
SYS@anqing2(rac2)> create table hn_logmnr nologging as select * from v$logmnr_contents where 1=2;
Table created.
SYS@anqing2(rac2)> insert /*+append */ into hn_logmnr select * from v$logmnr_contents;
250078 rows created.
SYS@anqing2(rac2)> commit;
Commit complete.
/* Formatted on 2011/6/19 14:12:25 (QP5 v5.163.1008.3004) */
SELECT SCN,
timestamp,
session#,
sql_redo,
sql_undo
FROM hn_logmnr
WHERE sql_redo LIKE 'delete from%HUAINING%';
因为内容太长,我这里列一条:
SQL_REDO:
/* Formatted on 2011/6/19 14:14:19 (QP5 v5.163.1008.3004) */
DELETE FROM "SYS"."HUAINING"
WHERE "OWNER" = 'SCOTT'
AND "OBJECT_NAME" = 'PK_DEPT'
AND "SUBOBJECT_NAME" IS NULL
AND "OBJECT_ID" = '51147'
AND "DATA_OBJECT_ID" = '51147'
AND "OBJECT_TYPE" = 'INDEX'
AND "CREATED" =
TO_DATE ('2005-06-30 19:47:57', 'yyyy-mm-dd hh24:mi:ss')
AND "LAST_DDL_TIME" =
TO_DATE ('2005-06-30 19:47:57', 'yyyy-mm-dd hh24:mi:ss')
AND "TIMESTAMP" = '2005-06-30:19:47:57'
AND "STATUS" = 'VALID'
AND "TEMPORARY" = 'N'
AND "GENERATED" = 'N'
AND "SECONDARY" = 'N'
AND ROWID = 'AAANSeAABAAA+qmAAc';
SQL_UNDO:
/* Formatted on 2011/6/19 14:14:37 (QP5 v5.163.1008.3004) */
INSERT INTO "SYS"."HUAINING" ("OWNER",
"OBJECT_NAME",
"SUBOBJECT_NAME",
"OBJECT_ID",
"DATA_OBJECT_ID",
"OBJECT_TYPE",
"CREATED",
"LAST_DDL_TIME",
"TIMESTAMP",
"STATUS",
"TEMPORARY",
"GENERATED",
"SECONDARY")
VALUES ('SCOTT',
'PK_DEPT',
NULL,
'51147',
'51147',
'INDEX',
TO_DATE ('2005-06-30 19:47:57', 'yyyy-mm-dd hh24:mi:ss'),
TO_DATE ('2005-06-30 19:47:57', 'yyyy-mm-dd hh24:mi:ss'),
'2005-06-30:19:47:57',
'VALID',
'N',
'N',
'N');
我们用spool 把这些SQL_UNDO 导出成sql 脚本,在执行一下,对应的DML操作就恢复过来了。
在这里看一个update的SQL:
/* Formatted on 2011/6/19 14:12:25 (QP5 v5.163.1008.3004) */
SELECT SCN,
timestamp,
session#,
sql_redo,
sql_undo
FROM hn_logmnr
WHERE sql_redo LIKE 'update %HUAINING%';
SQL_REDO:
/* Formatted on 2011/6/19 14:18:55 (QP5 v5.163.1008.3004) */
UPDATE "SYS"."HUAINING"
SET "OWNER" = 'DAVE'
WHERE "OWNER" = 'SYS' AND ROWID = 'AAANSeAABAAATIqAAD';
SQL_UNDO:
/* Formatted on 2011/6/19 14:18:58 (QP5 v5.163.1008.3004) */
UPDATE "SYS"."HUAINING"
SET "OWNER" = 'SYS'
WHERE "OWNER" = 'DAVE' AND ROWID = 'AAANSeAABAAATIqAAD';
我们可以用SQL_UNDO 来进行恢复, 注意这里有rowid。 一般情况rowid 是不会改变的,当move,shrink 等操作之后,rowid 就会发生改变。 这时候,我们在用SQL_UNDO 进行恢复的时候,就要先把rowid 给过滤掉在进行, 这个在小荷的Blog上用正则表达式过滤了。 因为他是重新建的表,rowid 肯定不一样。
正则语法:
spool hn.sql
select regexp_replace(SQL_UNDO,'and ROWID.+;',';')
from HN_logmnr
WHERE
table_name='HUAINING'
order by to_char(TIMESTAMP,'yyyy-mm-dd hh24:mi:ss') desc;
spool off
过滤之后的语法就可以直接去执行了:
/* Formatted on 2011/6/19 14:42:55 (QP5 v5.163.1008.3004) */
UPDATE "SYS"."HUAINING"
SET "OWNER" = 'SYS'
WHERE "OWNER" = 'DAVE';
3.10 结束Logminer
SQL> execute dbms_logmnr.end_logmnr;
-------------------------------------------------------------------------------------------------------