我们先看下面几个我们实际工作中经常会遇到的问题:
1、Times Ten为什么有两个CheckPoint文件?
2、两个CheckPoint文件之间是什么关系?
3、两个CheckPoint文件与Trans Log之间是什么关系?
4、TimesTen是怎么维护两个CheckPoint文件和Trans Log的?
5、如果CheckPoint文件删除或者损坏会不会宕机?
6、只有CheckPoint文件和Trans Log是否可以对内存数据库进行恢复?
一、TimesTen的内存结构
为了很好的理解TimesTen中两个CheckPoint文件与Trans Log的关系及其原理,我们先简单介绍一下TimesTen的内存结构。
1、Data Store,数据库所有数据的保存区域;加载所有数据的内存区域,TimesTen的所有的运算都是在Data Store中完成的。
2、日志缓存(LogBuffer),用于暂时存储记录Data Store变更的日志缓冲区,由后台进程将缓冲区的数据写入Trans Log、复制进程实时复制到目标端。
3、临时数据区域,临时存储执行计划等数据的共享区域,排序等操作临时使用。
4、检查点(CheckPoint)文件,保存了两个互为备份的DsName.ds0和DsName.ds1,可以理解为Oracle的数据文件,是内存数据库保存于磁盘的数据镜像。
5、事务日志(Transation Log)文件DsName.logNNN,保存Data Store的数据变更到物理设备。
从内存结构图可以看到,TimesTen包含两个检查点(CheckPoint)文件,常常都会有客户或者同事会问到为什么会存在两个检查点(CheckPoint)文件,这两个文件之间是什么关系?其实这两个文件的关系很简单,就是互为镜像。
本文只针对上图两个CheckPoint文件之间的关系及两个CheckPoint文件与Trans Log之间的关系进行详述。
二、通过CheckPoint文件和Trans Log恢复
在实际运维中,虽然我们几乎不会出现采用CheckPoint文件和Trans Log进行恢复,这里为了很好的理解TimesTen中两个CheckPoint文件与Trans Log的关系及其原理,所以简单做一个通过CheckPoint文件和Trans Log恢复TimesTen内存数据库的实验。
下面是一个正常运行的DSN:
$ ttadmin -ramload tyinfo
RAM Residence Policy : manual
Manually Loaded In RAM : True
Replication Agent Policy : manual
Replication Manually Started : False
Cache Agent Policy : manual
Cache Agent Manually Started : False
$ ttstatus
TimesTen status report as of Thu Mar 5 20:46:19 2015
Daemon pid 3171 port 53396 instance tt1122
TimesTen server pid 3182 started on port 53397
------------------------------------------------------------------------
Data store /ttchk/tt1122/DataStore/TYINFO/tyinfodata
There are 11 connections to the data store
Shared Memory KEY 0x0601f765 ID 32769
Type PID Context Connection Name ConnID
Subdaemon 3177 0x0000000008641d00 Manager 127
Subdaemon 3177 0x00000000086b8df0 Rollback 126
Subdaemon 3177 0x0000000008766fd0 Flusher 125
Subdaemon 3177 0x00000000087bbe80 Monitor 124
Subdaemon 3177 0x0000000008831520 AsyncMV 119
Subdaemon 3177 0x0000000008856950 Deadlock Detector 123
Subdaemon 3177 0x000000000886b5a0 HistGC 120
Subdaemon 3177 0x00000000088801f0 Log Marker 118
Subdaemon 3177 0x00002aaac00008c0 Aging 122
Subdaemon 3177 0x00002aaac40008c0 IndexGC 121
Subdaemon 3177 0x00002aaac4015510 Checkpoint 117
RAM residence policy: Manual
Data store is manually loaded into RAM
Replication policy : Manual
Cache Agent policy : Manual
------------------------------------------------------------------------
Accessible by group timesten
End of report
$
此时采用模拟实例crash的方式模拟通过CheckPoint文件和Trans Log恢复TimesTen内存数据库。
1、模式实例crash
$ kill -9 3171
[timesten@tony5 ~]$ ttstatus
ttStatus: Could not connect to the TimesTen daemon.
If the TimesTen daemon is not running, please start it
by running "ttDaemonAdmin -start".
2、备份CheckPoint文件和Trans Log文件
$ ls -l
drwxrwxr-x 2 timesten timesten 4096 Jan 20 09:29 TYINFO
[timesten@tony5 DataStore]$ cp -r TYINFO TYINFO2
[timesten@tony5 DataStore]$ ls -l
total 8
drwxrwxr-x 2 timesten timesten 4096 Jan 20 09:29 TYINFO
drwxrwxr-x 2 timesten timesten 4096 Mar 5 20:59 TYINFO2
[timesten@tony5 DataStore]$ cd TYINFO2
[timesten@tony5 TYINFO2]$ ls
tyinfodata.ds0 tyinfodata.ds1 tyinfodata.inval
$ ls -l
total 397932
-rw-rw---- 1 timesten timesten 4411392 Mar 5 20:46 tyinfodata.log0
-rw-rw---- 1 timesten timesten 134217728 Dec 18 15:20 tyinfodata.res0
-rw-rw---- 1 timesten timesten 134217728 Dec 18 15:20 tyinfodata.res1
-rw-rw---- 1 timesten timesten 134217728 Dec 18 15:20 tyinfodata.res2
[timesten@tony5 TYINFO]$ cp tyinfodata.log0 tyinfodata.log0_b
[timesten@tony5 TYINFO]$ ls -l
total 402252
-rw-rw---- 1 timesten timesten 4411392 Mar 5 20:46 tyinfodata.log0
-rw-rw---- 1 timesten timesten 4411392 Mar 5 21:02 tyinfodata.log0_b
-rw-rw---- 1 timesten timesten 134217728 Dec 18 15:20 tyinfodata.res0
-rw-rw---- 1 timesten timesten 134217728 Dec 18 15:20 tyinfodata.res1
-rw-rw---- 1 timesten timesten 134217728 Dec 18 15:20 tyinfodata.res2
3、启动Daemon进程并删除DSN
$ ttDaemonAdmin -start -force
/TimesTen/tt1122/info/timestend.pid file exists, attempt start due to -force option.
TimesTen Daemon startup OK.
[timesten@tony5 TYINFO]$ ttdestroy tyinfo
4、新建新的同名DSN,并在停止Daemon进程状态替换原来的CheckPoint文件和Trans Log文件
$ ttisql tyinfo
Copyright (c) 1996, 2014, Oracle and/or its affiliates. All rights reserved.
Type ? or "help" for help, type "exit" to quit ttIsql.
connect "DSN=tyinfo";
Connection successful: DSN=TYINFO;UID=timesten;DataStore=/ttchk/tt1122/DataStore/TYINFO/tyinfodata;DatabaseCharacterSet=ZHS16GBK;ConnectionCharacterSet=ZHS16GBK;LogFileSize=128;DRIVER=/TimesTen/tt1122/lib/libtten.so;LogDir=/ttlog/tt1122/TYINFO;PermSize=128;TempSize=64;Connections=80;CkptFrequency=600;RecoveryThreads=4;TypeMode=0;PLSQL=0;CacheGridEnable=0;LogBufMB=64;ReceiverThreads=1;
(Default setting AutoCommit=1)
Command> exit
Disconnecting...
Done.
[timesten@tony5 ~]$ ttdaemonadmin -stop
TimesTen Daemon stopped.
$ cd /ttchk/tt1122/DataStore/
[timesten@tony5 DataStore]$ ls
TYINFO TYINFO2
[timesten@tony5 DataStore]$ rm -rf TYINFO
[timesten@tony5 DataStore]$ mv TYINFO2 TYINFO
[timesten@tony5 DataStore]$ ls
TYINFO
$ cd /ttlog/tt1122/TYINFO/
[timesten@tony5 TYINFO]$ ls
tyinfodata.log0 tyinfodata.log0_b tyinfodata.res0 tyinfodata.res1 tyinfodata.res2
[timesten@tony5 TYINFO]$ rm tyinfodata.log0
[timesten@tony5 TYINFO]$ mv tyinfodata.log0_b tyinfodata.log0
5、正常启动DSN即可,完成通过CheckPoint文件和Trans Log恢复TimesTen内存数据库
$ ttdaemonadmin -start
TimesTen Daemon startup OK.
[timesten@tony5 TYINFO]$ ttadmin -rampolicy manual tyinfo
RAM Residence Policy : manual
Manually Loaded In RAM : False
Replication Agent Policy : manual
Replication Manually Started : False
Cache Agent Policy : manual
Cache Agent Manually Started : False
[timesten@tony5 TYINFO]$ ttadmin -ramload tyinfo
RAM Residence Policy : manual
Manually Loaded In RAM : True
Replication Agent Policy : manual
Replication Manually Started : False
Cache Agent Policy : manual
Cache Agent Manually Started : False
[timesten@tony5 TYINFO]$ ttstatus
TimesTen status report as of Thu Mar 5 23:35:06 2015
Daemon pid 16174 port 53396 instance tt1122
TimesTen server pid 16185 started on port 53397
------------------------------------------------------------------------
Data store /ttchk/tt1122/DataStore/TYINFO/tyinfodata
There are 11 connections to the data store
Shared Memory KEY 0x0401f765 ID 163841
Type PID Context Connection Name ConnID
Subdaemon 16181 0x0000000002476d00 Manager 127
Subdaemon 16181 0x00000000024cdba0 Rollback 126
Subdaemon 16181 0x000000000259bfc0 Flusher 125
Subdaemon 16181 0x0000000002611660 Monitor 124
Subdaemon 16181 0x0000000002666510 Deadlock Detector 123
Subdaemon 16181 0x00000000026bb3c0 Checkpoint 122
Subdaemon 16181 0x0000000002710270 Aging 121
Subdaemon 16181 0x0000000002765120 Log Marker 120
Subdaemon 16181 0x00000000027b9fd0 AsyncMV 119
Subdaemon 16181 0x000000000280ee80 HistGC 118
Subdaemon 16181 0x0000000002863d30 IndexGC 117
RAM residence policy: Manual
Data store is manually loaded into RAM
Replication policy : Manual
Cache Agent policy : Manual
------------------------------------------------------------------------
Accessible by group timesten
End of report
[timesten@tony5 TYINFO]$
测试时可以模拟在原库中新建对象,插入数据等,恢复后对比恢复前后数据一致性。
三、两个CheckPoint文件与Trans Log的关系及原理
TimesTen为什么会有两个CheckPoint文件,原因其实很简单,互为镜像,至于为什么互为镜像但是又不能存放到不同的路径/存储,这属于设计方面的问题,这里不做讨论。
为了直观的说明两个CheckPoint文件与Trans Log的关系及原理,我把CKPT的过程抽象成下面图表,分别对创建DSN开始,到进行4次CKPT进行描述。
创建DSN:首次创建DSN时,TimesTen会生成DSName.log0和DSName.ds0和DSName.ds1三个文件,其中DSName.logN是一个顺序递增的文件名。为了下面描述更加直观,我们假设每一次CheckPoint时产生2个事务日志(Trans Log)文件。
第1次CKPT:由于每次产生两个Trans Log文件,所以第一次CheckPoint时,将DSName.log0/1/2三个事务日志的变化写入检查点文件DSName.ds0,检查点文件DSName.ds1不变。由于DSName.log0/1/2三个事务日志尚未写入检查点文件DSName.ds1,所以此时不会删除任何事务日志文件。
第2次CKPT:第二次CheckPoint时,会将DSName.log0/1/2/3/4五个事务日志的变化写入检查点文件DSName.ds1,检查点文件DSName.ds0不变。由于DSName.log0/1/2三个事务日志已经写入检查点文件DSName.ds0和DSName.ds1,所以此时SubDaemon进程会触发删除DSName.log0/1/2事务日志文件。
第3次CKPT:第三次CheckPoint时,会将DSName.log3/4/5/6四个事务日志的变化写入检查点文件DSName.ds0,检查点文件DSName.ds1不变。由于DSName.log3/4两个事务日志已经写入检查点文件DSName.ds1和DSName.ds0,所以此时SubDaemon进程会触发删除DSName.log3/4事务日志文件。
第4次CKPT:第三次CheckPoint时,会将DSName.log5/6/7/8四个事务日志的变化写入检查点文件DSName.ds1,检查点文件DSName.ds0不变。由于DSName.log5/6两个事务日志已经写入检查点文件DSName.ds0和DSName.ds1,所以此时SubDaemon进程会触发删除DSName.log5/6事务日志文件。后面的CheckPoint对检查点文件的写入和Trans Log事务日志的清理操作以此类推。
通过以上图示对两个CheckPoint文件与Trans Log的关系及原理进行描述,为了再次验证两个CheckPoint文件的镜像作用,我们通过手动模拟CheckPoint文件被误删除及损坏的场景。
四、模拟CheckPoint文件丢失或损坏
1、模拟1个检查点文件丢失
$ ttstatus
TimesTen status report as of Fri Mar 6 07:03:14 2015
Daemon pid 3138 port 53396 instance tt1122
TimesTen server pid 3156 started on port 53397
------------------------------------------------------------------------
Data store /ttchk/tt1122/DataStore/TYINFO/tyinfodata
There are 11 connections to the data store
Shared Memory KEY 0x0601f765 ID 32769
Type PID Context Connection Name ConnID
Subdaemon 3144 0x000000001bbfed00 Manager 127
Subdaemon 3144 0x000000001bc55ba0 Rollback 126
Subdaemon 3144 0x000000001bd23fd0 Flusher 125
Subdaemon 3144 0x000000001bd99670 Monitor 124
Subdaemon 3144 0x000000001bdee520 Deadlock Detector 123
Subdaemon 3144 0x000000001be433d0 HistGC 119
Subdaemon 3144 0x00002aaac00008c0 Checkpoint 122
Subdaemon 3144 0x00002aaac0015510 Aging 121
Subdaemon 3144 0x00002aaac40008c0 Log Marker 120
Subdaemon 3144 0x00002aaac4015510 AsyncMV 118
Subdaemon 3144 0x00002aaac80008c0 IndexGC 117
RAM residence policy: Manual
Data store is manually loaded into RAM
Replication policy : Manual
Cache Agent policy : Manual
------------------------------------------------------------------------
Accessible by group timesten
End of report
[timesten@tony5 TYINFO]$ ls
tyinfodata.ds0 tyinfodata.ds1 tyinfodata.inval
[timesten@tony5 TYINFO]$ rm tyinfodata.ds0
[timesten@tony5 TYINFO]$ ls
tyinfodata.ds1 tyinfodata.inval
[timesten@tony5 TYINFO]$ ttisql tyinfo
Copyright (c) 1996, 2014, Oracle and/or its affiliates. All rights reserved.
Type ? or "help" for help, type "exit" to quit ttIsql.
connect "DSN=tyinfo";
Connection successful: DSN=TYINFO;UID=timesten;DataStore=/ttchk/tt1122/DataStore/TYINFO/tyinfodata;DatabaseCharacterSet=ZHS16GBK;ConnectionCharacterSet=ZHS16GBK;LogFileSize=128;DRIVER=/TimesTen/tt1122/lib/libtten.so;LogDir=/ttlog/tt1122/TYINFO;PermSize=128;TempSize=64;Connections=80;CkptFrequency=600;RecoveryThreads=4;TypeMode=0;PLSQL=0;CacheGridEnable=0;LogBufMB=64;ReceiverThreads=1;
(Default setting AutoCommit=1)
Command> call ttckpt;
Command> exit
Disconnecting...
Done.
[timesten@tony5 TYINFO]$ ls
tyinfodata.ds0 tyinfodata.ds1 tyinfodata.inval
[timesten@tony5 TYINFO]$
通过上面模拟可以看到,当出现一个检查点文件丢失,TimesTen不会crash,并且在下一次CheckPoint时会自动生成(恢复)新的检查点文件。
2、模拟2个检查点文件丢失
$ ls
tyinfodata.ds0 tyinfodata.ds1 tyinfodata.inval
[timesten@tony5 TYINFO]$ rm tyinfodata.ds*
[timesten@tony5 TYINFO]$ ls
tyinfodata.inval
[timesten@tony5 TYINFO]$ ttstatus
TimesTen status report as of Fri Mar 6 07:07:04 2015
Daemon pid 3138 port 53396 instance tt1122
TimesTen server pid 3156 started on port 53397
------------------------------------------------------------------------
Data store /ttchk/tt1122/DataStore/TYINFO/tyinfodata
There are 11 connections to the data store
Shared Memory KEY 0x0601f765 ID 32769
Type PID Context Connection Name ConnID
Subdaemon 3144 0x000000001bbfed00 Manager 127
Subdaemon 3144 0x000000001bc55ba0 Rollback 126
Subdaemon 3144 0x000000001bd23fd0 Flusher 125
Subdaemon 3144 0x000000001bd99670 Monitor 124
Subdaemon 3144 0x000000001bdee520 Deadlock Detector 123
Subdaemon 3144 0x000000001be433d0 HistGC 119
Subdaemon 3144 0x00002aaac00008c0 Checkpoint 122
Subdaemon 3144 0x00002aaac0015510 Aging 121
Subdaemon 3144 0x00002aaac40008c0 Log Marker 120
Subdaemon 3144 0x00002aaac4015510 AsyncMV 118
Subdaemon 3144 0x00002aaac80008c0 IndexGC 117
RAM residence policy: Manual
Data store is manually loaded into RAM
Replication policy : Manual
Cache Agent policy : Manual
------------------------------------------------------------------------
Accessible by group timesten
End of report
[timesten@tony5 TYINFO]$ ttisql tyinfo
Copyright (c) 1996, 2014, Oracle and/or its affiliates. All rights reserved.
Type ? or "help" for help, type "exit" to quit ttIsql.
connect "DSN=tyinfo";
Connection successful: DSN=TYINFO;UID=timesten;DataStore=/ttchk/tt1122/DataStore/TYINFO/tyinfodata;DatabaseCharacterSet=ZHS16GBK;ConnectionCharacterSet=ZHS16GBK;LogFileSize=128;DRIVER=/TimesTen/tt1122/lib/libtten.so;LogDir=/ttlog/tt1122/TYINFO;PermSize=128;TempSize=64;Connections=80;CkptFrequency=600;RecoveryThreads=4;TypeMode=0;PLSQL=0;CacheGridEnable=0;LogBufMB=64;ReceiverThreads=1;
(Default setting AutoCommit=1)
Command> call ttckpt;
Command> exit
Disconnecting...
Done.
[timesten@tony5 TYINFO]$ ls
tyinfodata.ds1 tyinfodata.inval
[timesten@tony5 TYINFO]$ ttisql tyinfo
Copyright (c) 1996, 2014, Oracle and/or its affiliates. All rights reserved.
Type ? or "help" for help, type "exit" to quit ttIsql.
connect "DSN=tyinfo";
Connection successful: DSN=TYINFO;UID=timesten;DataStore=/ttchk/tt1122/DataStore/TYINFO/tyinfodata;DatabaseCharacterSet=ZHS16GBK;ConnectionCharacterSet=ZHS16GBK;LogFileSize=128;DRIVER=/TimesTen/tt1122/lib/libtten.so;LogDir=/ttlog/tt1122/TYINFO;PermSize=128;TempSize=64;Connections=80;CkptFrequency=600;RecoveryThreads=4;TypeMode=0;PLSQL=0;CacheGridEnable=0;LogBufMB=64;ReceiverThreads=1;
(Default setting AutoCommit=1)
Command> call ttckpt;
Command> exit
Disconnecting...
Done.
[timesten@tony5 TYINFO]$ ls
tyinfodata.ds0 tyinfodata.ds1 tyinfodata.inval
通过上面的测试可以看到,当出现两个检查点文件丢失,TimesTen不会crash,并且在两次CheckPoint时会自动生成(恢复)新的检查点文件。
3、模拟检查点文件损坏
由于要模拟检查点文件损坏较为困难,所以这里不模拟;当我们发现检查点文件损坏,可以通过转换成上面检查点文件丢失的场景,直接将损坏的检查点文件删除,并手动做两次CheckPoint使检查点文件自动生成;当然这样会耗费大量的IO资源。
-----------------End-----------------