1. 时间戳增量回滚同步
假定在源数据表中有一个字段会记录数据的新增或修改时间,可以通过它对数据在时间维度上进行排序。通过中间表记录每次更新的时间戳,在下一个同步周期时,通过这个时间戳同步该时间戳以后的增量数据。这是时间戳增量同步。
但是时间戳增量同步不能对源数据库中历史数据的删除操作进行同步,我就使用orcale物化视图的方式进行删除更新操作
说明:
- 源数据表 需要被同步的数据表
- 目标数据表 同步至的数据表
- 中间表 存储时间戳的表
2. 前期准备
在两个数据库中分别创建数据表,并通过脚本在源数据表中插入500万条数据,完成后再以每秒一条的速度插入新数据,模拟生产环境。
源数据表结构如下:
CREATE TABLE "OIM"."YG_TQ_FD_BASICINFO" ( "ID" NUMBER(10,0) NOT NULL ENABLE, "DECLAREDATE" VARCHAR2(16), "UPDATEDATE" VARCHAR2(16), "SECODE" VARCHAR2(40) NOT NULL ENABLE, "FRONTSYMBOL" VARCHAR2(40), "BACKSYMBOL" VARCHAR2(40), "SYMBOLCOMP" VARCHAR2(40), "SNAMECOMP" VARCHAR2(100), "ENABLED" VARCHAR2(20) NOT NULL ENABLE, "FDSNAME" VARCHAR2(100) NOT NULL ENABLE, "FDTYPE" VARCHAR2(100), "FDNAME" VARCHAR2(200), "FDTOTUNIT" NUMBER(16,4), "FDNATURE" VARCHAR2(40), "INVESTSTYLE" VARCHAR2(40), "INVESTGOAL" CLOB, "INVRULE" CLOB, "FDSTYLE" VARCHAR2(20), "FOUNDDATE" VARCHAR2(16), "LISTDATE" VARCHAR2(16), "ENDDATE" VARCHAR2(16), "KEEPERCODE" VARCHAR2(16), "KEEPERNAME" VARCHAR2(200), "TRUSTEECODE" VARCHAR2(16), "TRUSTEENAME" VARCHAR2(200), "MANAGERNAME" VARCHAR2(400), "DECISIONRULE" CLOB, "DECISIONPROC" CLOB, "DISTRIBUTPRIN" CLOB, "INVESTRANGE" CLOB, "INVESTPOLICY" CLOB, "INVESTSTD" CLOB, "RISKTYPE" CLOB, "RISKINDEX" CLOB, "FDINTRO" CLOB, "FDEVOLUTION" CLOB, "DISCLOSUREPEOPLE" VARCHAR2(60), "DISCLOSUREPHONE" VARCHAR2(100), "EXISTBEGDATE" VARCHAR2(16), "EXISTENDDATE" VARCHAR2(16), "PRIEXISTENDDATE" VARCHAR2(16), "EXSITPERIOD" NUMBER(16,2), "EXCHANGE" VARCHAR2(20) NOT NULL ENABLE, "FDSERIESCODE" VARCHAR2(12), "FDSERIESNAME" VARCHAR2(200), "TRASTYPE" VARCHAR2(20), "EMONOVERDATE" VARCHAR2(20), "SUBCONFDATE" VARCHAR2(20), "REDCONFDATE" VARCHAR2(20), "REDPAYDATE" VARCHAR2(20), "OUTSUBBEGDATE" VARCHAR2(16), "OUTSUBENDDATE" VARCHAR2(16), "INSUBBEGDATE" VARCHAR2(16), "INREDENDDATE" VARCHAR2(16), "FDINVCATEGORY" VARCHAR2(20), "FDMETHOD" VARCHAR2(20) NOT NULL ENABLE, "MEMO" CLOB, "TOTSHARE" NUMBER(18,6), "ISSHARESTAT" NUMBER(10,0) NOT NULL ENABLE, "ISSTAT" NUMBER(10,0) NOT NULL ENABLE, "ISVALID" NUMBER(10,0) NOT NULL ENABLE, "TMSTAMP" RAW(8) NOT NULL ENABLE, "ENTRYDATE" DATE, "ENTRYTIME" VARCHAR2(16), "TRADPLACE" VARCHAR2(20), "SECURITYID" VARCHAR2(40) NOT NULL ENABLE, "NAVPUBTYPE" VARCHAR2(20), "PROFITPAYTYPE" VARCHAR2(20), "OPERATEPERIOD" NUMBER(10,0), "OPERATEPERIODUNIT" VARCHAR2(20), "FSYMBOL" VARCHAR2(40) NOT NULL ENABLE, "KEEPERSNAME" VARCHAR2(100) NOT NULL ENABLE, "KEEPERINITIALS" VARCHAR2(20) NOT NULL ENABLE, "LIQUBEGINDATE" VARCHAR2(16), "LIQUENDDATE" VARCHAR2(16), "CREDITLINEID" NUMBER(10,0), CONSTRAINT "PK_YG_TQ_FD_BASICINFO" PRIMARY KEY ("ID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "OIMTS" ENABLE ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) TABLESPACE "OIMTS" LOB ("INVESTGOAL") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("INVRULE") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("DECISIONRULE") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("DECISIONPROC") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("DISTRIBUTPRIN") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("INVESTRANGE") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("INVESTPOLICY") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("INVESTSTD") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("RISKTYPE") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("RISKINDEX") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("FDINTRO") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("FDEVOLUTION") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)) LOB ("MEMO") STORE AS BASICFILE ( TABLESPACE "OIMTS" ENABLE STORAGE IN ROW CHUNK 8192 RETENTION NOCACHE LOGGING STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT))
3. 作业流程
- 开始组件
- 建时间戳中间表
- 获取中间表的时间戳,并设置为全局变量
- 抽取两个数据表的时间戳及时间戳以后的数据进行比对,并根据比对结果进行删除、新增或修改操作
- 删除物化视图中的需要删除的数据
- 更新时间戳
4. 创建作业
作业的最终截图如下:
4.1 创建作业和DB连接
打开Spoon工具,新建作业,然后在左侧主对象树DB连接中新建DB连接。创建连接并测试通过后可以在左侧DB连接下右键共享出来。因为在单个作业或者转换中新建的DB连接都是局域数据源,在其他转换和作业中是不能使用的,即使属于同一个作业下的不同转换,所以需要把他们共享,这样DB连接就会成为全局数据源,不用多次编辑。
4.2 建时间戳中间表
这一步是为了在目标数据库建中间表ETL_POSITION,并插入初始的时间戳字段。因为该作业在生产环境是循环调用的,该步骤在每一个同步周期中都会调用,所以在建表时需要判断该表是否已经存在,如果不存在才建表。
SQL代码和组件配置截图如下:
create table ${SCHEMA_NAME}.ETL_POSITION ( table_name VARCHAR2(100), last_position_time TIMESTAMP(6), last_position number(30), progress_history VARCHAR2(200), current_progress VARCHAR2(30), target_url VARCHAR2(400), target_table_name VARCHAR2(100), schema_name VARCHAR2(100) ); -- Add comments to the table comment on table ${SCHEMA_NAME}.ETL_POSITION is 'ETL事件唯一标示'; -- Add comments to the columns comment on column ${SCHEMA_NAME}.ETL_POSITION.table_name is '表的名称'; comment on column ${SCHEMA_NAME}.ETL_POSITION.last_position_time is '最后一次同步的日期'; comment on column ${SCHEMA_NAME}.ETL_POSITION.last_position_time is '最后一次同步的位置ID'; comment on column ${SCHEMA_NAME}.ETL_POSITION.progress_history is ' UNKNOW, MARK, FULLING, INCING, CLEAR, FAILED, SUCCESS;'; comment on column ${SCHEMA_NAME}.ETL_POSITION.current_progress is '当前状态'; comment on column ${SCHEMA_NAME}.ETL_POSITION.target_url is '目标表的URL(为了多数据源复制)'; comment on column ${SCHEMA_NAME}.ETL_POSITION.target_table_name is '目标表表名(为了多数据源复制)'; comment on column ${SCHEMA_NAME}.ETL_POSITION.schema_name is '表所在空间'; -- Create/Recreate indexes create unique index ${SCHEMA_NAME}.ETL_POWEIYI on ${SCHEMA_NAME}.ETL_POSITION (TABLE_NAME, SCHEMA_NAME, TARGET_URL,TARGET_TABLE_NAME);
-- 判断是否有事件表
select * from all_tables where TABLE_NAME = 'ETL_POSITION' AND OWNER='${SCHEMA_NAME}'
创建物化视图
CREATE MATERIALIZED VIEW LOG ON ${SCHEMA_NAME}.${TABLE_NAME} with primary key, sequence;
判断是否有物化视图
select * from all_mview_logs where master = '${TABLE_NAME}' and log_owner = '${SCHEMA_NAME}'
在作业中设置变量
具体看代码吧