数据仓库系统实现DW与ODS字段类型及长度比对的一个方案

1、 根据EXCEL映射表,生成插入映射关系到EDW_MAP表的PERL程序

INSERT_INTO_EDW_MAP.pl

 #!/usr/bin/perl -w ###################################################################### use strict; # Declare using Perl strict syntax use POSIX qw(strftime); use Win32::OLE qw(in with); use Win32::OLE::Const 'Microsoft Excel'; use Cwd; ###################################################################### chdir('D:/EDW_VSS/贷款ETL/01-设计'); #print cwd."/n"; $Win32::OLE::Warn = 5; # die on errors... # get already active Excel application or open new my $Excel = Win32::OLE->GetActiveObject("Excel.Application") || Win32::OLE->new("Excel.Application", "Quit"); # Win32::OLE->new("Excel.Application") my $filename = q(D:/EDW_VSS/贷款ETL/01-设计/吉林银行数据仓库字段映射T05_交易.xls); my $Book = $Excel->Workbooks->Open("$filename"); my $zt='T05_'; open(F,'>D:/INSERT_INTO_EDW_MAP_t05.SQL'); my $indent = 1; #要匹配的SDM的SHEET数,1..20表示从第一个SHEET到第二十个 $Book->Worksheets->count foreach my $Sheetnum(3..$Book->Worksheets->count){ my $Sheet = $Book->Worksheets($Sheetnum)||die "over"; my $LastRow = $Sheet->UsedRange->Find({What=>"*",SearchDirection=>xlPrevious,SearchOrder=>xlByRows})->{Row}; #my $LastRow =$Sheet->{MaxRow} ; #my $error_count = 0; for(my $ROW = 3;$ROW<$LastRow;$ROW++){ (next) unless defined $Sheet->Cells($ROW,11)->{'Value'}; my $Table_Name_Target = $Sheet->Cells($ROW,3)->{'Value'}; my $Table_Name_Source = $Sheet->Cells($ROW,10)->{'Value'}; $Table_Name_Target = uc($Table_Name_Target); $Table_Name_Source = uc($Table_Name_Source); my $Col_Src_Name = $Sheet->Cells($ROW,11)->{'Value'}; my $Col_Tag_Name = $Sheet->Cells($ROW,5)->{'Value'}; if($Col_Src_Name =~ //|/|/) {$Col_Src_Name =~ s/.*/|/|(/w+)/$1/ ;} if($Col_Src_Name eq "源字段名") {next;} $Col_Src_Name = uc($Col_Src_Name); $Col_Tag_Name = uc($Col_Tag_Name); print F "INSERT INTO EDW_MAP (ID, SRC_TABLE_NAME, SRC_COLUMN_NAME, DEST_TABLE_NAME, DEST_COLUMN_NAME) VALUES ('${zt}${indent}', '${Table_Name_Source}', '${Col_Src_Name}', '${Table_Name_Target}', '${Col_Tag_Name}');"; print F "/n"; $indent = $indent +1; } } close(F); # clean up after ourselves $Book->Close;

说明:生成不同主题的INSERT语句,要修改三处地方:my $filename = q(D:/EDW_VSS/贷款ETL/01-设计/吉林银行数据仓库字段映射T05_交易.xls)my $zt='T05_'open(F,'>D:/INSERT_INTO_EDW_MAP_t05.SQL');此程序已经过滤掉源表为空的映射;程序设置了一个自增的列作为ID主键,以方面映射表分开后的关联。

2、 实现此功能用到的一些表的表结构

建表脚本.sql

 --------------------------------------------- -- Export file for user EDW -- -- Created by nisj on 2010-09-06, 13:17:19 -- --------------------------------------------- spool 建表脚本.log prompt prompt Creating table EDW_MAP prompt ====================== prompt create table EDW_MAP ( ID VARCHAR2(32), SRC_TABLE_NAME VARCHAR2(32), SRC_COLUMN_NAME VARCHAR2(32), DEST_TABLE_NAME VARCHAR2(32), DEST_COLUMN_NAME VARCHAR2(32), SRC_COLUMN_TYPE VARCHAR2(32), DEST_COLUMN_TYPE VARCHAR2(32), SRC_LENGTH VARCHAR2(32), DEST_LENGTH VARCHAR2(32), REMARK VARCHAR2(1800) ) ; prompt prompt Creating table EDW_MAP_DEST prompt =========================== prompt create table EDW_MAP_DEST ( ID VARCHAR2(32), DEST_TABLE_NAME VARCHAR2(32), DEST_COLUMN_NAME VARCHAR2(32) ) ; prompt prompt Creating table EDW_MAP_OK prompt ========================= prompt create table EDW_MAP_OK ( ID VARCHAR2(32), SRC_TABLE_NAME VARCHAR2(32), SRC_COLUMN_NAME VARCHAR2(32), DEST_TABLE_NAME VARCHAR2(32), DEST_COLUMN_NAME VARCHAR2(32) ) ; prompt prompt Creating table EDW_MAP_SRC prompt ========================== prompt create table EDW_MAP_SRC ( ID VARCHAR2(32), SRC_TABLE_NAME VARCHAR2(32), SRC_COLUMN_NAME VARCHAR2(32) ) ; spool off

3、 将没有问题的EDW_MAP表中的映射关系插入到EDW_MAP_OK表中

插入数据到OK.sql

 DECLARE V_ID VARCHAR2(32); V_SRC_TABLE_NAME VARCHAR2(32); V_SRC_COLUMN_NAME VARCHAR2(32); V_DEST_TABLE_NAME VARCHAR2(32); V_DEST_COLUMN_NAME VARCHAR2(32); V_SRC_COLUMN_TYPE VARCHAR2(32); V_DEST_COLUMN_TYPE VARCHAR2(32); V_SRC_LENGTH VARCHAR2(32); V_DEST_LENGTH VARCHAR2(32); CURSOR CUR_EDW_MAP IS SELECT ID ,SRC_TABLE_NAME ,SRC_COLUMN_NAME ,DEST_TABLE_NAME ,DEST_COLUMN_NAME FROM EDW_MAP; BEGIN EXECUTE IMMEDIATE 'TRUNCATE TABLE EDW_MAP_SRC'; EXECUTE IMMEDIATE 'TRUNCATE TABLE EDW_MAP_DEST'; EXECUTE IMMEDIATE 'TRUNCATE TABLE EDW_MAP_OK'; OPEN CUR_EDW_MAP; LOOP FETCH CUR_EDW_MAP INTO V_ID ,V_SRC_TABLE_NAME ,V_SRC_COLUMN_NAME ,V_DEST_TABLE_NAME ,V_DEST_COLUMN_NAME; EXIT WHEN CUR_EDW_MAP%NOTFOUND; --插入到EDW_MAP_SRC表OK数据 INSERT INTO EDW_MAP_SRC (ID ,SRC_TABLE_NAME ,SRC_COLUMN_NAME) SELECT DISTINCT M.ID, M.SRC_TABLE_NAME, M.SRC_COLUMN_NAME FROM EDW_MAP M INNER JOIN ALL_TAB_COLUMNS A ON M.SRC_TABLE_NAME = A.TABLE_NAME AND M.SRC_COLUMN_NAME = A.COLUMN_NAME WHERE A.TABLE_NAME = V_SRC_TABLE_NAME AND A.OWNER = 'EDW' AND A.COLUMN_NAME = V_SRC_COLUMN_NAME AND M.ID = V_ID; --插入到EDW_MAP_DEST表OK数据 INSERT INTO EDW_MAP_DEST (ID ,DEST_TABLE_NAME ,DEST_COLUMN_NAME) SELECT DISTINCT M.ID, M.DEST_TABLE_NAME, M.DEST_COLUMN_NAME FROM EDW_MAP M INNER JOIN ALL_TAB_COLUMNS A ON M.DEST_TABLE_NAME = A.TABLE_NAME AND M.DEST_COLUMN_NAME = A.COLUMN_NAME WHERE A.TABLE_NAME = V_DEST_TABLE_NAME AND A.OWNER = 'EDW' AND A.COLUMN_NAME = V_DEST_COLUMN_NAME AND M.ID = V_ID; END LOOP; CLOSE CUR_EDW_MAP; --插入到新的映射表 INSERT INTO EDW_MAP_OK (ID ,SRC_TABLE_NAME ,SRC_COLUMN_NAME ,DEST_TABLE_NAME ,DEST_COLUMN_NAME) SELECT S.ID ,S.SRC_TABLE_NAME ,S.SRC_COLUMN_NAME ,D.DEST_TABLE_NAME ,D.DEST_COLUMN_NAME FROM EDW_MAP_SRC S INNER JOIN EDW_MAP_DEST D ON S.ID = D.ID; --提交事务 COMMIT; END;

说明:为防止没有发现数据的游标赋值错误,先将满足条件的映射插入到EDW_MAP_OK表中;注意,一定要用到ID变量,以防止出现重复的情况。

4、 更新EDW_MAP表中的数据类型及长度相关字段

计算源和目标字段长度的代码.sql

 DECLARE V_ID VARCHAR2(32); V_SRC_TABLE_NAME VARCHAR2(32); V_SRC_COLUMN_NAME VARCHAR2(32); V_DEST_TABLE_NAME VARCHAR2(32); V_DEST_COLUMN_NAME VARCHAR2(32); V_SRC_COLUMN_TYPE VARCHAR2(32); V_DEST_COLUMN_TYPE VARCHAR2(32); V_SRC_LENGTH VARCHAR2(32); V_DEST_LENGTH VARCHAR2(32); CURSOR CUR_EDW_MAP IS SELECT ID ,SRC_TABLE_NAME ,SRC_COLUMN_NAME ,DEST_TABLE_NAME ,DEST_COLUMN_NAME FROM EDW_MAP_OK; BEGIN OPEN CUR_EDW_MAP; LOOP FETCH CUR_EDW_MAP INTO V_ID ,V_SRC_TABLE_NAME ,V_SRC_COLUMN_NAME ,V_DEST_TABLE_NAME ,V_DEST_COLUMN_NAME; EXIT WHEN CUR_EDW_MAP%NOTFOUND; --计算源表字段类型及其长度 SELECT DATA_TYPE || '(' || (DECODE(DATA_TYPE, 'VARCHAR2', DATA_LENGTH, 'NUMBER', DATA_PRECISION)) || (DECODE(DATA_TYPE, 'VARCHAR2', '', 'NUMBER', ',' || DATA_SCALE)) || ')' ,DECODE(DATA_TYPE, 'VARCHAR2', DATA_LENGTH, 'NUMBER', DATA_PRECISION) INTO V_SRC_COLUMN_TYPE, V_SRC_LENGTH FROM ALL_TAB_COLUMNS A INNER JOIN ALL_COL_COMMENTS B ON A.OWNER = B.OWNER AND A.TABLE_NAME = B.TABLE_NAME AND A.COLUMN_NAME = B.COLUMN_NAME WHERE A.TABLE_NAME = V_SRC_TABLE_NAME AND A.OWNER = 'EDW' AND A.COLUMN_NAME = V_SRC_COLUMN_NAME; --计算目的表字段类型及其长度 SELECT DATA_TYPE || '(' || (DECODE(DATA_TYPE, 'VARCHAR2', DATA_LENGTH, 'NUMBER', DATA_PRECISION)) || (DECODE(DATA_TYPE, 'VARCHAR2', '', 'NUMBER', ',' || DATA_SCALE)) || ')' ,DECODE(DATA_TYPE, 'VARCHAR2', DATA_LENGTH, 'NUMBER', DATA_PRECISION) INTO V_DEST_COLUMN_TYPE, V_DEST_LENGTH FROM ALL_TAB_COLUMNS A INNER JOIN ALL_COL_COMMENTS B ON A.OWNER = B.OWNER AND A.TABLE_NAME = B.TABLE_NAME AND A.COLUMN_NAME = B.COLUMN_NAME WHERE A.TABLE_NAME = V_DEST_TABLE_NAME AND A.OWNER = 'EDW' AND A.COLUMN_NAME = V_DEST_COLUMN_NAME; --更新EDW_MAP表的值 UPDATE EDW_MAP SET SRC_COLUMN_TYPE = V_SRC_COLUMN_TYPE, DEST_COLUMN_TYPE = V_DEST_COLUMN_TYPE, SRC_LENGTH = V_SRC_LENGTH, DEST_LENGTH = V_DEST_LENGTH WHERE ID = V_ID; --提交事务 COMMIT; END LOOP; CLOSE CUR_EDW_MAP; END;

5、 查看目标表比源表短的记录

对结果查询分析.sql

 SELECT * FROM EDW_MAP WHERE DEST_COLUMN_NAME NOT LIKE '%CD' AND DEST_COLUMN_NAME NOT LIKE '%FLAG' AND TO_NUMBER(SRC_LENGTH) > TO_NUMBER(DEST_LENGTH) ORDER BY SUBSTR(ID, 1, 3), TO_NUMBER(SUBSTR(ID, 5));

说明:过滤掉含有CDFLAG的字段值,因为这些都已经作过转换,可以不予考虑。

6、 总结

如果源与目标字段采用手工方式比对,可以想象那将是一个多么烦人、效率低下、且容易出错的事情;采用些方案,虽然也不简单,但比起手工来,效率提高了不止一个等级。

另:通过EDW_MAP与EDW_MAP_OK比较,可以找出SDM映射中的错误。

你可能感兴趣的:(JOIN,Excel,Microsoft,table,数据仓库,insert)