Kettle实现Oracle到Trafodion数据定时抽取

在之前的一篇文章中介绍到如何用Kettle从MySQL迁移数据到Trafodion,请见http://blog.csdn.net/post_yuan/article/details/52804105
本文进一步介绍如何利用Kettle的定时任务机制结合Linux自带的Cronjob实现从Oracle数据库到Trafodion数据库的定时数据抽取。
首先,需要利用Kettle的Spoon工具创建相应的任务和转换,这里我们设计一个简单的从Oracle表向Trafodion表做增量数据抽取的任务,如下图

Kettle实现Oracle到Trafodion数据定时抽取_第1张图片
Kettle实现Oracle到Trafodion数据定时抽取_第2张图片

利用Kettle的Kitchen工具,简单测试调度可用性,具体命令和输出如下

[centos@cent-2 data-integration]$ /opt/install/pdi-ce-6.1.0.1-196/data-integration/kitchen.sh -norep \
> -file=/opt/install/pdi-ce-6.1.0.1-196/data-integration/test_job.kjb
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
05:27:16,652 INFO  [KarafInstance]
*******************************************************************************
*** Karaf Instance Number: 1 at /opt/install/pdi-ce-6.1.0.1-196/data-integr ***
***   ation/./system/karaf/caches/default/data-1                            ***
*** Karaf Port:8802                                                         ***
*** OSGI Service Port:9051                                                  ***
*******************************************************************************
05:27:16,653 INFO  [KarafBoot] Checking to see if org.pentaho.clean.karaf.cache is enabled
Mar 21, 2017 5:27:17 AM org.apache.karaf.main.Main$KarafLockCallback lockAquired
INFO: Lock acquired. Setting startlevel to 100
2017/03/21 05:27:17 - Kitchen - Start of run.
2017/03/21 05:27:17 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
Mar 21, 2017 5:27:19 AM org.pentaho.caching.impl.PentahoCacheManagerFactory$RegistrationHandler$1 onSuccess
INFO: New Caching Service registered
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/install/pdi-ce-6.1.0.1-196/data-integration/launcher/../lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/install/pdi-ce-6.1.0.1-196/data-integration/plugins/pentaho-big-data-plugin/lib/slf4j-log4j12-1.7.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2017/03/21 05:27:21 - test_job - Start of job execution
2017/03/21 05:27:21 - test_job - Starting entry [转换]
2017/03/21 05:27:21 - 转换 - Loading transformation from XML file [file:///opt/install/pdi-ce-6.1.0.1-196/data-integration/test_transfer.ktr]
2017/03/21 05:27:21 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2017/03/21 05:27:21 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2017/03/21 05:27:21 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2017/03/21 05:27:21 - test_transfer - Dispatching started for transformation [test_transfer]
2017/03/21 05:27:22 - TrafTable Output.0 - Connected to database [trafodion] (commit=1000)
2017/03/21 05:28:15 - 表输入.0 - Finished reading query, closing connection.
2017/03/21 05:28:15 - 表输入.0 - Finished processing (I=3, O=0, R=0, W=3, U=0, E=0)
2017/03/21 05:28:15 - TrafTable Output.0 - Finished processing (I=0, O=3, R=3, W=3, U=0, E=0)
2017/03/21 05:28:15 - test_job - Starting entry [成功]
2017/03/21 05:28:15 - test_job - Finished job entry [成功] (result=[true])
2017/03/21 05:28:15 - test_job - Finished job entry [转换] (result=[true])
2017/03/21 05:28:15 - test_job - Job execution finished
2017/03/21 05:28:15 - Kitchen - Finished!
2017/03/21 05:28:15 - Kitchen - Start=2017/03/21 05:27:17.276, Stop=2017/03/21 05:28:15.778
2017/03/21 05:28:15 - Kitchen - Processing ended after 58 seconds.

关于kitchen命令的详细用法及参数,可参考 wiki.pentaho.com/display/EAI/Kitchen+User+Documentation#KitchenUserDocumentation-Commandlineoptions
执行成功后,检查对应的Trafodion表数据是否抽取成功。测试成功后,编写对应的SHELL脚本,在脚本内部调用上述命令。这里假设创建的脚本为run_kettle_job.sh,利用Linux自带的Cronjob创建一个定时任务,如下
(关于Linux Cronjob的使用,可以参考 https://www.cyberciti.biz/faq/how-do-i-add-jobs-to-cron-under-linux-or-unix-oses/)

//编辑cronjob
crontab -e

//添加定时任务,每天6点运行一次kettle任务
0 6 * * * run_kettle_job.sh >> /opt/install/pdi-ce-6.1.0.1-196/data-integration/kitchen.log

你可能感兴趣的:(Kettle实现Oracle到Trafodion数据定时抽取)