原文:http://wiki.pentaho.com/display/EAI/Kitchen+User+Documentation#KitchenUserDocumentation-RunajobfromRepository
Kitchen 是一个让你可以从XML资源库或数据库资源库中启动作业的程序。通常作业是被自动地定期以批处理模式调度的。
第一步是安装JRE 1.5或更高,http://www.java.com/.
然后,只需将下载的Kettle的发行版本zip文件解压到任意目录。
进入Kettle目录,会有很多文件, 你可以找到叫Kitchen.bat和Kitchen.sh的文件,
但是如果你使用的是类Unix系统,那么需要修改脚本程序的权限使其可执行。
执行以下命令可以使所有的脚本变为可执行:
cd Kettle chmod +x *.sh
针对不同的平台提供了各自的脚本:
当然,Kitchen可以运行于任何安装了 JRE1.5或以上的系统平台上。
下面会介绍可以使用的各种命令行选项。
重要提示:
在Windows系统上,在命令行选项中使用减号"-"会像等号"="一样引起一些问题。因此从版本2.2.2开始,你可以使用如/, - 以及 :,=的组合。
下面以斜体显示的字段表示选项的值。
如果有空格存在,需要用双引号括起来.
示例:
/option:value
下面是其它有效的选项描述。
-version
该选项显示Kettle核心库的版本信息(Kettle.jar)。
同时会显示建版本号和构建日期。
参数
-param
设置自定义参数的值, 例如: -param:FOO=value
-listparam
显示所有在指定作业中定义的参数(名称,默认值及描述)。
参见: Named Parameters.
--file=filename
启动定义在XML文件中的作业。(.kjb : Kettle Job)
-param:key=value
指定一个参数的值。例如:"-param:MASTER_HOST=192.168.1.3" "-param:MASTER_PORT=8181"
参见: Named Parameters
-log=Logging Filename
指定日志文件,默认是标准输出方式。
-level=Logging Level
指定将被运行作业的日志级别。
以下是有效的选项值:
-rep=Repository name
连接到名称为"Repository name"的版本库。
同时要指定的参数可能有 -user, -pass, -dir and -job.
另外你也可以以环境变量的形式来指定KETTLE_REPOSITORY作为资源库。
-user=Username
This is the username with which you want to connect to the repository.
You can also specify this option in the form of environment variable KETTLE_USER.
-pass=Password
The password to use to connect to the repository
You can also specify this option in the form of environment variable KETTLE_PASSWORD.
-job=Job Name
Use this option to select the job to run from the repository. Please also select the directory with the "-dir" option.
-listdir=Y
Print a listing of all the sub-directories in the repository directory specified with the option "-dir".
-dir=directory
Specifies the directory in the repository to use. Repository directories are specified like this:
从版本2.2.2开始,所有平台上都使用/来分隔目录
-listjobs=Y
Show a list of all the jobs in the repository directory specified with the option "-dir".
-listrep=Y
Print a listing of all the defined repositories.
-norep=Y
例如你定义了KETTLE_REPOSITORY, KETTLE_USER, KETTLE_PASSWORD这些环境变量,但是你希望不让Kitchen去登录资源库。比如你希望从XMl中去启动某个作业。
Please make sure that you are positioned in the Kettle directory before running the samples below. If you put these scripts into a batch file or shell script, simply do a change directory to the installation directory:
If Kettle was installed on windows on the D:\ drive
D: cd \Kettle
If Kettle was installed in the /product directory on a Unix system:
cd /product/Kettle/
从文件中运行作业
This example runs a job from file on a windows platform:
kitchen.bat /file:D:\Jobs\updateWarehouse.kjb /level:Basic
This example runs a job from file on a Linux box:
kitchen.sh -file=/PRD/updateWarehouse.kjb -level=Minimal
从资源库中运行作业
This example runs a job from the repository on a windows platform:
(Enter on a single line without returns...)
kitchen.bat /rep:"Production Repository" /job:"Update dimensions" /dir:/Dimensions /user:matt /pass:somepassword123 /level:Basic
输出重定向
If you don't want the output of the file to appear on the screen but rather be put into a log file, you can use redirection.
This example adds the Kitchen output to an ever-growing log file(不停增长的日志):
kitchen.sh -file="/PRD/updateWarehouse.kjb" --level=Minimal >> /LOG/trans.log
This example writes the Kitchen output to a file that gets overwritten every time(每次运行旧的日志被覆盖):
kitchen.bat /file:C:\PRD\runAll.kjb /level:Basic > C:\LOG\trans.log
返回状态码
Kitchen returns an error code based on how the execution went:
The best way to go at it is to test the command first at the dos prompt.
Then you can use the windows scheduler to launch this command.
Windows versions since Windows 2000 have a GUI for doing this accessible through the control panel. However it's also possible to use the command line to do this:
at 23:30 /every:Monday,Wednesday,Friday "D:\updateWarehouse.bat
To see a list of the scheduled commands simply type:
at
First create a shell script that runs all the jobs you need. Then you can schedule this script to run.
On Unix like systems the easiest way to schedule a command is by using the "cron table". You can do this by entering the following command:
crontab -e
Then you can enter the time at which the command needs to be run as well as the command on a single line in the text file that is presented.
The first options are:
You can specify more then 1 number for each of these values by separating 2 number with a hyphen -. This means an inclusive number range. If you separate the number by commas (,), it means distinct values. If you use * instead of a number, it means: every possible hour, minute, day, month or weekday.
So, if you want to update the dimensions every hour, at 15 and 45 minutes past the hour during the weekdays, you might enter these lines in a crontab:
# # Launches the update of the dimensions in the warehouse # 15,45 * * * 1-5 /PROD/update_dimensions.sh #