1、将现成功运行的kettle打包,zip
备注:为连接impala,此kettle在官方原版kettle基础上,在KETTLE_HOME/lib和KETTLE_HOME/plugins/pentaho-big-data-plugin/hadoop-configurations/cdh6/lib添加了如下jar包:
zip -r kettle.zip /opt/kettle/data-integration
2、准备好jdk包
3、下载好centos父镜像
docker search centos
docker pull docker.io/centos:7
1、创建空目录
mkdir /kettle_docker
2、编写Dockerfile
cd /kettle_docker && vim Dockerfile
内容如下:(注意:记录下,kettle镜像没有配置时区)
FROM centos:7
MAINTAINER lhp
# 安装jdk
RUN mkdir /home/java
ADD jdk-8u241-linux-x64.tar.gz /home/java
ENV JAVA_HOME=/home/java/jdk1.8.0_241
ENV CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
ENV PATH=$PATH:$JAVA_HOME/bin
# 安装zip命令
RUN yum -y install zip unzip
# 解压kettle
RUN mkdir /opt/kettle
COPY kettle.zip /opt/kettle
RUN unzip /opt/kettle/kettle.zip
# 添加kettle环境变量
ENV KETTLE_HOME=/opt/kettle/data-integration
# 创建存放ktr job的目录
RUN mkdir /opt/kettle/kettle_job
# 开放端口号
EXPOSE 18080
# 设置启动cart
CMD sh /opt/kettle/data-integration/carte.sh /opt/kettle/data-integration/carte.xml
1、mkdir /opt/kettle/conf
此目录放置 carte.xml、kettle.properties、repositories.xml
2、mkdir /opt/kettle/kettle_job
此目录放置 job、ktr;(job中描述ktr所在位置要改正)
3、carte.xml内容:
<slave_config>
<slaveserver>
<name>master1</name>
<hostname>0.0.0.0</hostname>
<port>18080</port>
<username>newcluster</username>
<password>newcluster</password>
<master>Y</master>
</slaveserver>
</slave_config>
备注:
4、kettle.properties内容:
MYSQL_HOST = ***
MYSQL_DBNAME = ***
MYSQL_PORT = 53306
MYSQL_USER_NAME = root
MYSQL_PASSWORD = ***
HIVE_HOST = ***
HIVE_DBNAME = ***
HIVE_PORT = 22050
HIVE_USER_NAME = root
HIVE_PASSWORD = ***
5、repositories.xml:开发时在spoon创建资源库,会在KETTEL_HOME/.kettle目录下自动生成此xml文件,可以直接copy使用。
1、将jdk包和kettle.zip放置在/kettle_docker目录下
2、执行:
docker build -t lhp/kettle /kettle_docker
lhp/kettle是镜像标签,/kettle_docker是Dockerfile所在目录
3、创建并启动kettle容器:
使用本地的ktr、job文件时:
docker run --name kettle -d -p 18080:18080 -v /opt/kettle/kettle_job:/opt/kettle/kettle_job -v /opt/kettle/conf/kettle.properties:/opt/kettle/data-integration/.kettle/kettle.properties -v /opt/kettle/conf/carte.xml:/opt/kettle/data-integration/carte.xml lhp/kettle
使用远程资源库时:
docker run --name kettle -d -p 18080:18080 -v /opt/kettle/conf/repositories.xml:/opt/kettle/data-integration/.kettle/repositories.xml -v /opt/kettle/conf/kettle.properties:/opt/kettle/data-integration/.kettle/kettle.properties -v /opt/kettle/conf/carte.xml:/opt/kettle/data-integration/carte.xml lhp/kettle
注意:挂载的目录要与Dockerfile设置的一样
1、COPY和ADD命令不能拷贝上下文之外的本地文件,具体参考如下:
https://www.cnblogs.com/sparkdev/p/9573248.html
2、配置jdk环境变量要用 ENV
命令。(将JAVA_HOME export进/etc/profile再source,不可行)
3、进入某docker镜像
docker run -it image_name /bin/bash