关键点:
-
rm -rf /home/user/
data-integration/./system/karaf/caches
-
rm -rf /home/user/
data-integration/./system/karaf/
data
karaf 是Kettle用于实现插件的一个组件, 比如一些大数据有关的shim都算做kettle的插件
在 Kettle安装目录/data-integration/plugins/pentaho-big-data-plugin/plugin.properties 中找到配置项
active.hadoop.configuration=cdh513
这里的"cdh513"就是 Kettle安装目录/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations 下的子目录名字, 又称shim, 它相当于hadoop不同版本的驱动
默认有4个shim, 分别对应hadoop的4个发行版, 用哪个就在上述的plugin.properties里配置好
配置shim的Hadoop setting文件:
Kettle安装目录/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/cdh513/下6个xml文件, 如core-site.xml, hbase-site.xml, hdfs-site.xml, yarn-site.xml, hive-site.xml, mapred-site.xml
或者直接从集群那里拷贝覆盖.(CDP各组件配置文件路径: Hadoop: /etc/hadoop/conf, hbase: /etc/hbase/conf, hive: /etc/hive/conf)
运行kitchen或pan的时候如果报错: no suitable driver found for jdbc:hive2
可以复制一遍jar包到kettle的lib目录下 以及 active shim的lib目录下
CDH的Hive在 /opt/cloudera/parcels/CDH/lib/hive
可以把/opt/cloudera/parcels/CDH/lib/hive/lib下所有hive开头的jar包复制到 Kettle安装目录/data-integration/lib 和 Kettle安装目录/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/cdh513/lib
例如(这里假设kettle安装在/opt下, 且当前生效的shim叫cdh513)
-
cp /opt/cloudera/parcels/CDH/
lib/hive/
lib/hive*.jar /opt/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/cdh513/
lib
-
cp /opt/cloudera/parcels/CDH/
lib/hive/
lib/hive*.jar /opt/data-integration/
lib
然后再清理一遍kettle的缓存, 否则Kettle可能会不识别刚才拷贝的jar文件:
-
rm -rf /home/fr-renjie.wei/
data-integration/./system/karaf/caches
-
rm -rf /home/fr-renjie.wei/
data-integration/./system/karaf/
data
似乎linux上的Kettle本身在调用Hive jar包的过程中有什么bug, 这个问题经常出现, 网上也有很多人问到这个bug.
我遇到情况是, Kitchen调用job会报这个错, pan不会报错.
我的解决办法:
还有一个一劳永逸的办法, 直接改Kitchen.sh, 加上rm这句
-
#!/bin/sh
-
-
# *****************************************************************************
-
#
-
# Pentaho Data Integration
-
#
-
# Copyright (C) 2005-2018 by Hitachi Vantara : http://www.pentaho.com
-
#
-
# *****************************************************************************
-
#
-
# Licensed under the Apache License, Version 2.0 (the "License");
-
# you may not use this file except in compliance with
-
# the License. You may obtain a copy of the License at
-
#
-
# http://www.apache.org/licenses/LICENSE-2.0
-
#
-
# Unless required by applicable law or agreed to in writing, software
-
# distributed under the License is distributed on an "AS IS" BASIS,
-
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-
# See the License for the specific language governing permissions and
-
# limitations under the License.
-
#
-
# *****************************************************************************
-
-
-
INITIALDIR=
"`pwd`"
-
BASEDIR=
"`dirname $0`"
-
cd
"$BASEDIR"
-
DIR=
"`pwd`"
-
cd - > /dev/null
-
rm -rf
$BASEDIR/./system/karaf/caches
#add this!
-
-
if [
"$1" =
"-x" ];
then
-
set LD_LIBRARY_PATH=
$LD_LIBRARY_PATH:
$BASEDIR/lib
-
export LD_LIBRARY_PATH
-
export OPT=
"-Xruntracer $OPT"
-
shift
-
fi
-
-
export IS_KITCHEN=
"true"
-
-
"$DIR/spoon.sh" -main org.pentaho.di.kitchen.Kitchen -initialDir
"$INITIALDIR/"
"$@"