CDH集群中安装kylin:code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

下载cdh版的kylin tar包: http://mirrors.tuna.tsinghua.edu.cn/apache/kylin/apache-kylin-2.6.1/apache-kylin-2.6.1-bin-cdh60.tar.gz
注: 此版本kylin依赖jdk8

1, 配置环境变量

export JAVA_HOME=/usr/java/jdk1.8
export KYLIN_HOME=/opt/kylin
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export PATH=$PATH:$JAVA_HOME/bin:$KYLIN_HOME/bin

2,启动服务

  • check-env.sh , 查看是否配置正确
  • kylin.sh start

kylin启动报错: 找不到SPARK_HOME (2.6.1有没有自带spark, 2.1.0有自带spark不会报此错误)

提示spark not found, set SPARK_HOME, or run bin/download-spark.sh
function retrieveDependency() {
    #retrive $hive_dependency and $hbase_dependency
    source ${dir}/find-hive-dependency.sh
    source ${dir}/find-hbase-dependency.sh
    source ${dir}/find-hadoop-conf-dir.sh
    source ${dir}/find-kafka-dependency.sh
    source ${dir}/find-spark-dependency.sh
	
find-spark-dependency.sh==>定位报错点
if [ ! -d "$spark_home/jars" ]
  then
    quit "spark not found, set SPARK_HOME, or run bin/download-spark.sh"
fi
======》下载对应版本的spark: ./bin/download-spark.sh

启动服务: kylin.sh start ==>webui:  http://host:7070/kylin ,ADMIN/KYLIN

3,测试使用:code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

测试使用: sample.sh /sample-streaming.sh, 重启服务kylin.sh stop && kylin.sh start
cube-->action: build 报错日志
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
排查错误: 设置kylin_hive_conf.xml文件, 由于内存过下,使得程序不能运行,所以要关闭map端join的优化,设置相关值为false

	hive.auto.convert.join
	true


	hive.auto.convert.join.noconditionaltask
	true


	hive.auto.convert.join.noconditionaltask.size
	100000000


或者配置hive启动参数:
SET hive.auto.convert.join=false;
SET mapreduce.map.memory.mb = 16384; 
SET mapreduce.map.java.opts='-Djava.net.preferIPv4Stack=true -Xmx13107M';
SET mapreduce.reduce.memory.mb = 13107; 
SET mapreduce.reduce.java.opts='-Djava.net.preferIPv4Stack=true -Xmx16384M';
set hive.support.concurrency = false;

4,配置kylin.properties文件

配置项 k1-v1 k2-v2 k3-v3
METADATA ,ENV kylin.metadata.url=kylin_metadata@hbase kylin.server.mode=all kylin.server.cluster-servers=node1:7070
SOURCE kylin.source.hive.client=cli kylin.source.hive.enable-sparksql-for-table-ops=true
JOB kylin.job.retry=4 kylin.job.max-concurrent-jobs=10
ENGINE kylin.job.retry=4 kylin.engine.mr.reduce-input-mb=200 kylin.engine.mr.max-reducer-number=100
SPARK ENGINE CONFIGS kylin.env.hadoop-conf-dir=/etc/hadoop/conf kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.yarn.archive=hdfs://xxx/spark-libs.jar

5,配置kylin使用spark-engine

  • 配置kylin.properties
#### SOURCE ###
kylin.source.hive.enable-sparksql-for-table-ops=false
kylin.source.hive.sparksql-beeline-shell=/path/to/spark-client/bin/beeline
hive.security.authorization.sqlstd.confwhitelist.append= -u jdbc:hive2://localhost:10000

#### SPARK ENGINE CONFIGS ###
kylin.env.hadoop-conf-dir=/etc/hadoop/conf
kylin.engine.spark-conf.spark.master=yarn
#拷贝hive-site.xml到指定的hadoop配置目录
cp /etc/hive/conf/hive-site.xml    /etc/hadoop/conf
  • 配置spark的在hdfs文件系统的jar包
jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ .
hadoop fs -mkdir -p /kylin/spark/
hadoop fs -put spark-libs.jar /kylin/spark/

# 配置kylin.properties
kylin.engine.spark-conf.spark.yarn.archive=hdfs://node1:8020/kylin/spark/spark-libs.jar

6,设置开机启动

[hdfs@node1 kylin]$ tail /etc/rc.d/rc.local
service mysqld start
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
/root/sh/cdh.sh start
sleep 100
#ssh root@localhost 'source /etc/profile; su -s /bin/sh -c "/opt/kylin/bin/kylin.sh start" hdfs'
/usr/local/bin/kylin start

你可能感兴趣的:(kylin,kylin安装)