1. 部署的环境准备
Kylin2.3.0 默认支持spark2.1版本,对版本的spark 2.2 兼容存在问题。
2. 下载最新的tar
最新下载地址
apache-kylin-2.3.0-hbase1x-bin.tar.gz
3. 解压缩,配置kylin环境
a) 配置kylin jvm 大小setenv.sh
Kylin再进行built的时候是比较耗内存,所以进行选择内存大一点的服务器部署。
export KYLIN_JVM_SETTINGS=" -XX:+DisableExplicitGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+CMSScavengeBeforeRemark -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=60 –Xms32g –Xmx32g –Xmn4g -XX:PermSize=256M -XX:MaxPermSize=256M -verbose:gc -Xloggc:$KYLIN_HOME/logs/kylin.gc.log-`date +'%Y%m%d%H%M'` -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps " |
b) kylin_hive_conf.xml 可以相应的进行调整
|
c) kylin_job_conf.xml 配置下job执行的队列
|
d) kylin_job_conf_inmem.xml 配置下map和reduce内存
|
a) 最重要的还是配置kylin.properties 文件,这里有spark和hbase的配置信息
kylin.metadata.url=kylin_metadata_new@hbase kylin.metadata.sync-retries=3 kylin.env.hdfs-working-dir=/kylin kylin.env=QA kylin.env.zookeeper-base-path=/kylin_new kylin.server.mode=all kylin.server.cluster-servers=10.10.16.111:7070 kylin.web.timezone=GMT+8 kylin.web.query-timeout=300000 kylin.web.cross-domain-enabled=true kylin.web.export-allow-admin=true kylin.web.export-allow-other=true kylin.web.hide-measures=RAW kylin.restclient.connection.default-max-per-route=20 kylin.restclient.connection.max-total=200 kylin.engine.default=2 kylin.storage.default=2 kylin.web.hive-limit=20 kylin.web.help.length=4 kylin.web.help.0=start|Getting Started|http://kylin.apache.org/docs21/tutorial/kylin_sample.html kylin.web.help.1=odbc|ODBC Driver|http://kylin.apache.org/docs21/tutorial/odbc.html kylin.web.help.2=tableau|Tableau Guide|http://kylin.apache.org/docs21/tutorial/tableau_91.html kylin.web.help.3=onboard|Cube Design Tutorial|http://kylin.apache.org/docs21/howto/howto_optimize_cubes.html kylin.web.link-streaming-guide=http://kylin.apache.org/ kylin.htrace.show-gui-trace-toggle=false kylin.web.link-hadoop= kylin.web.link-diagnostic= kylin.web.contact-mail= kylin.server.external-acl-provider= kylin.source.hive.client=cli kylin.source.hive.beeline-shell=beeline kylin.source.hive.enable-sparksql-for-table-ops=false kylin.source.hive.keep-flat-table=false kylin.source.hive.database-for-flat-table=dp_kylin kylin.source.hive.redistribute-flat-table=true kylin.storage.url=hbase kylin.storage.hbase.table-name-prefix=KYLIN_ kylin.storage.hbase.namespace=default kylin.storage.hbase.compression-codec=none kylin.storage.hbase.region-cut-gb=5 kylin.storage.hbase.hfile-size-gb=2 kylin.storage.hbase.min-region-count=1 kylin.storage.hbase.max-region-count=500 kylin.storage.hbase.coprocessor-mem-gb=3 kylin.storage.partition.aggr-spill-enabled=true kylin.storage.partition.max-scan-bytes=3221225472 kylin.job.retry=1 kylin.job.max-concurrent-jobs=10 kylin.job.sampling-percentage=100 kylin.engine.mr.yarn-check-interval-seconds=10 kylin.engine.mr.reduce-input-mb=2048 kylin.engine.mr.max-reducer-number=500 kylin.engine.mr.mapper-input-rows=1000000 kylin.engine.mr.build-dict-in-reducer=true kylin.engine.mr.uhc-reducer-count=1 kylin.engine.mr.build-uhc-dict-in-additional-step=false kylin.cube.cuboid-scheduler=org.apache.kylin.cube.cuboid.DefaultCuboidScheduler kylin.cube.segment-advisor=org.apache.kylin.cube.CubeSegmentAdvisor kylin.cube.algorithm=layer kylin.cube.algorithm.layer-or-inmem-threshold=7 kylin.cube.aggrgroup.max-combination=4096 kylin.snapshot.max-mb=300 kylin.cube.cubeplanner.enabled=false kylin.cube.cubeplanner.enabled-for-existing-cube=false kylin.cube.cubeplanner.expansion-threshold=15.0 kylin.cube.cubeplanner.recommend-cache-max-size=200 kylin.cube.cubeplanner.mandatory-rollup-threshold=1000 kylin.cube.cubeplanner.algorithm-threshold-greedy=10 kylin.cube.cubeplanner.algorithm-threshold-genetic=23 kylin.query.max-scan-bytes=0 kylin.query.cache-enabled=false kylin.query.security.table-acl-enabled=true kylin.query.interceptors=org.apache.kylin.rest.security.TableInterceptor kylin.query.escape-default-keyword=false kylin.query.transformers=org.apache.kylin.query.util.DefaultQueryTransformer,org.apache.kylin.query.util.KeywordDefaultDirtyHack kylin.security.profile=testing kylin.security.acl.admin-role=admin kylin.security.ldap.connection-server=ldap://ldap_server:389 kylin.security.ldap.connection-username= kylin.security.ldap.connection-password= kylin.security.ldap.user-search-base= kylin.security.ldap.user-search-pattern= kylin.security.ldap.user-group-search-base= kylin.security.ldap.user-group-search-filter=(|(member={0})(memberUid={1})) kylin.security.ldap.service-search-base= kylin.security.ldap.service-search-pattern= kylin.security.ldap.service-group-search-base= kylin.security.saml.metadata-file=classpath:sso_metadata.xml kylin.security.saml.metadata-entity-base-url=https://hostname/kylin kylin.security.saml.keystore-file=classpath:samlKeystore.jks kylin.security.saml.context-scheme=https kylin.security.saml.context-server-name=hostname kylin.security.saml.context-server-port=443 kylin.security.saml.context-path=/kylin kylin.engine.spark.rdd-partition-cut-mb=10 kylin.engine.spark.min-partition=1 kylin.engine.spark.max-partition=5000 kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.yarn.queue=lx_kylin kylin.engine.spark-conf.spark.executor.memory=3G kylin.engine.spark-conf.spark.executor.cores=2 kylin.engine.spark-conf.spark.executor.instances=12 kylin.engine.spark-conf.spark.eventLog.enabled=true kylin.engine.spark-conf.spark.eventLog.dir=hdfs://hacluster/sparklog kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs:///kylin/spark-history kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false kylin.storage.hbase.cluster-fs=hdfs://hacluster2/hbase kylin.engine.spark-conf.spark.submit.deployMode=cluster kylin.engine.spark-conf.spark.yarn.archive=hdfs://hacluster/system/spark-lib/2.1 kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec kylin.env.hadoop-conf-dir=/usr/local/fqlhadoop/hadoop/conf kylin.hadoop.conf.dir=/usr/local/fqlhadoop/hadoop/conf |
所有的配置根据相应的hadoop集群和hbase集群进行调整,配置完成后就可以启动Kylin 服务了。
4. 将kylin自带测试数据导入
执行sample.sh脚本就可以将数据导入hive中,和元数据到hbase表中
$KYLIN_HOME/bin/sample.sh |
执行这不一般不会出现问题,如果出现问题,那就是你的hadoop和hbase配置的路径问题了。
5. 启动kylin服务
$KYLIN_HOME/bin/kylin.sh start |
注意查看日志信息kylin.log和kylin.out,通过浏览器查看:http://xxx.xxx.xx.xx:7070/kylin
正常情况下,部署的任务就完成了,下面就是来页面进行测试的工作了。登陆页面默认用户名密码:ADMIN/KYLIN
1、 对kylin自带任务的kylin_sales_cube 的cube 进行built
在monitor监控中状态一直处于pengding,通过查看kylin.log日志,并无任务异常。查看源码,看执行代码,也没有发现问题。查看kylin.out 出现如下异常信息:
解决办法:
第一个图片,异常提示需要配置kylin.env.zookeeper-connect-string
1) 在kylin.properties 配置
kylin.env.zookeeper-connect-string=1.hadoop2.com,2.hadoop2.com,3.hadoop2.com:2181 |
第二张图片线上应该是zookeeper的客户端框架curator版本的问题。
2) 在$KYLIN_HOME/lib增加如下jar包,这个jar版本是从spark2.1中copy出来
curator-client-2.6.0.jar curator-framework-2.6.0.jar curator-recipes-2.6.0.jar zookeeper-3.4.6.jar |
重启kylin服务
2、 在built是出现hive-site.xml文件找不到的问题
如下图:
解决办法:将hive/conf中的hive-site.xml 复制到hadoop/conf中