Kylin2.5.0环境搭建及操作记录

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

Apache Kylin是一个开源的分布式分析引擎,提供Hadoop/Spark之上的SQL查询接口及多维分析(OLAP)能力以支持超大规模数据,最初由eBay Inc. 开发并贡献至开源社区。它能在亚秒内查询巨大的Hive表。

Kylin2.5.0环境搭建及操作记录_第1张图片

Kylin2.5.0环境搭建及操作记录_第2张图片

Kylin2.5.0环境搭建及操作记录_第3张图片

伪分布式环境搭建

hadoop-2.7.7安装
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

hive2.1.1
https://my.oschina.net/peakfang/blog/2236971

hbase1.2.6
http://hbase.apache.org/book.html#_introduction

kylin-2.2.0
http://kylin.apache.org/docs/install/index.html

如果所用的spark为(hive on spark)源码编译不带hive jar包,或者1.6.3版本时,因SPARK_HOME目录下无jars目录,启动kylin时会报如下错误

find: ‘/usr/local/spark-1.6.3/jars’: No such file or directory

[root@node222 local]# vi kylin-2.5.0/bin/find-spark-dependency.sh
# 38行 jars 改成lib
[root@node222 local]# kylin-2.5.0/bin/kylin.sh  start
Retrieving hadoop conf dir...
KYLIN_HOME is set to /usr/local/kylin-2.5.0
Retrieving hive dependency...
Retrieving hbase dependency...
Retrieving hadoop conf dir...
Retrieving kafka dependency...
Retrieving Spark dependency...
Start to check whether we need to migrate acl tables
Retrieving hadoop conf dir...
KYLIN_HOME is set to /usr/local/kylin-2.5.0
Retrieving hive dependency...
Retrieving hbase dependency...
Retrieving hadoop conf dir...
Retrieving kafka dependency...
Retrieving Spark dependency...
......
A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /usr/local/kylin-2.5.0/logs/kylin.log
Web UI is at http://:7070/kylin

快速入门

http://kylin.apache.org/docs/tutorial/kylin_sample.html

[root@node222 local]# kylin-2.5.0/bin/sample.sh
Retrieving hadoop conf dir...
Loading sample data into HDFS tmp path: /tmp/kylin/sample_cube/data
......
Loading data to table default.kylin_sales
OK
Time taken: 1.257 seconds
Loading data to table default.kylin_account
OK
Time taken: 0.455 seconds
Loading data to table default.kylin_country
OK
Time taken: 0.385 seconds
Loading data to table default.kylin_cal_dt
OK
Time taken: 0.579 seconds
Loading data to table default.kylin_category_groupings
OK
Time taken: 0.502 seconds
......
Sample cube is created successfully in project 'learn_kylin'.
Restart Kylin Server or click Web UI => System Tab => Reload Metadata to take effect

通过web ui build kylin_sales_cube 如果提示如下错误,则需要启动historyserver服务

ll From node222/192.168.0.222 to localhost:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
java.io.IOException: java.net.ConnectException: Call From node222/192.168.0.222 to localhost:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:334)
        at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:371)

#启动historyserver服务,再执行成功
[root@node222 ~]# /usr/local/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh  start  historyserver
starting historyserver, logging to /usr/local/hadoop-2.7.7/logs/mapred-root-historyserver-node222.out

build 过程可通过monitor界面监控执行进度

Kylin2.5.0环境搭建及操作记录_第4张图片

通过insight界面执行SQL

Kylin2.5.0环境搭建及操作记录_第5张图片

kylin能执行的查询与model定义的连接类型一致,如model中定义的都是inner join 则insight中只能执行inner join 不能执行left join

结果可简单的通过可视化展示

Kylin2.5.0环境搭建及操作记录_第6张图片

Kylin2.5.0环境搭建及操作记录_第7张图片

安装系统模型

http://kylin.apache.org/docs/tutorial/setup_systemcube.html

在KYLIN_HOME目录下创建配置文件,SCSinkTools.json

[
  [
    "org.apache.kylin.tool.metrics.systemcube.util.HiveSinkTool",
    {
      "storage_type": 2,
      "cube_desc_override_properties": [
        "java.util.HashMap",
        {
          "kylin.cube.algorithm": "INMEM",
          "kylin.cube.max-building-segments": "1"
        }
      ]
    }
  ]
]

生成元数据

[root@node222 kylin-2.5.0]# ./bin/kylin.sh org.apache.kylin.tool.metrics.systemcube.SCCreator -inputConfig SCSinkTools.json -output system_cube
Retrieving hadoop conf dir...
KYLIN_HOME is set to /usr/local/kylin-2.5.0
Retrieving hive dependency...
Retrieving hbase dependency...
Retrieving hadoop conf dir...
Retrieving kafka dependency...
Retrieving Spark dependency...
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/kylin-2.5.0/tool/kylin-tool-2.5.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-12-14 10:56:36,096 INFO  [main] common.KylinConfig:332 : Loading kylin-defaults.properties from file:/usr/local/kylin-2.5.0/tool/kylin-tool-2.5.0.jar!/kylin-defaults.properties
2018-12-14 10:56:36,136 DEBUG [main] common.KylinConfig:291 : KYLIN_CONF property was not set, will seek KYLIN_HOME env variable
2018-12-14 10:56:36,144 INFO  [main] common.KylinConfig:99 : Initialized a new KylinConfig from getInstanceFromEnv : 1987083830
Running org.apache.kylin.tool.metrics.systemcube.SCCreator -inputConfig SCSinkTools.json -output system_cube
2018-12-14 10:56:36,931 INFO  [main] measure.MeasureTypeFactory:116 : Checking custom measure types from kylin config
2018-12-14 10:56:36,934 INFO  [main] measure.MeasureTypeFactory:145 : registering COUNT_DISTINCT(hllc), class org.apache.kylin.measure.hllc.HLLCMeasureType$Factory
2018-12-14 10:56:36,985 INFO  [main] measure.MeasureTypeFactory:145 : registering COUNT_DISTINCT(bitmap), class org.apache.kylin.measure.bitmap.BitmapMeasureType$Factory
2018-12-14 10:56:37,001 INFO  [main] measure.MeasureTypeFactory:145 : registering TOP_N(topn), class org.apache.kylin.measure.topn.TopNMeasureType$Factory
2018-12-14 10:56:37,006 INFO  [main] measure.MeasureTypeFactory:145 : registering RAW(raw), class org.apache.kylin.measure.raw.RawMeasureType$Factory
2018-12-14 10:56:37,009 INFO  [main] measure.MeasureTypeFactory:145 : registering EXTENDED_COLUMN(extendedcolumn), class org.apache.kylin.measure.extendedcolumn.ExtendedColumnMeasureType$Factory
2018-12-14 10:56:37,011 INFO  [main] measure.MeasureTypeFactory:145 : registering PERCENTILE_APPROX(percentile), class org.apache.kylin.measure.percentile.PercentileMeasureType$Factory
2018-12-14 10:56:37,014 INFO  [main] measure.MeasureTypeFactory:145 : registering COUNT_DISTINCT(dim_dc), class org.apache.kylin.measure.dim.DimCountDistinctMeasureType$Factory

[root@node222 kylin-2.5.0]# ll system_cube/
total 20
-rw-r--r-- 1 root root 3282 Dec 14 10:56 create_hive_tables_for_system_cubes.sql
drwxr-xr-x 2 root root 4096 Dec 14 10:56 cube
drwxr-xr-x 2 root root 4096 Dec 14 10:56 cube_desc
drwxr-xr-x 2 root root 4096 Dec 14 10:56 model_desc
drwxr-xr-x 2 root root   30 Dec 14 10:56 project
drwxr-xr-x 2 root root 4096 Dec 14 10:56 table

创建hive表

[root@node222 kylin-2.5.0]# hive -f system_cube/create_hive_tables_for_system_cubes.sql

Logging initialized using configuration in jar:file:/usr/local/hive-2.1.1/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
OK
Time taken: 2.099 seconds
OK
Time taken: 0.162 seconds
OK
Time taken: 0.741 seconds
OK
Time taken: 0.028 seconds
OK
Time taken: 0.169 seconds
OK
Time taken: 0.027 seconds
OK
Time taken: 0.134 seconds
OK
Time taken: 0.033 seconds
OK
Time taken: 0.15 seconds
OK
Time taken: 0.026 seconds
OK
Time taken: 0.116 seconds

hive> use kylin;
OK
Time taken: 0.053 seconds
hive> show tables;
OK
hive_metrics_job_exception_qa
hive_metrics_job_qa
hive_metrics_query_cube_qa
hive_metrics_query_qa
hive_metrics_query_rpc_qa
Time taken: 0.11 seconds, Fetched: 5 row(s)

更新元数据

[root@node222 kylin-2.5.0]# ./bin/metastore.sh restore system_cube
Starting restoring system_cube
Retrieving hadoop conf dir...
KYLIN_HOME is set to /usr/local/kylin-2.5.0
......
2018-12-14 11:02:40,126 INFO  [main-EventThread] zookeeper.ClientCnxn:512 : EventThread shut down

刷新元数据

在system页面reload metadata

构建system cube

直接通过webui或者脚本构建时都会报错

Kylin2.5.0环境搭建及操作记录_第8张图片
查看/usr/local/kylin-2.5.0/logs/system_cube_KYLIN_HIVE_METRICS_QUERY_QA_1544756400000.log
2018-12-14 11:17:51,783 ERROR [main] job.CubeBuildingCLI:134 : error start cube building
java.lang.RuntimeException: error execute org.apache.kylin.tool.job.CubeBuildingCLI. Root cause: Inconsistent cube desc signature for CubeDesc [name=KYLIN_HIVE
_METRICS_QUERY_QA]

在webui 上重新保存各个cube,再构建即可。

Kylin2.5.0环境搭建及操作记录_第9张图片

创建构建脚本

#!/bin/bash

dir=$(dirname ${0})
export KYLIN_HOME=${dir}/../

CUBE=$1
INTERVAL=$2
DELAY=$3
CURRENT_TIME_IN_SECOND=`date +%s`
CURRENT_TIME=$((CURRENT_TIME_IN_SECOND * 1000))
END_TIME=$((CURRENT_TIME-DELAY))
END=$((END_TIME - END_TIME%INTERVAL))

ID="$END"
echo "building for ${CUBE}_${ID}" >> ${KYLIN_HOME}/logs/build_trace.log
sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube ${CUBE} --endTime ${END} > ${KYLIN_HOME}/logs/system_cube_${CUBE}_${END}.log 2>&1 &

测试构建:
[root@node222 kylin-2.5.0]# ./bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_QA 3600000 1200000

配置成定时任务自动运行

[root@node222 kylin-2.5.0]# cat conf/schedule_system_cube_build.cron
0 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_QA 3600000 1200000

20 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_CUBE_QA 3600000 1200000

40 */4 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_RPC_QA 3600000 1200000

30 */4 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_JOB_QA 3600000 1200000

50 */12 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_JOB_EXCEPTION_QA 3600000 12000
[root@node222 kylin-2.5.0]# crontab  conf/schedule_system_cube_build.cron
[root@node222 kylin-2.5.0]# crontab  -l

编译完成后

Kylin2.5.0环境搭建及操作记录_第10张图片

启用仪表板

http://kylin.apache.org/docs/tutorial/use_dashboard.html

Kylin2.5.0环境搭建及操作记录_第11张图片

整个过程可以通过KYLIN_HOME/logs/kylin.log文件查看执行日志信息

如果server关闭,再重启kylin服务时报如下错误

[root@node222 ~]# kylin.sh  start
Retrieving hadoop conf dir...

......

Exception in thread "main" java.lang.IllegalArgumentException: Failed to find metadata store by url: kylin_metadata@hbase
        at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:98)
        at org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:110)
        at org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(AclTableMigrationTool.java:98)
        at org.apache.kylin.tool.AclTableMigrationCLI.main(AclTableMigrationCLI.java:41)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:92)
        ... 3 more

此时通过hive 命令进入hive提示如下错误

[root@node222 ~]# hive

Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/root/c5870040-12ed-47ab-bbc2-84d6ff3f2d24. Name node is in safe mode.
The reported blocks 709 needs additional 410 blocks to reach the threshold 0.9990 of total blocks 1120.
The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1335)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3874)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:984)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:634)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)

退出安全模式

[root@node222 ~]# hdfs dfsadmin -safemode leave
Safe mode is OFF

再重启即正常启动了

[root@node222 ~]# kylin.sh  start        
A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /usr/local/kylin-2.5.0/logs/kylin.log
Web UI is at http://:7070/kylin
        
jar cv0f spark-libs.jar -C $KYLIN_HOME/spark/jars/ .
hadoop fs -mkdir -p /kylin/spark/
hadoop fs -put spark-libs.jar /kylin/spark/

转载于:https://my.oschina.net/peakfang/blog/2988545

你可能感兴趣的:(Kylin2.5.0环境搭建及操作记录)