使用Kylin导入JDBC数据源遇到的问题

一、目标:
直接使用Mysql数据作为Kylin数据源
二、参考官方配置 JDBC 数据源
准备 Sqoop
Kylin 使用 Apache Sqoop 从关系型数据库加载数据到 HDFS。在与 Kylin 同一个机器上下载并安装最新版本的 Sqoop。我们使用 SQOOP_HOME 环境变量指出在本指南中 Sqoop 的安装路径。
准备 JDBC driver
需要下载您数据库的 JDBC Driver 到 Kylin server。JDBC driver jar 需要被加到 KYLINHOME/ext K Y L I N H O M E / e x t 和 SQOOP_HOME/lib 文件夹下。
配置 Kylin
在 $KYLIN_HOME/conf/kylin.properties 中,添加以下配置。
MySQL 样例:

kylin.source.default=8
kylin.source.jdbc.connection-url=jdbc:mysql://hostname:3306/employees
kylin.source.jdbc.driver=com.mysql.jdbc.Driver
kylin.source.jdbc.dialect=mysql
kylin.source.jdbc.user=your_username
kylin.source.jdbc.pass=your_password
kylin.source.jdbc.sqoop-home=/usr/hdp/current/sqoop-client/bin
kylin.source.jdbc.filed-delimiter=|

三、遇到的问题:

exe cmd:/usr/hdp/2.5.5.0-157/sqoop/bin/sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true  -Dmapreduce.job.queuename=default --connect "jdbc:mysql://X.X.X.X:3306/XXX" --driver com.mysql.jdbc.Driver --username XXXXX --password XXXXX --query "SELECT SLICES.CHANGED_ON as SLICES_CHANGED_ON ,SLICES.ID as SLICES_ID ,SLICES.SLICE_NAME as SLICES_SLICE_NAME ,SLICES.DATASOURCE_TYPE as SLICES_DATASOURCE_TYPE ,SLICES.DATASOURCE_NAME as SLICES_DATASOURCE_NAME ,SLICES.VIZ_TYPE as SLICES_VIZ_TYPE ,SLICES.DESCRIPTION as SLICES_DESCRIPTION FROM SUPERSET.SLICES as SLICES  WHERE 1=1 AND \$CONDITIONS" --target-dir hdfs://master1.bigdata:8020/kylin/kylin_metadata_2.3/kylin-908401b6-a8aa-4879-a70d-fdefeefd833d/kylin_intermediate_superset_slice_1f3498b3_ed6f_47b2_bcaa_7fd449d93306 --split-by SLICES.SLICES.ID --boundary-query "SELECT min(SLICES.ID), max(SLICES.ID) FROM "SUPERSET".SLICES as SLICES" --null-string '' --fields-terminated-by '|' --num-mappers 1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.5.5.0-157/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/insight/vdc1/software/apache-kylin-2.4.0-bin-hbase1x/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/08/14 16:19:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
18/08/14 16:19:55 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/08/14 16:19:55 WARN tool.BaseSqoopTool: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
18/08/14 16:19:55 ERROR tool.BaseSqoopTool: Got error creating database manager: You must specify --connection-manager when you specified --driver.
    at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:278)
    at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:89)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:610)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:243)

四、异常分析:
从报错上看,就是如果指定 –driver必须指定 –connection-manager,然后再Kylin.properties配置文件里,添加了 connection-manager=org.apache.sqoop.manager.MySQLManager,依然保持,并且执行的sql语句里并没有读取到参数connection-manager,于是在cube的configuration overwrites页面添加connection manager参数,继续build,同样的异常,没有读取到参数。
接下来注释掉 –driver参数,执行build,仍然报错,执行的sql里 仍然传了 –driver参数,只不过是null。
于是换了另一种思路,不通过Kylin配置sqoop参数,直接执行sqoop import语句,两种方式都可以成功:1)同时传–driver=com.mysql.jdbc.Driver –connection-manager=–connection-manager org.apache.sqoop.manager.MySQLManager
2)两个参数都不传。

五:解决
有了上面的尝试,有了方向,可能跟sqoop的版本有关系,Kylin-2.4 source-jdbc部分并没有适配sqoop的版本,使用的最新版本。为了确认自己的推断,做了两件事:
1、查看Kylin的source-jdbc部分的源码
JdbcExplorer.java

package org.apache.kylin.source.jdbc;

import java.sql.Connection;
import java.sql.DatabaseMetaData;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.UUID;

import org.apache.commons.lang3.StringUtils;
import org.apache.kylin.common.KylinConfig;
import org.apache.kylin.common.util.DBUtils;
import org.apache.kylin.common.util.Pair;
import org.apache.kylin.metadata.datatype.DataType;
import org.apache.kylin.metadata.model.ColumnDesc;
import org.apache.kylin.metadata.model.ISourceAware;
import org.apache.kylin.metadata.model.TableDesc;
import org.apache.kylin.metadata.model.TableExtDesc;
import org.apache.kylin.source.ISampleDataDeployer;
import org.apache.kylin.source.ISourceMetadataExplorer;
import org.apache.kylin.source.hive.DBConnConf;
import org.apache.kylin.source.jdbc.metadata.IJdbcMetadata;
import org.apache.kylin.source.jdbc.metadata.JdbcMetadataFactory;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class JdbcExplorer implements ISourceMetadataExplorer, ISampleDataDeployer {
    private static final Logger logger = LoggerFactory.getLogger(JdbcExplorer.class);

    private final KylinConfig config;
    private final String dialect;
    private final DBConnConf dbconf;
    private final IJdbcMetadata jdbcMetadataDialect;

    public JdbcExplorer() {
        config = KylinConfig.getInstanceFromEnv();
        String connectionUrl = config.getJdbcSourceConnectionUrl();
        String driverClass = config.getJdbcSourceDriver();
        String jdbcUser = config.getJdbcSourceUser();
        String jdbcPass = config.getJdbcSourcePass();
        this.dbconf = new DBConnConf(driverClass, connectionUrl, jdbcUser, jdbcPass);
        this.dialect = config.getJdbcSourceDialect();
        this.jdbcMetadataDialect = JdbcMetadataFactory.getJdbcMetadata(dialect, dbconf);
    }

并没有参数connection-manager,并且–driver必传

2、下载最新版sqoop-1.4.7,重新配置SQOOP_HOME
重启Kylin,重新执行cube的build操作,成功执行。

六、总结:
生产环境的Sqoop是1.4.6,由于是内网并且多租户使用,组件的及时更新很不现实,这次的问题,没有在google找到一点问题的解决方案,不知道是大家没有使用关系型数据库,还是sqoop的版本都是最新的,费了好些周折,把问题分享出来,希望对同在坑中的同志提供些帮助。

你可能感兴趣的:(Kylin)