kylin填坑记--创建cube时遇到的两个坑

创建cube时,最容易出错的地方就是在 Build Dimension Dictionary这步,也就是第四步。如下图

kylin填坑记--创建cube时遇到的两个坑_第1张图片

这步,kylin后台会做很多关于字段的检查。遇到的两个坑,正是发生在这步,因为数据本身有这样的问题:

第一,维度表中类型为longtext的字段description(存的描述信息,很长),其长度超出Short.MAX_VALUE(short值得范围:-32768-32767)。尽管这个字段在model和cube创建时都未加入dimensions中,但还是报错了,也就是说,kylin会检查所有维度表字段里value的长度,不管有没有加到dimensions里。报错源代码如下:

if (maxValueLength < 0) {
    throw new IllegalStateException("maxValueLength is negative (" + maxValueLength
            + "). Dict value is too long, whose length is larger than " + Short.MAX_VALUE);
}

第二,建model时,一般会把事实表的外键和维度表的主键做关联,但在hive中并不存在主外键这种概念,所以维度表不管是不是主键,kylin都会检查其唯一性。否则,一张事实表的一条记录,会关联出两条或多条维度表的记录,这种情况肯定是非法的。因此,与事实表做关联的的维度表字段必须是唯一的,且非空,即为主键(但hive中不存主键这一说)。

针对上面两个坑,我们把错误日志记录下来了


坑一:build cube时维度表字段的value值太长,报了如下错,经过多天处理,各种尝试,才找到了原因,description字段为longtxt类型的,太长。因此,我们将hive表中的这个字段删掉,重新导入数据,问题解决。

java.lang.IllegalStateException: stats.maxValueLength is not positive short, usually caused by too long dict value.
at org.apache.kylin.dict.TrieDictionaryBuilder.positiveShortPreCheck(TrieDictionaryBuilder.java:490)
at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(TrieDictionaryBuilder.java:447)
at org.apache.kylin.dict.TrieDictionaryBuilder.build(TrieDictionaryBuilder.java:418)
at org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:98)
at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:139)
at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:287)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:87)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:49)
at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:66)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:62)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:144)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
result code:2

报错日志中并没有给出那个字段太长,让我们尝试了很多解决办法,都没解决此问题,因为我们一直认为没有加到model中的维度,应该不会检查其维度的value值长度。因此,希望kylin开发团队在以后版本中,把报错日志体现的更加详细点,最好能具体到哪个字段太长表示出来。另外,没有加到dimensions里的维度,为啥还要做长度检查。。。没必要吧!!!希望也能把这点改善一下!


坑二:build cube时维度表关联实施表的字段,不唯一,kylin后台做唯一性检查时,报了如下错:

java.lang.IllegalStateException: The table: SCORE4 Dup key found, key=[0001], value1=[0001,99.0,100], value2=[0001,98.0,99]
at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:86)
at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:69)
at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:79)
at org.apache.kylin.dict.lookup.LookupTable.(LookupTable.java:57)
at org.apache.kylin.dict.lookup.LookupStringTable.(LookupStringTable.java:65)
at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:648)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:93)
at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:49)
at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:66)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:62)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:64)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:125)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:144)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

result code:2

解决办法,重新建立关联关系,将维度表的关联字段设置为唯一的。

你可能感兴趣的:(kylin)