kylin实例

示例数据参见: kylin 用实例说明原理

环境

  • centos 6.5
  • CDH 5.15
  • Apache Kylin2.1.0

hive新表原始表

create table if not exists chenzl.kylintest (
    year int,
    city string,
    price int
)
row format delimited
fileds terminated by '|'
lines terminated by '\n'
stored as textfile;

路径为

/user/hive/warehouse/chenzl.db/kylintest

数据文件

$ vi kylintest.txt
1993|beijing|10
1993|beijing|30
1994|shanghai|20
1994|beijing|40

上传到hdfs

sudo -u hdfs hadoop fs -put kylintest.txt /user/hive/warehouse/chenzl.db/kylintest/

hive查询

$ select * from  chenzl.kylintest
year	city	price
1993	beijing	10
1993	beijing	30
1994	shanghai	20
1994	beijing	40

kylin操作

刷新元数据: System->Reload Metadata
新建项目: Model->(+)Add Project->test
新建数据源: Model->Data Source->(↓) Load Table

Table Names: chenzl.kylintest
创建模型

新建模型: Model->(+ New)->New Model

Model Info

Model Name: M_test

Data Model

Fact Table: chenzl.kylintest

Dimensions

Columns: year,city

Measures

Columns: price

一直next,然后save

创建Cube

新建cube: Model->(+ New)->New Cube

Cube Info

Model Name: M_test
Cube Name: C_test

Dimensions

Add Dimensins->Select All

Measures
(+)Measure

Name: sum(price)
Expression: sum
Param Type: column
Param Value: kylintest.price

Refresh Setting 跳过
Advanced Setting
Aggregation Groups 部分

Includes  kylintest.year,kylintest.city

Advanced ColumnFamily 部分

F1  __COUNT_,sum(price)

然后next, 最后Save

构建Cube

构建cube: C_test->Actions->build
Monitor-> 查看, (>)可以查看构建过程

构建过程

  1. Create Intermediate Flat Hive Table
  2. Redistribute Flat Hive Table
  3. Extract Fact Table Distinct Columns
  4. Build Dimension Dictionary
  5. Save Cuboid Statistics
  6. Create HTable
  7. Build Base Cuboid
  8. Build N-Dimension Cuboid : level 1
  9. Build Cube In-Mem
  10. Convert Cuboid Data to HFile
  11. Load HFile to HBase Table
  12. Update Cube Info
  13. Hive Cleanup
  14. Garbage Collection on HDFS

在第11步,可以看到生成的hbase的表名,如"KYLIN_CBSVR3S7FK"

查询

构建完,点击Insight,查询cube

    select "YEAR", sum(price) from kylintest where city = 'beijing' group by "YEAR"

结果要跟在原表上查询的一样;

你可能感兴趣的:(#,Kylin,大数据运维)