hive学习笔记之--hive index

Hive中可以创建Index

 

索引是标准的数据库技术,hive 0.7版本之后支持索引。hive索引采用的不是'one size fites all'的索引实现方式,而是提供插入式接口,并且提供一个具体的索引实现作为参考。

hive索引具有以下特点:

1.索引key冗余存储,提供基于key的数据视图

2.存储设计以优化查询&检索性能

3.对于某些查询减少IO,从而提高性能。

 

hive索引创建语句:

 

CREATE INDEX index_name  

ON TABLE base_table_name (col_name, ...) 

AS 'index.handler.class.name' 

[WITH DEFERRED REBUILD] 

[IDXPROPERTIES (property_name=property_value, ...)] 

[IN TABLE index_table_name] 

[PARTITIONED BY (col_name, ...)] 

   [ ROW FORMAT...] STORED AS ... 

   | STORED BY ...  

[LOCATION hdfs_path] 

[TBLPROPERTIES (...)] 

[COMMENT "index comment"] 

 

注意

1.indexpartition默认和数据表一致

2.视图上不能创建index

3. index可以通过stored as配置存储格式

 

重建索引

 

ALTER INDEX index_name ON table_name [PARTITION (...)]REBUILD 

 

假如在创建索引是我们使用“ WITH DEFERRED REBUILD”语句,则索引创建是为空,可以通过“Alter index ... REBUILD”在一个partition上或所有partition上构建索引。

 

注意:

1.hive数据更新时,必须调用该语句更新索引。

2. index rebuild操作时一个原子操作,因此,当rebuild失败时,先前构建的索引也无法使用

 

删除索引:

 

 

DROP INDEX index_name ON table_name 

 

示例:

创建表&索引

 

hive> create table index_test(id INT, nameSTRING) 

    >partitioned by (dt STRING) 

    > row formatdelimited fields terminated by ','; 

OK 

Time taken: 32.446 seconds 

hive> create table index_tmp(id INT, name STRING, dtSTRING) 

    > row formatdelimited fields terminated by ','; 

OK 

Time taken: 0.157 seconds 

hive> load data local inpath '/home/work/data/alter_test.txt'into table index_tmp; 

Copying data fromfile:/home/work/data/alter_test.txt 

Copying file: file:/home/work/data/alter_test.txt 

Loading data to table default.index_tmp 

OK 

Time taken: 1.587 seconds 

hive> set hive.exec.dynamic.partition.mode=nonstrict; 

hive> set hive.exec.dynamic.partition=true;   

hive> insert overwrite table index_testpartition(dt) 

    > selectid,name,dt 

    > fromindex_tmp; 

Total MapReduce jobs = 2 

...... 

OK 

Time taken: 56.103 seconds 

hive> create index index1_index_test on tableindex_test(id) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITHDEFERRED REBUILD ;    

OK 

Time taken: 0.764 seconds 

hive> alter index index1_index_test on index_testrebuild; 

Total MapReduce jobs = 3 

...... 

OK 

Time taken: 138.589 seconds 

hive> show index on index_test; 

OK 

index1_index_test      index_test              id                     default__index_test_index1_index_test__ compact              

Time taken: 0.163 seconds 

hive> show partitions index_test;  

OK 

dt=2012-08-10 

dt=2012-08-11 

dt=2012-08-12 

Time taken: 0.17 seconds 

 

举例2

--下面是给上节的stocks表创建索引

CREATE INDEX stocks_index

ON TABLE stocks (ymd)

AS'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'

WITH DEFERRED REBUILD

IDXPROPERTIES ('creator' = 'me', 'created_at' ='some_time')

IN TABLE stocks_index_table

COMMENT 'stocks indexed by ymd.';

 

alter index stocks_index on stocks rebuild;

 


 

查看索引数据:

$ hadoop fs -ls/user/hive/warehouse/default__index_test_index1_index_test__

Found 3 items

drwxr-xr-x   - worksupergroup          0 2012-08-16 18:27/user/hive/warehouse/default__index_test_index1_index_test__/dt=2012-08-10

drwxr-xr-x   - worksupergroup          0 2012-08-16 18:27/user/hive/warehouse/default__index_test_index1_index_test__/dt=2012-08-11

drwxr-xr-x   - worksupergroup          0 2012-08-16 18:28/user/hive/warehouse/default__index_test_index1_index_test__/dt=2012-08-12

$ hadoop fs -cat/user/hive/warehouse/default__index_test_index1_index_test__/dt=2012-08-10/000000_0

5,hdfs://localhost:9000/user/hive/warehouse/index_test/dt=2012-08-10/000000_0,0

6,hdfs://localhost:9000/user/hive/warehouse/index_test/dt=2012-08-10/000000_0,10

7,hdfs://localhost:9000/user/hive/warehouse/index_test/dt=2012-08-10/000000_0,20

 

 

删除索引:

 

hive> drop index index1_index_test on index_test; 

OK 

Time taken: 0.761 seconds 

hive> show index on index_test;                   

OK 

Time taken: 0.095 seconds 

 

 

索引数据也被删除

$ hadoop fs -ls/user/hive/warehouse/default__index_test_index1_index_test__

ls: Cannot access/user/hive/warehouse/default__index_test_index1_index_test__: No such file ordirectory.

你可能感兴趣的:(hive学习笔记之--hive index)