上篇文章中讲完了查询的相关操作,接下来就是视图与索引了,Hive中的视图的作用总的来说就是为了简化查询语句,是一个逻辑上的视图,而不是物化的视图。索引则是加快查询速度的比较重要的手段,之前的Mysql优化的文章中也讲到了索引的使用,感觉概念上和Mysql数据库中的操作基本上是相似的。
转载请注明出处:Hive数据仓库--HiveQL视图和索引
视图
创建这样的一个视图,高收入人群。
我试了下,这里的视图并不会帮我们进行存储视图所代表的查询语句所包含的数据的,这里可以认为他就是一个复杂的语句的简化,是一个逻辑的视图,而不是物化视图,这里好像并没有对效率进行提升。视图在这里是将Hive的行和列进行的固化,但是并没有对数据进行固化,那么当你删除掉表中的列的时候,会造成视图的错误。
创建视图语句
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment], ...) ][COMMENT view_comment][TBLPROPERTIES (property_name = property_value, ...)]AS SELECT ...
创建视图
hive> > create view salaries_high as > select * from salaries_external where salary > 500000;OKTime taken: 1.227 secondshive> select * from salaries_high limit 10;Total jobs = 1Launching Job 1 out of 1Number of reduce tasks is set to 0 since there's no reduce operatorStarting Job = job_1475147088438_0007, Tracking URL = http://hadoopwy1:8088/proxy/application_1475147088438_0007/Kill Command = /usr/local/hadoop2/bin/hadoop job -kill job_1475147088438_0007Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 02016-09-29 05:37:02,617 Stage-1 map = 0%, reduce = 0%2016-09-29 05:37:10,092 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.46 secMapReduce Total cumulative CPU time: 1 seconds 460 msecEnded Job = job_1475147088438_0007MapReduce Jobs Launched:Job 0: Map: 1 Cumulative CPU: 1.46 sec HDFS Read: 4422 HDFS Write: 310 SUCCESSTotal MapReduce CPU Time Spent: 1 seconds 460 msecOK1985 BAL AL murraed02 1472819.01985 BAL AL lynnfr01 1090000.01985 BAL AL ripkeca01 800000.01985 BAL AL lacyle01 725000.01985 BAL AL flanami01 641667.01985 BAL AL boddimi01 625000.01985 BAL AL stewasa01 581250.01985 BAL AL martide01 560000.01985 BAL AL roeniga01 558333.01985 BAL AL mcgresc01 547143.0Time taken: 26.702 seconds, Fetched: 10 row(s)
删除视图
hive> drop view if exists salaries_high;OKTime taken: 1.043 seconds
索引
创建索引语句
CREATE INDEX index_name ON TABLE base_table_name (col_name, ...)AS 'index.handler.class.name'[WITH DEFERRED REBUILD][IDXPROPERTIES (property_name=property_value, ...)][IN TABLE index_table_name][PARTITIONED BY (col_name, ...)][ [ ROW FORMAT ...] STORED AS ... | STORED BY ...][LOCATION hdfs_path][TBLPROPERTIES (...)][COMMENT "index comment"]
创建一个索引
索引表的
hive> create index yearindex on table salaries_external(yearid) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild in table salaries_external_index;OKTime taken: 0.475 seconds
仅仅索引的
hive> create index index_test on table salaries_external(yearid) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD ;OKTime taken: 0.278 seconds
查看索引
hive> > show index on salaries_external;OKyearindex salaries_external yearid salaries_external_index compactindex_test salaries_external yearid default__salaries_external_index_test__ compactTime taken: 0.077 seconds, Fetched: 2 row(s)
改变索引
hive> alter index index_test on salaries_external rebuild;Total jobs = 1Launching Job 1 out of 1Number of reduce tasks not specified. Estimated from input data size: 1In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers: set hive.exec.reducers.max=In order to set a constant number of reducers: set mapreduce.job.reduces=Starting Job = job_1475147088438_0009, Tracking URL = http://hadoopwy1:8088/proxy/application_1475147088438_0009/Kill Command = /usr/local/hadoop2/bin/hadoop job -kill job_1475147088438_0009Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12016-09-29 06:44:34,287 Stage-1 map = 0%, reduce = 0%2016-09-29 06:45:02,611 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.2 sec2016-09-29 06:45:18,538 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.85 secMapReduce Total cumulative CPU time: 3 seconds 850 msecEnded Job = job_1475147088438_0009Loading data to table default.default__salaries_external_index_test__rmr: DEPRECATED: Please use 'rm -r' instead.Deleted hdfs://hadoopnodeservice1/user/hive/warehouse/default__salaries_external_index_test__Table default.default__salaries_external_index_test__ stats: [numFiles=1, numRows=58, totalSize=321107, rawDataSize=321049]MapReduce Jobs Launched:Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.85 sec HDFS Read: 1354022 HDFS Write: 321214 SUCCESSTotal MapReduce CPU Time Spent: 3 seconds 850 msecOKTime taken: 58.187 seconds
删除索引
hive> > drop index index_test on salaries_external;OKTime taken: 0.188 secondshive> show index on salaries_external;OKyearindex salaries_external yearid salaries_external_index compactTime taken: 0.065 seconds, Fetched: 1 row(s)
参考:https://www.yiibai.com/hive/hive_views_and_indexes.html
转载请注明出处:Hive数据仓库--HiveQL视图和索引