本文简单介绍Hive CLI下面一些常用的HQL语句,如建表,删表,导入数据等等。
在执行这个HQL语句之前先进行Hive CLI,如下。当然了,使用Hive CLI的前提条件是你的环境里面已经安装了Hadoop/Hive相关组件,
[centos@cent-2 ~]$ hive
16/11/15 11:32:11 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist
16/11/15 11:32:11 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist
16/11/15 11:32:11 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
16/11/15 11:32:11 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist
Logging initialized using configuration in file:/etc/hive/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive>
1 创建简单表 –CREATE TABLE
hive> create table test1(columna int, columnb string);
OK
Time taken: 0.349 seconds
2 显示所有表 –SHOW TABLES {regexp(tablename)}
hive> show tables;
OK
test1
ttest1
Time taken: 0.023 seconds, Fetched: 2 row(s)
hive> show tables 'tt*';
OK
ttest1
Time taken: 0.026 seconds, Fetched: 1 row(s)
3 显示所有数据库 –SHOW DATABASES
hive> show databases;
OK
default
Time taken: 0.015 seconds, Fetched: 1 row(s)
4 查看表结构 –DESCRIBE/DESC tablename
hive> describe test1;
OK
columna int
columnb string
Time taken: 1.497 seconds, Fetched: 2 row(s)
5 修改表 –ALTER TABLE
hive> alter table test1 rename to test2;
OK
Time taken: 0.177 seconds
hive> show tables;
OK
test2
Time taken: 0.045 seconds, Fetched: 1 row(s)
hive> alter table test2 add columns(columnc string);
OK
Time taken: 0.129 seconds
hive> desc test2;
OK
columna int
columnb string
columnc string
Time taken: 0.097 seconds, Fetched: 3 row(s)
6 删除表 –DROP TABLE
hive> drop table test2;
OK
Time taken: 0.351 seconds
hive> show tables;
OK
Time taken: 0.023 seconds
7 导入数据 –LOAD DATA {LOCAL} INPATH … {OVERWRITE} INTO TABLE tablename
(注:LOCAL表示从本地导入,去掉表示从HDFS导入;OVERWRITE表示覆盖表原有数据,去掉表示 APPEND)
[hdfs@cent-2 ~]$ hadoop fs -cat /user/hive/test.txt
1,'AAA'
2,'BBB'
3,'CCC'
hive> load data inpath '/user/hive/test.txt' overwrite into table test1;
Loading data to table default.test1
Table default.test1 stats: [numFiles=1, numRows=0, totalSize=24, rawDataSize=0]
OK
Time taken: 1.74 seconds
8 查询数据(转换为MapReduce) –SELECT
hive> select count(*) from test1;
Query ID = hdfs_20161115120101_3de3af75-ca9c-4cec-892b-559f5af6f313
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1479180357223_0001, Tracking URL = http://cent-2.novalocal:8088/proxy/application_1479180357223_0001/
Kill Command = /usr/hdp/2.2.0.0-2041/hadoop/bin/hadoop job -kill job_1479180357223_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-11-15 12:01:35,574 Stage-1 map = 0%, reduce = 0%
2016-11-15 12:01:44,187 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.08 sec
2016-11-15 12:01:50,553 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.77 sec
MapReduce Total cumulative CPU time: 3 seconds 770 msec
Ended Job = job_1479180357223_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.77 sec HDFS Read: 241 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 770 msec
OK
3
Time taken: 34.68 seconds, Fetched: 1 row(s)
9 创建表,指定分隔符 –FIELDS TERMINATED BY
hive> create table test2(columna int, columnb string)
> row format delimited fields terminated by ',';
OK
Time taken: 0.115 seconds
10 创建分区表 –PARTITIONED BY
hive> create table test3(columna int, columnb string)
> partitioned by (dt string);
OK
Time taken: 0.216 seconds
hive> describe test3;
OK
columna int
columnb string
dt string
# Partition Information
# col_name data_type comment
dt string
Time taken: 0.097 seconds, Fetched: 8 row(s)
11 新增分区 –ADD PARTITION
hive> alter table test4 add partition(dt='201601');
OK
Time taken: 0.194 seconds
hive> alter table test4 add partition(dt='201602');
OK
Time taken: 0.09 seconds
hive> alter table test4 add partition(dt='201603');
OK
Time taken: 0.084 seconds
hive> show partitions test4;
OK
dt=201601
dt=201602
dt=201603
Time taken: 0.096 seconds, Fetched: 3 row(s)
12 导入数据到固定分区 –LOAD DATA …PARTITION …
hive> load data local inpath '/home/hdfs/test.txt' into table test4 partition(dt='201603');
Loading data to table default.test4 partition (dt=201603)
Partition default.test4{dt=201603} stats: [numFiles=1, totalSize=40]
OK
Time taken: 0.879 seconds
13 创建外部表 –CREATE EXTERNAL TABLE
hive> create external table ext_table(columna int, columnb string, columnc string)
> row format delimited
> fields terminated by ','
> location '/user/hive';
OK
Time taken: 0.103 seconds
hive> select * from ext_table;
OK
1 HHH 201601
2 JJJ 201602
3 KKK 201603
NULL NULL NULL
Time taken: 0.055 seconds, Fetched: 4 row(s)
14 查看表定义 –SHOW CREATE TABLE
hive> show create table default.eboxdata;
OK
CREATE EXTERNAL TABLE `default.eboxdata`(
`ctime` string,
`mac` string,
`addr` int,
`title` string,
`o_c` smallint,
`enable_net_ctrl` smallint,
`alarm` int,
`model` string,
`specification` string,
`version` string,
`a_a` float,
`a_ld` float,
`a_t` float,
`a_v` float,
`a_w` float,
`power` float,
`mxdw` float,
`mxgg` float,
`mxgl` float,
`mxgw` float,
`mxgy` float,
`mxld` float,
`mxqy` float,
`control` smallint,
`visibility` smallint)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://n11.trafodion.local:8020/bulkload/EBOXDATA'
TBLPROPERTIES (
'transient_lastDdlTime'='1479781899')
Time taken: 0.158 seconds, Fetched: 36 row(s)