内部表和外部表的区别
hive (hive1)> create table hive1.test1(id int,name string,age int) row format delimited fields terminated by '\t' lines terminated by '\n';
hive (hive1)> desc formatted test1; //显示格式化的信息
OK
col_name data_type comment
# col_name data_type comment
id int
name string
age int
# Detailed Table Information
Database: hive1
Owner: hadoop
CreateTime: Sat Jun 30 00:29:22 CST 2018
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://master:9000/user/hive/warehouse/hive1.db/test1
Table Type: MANAGED_TABLE
Table Parameters:
numFiles 1
numRows 0
rawDataSize 0
totalSize 99
transient_lastDdlTime 1530289773
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
line.delim \n
serialization.format \t
Time taken: 0.139 seconds, Fetched: 33 row(s)
数据准备:
employees.txt
1201 Gopal 45000
1202 Manisha 45000
1203 Masthanvali 40000
1204 Krian 40000
1205 Kranthi 30000
向表中加载数据:
hive (hive1)> load data local inpath '/home/hadoop/employees.txt' into table hive1.test1;
Loading data to table hive1.test1
OK
Time taken: 0.725 seconds
查询表的信息:
hive (hive1)> select * from test1;
OK
test1.id test1.name test1.age
1201 Gopal 45000
1202 Manisha 45000
1203 Masthanvali 40000
1204 Krian 40000
1205 Kranthi 30000
查看表和删除表
hive (hive1)> dfs -lsr /user/hive/warehouse/hive1.db;
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:29 /user/hive/warehouse/hive1.db/test1
-rwxrwxrwx 3 hadoop supergroup 99 2018-06-30 00:29 /user/hive/warehouse/hive1.db/test1/employees.txt
hive (hive1)> drop table test1;
OK
Time taken: 1.31 seconds
hive (hive1)> dfs -lsr /user/hive/warehouse/hive1.db;
lsr: DEPRECATED: Please use 'ls -R' instead.
外部表
---------------------------------
hive控制元数据,删除外部表的时候,数据不会被删除
创建外部表:
hive (hive1)> create external table hive1.test2 like hive1.test1; //创建外部表 like 它只复制表结构没有数据 | as 复制表结构有数据
OK
Time taken: 0.163 second
查看表的属性:
hive (hive1)> desc formatted test2; //显示格式化的信息
OK
col_name data_type comment
# col_name data_type comment
id int
name string
age int
# Detailed Table Information
Database: hive1
Owner: hadoop
CreateTime: Sat Jun 30 00:35:50 CST 2018
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://master:9000/user/hive/warehouse/hive1.db/test2
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
EXTERNAL TRUE
numFiles 0
numRows 0
rawDataSize 0
totalSize 0
transient_lastDdlTime 1530290150
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
line.delim \n
serialization.format \t
Time taken: 0.101 seconds, Fetched: 35 row(s)
向表中加载数据:
hive (hive1)> load data local inpath '/home/hadoop/employees.txt' into table hive1.test2;
Loading data to table hive1.test2
OK
Time taken: 0.725 seconds
hive (hive1)> select * from test2;
OK
test2.id test2.name test2.age
1201 Gopal 45000
1202 Manisha 45000
1203 Masthanvali 40000
1204 Krian 40000
1205 Kranthi 30000
Time taken: 0.142 seconds, Fetched: 5 row(s)
hive (hive1)> dfs -lsr /user/hive/warehouse/hive1.db;
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:35 /user/hive/warehouse/hive1.db/test1
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:36 /user/hive/warehouse/hive1.db/test2
-rwxrwxrwx 3 hadoop supergroup 99 2018-06-30 00:36 /user/hive/warehouse/hive1.db/test2/employees.txt
hive (hive1)> drop table test2;
OK
Time taken: 0.161 seconds
hive (hive1)> dfs -lsr /user/hive/warehouse/hive1.db;
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:35 /user/hive/warehouse/hive1.db/test1
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:36 /user/hive/warehouse/hive1.db/test2
-rwxrwxrwx 3 hadoop supergroup 99 2018-06-30 00:36 /user/hive/warehouse/hive1.db/test2/employees.txt
分区表
----------------------------------------
创建分区表
hive (hive1)> create table hive1.test3(id int,name string,age int ) partitioned by (province string,city string) row format delimited fields terminated by '\t' lines terminated by '\n';
OK
Time taken: 0.098 second
数据准备:
employees2.txt
1206 Actualopal 45000
1207 SLF4JManisha 45000
1208 NMasthanv 40000
1206 Actualopal 45000
1207 SLF4JManisha 45000
1208 NMasthanv 40000
加载数据
hive (hive1)> load data local inpath '/home/hadoop/employees2.txt' into table hive1.test3 partition (province='henan',city ='zhengzhou');
Loading data to table hive1.test3 partition (province=henan, city=zhengzhou)
OK
Time taken: 1.115 seconds
通过shell命令进行查看
hive (hive1)> dfs -lsr /user/hive/warehouse/hive1.db/test3;
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:50 /user/hive/warehouse/hive1.db/test3/province=henan
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:46 /user/hive/warehouse/hive1.db/test3/province=henan/city=shangqiu
-rwxrwxrwx 3 hadoop supergroup 99 2018-06-30 00:46 /user/hive/warehouse/hive1.db/test3/province=henan/city=shangqiu/employees.txt
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:50 /user/hive/warehouse/hive1.db/test3/province=henan/city=zhengzhou
-rwxrwxrwx 3 hadoop supergroup 71 2018-06-30 00:50 /user/hive/warehouse/hive1.db/test3/province=henan/city=zhengzhou/employees2.txt
【查询分区表】
hive (hive1)> select * from hive1.test3 where province='henan' and city='shangqiu';
OK
test3.id test3.name test3.age test3.province test3.city
1201 Gopal 45000 henan shangqiu
1202 Manisha 45000 henan shangqiu
1203 Masthanvali 40000 henan shangqiu
1204 Krian 40000 henan shangqiu
1205 Kranthi 30000 henan shangqiu
[查看分区表有哪些分区]
hive (hive1)> show partitions hive1.test3;
OK
partition
province=henan/city=shangqiu
province=henan/city=zhengzhou
Time taken: 0.122 seconds, Fetched: 2 row(s)
hive (hive1)> show partitions hive1.test3 partition(province='henan'); //查看具体分区的细节信息
OK
partition
province=henan/city=shangqiu
province=henan/city=zhengzhou
Time taken: 0.149 seconds, Fetched: 2 row(s)
[手动增加一个分区]
hive (hive1)> alter table hive1.test3 add partition (province='hebei',city='baodong');
OK
Time taken: 0.201 seconds
hive (hive1)> dfs -lsr /user/hive/warehouse/hive1.db/test3;
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:55 /user/hive/warehouse/hive1.db/test3/province=hebei
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:55 /user/hive/warehouse/hive1.db/test3/province=hebei/city=baodong
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:50 /user/hive/warehouse/hive1.db/test3/province=henan
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:46 /user/hive/warehouse/hive1.db/test3/province=henan/city=shangqiu
-rwxrwxrwx 3 hadoop supergroup 99 2018-06-30 00:46 /user/hive/warehouse/hive1.db/test3/province=henan/city=shangqiu/employees.txt
drwxrwxrwx - hadoop supergroup 0 2018-06-30 00:50 /user/hive/warehouse/hive1.db/test3/province=henan/city=zhengzhou
-rwxrwxrwx 3 hadoop supergroup 71 2018-06-30 00:50 /user/hive/warehouse/hive1.db/test3/province=henan/city=zhengzhou/employees2.txt