1.概述
Hive作为大数据中离线数据的存储,并把Hive作为构建数据仓库的环境,一般情况下数据仓库的数据类型都是基本数据类型如int、string、double等,但是有时候也会需要一些复合数据结构来存储数据,如array、map、struct;下面我们就分别介绍下这三种符合数据结构:
类型
定义
说明
array
Array
array中的数据为相同类型,例如,假如array A中元素['a','b','c'],则A[1]的值为'b'
map
Map
Map数据类型,主要是以K:V形式进行存储可以通过字段名[‘key’]进行访问,将返回这个key对应的Value
struct
STRUCT < col_name : data_type [COMMENT col_comment], …>
structs内部的数据可以通过DOT(.)来存取,例如,表中一列a的类型为STRUCT{b INT; c INT},我们可以通过a.ba来访问域b
2.Array使用
1).新建一张学生成绩表student1,里面有id,name,score字段,score是个array数据类型,里面是学生的成绩,新建表语句:
hive>create table student1(id int,name string, score array) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '|';
2)数据准备student1.txt
[root@salver158 ~]# cat student1.txt100,"student1",80|82|84101,"student2",70|72|74102,"student3",60|62|64
3)加载数据
hive>load data local inpath "/root/student1.txt" into table student1;
4)加载成功,查询下看看:
hive> select * from student1;OK100 "student1" [80.0,82.0,84.0]101 "student2" [70.0,72.0,74.0]102 "student3" [60.0,62.0,64.0]Time taken: 0.612 seconds, Fetched: 3 row(s)
3.Map使用
1).新建表sudent2,字段id,name,score,其中score数据类型为Map,建表语句:
hive> create table student2(id int,name string,score map)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '|' MAP KEYS TERMINATED BY ':';
2).数据准备student2.txt
[root@salver158 ~]# cat student2.txt100,"student1","yuwen":80|"shuxue":82|"yingyu":84101,"student2","yuwen":70|"shuxue":72|"yingyu":74102,"student3","yuwen":60|"shuxue":62|"yingyu":64
3).数据加载
hive> load data local inpath "/root/student2.txt" into table student2;
4).加载成功,查询下看看:
hive> select * from student2;OK100 "student1" {"\"yuwen\"":80.0,"\"shuxue\"":82.0,"\"yingyu\"":84.0}101 "student2" {"\"yuwen\"":70.0,"\"shuxue\"":72.0,"\"yingyu\"":74.0}102 "student3" {"\"yuwen\"":60.0,"\"shuxue\"":62.0,"\"yingyu\"":64.0}Time taken: 0.124 seconds, Fetched: 3 row(s)
4.struct使用
1).新建表sudent3,字段id,name,score,其中score数据类型为struct,建表语句:
hive> create table student3(id int,name string,score struct)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '|' ;
2).数据准备student3.txt
[root@salver158 ~]# cat student3.txt100,"student1","yuwen"|80101,"student2","yuwen"|70102,"student3","yuwen"|60
3).数据加载
hive> load data local inpath "/root/student3.txt" into table student3;
4).加载成功,查询下看看:
hive> select * from student3;OK100 "student1" {"kecheng":"\"yuwen\"","score":80.0}101 "student2" {"kecheng":"\"yuwen\"","score":70.0}102 "student3" {"kecheng":"\"yuwen\"","score":60.0}Time taken: 0.091 seconds, Fetched: 3 row(s)
至此三种复合数据类型的使用介绍完成。
如果觉得我的文章能帮到您,请关注微信公众号“大数据开发运维架构”,并转发朋友圈,谢谢支持!!!