hive内置UDTF函数

  • explode函数:hive内置的表生成函数,主要用于将一行输入拆分成多行输出。
官网解释:explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows. UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW.
①创建array数据类型表
hive> create table student_array(
    > name string,
    > course_score array
    > )
    > row format delimited fields terminated by '\t'
    > collection items terminated by ','
    > stored as textfile;
②准备数据student_array.txt
root@Master:/home/hadoop/dataset# cat student_array.txt
zs	语文:86,数学:87.5,英语:90
ls	语文:76,数学:93,英语:88
ww	语文:88,数学:90,英语:95
③将数据导入到hive表
hive> load data local inpath '/home/hadoop/dataset/student_array.txt' overwrite into table student_array;
④查看hive表数据
hive> select * from student_array;
-chgrp: '*' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
OK
zs	["语文:86","数学:87.5","英语:90"]
ls	["语文:76","数学:93","英语:88"]
ww	["语文:88","数学:90","英语:95"]
对于array数据类型,explode函数将数组中的元素拆分,按行输出每个元素
hive> select explode(course_score) from student_array;
-chgrp: '*' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
OK
语文:86
数学:87.5
英语:90
语文:76
数学:93
英语:88
语文:88
数学:90
英语:95
posexplode函数类似于explode函数,只是在返回数组中的元素同时也返回元素在数组中的索引位置
hive> select posexplode(course_score)   from  student_array;
-chgrp: '*' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
OK
0	语文:86
1	数学:87.5
2	英语:90
0	语文:76
1	数学:93
2	英语:88
0	语文:88
1	数学:90
2	英语:95
对于map数据类型,explode函数会将一个map数据类型拆分成两列输出,键一列,值一列。
首先创建带map的表
hive> create table student_map(
    > name string,
    > course_score map
    > )
    > row format delimited fields terminated by '\t'
    > collection items terminated by ','
    > map keys terminated by ':'
    > stored as textfile;
hive> select * from student_map;
-chgrp: '*' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
OK
zs	{"语文":86.0,"数学":87.5,"英语":90.0}
ls	{"语文":76.0,"数学":93.0,"英语":88.0}
ww	{"语文":88.0,"数学":90.0,"英语":95.0}
explode函数将map数据类型的数据键值对拆分长两列
hive> select explode(course_score) from student_map;
-chgrp: '*' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
OK
语文	86.0
数学	87.5
英语	90.0
语文	76.0
数学	93.0
英语	88.0
语文	88.0
数学	90.0
英语	95.0
posexplode函数的参数只能是数组,不能是map数据类型
hive> select posexplode(course_score) from student_map;
FAILED: UDFArgumentException posexplode() takes an array as a parameter
对于结构体数组,可以使用inline函数进行拆分
hive> select * from student_struct_array;
-chgrp: '*' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
OK
zs	[{"course":"语文","score":86.0},{"course":"数学","score":87.5},{"course":"英语","score":90.0}]
ls	[{"course":"语文","score":76.0},{"course":"数学","score":93.0}]
ww	[{"course":"语文","score":88.0},{"course":"数学","score":90.0},{"course":"英语","score":95.0}]
Time taken: 0.19 seconds, Fetched: 3 row(s)
hive> select inline(course_score) from student_struct_array;
-chgrp: '*' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
OK
语文	86.0
数学	87.5
英语	90.0
语文	76.0
数学	93.0
语文	88.0
数学	90.0
英语	95.0
在select语句中使用UDTF语法的限制:
①在select语句中只支持单个UDTF表达式
②不支持UDTF嵌套
③不支持GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY语句
hive> select explode(name) , explode(course_score) from student_array;
FAILED: SemanticException 1:23 Only a single expression in the SELECT clause is supported with UDTF's. Error encountered near token 'course_score'
hive> select explode(name) , explode(course_score) from student_array;
FAILED: SemanticException 1:23 Only a single expression in the SELECT clause is supported with UDTF's. Error encountered near token 'course_score'
hive> select name ,explode(course_score) from student_array;
FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions
hive> select explode(course_score) from student_array sort by name;
FAILED: SemanticException [Error 10078]: SORT BY is not supported with a UDTF in the SELECT clause
explode函数和lateral view结合使用,就不存在上上语法限制
hive> select name ,course ,score from student_struct_array  lateral view inline(course_score)  course_scores as course , score ;
-chgrp: '*' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
OK
zs	语文	86.0
zs	数学	87.5
zs	英语	90.0
ls	语文	76.0
ls	数学	93.0
ww	语文	88.0
ww	数学	90.0
ww	英语	95.0
lateral view outer 使用方法
hive> select * from student_array lateral view outer explode(array())  c as a;
-chgrp: '*' does not match expected pattern for group
Usage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...
OK
zs	["语文:86","数学:87.5","英语:90"]	NULL
ls	["语文:76","数学:93","英语:88"]	NULL
ww	["语文:88","数学:90","英语:95"]	NULL
Time taken: 0.166 seconds, Fetched: 3 row(s)









你可能感兴趣的:(hive)