Lateral View是Hive中提供给UDTF的conjunction,它可以解决UDTF不能添加额外的select列的问题。
select game_id, explode(split(user_ids,'\\[\\[\\[')) as user_id from login_game_log where dt='2014-05-15' FAILED: Error in semantic analysis: UDTF's are not supported outside the SELECT clause, nor nested in expressions。
提示语法分析错误,UDTF不支持函数之外的select 语句,真无语。。。
如果我们想支持怎么办呢?接下来就是Lateral View 登场的时候了。
Lateral view is used in conjunction with user-defined table generatingfunctions such as explode()
. As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows foreach input row. A lateral view first applies the UDTF to each row of base tableand then joins resulting output rows to the input rows to form a virtual tablehaving the supplied table alias.
Lateral view 其实就是用来和像类似explode这种UDTF函数联用的。lateral view 会将UDTF生成的结果放到一个虚拟表中,然后这个虚拟表会和输入行即每个game_id进行join 来达到连接UDTF外的select字段的目的。
1. 在udtf前面用
2. 在from baseTable后面用
举个例子:
1. 先创建一个文件,里面2列用\t分割,game_id和user_ids
hive> create table test_lateral_view_shengli(game_id string,userl_ids string) row format delimited fields terminated by '\t' stored as textfile; OK Time taken: 2.451 seconds hive> load data local inpath '/home/hadoop/test_lateral' into table test_lateral_view_shengli; Copying data from file:/home/hadoop/test_lateral Copying file: file:/home/hadoop/test_lateral Loading data to table dw.test_lateral_view_shengli OK Time taken: 6.716 seconds hive> select * from test_lateral_view_shengli; OK game101 15358083654[[[ab33787873[[[zjy18052480603[[[shlg1881826[[[lxqab110 game66 winning1ren[[[13810537508 game101 hu330602003[[[hu330602004[[[hu330602005[[[15967506560
hive> select game_id, user_id > from test_lateral_view_shengli lateral view explode(split(userl_ids,'\\[\\[\\[')) snTable as user_id > ; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201403301416_445839, Tracking URL = http://10.1.9.10:50030/jobdetails.jsp?jobid=job_201403301416_445839 Kill Command = /app/home/hadoop/src/hadoop-0.20.2-cdh3u5/bin/../bin/hadoop job -Dmapred.job.tracker=10.1.9.10:9001 -kill job_201403301416_445839 2014-05-16 17:39:19,108 Stage-1 map = 0%, reduce = 0% 2014-05-16 17:39:28,157 Stage-1 map = 100%, reduce = 0% 2014-05-16 17:39:38,830 Stage-1 map = 100%, reduce = 100% Ended Job = job_201403301416_445839 OK game101 hu330602003 game101 hu330602004 game101 hu330602005 game101 15967506560 game101 15358083654 game101 ab33787873 game101 zjy18052480603 game101 shlg1881826 game101 lxqab110 game66 winning1ren game66 13810537508
Array<int> col1 |
Array<string> col2 |
[1, 2] |
[a", "b", "c"] |
[3, 4] |
[d", "e", "f"] |
int myCol1 |
string myCol2 |
1 |
"a" |
1 |
"b" |
1 |
"c" |
2 |
"a" |
2 |
"b" |
2 |
"c" |
3 |
"d" |
3 |
"e" |
3 |
"f" |
4 |
"d" |
4 |
"e" |
4 |
"f" |
SELECT myCol1, myCol2 FROM baseTable LATERAL VIEW explode(col1) myTable1 AS myCol1 LATERAL VIEW explode(col2) myTable2 AS myCol2;
hive> select * FROM test_lateral_view_shengli LATERAL VIEW explode(array()) C AS a ;结果是什么都不输出。
SELECT * FROM src LATERAL VIEW OUTER explode(array()) C AS a limit 10;
238 val_238 NULL 86 val_86 NULL 311 val_311 NULL 27 val_27 NULL 165 val_165 NULL 409 val_409 NULL 255 val_255 NULL 278 val_278 NULL 98 val_98 NULL ...