功能:
语法:LATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,’ columnAlias)
lateral view函数会将lUDTF生成的结果放到一个虚拟表中,然后这个虚拟表和输入行进行join来达到连接UDTF外的select字段的目的。
使用方式:
举例1:与explode() 函数连用
--测试表的数据结构,可以看到第一列是id(string类型),第二列是rule_array(array<string>类型)
> desc test_2;
--结果:
+-------------+----------------+----------+
| col_name | data_type | comment |
+-------------+----------------+----------+
| id | string | |
| rule_array | array<string> | |
+-------------+----------------+----------+
--查看测试表的内容
> select * from test_2;
--结果:
+------------+----------------------------------------+
| test_2.id | test_2.rule_array |
+------------+----------------------------------------+
| a | ["501","502","503","501","512"] |
| b | ["511","512","513","511","512","511"] |
| c | ["512","513","511","512"] |
+------------+----------------------------------------+
--拆分rule_array
> select id, rule_explode
> from test_2
> lateral view explode(rule_array) adTable AS rule_explode;
--结果:
+-----+---------------+
| id | rule_explode |
+-----+---------------+
| a | 501 |
| a | 502 |
| a | 503 |
| a | 501 |
| a | 512 |
| b | 511 |
| b | 512 |
| b | 513 |
| b | 511 |
| b | 512 |
| b | 511 |
| c | 512 |
| c | 513 |
| c | 511 |
| c | 512 |
+-----+---------------+
举例2:与parse_url_tuple()函数连用
--测试表的数据结构,可以看到第一列是id(int类型),第二列是url(string类型)
> desc tb_url;
--结果:
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| id | int | |
| url | string | |
+-----------+------------+----------+
--查看测试表的内容
> select * from tb_url;
--结果:
+------------+-------------------------------------------------+
| tb_url.id | tb_url.url |
+------------+-------------------------------------------------+
| 1 | http://facebook.com/path/p1.php?query=1 |
| 2 | http://www.baidu.com/news/index.jsp?uuid=frank |
| 3 | http://www.jd.com/index?source=baidu |
+------------+-------------------------------------------------+
--提取tb_url的query部分
> select
> a.id as id,
> c.query as query
> from tb_url a
> lateral view parse_url_tuple(url,"QUERY") c as query
> ;
--结果:
+-----+---------------+
| id | query |
+-----+---------------+
| 1 | query=1 |
| 2 | uuid=frank |
| 3 | source=baidu |
+-----+---------------+
举例3:多个lateral view连用
--提取tb_url的HOST、PATH和QUERY部分
> select
> a.id as id,
> b.host as host,
> b.path as path,
> c.query as query
> from tb_url a
> lateral view parse_url_tuple(url,"HOST","PATH") b as host,path
> lateral view parse_url_tuple(url,"QUERY") c as query
> ;
--结果:
+-----+----------------+------------------+---------------+
| id | host | path | query |
+-----+----------------+------------------+---------------+
| 1 | facebook.com | /path/p1.php | query=1 |
| 2 | www.baidu.com | /news/index.jsp | uuid=frank |
| 3 | www.jd.com | /index | source=baidu |
+-----+----------------+------------------+---------------+
注意:
使用方式:
举例1:直接使用
--explode(array)
> select explode(array(11,22,33)) as item;
--结果:
+-------+
| item |
+-------+
| 11 |
| 22 |
| 33 |
+-------+
--explode(map)
> select explode(map("id",11,"name","lily","age",18)) as (k,v);
--结果:
+-------+-------+
| k | v |
+-------+-------+
| id | 11 |
| name | lily |
| age | 18 |
+-------+-------+
举例2:与lateral view一起使用
> select * from test_2;
--结果:
+------------+----------------------------------------+
| test_2.id | test_2.rule_array |
+------------+----------------------------------------+
| a | ["501","502","503","501","512"] |
| b | ["511","512","513","511","512","511"] |
| c | ["512","513","511","512"] |
+------------+----------------------------------------+
> select id,rule
> from test_2
> lateral view explode(rule_array) tbl as rule
> ;
--结果:
+-----+-------+
| id | rule |
+-----+-------+
| a | 501 |
| a | 502 |
| a | 503 |
| a | 501 |
| a | 512 |
| b | 511 |
| b | 512 |
| b | 513 |
| b | 511 |
| b | 512 |
| b | 511 |
| c | 512 |
| c | 513 |
| c | 511 |
| c | 512 |
+-----+-------+
举例3:使用局限性
> select id, explode(rule_array) as rule from test_2;
--报错:
Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
> select explode(rule_array) as rule from test_2 group by explode(rule_array);
--报错:
Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
> select explode(explode(rule_array)) from test_2 ;
--报错:
Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
> select explode("1,2,3,4,5") from test_2 ;
--报错:
Error: Error while compiling statement: FAILED: UDFArgumentException explode() takes an array or a map as a parameter (state=42000,code=40000)
使用方式:
举例1:
> desc test_2;
--结果:
+-------------+----------------+----------+
| col_name | data_type | comment |
+-------------+----------------+----------+
| id | string | |
| dt_array | array<string> | |
| rule_array | array<string> | |
+-------------+----------------+----------+
> select * from test_2;
--结果:
+------------+----------------------------------------------------+----------------------------------------+
| test_2.id | test_2.dt_array | test_2.rule_array |
+------------+----------------------------------------------------+----------------------------------------+
| a | ["20230809","20230811","20230812","20230812","20230813"] | ["501","502","503","501","512"] |
| b | ["20230809","20230811","20230812","20230812","20230813","20230809"] | ["511","512","513","511","512","511"] |
| c | ["20230811","20230812","20230812","20230813"] | ["512","513","511","512"] |
+------------+----------------------------------------------------+----------------------------------------+
--注:语句中b.idxb和c.idxc分别是子视图的两个索引,我们where子句中使用b.idxb=c.idxc保证顺序的一致性。
> select a.id,b.cur_day,c.rule
> from test_2 a
> lateral view posexplode(dt_array) b as idxb,cur_day,
> lateral view posexplode(rule_array) c as idxc,rule where b.idxb=c.idxc
> ;
--结果:
+-------+------------+---------+
| a.id | b.cur_day | c.rule |
+-------+------------+---------+
| a | 20230809 | 501 |
| a | 20230811 | 502 |
| a | 20230812 | 503 |
| a | 20230812 | 501 |
| a | 20230813 | 512 |
| b | 20230809 | 511 |
| b | 20230811 | 512 |
| b | 20230812 | 513 |
| b | 20230812 | 511 |
| b | 20230813 | 512 |
| b | 20230809 | 511 |
| c | 20230811 | 512 |
| c | 20230812 | 513 |
| c | 20230812 | 511 |
| c | 20230813 | 512 |
+-------+------------+---------+
举例1:使用explode()+lateral view
> select t.id,a.cur_day
> from test_2 t
> lateral view explode(t.dt_array) a as cur_day
> ;
--结果:
+-------+------------+
| t.id | a.cur_day |
+-------+------------+
| b | 20230809 |
| b | 20230811 |
| b | 20230812 |
| b | 20230812 |
| b | 20230813 |
| b | 20230809 |
| c | 20230811 |
| c | 20230812 |
| c | 20230812 |
| c | 20230813 |
| a | 20230809 |
| a | 20230811 |
| a | 20230812 |
| a | 20230812 |
| a | 20230813 |
+-------+------------+
举例2:explode字段是string格式,先split()指定分隔符,如果省略split()则使用默认分隔符是英文逗号。
> desc test_3;
--结果:
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| id | string | |
| dt | string | |
| rule1 | string | |
+-----------+------------+----------+
> select * from test_3;
--结果:
+------------+----------------------------------------------------+--------------------------+
| test_3.id | test_3.dt | test_3.rule1 |
+------------+----------------------------------------------------+--------------------------+
| a | 20230809,20230811,20230812,20230812,20230813 | 501,502,503,501,512 |
| b | 20230809,20230811,20230812,20230812,20230813,20230809 | 511,512,513,511,512,511 |
| c | 20230811,20230812,20230812,20230813 | 512,513,511,512 |
+------------+----------------------------------------------------+--------------------------+
> select t.id,a.cur_day
> from test_3 t
> lateral view explode(split(t.dt,',')) a as cur_day
> ;
--结果:
+-------+------------+
| t.id | a.cur_day |
+-------+------------+
| a | 20230809 |
| a | 20230811 |
| a | 20230812 |
| a | 20230812 |
| a | 20230813 |
| b | 20230809 |
| b | 20230811 |
| b | 20230812 |
| b | 20230812 |
| b | 20230813 |
| b | 20230809 |
| c | 20230811 |
| c | 20230812 |
| c | 20230812 |
| c | 20230813 |
+-------+------------+
举例1:使用posexplode()+lateral view
先测试使用explode(), 看看效果:
> select a.id,b.cur_day,c.rule
> from test_2 a
> lateral view explode(dt_array) b as cur_day,
> lateral view explode(rule_array) c as rule
> ;
--结果:
+-------+------------+---------+
| a.id | b.cur_day | c.rule |
+-------+------------+---------+
| b | 20230809 | 511 |
| b | 20230809 | 512 |
| b | 20230809 | 513 |
| b | 20230809 | 511 |
| b | 20230809 | 512 |
| b | 20230809 | 511 |
| b | 20230811 | 511 |
| b | 20230811 | 512 |
| b | 20230811 | 513 |
| b | 20230811 | 511 |
| b | 20230811 | 512 |
| b | 20230811 | 511 |
| b | 20230812 | 511 |
| b | 20230812 | 512 |
| b | 20230812 | 513 |
| b | 20230812 | 511 |
| b | 20230812 | 512 |
| b | 20230812 | 511 |
| b | 20230812 | 511 |
| b | 20230812 | 512 |
| b | 20230812 | 513 |
| b | 20230812 | 511 |
| b | 20230812 | 512 |
| b | 20230812 | 511 |
| b | 20230813 | 511 |
| b | 20230813 | 512 |
| b | 20230813 | 513 |
| b | 20230813 | 511 |
| b | 20230813 | 512 |
| b | 20230813 | 511 |
| b | 20230809 | 511 |
| b | 20230809 | 512 |
| b | 20230809 | 513 |
| b | 20230809 | 511 |
| b | 20230809 | 512 |
| b | 20230809 | 511 |
| c | 20230811 | 512 |
| c | 20230811 | 513 |
| c | 20230811 | 511 |
| c | 20230811 | 512 |
| c | 20230812 | 512 |
| c | 20230812 | 513 |
| c | 20230812 | 511 |
| c | 20230812 | 512 |
| c | 20230812 | 512 |
| c | 20230812 | 513 |
| c | 20230812 | 511 |
| c | 20230812 | 512 |
| c | 20230813 | 512 |
| c | 20230813 | 513 |
| c | 20230813 | 511 |
| c | 20230813 | 512 |
| a | 20230809 | 501 |
| a | 20230809 | 502 |
| a | 20230809 | 503 |
| a | 20230809 | 501 |
| a | 20230809 | 512 |
| a | 20230811 | 501 |
| a | 20230811 | 502 |
| a | 20230811 | 503 |
| a | 20230811 | 501 |
| a | 20230811 | 512 |
| a | 20230812 | 501 |
| a | 20230812 | 502 |
| a | 20230812 | 503 |
| a | 20230812 | 501 |
| a | 20230812 | 512 |
| a | 20230812 | 501 |
| a | 20230812 | 502 |
| a | 20230812 | 503 |
| a | 20230812 | 501 |
| a | 20230812 | 512 |
| a | 20230813 | 501 |
| a | 20230813 | 502 |
| a | 20230813 | 503 |
| a | 20230813 | 501 |
| a | 20230813 | 512 |
+-------+------------+---------+
出现这种情况,是因为两个并列的explode()的hql没办法识别cur_day对应的rule是什么,对于多个数组的行转列可以使用posexplode()函数。
例如使用如下查询语句:
> select a.id,b.cur_day,c.rule
> from test_2 a
> lateral view posexplode(dt_array) b as idxb,cur_day,
> lateral view posexplode(rule_array) c as idxc,rule where b.idxb=c.idxc
> ;
--结果:
+-------+------------+---------+
| a.id | b.cur_day | c.rule |
+-------+------------+---------+
| b | 20230809 | 511 |
| b | 20230811 | 512 |
| b | 20230812 | 513 |
| b | 20230812 | 511 |
| b | 20230813 | 512 |
| b | 20230809 | 511 |
| c | 20230811 | 512 |
| c | 20230812 | 513 |
| c | 20230812 | 511 |
| c | 20230813 | 512 |
| a | 20230809 | 501 |
| a | 20230811 | 502 |
| a | 20230812 | 503 |
| a | 20230812 | 501 |
| a | 20230813 | 512 |
+-------+------------+---------+