今天写几个hive函数, find_in_set,get_json_object,lateral view explode,方便以后查找话不多说开干。
find_in_set函数说两点
1.使用方法
FIND_IN_SET(
str
,strlist
)
SELECT FIND_IN_SET('b','a,b,c,d');
-> 2
SELECT FIND_IN_SET('e','a,b,c,d');
-> 0
2.建议可以使用关联的不要使用find_in_set(),效率上很差。
get_json_object
应用场景:使用Hive对日志信息进行查询分解,log里面记录的是json形式的数据:
request_uri_dmalog_show_post_id --》{"id":"3227823","type":"guanzhu","ca_kw":18,"pn":1}
select request_uri_dmalog_atype, get_json_object(request_uri_dmalog_show_post_id,'$.pn')
from ods.ods_rtm_app_ev where request_uri_dmalog_atype='show' limit 9 --json key取值
多层json处理
https://blog.csdn.net/qq_31573519/article/details/55104822
lateral view explode
使用
数据样式
1d66fa6eeaba3abdc57ae9cd5909eabb [26,87,27,39,32,56,50,69,29,46,59,63,60,28,62,72,36]
SELECT uuid,tagid tag_id FROM ods.ods_remymartin_data_rtm_unlogin_user_follow_tag lateral view explode(split(substring(tag_id,2,length(tag_id)-2),',')) adtable AS tagid --一行转多列
结果
1d66fa6eeaba3abdc57ae9cd5909eabb 26
1d66fa6eeaba3abdc57ae9cd5909eabb 87
1d66fa6eeaba3abdc57ae9cd5909eabb 27
1d66fa6eeaba3abdc57ae9cd5909eabb 39
1d66fa6eeaba3abdc57ae9cd5909eabb 32
1d66fa6eeaba3abdc57ae9cd5909eabb 56
1d66fa6eeaba3abdc57ae9cd5909eabb 50
1d66fa6eeaba3abdc57ae9cd5909eabb 69
1d66fa6eeaba3abdc57ae9cd5909eabb 29
1d66fa6eeaba3abdc57ae9cd5909eabb 46
1d66fa6eeaba3abdc57ae9cd5909eabb 59
1d66fa6eeaba3abdc57ae9cd5909eabb 63
1d66fa6eeaba3abdc57ae9cd5909eabb 60
1d66fa6eeaba3abdc57ae9cd5909eabb 28
1d66fa6eeaba3abdc57ae9cd5909eabb 62
1d66fa6eeaba3abdc57ae9cd5909eabb 72
1d66fa6eeaba3abdc57ae9cd5909eabb 36