Hive中对json处理

应用场景:使用Hive对日志信息进行查询分解,log里面记录的是json形式的数据:

{"logid":"5d40e1af-19f7-4aad-af8f-c7247e322e5c","souc":"4","devi":"OPPO R7sm","sys":"22,5.1.1","dname":"Dalvik/2.1.0 (Linux; U; 
Android 5.1.1; OPPO R7sm Build/LMY47V)","chan":"canary","vers":"638.1","mac":"dc:6d:cd:16:46:0f","imei":"869410022554506","ifa":"","city_id":"384",
"reso":"1800*1080","euid":"a341c57d6abcdd969b6dc4c7a564a15f","dpi":"3.0","host":"192.168.1.212:80","ip":"223.100.138.62","uid":"9587355",
"qtype":"view_caipu_detail","qtype_sub":"","obj":"861920","uri":"recipe/detail","qnum":"1","refer":"","lat":"39.68861","lon":"122.978162",
"srctype":"2800","agentid":"2066b417879f467164fd1fb91b9d04c0",
"ext":"{\"query\":{\"kw\":\"培根披萨\",\"src\":\"2801\",\"idx\":\"3\",\"type\":\"13\",\"id\":\"861920\"}}","page_num":"","android_id":"","pseudo_id":""}

在这部分日志里面,对于:

"ext":"{\"query\":{\"kw\":\"培根披萨\",\"src\":\"2801\",\"idx\":\"3\",\"type\":\"13\",\"id\":\"861920\"}}","
部分,是json里面又包含了2重json。

接下来,我们使用Hive来取得kw、src、idx、type等信息:

写法一:

select tmp.kw,tmp.src,tmp.idx,count(*) as num 
from (
	select get_json_object(tt.query,'$.kw') as kw,get_json_object(tt.query,'$.src') as src,get_json_object(tt.query,'$.idx') as idx 
	from (
		select get_json_object(t.ext,'$.query') as query 
		from (
			select req["ext"] as ext
			from dh_server_log where p_day=20170213 and req["qtype"]='view_caipu_detail' and req["ext"] <> '' 
		) t 
	) tt 
) tmp group by tmp.kw,tmp.src,tmp.idx 
这种写法使用了Hive内置函数get_json_object,而且使用了多层次来分解,可以实现对json的分解。

写法二:

select json_tuple(get_json_object(req["ext"],"$.query"),"kw","id","idx","type","src") 
from dh_server_log where p_day=20170213 and req["qtype"]='view_caipu_detail' and req["ext"] <> '' 
limit 20

这种写法使用了json_tuple函数配合get_json_object来取得json里的数据,取得了''query'':'' ''这第二层json的数据,但是如果对于"kw","id","idx","type","src"这个几个字段只需要其中几个的话,则需要使用lateral view来处理了:

select nt.a,count(nt.a) as num from (
select req from dh_server_log where p_day=20170213 and req["qtype"]='view_caipu_detail' and req["ext"] <> '' limit 10 
) dsl lateral view json_tuple(get_json_object(req["ext"],"$.query"),"kw","id","idx","type","src") nt as a,b,c,d,e 





你可能感兴趣的:(Hive)