hive 解析json 之 json 中(key 不确定名称及个数, values 不确定数值及个数) 情况下解析

  • json 串如下所显示  repayment  list 中,有 多个 maps ,其中个maps中的key 是不确定个数的,半年是7个,2个月就是3个.
  • 
    "repayment":{
    "credit_rpy_amt":{
    "2018-11":"0.00",
    "2018-10":"0.00",
    "2018-09":"0.00",
    "2018-08":"0.00",
    "2018-07":"0.00",
    "2018-06":"0.00",
    "sum":"0.00"
    },
    "credit_rpy_cnt":{
    "2018-11":"0",
    "2018-10":"0",
    "2018-09":"0",
    "2018-08":"0",
    "2018-07":"0",
    "2018-06":"0",
    "sum":"0"
    },
    "huabei_rpy_amt":{},
    "huabei_rpy_cnt":{},
    "jiebei_rpy_amt":{},
    "jiebei_rpy_cnt":{},
    "other_rpy_amt":{},
    "other_rpy_cnt":{}
    },
  • 解析方法为:
  •  
  • add jar /tmp/json-serde-1.3.8-jar-with-dependencies.jar; -- hive 加jar 包

  •  1 先找到 credit_rpy_amt 的 上级,解析到他的上级就完成了 
  • 2 使用lateral view explode 模拟oracle 的level  
  • 3 使用 level 来解析 

 

select  
customerid,timestamp,
keys,
a1_values[case when t.keys='sum'  then mrn-1 else mrn-rn-1 end],
a2_values[case when t.keys='sum'  then mrn-1 else mrn-rn-1 end]
from (
select t.*,max(rn) over (partition by timestamp,customerid ) mrn from (
select t.*,keys,row_number () over ( partition by timestamp,customerid order by keys)  rn
from (
select subtype,
timestamp,
topic,
t.customerid,
t.data.httpStatusCode,
str_to_map(regexp_replace(get_json_object(t.data.data,'$.major_expenditure.repayment.credit_rpy_amt'),'\\"|\\{|\\}','')) a1,
map_values(str_to_map(regexp_replace(get_json_object(t.data.data,'$.major_expenditure.repayment.credit_rpy_amt'),'\\"|\\{|\\}',''))) a1_values,
map_values(str_to_map(regexp_replace(get_json_object(t.data.data,'$.major_expenditure.repayment.credit_rpy_cnt'),'\\"|\\{|\\}',''))) a2_values,
map_values(str_to_map(regexp_replace(get_json_object(t.data.data,'$.major_expenditure.repayment.huabei_rpy_amt'),'\\"|\\{|\\}',''))) a3_values,
map_values(str_to_map(regexp_replace(get_json_object(t.data.data,'$.major_expenditure.repayment.huabei_rpy_cnt'),'\\"|\\{|\\}',''))) a4_values,
map_values(str_to_map(regexp_replace(get_json_object(t.data.data,'$.major_expenditure.repayment.jiebei_rpy_amt'),'\\"|\\{|\\}',''))) a5_values,
map_values(str_to_map(regexp_replace(get_json_object(t.data.data,'$.major_expenditure.repayment.jiebei_rpy_cnt'),'\\"|\\{|\\}',''))) a6_values,
map_values(str_to_map(regexp_replace(get_json_object(t.data.data,'$.major_expenditure.repayment.other_rpy_amt'),'\\"|\\{|\\}','')) ) a7_values,
map_values(str_to_map(regexp_replace(get_json_object(t.data.data,'$.major_expenditure.repayment.other_rpy_cnt'),'\\"|\\{|\\}','')) ) a8_values
from tmp.tmp_json_201801 where timestamp in ('1542643258438','1542644587781','1542643388916'))  t 
lateral view explode(a1) table1 as keys,values) t ) t  

 

你可能感兴趣的:(hive)