使用HIVE 自带的JSON 函数进行json解析 同时解析JSON数组

数据展示
这是遗传JSON字符串 ,是一个还款计划,其中包含了很多的还款计划。

{"plan":[{"principal":"1114.09","interest":"489.14","date":"2018-11-02"},{"principal":"1124.30","interest":"423.03","date":"2018-12-02"},{"principal":"1134.61","interest":"412.72","date":"2019-01-02"},{"principal":"1145.01","interest":"402.32","date":"2019-02-02"},{"principal":"1155.50","interest":"391.83","date":"2019-03-02"},{"principal":"1166.10","interest":"381.23","date":"2019-04-02"},{"principal":"1176.78","interest":"370.55","date":"2019-05-02"},{"principal":"1187.57","interest":"359.76","date":"2019-06-02"},{"principal":"1198.46","interest":"348.87","date":"2019-07-02"},{"principal":"1209.44","interest":"337.89","date":"2019-08-02"},{"principal":"1220.53","interest":"326.80","date":"2019-09-02"},{"principal":"1231.72","interest":"315.61","date":"2019-10-02"},{"principal":"1243.01","interest":"304.32","date":"2019-11-02"},{"principal":"1254.40","interest":"292.93","date":"2019-12-02"},{"principal":"1265.90","interest":"281.43","date":"2020-01-02"},{"principal":"1277.51","interest":"269.82","date":"2020-02-02"},{"principal":"1289.22","interest":"258.11","date":"2020-03-02"},{"principal":"1301.03","interest":"246.30","date":"2020-04-02"},{"principal":"1312.96","interest":"234.37","date":"2020-05-02"},{"principal":"1325.00","interest":"222.33","date":"2020-06-02"},{"principal":"1337.14","interest":"210.19","date":"2020-07-02"},{"principal":"1349.40","interest":"197.93","date":"2020-08-02"},{"principal":"1361.77","interest":"185.56","date":"2020-09-02"},{"principal":"1374.25","interest":"173.08","date":"2020-10-02"},{"principal":"1386.85","interest":"160.48","date":"2020-11-02"},{"principal":"1399.56","interest":"147.77","date":"2020-12-02"},{"principal":"1412.39","interest":"134.94","date":"2021-01-02"},{"principal":"1425.34","interest":"121.99","date":"2021-02-02"},{"principal":"1438.40","interest":"108.93","date":"2021-03-02"},{"principal":"1451.59","interest":"95.74","date":"2021-04-02"},{"principal":"1464.90","interest":"82.43","date":"2021-05-02"},{"principal":"1478.32","interest":"69.01","date":"2021-06-02"},{"principal":"1491.87","interest":"55.46","date":"2021-07-02"},{"principal":"1505.55","interest":"41.78","date":"2021-08-02"},{"principal":"1519.35","interest":"27.98","date":"2021-09-02"},{"principal":"1533.28","interest":"14.05","date":"2021-10-02"}]}

目标表建表语句,现在的需求是要将一个JSON转化为多条数据,并且增加还款期数

CREATE TABLE `app.app_cpdji_repayment_plan`
  (
    `platform_no` string COMMENT '社会信用编码'
    , `project_id` string COMMENT '项目编号'
    , `contract_id` string COMMENT '合同编号'
    , `repayment_periods` INT COMMENT '还款期数'
    , `repayment_date` string COMMENT '还款日期'
    , `principal` DECIMAL(20,2) COMMENT '应还本金'
    , `interest` DECIMAL(20,2) COMMENT '应还利息'
  )
  COMMENT '还款计划' STORED AS PARQUET;
  1. 获取JSON 中计划的内容
 select get_json_object(repay_plan, '$.plan');
  1. 将JSON串中的各行进行区分 ,分出多行数据
 split(regexp_replace( regexp_replace( regexp_replace(get_json_object(repay_plan, '$.plan'),'\\[','') , '\\]','') ,'\\}\\,\\{' ,'\\}\\;\\{') ,'\\;')) course_scores AS json) AS table1 LATERAL VIEW json_tuple(json,'date','principal','interest')
  1. 将数组进行列转行
 explode( split(regexp_replace( regexp_replace( regexp_replace(get_json_object(repay_plan, '$.plan'),'\\[','') , '\\]','') ,'\\}\\,\\{' ,'\\}\\;\\{') ,'\\;')) course_scores AS json) AS table1 LATERAL VIEW json_tuple(json,'date','principal','interest') d AS DATE1
  , principal
  ,interest; 
  1. 完成数据加工,进行按照日期排序
INSERT INTO app.app_cpdji_repayment_plan
SELECT platform_no
  ,project_no
  ,contract_no
  ,ROW_NUMBER() over(
                   PARTITION BY project_no
                     ,contract_no
                     ,platform_no
                   ORDER BY DATE1 ) rt
  ,DATE1
  ,principal
  ,interest
FROM ( SELECT json
    ,project_no
    ,contract_no
    ,platform_no
  FROM app.app_cpdji_view_ods_prodc_inv_contract LATERAL VIEW explode( split(regexp_replace( regexp_replace( regexp_replace(get_json_object(repay_plan, '$.plan'),'\\[','') , '\\]','') ,'\\}\\,\\{' ,'\\}\\;\\{') ,'\\;')) course_scores AS json) AS table1 LATERAL VIEW json_tuple(json,'date','principal','interest') d AS DATE1
  , principal
  ,interest;

这样一个简单的数据解析就完成了,不得不说HIVE进行JSON解析是真的很不错。
使用HIVE 自带的JSON 函数进行json解析 同时解析JSON数组_第1张图片
花费了 1100多秒。

使用HIVE 自带的JSON 函数进行json解析 同时解析JSON数组_第2张图片
一共200亿的数据量,这个效率还是太棒了。

你可能感兴趣的:(HIVE)