SQL:LATERAL VIEW函数解析多嵌套的json

数据处理中,一遇到json就头大,很长一段时间里,明知lateral view函数是个好东西,但就是很抗拒去学,都是找数仓的同事先理好字段直接用,顺便以菜鸡的身份,同情和膜拜一下埋头洗脏数的数仓同学,大佬辛苦。。

前段时间加入到一个数据建设的项目中作为先锋军打头阵,没办法遇到json还是硬着头皮终于学会了lateral view用法,感受:困难只是心中的一座大山!也不过如此!

我肯定我过不了几天一定会忘掉(其实已经忘掉一点了。。),没有好记性拿起烂笔头,记录在这里吧。

基本语法:

select 
*
from T t
lateral view json_tuple(t.json_txt,[],[],……) q as item1,item2,……

假设T表中有个json_txt字段取值格式如下:

{
"student_no":"0001",
"student_name":'zhangxiaoxiao',
"class":"高三(1)班",
"score_detail":{
            "scoreList":[{"scores":[
                                    {"course":"语文","score":100,"rank":2}
                                    ,{"course":"数学","score":120,"rank":9}
                                    ,{"course":"英语","score":110,"rank":6}
                                    ,{"course":"化学","score":90,"rank":4}
                                    ,{"course":"物理","score":90,"rank":3}
                                    ,{"course":"生物","score":90,"rank":2}
                                    ]
                        }]
            },
"total_score":"600"
"overal_rank":"3",
}

如果我要得到每个学生的所有信息字段,则需要将json中的信息解析出来。

select 
t.* ----表中其他原始字段保留
,q.student_no
,q.student_name,
,q.class,
,q.total_score,
,q.overal_rank,
,q.course,q.score,q.rank
from T t
lateral view json_tuple(t.json_txt,
                     "student_no",
                     "student_name",
                     "class",
                     "total_score",
                     "overal_rank",
                     "score_detail.scoreList.[*].scores.[*].course",
                     "score_detail.scoreList.[*].scores.[*].score"
                     "score_detail.scoreList.[*].scores.[*].rank"
) q as course,score,rank

得到结果如下:

result

但是score和rank是以数列形式存储在同一行,不方便计算,用trans_array()函数可以解决啦:

select 
trans_array(5,',',student_no,student_name,class,total_score,overal_rank,course,score,rank) as (student_no,student_name,class,total_score,overal_rank,course,score,rank)
from (
        select           
               student_no,student_name,class,total_score,overal_rank
              ,regexp_replace(course,'(\\[)|(\\])|("))','') as course ---去掉[]"符号 
              ,regexp_replace(score,'(\\[)|(\\])|("))','') as score ---去掉[]"符号
              ,regexp_replace(rank,'(\\[)|(\\])|("))','') as rank ---去掉[]"符号
        from result
) t

得到的结果就是纵列的分数明细:

result2

你可能感兴趣的:(SQL:LATERAL VIEW函数解析多嵌套的json)