现有如下json字符串数组,需要把其拆分多行
[{
"tabid": "新人专享",
"id": 100015129300,
"channelid": "APP",
"ass_rule": {
"lv1": "ass",
"lv2": "new1"
}
}, {
"tabid": "新人专享",
"id": 100006377942,
"channelid": "APP",
"uas_rule": {
"lv1": "ass",
"lv2": "new2"
}
}, {
"tabid": "新人专享",
"id": 100014352353,
"channelid": "APP",
"ass_rule": {
"lv1": "ass",
"lv2": "new3"
}
}]
分析:没有别的好办法,肯定提取“[]”之间的json字符串之后、后进行分割,逗号分割不行,逗号太多,而我们只想两个大括号之间的逗号做分割“ },{”,所以要把逗号替换成json数组中不存在的其它字符,之后再进行分割,如下,五步走:
1、regexp_extract提取[]内的json字符串
2、regexp_replace替换为逗号为自定义分隔符
3、split根据自定义分隔符分割
4、explode行转列函数
5、get_json_object解析json取字段值
SELECT
get_json_object(str_json, '$.id') AS sid
FROM
(
SELECT
event_param_json,
split(regexp_replace(regexp_extract(event_param_json, '(\\[)(.*?)(\\])', 2), '\\},\\{', '\\}#\\{'), '\\#') AS json_list
FROM
abm.abm_wireless_exposure_log
WHERE
dt = '2020-09-06'
AND event_id = 'ProExpo'
AND event_param_json LIKE '%ass%'
)
a lateral VIEW explode(json_list) list_tab AS str_json
或者如下SQL都可以(两种SQL只是提取中括号[]间json字符串的正则不一样)
SELECT
GET_JSON_OBJECT(json_str, '$.id') AS sid
FROM
(
SELECT
json_str
FROM
(
SELECT
split(regexp_replace(regexp_extract(event_param_json -- 获取data数组,格式[{json},{json}]
, '^\\[(.+)\\]$', 1) -- 删除字符串前后的[],格式{json},{json}
, '\\}\\,\\{', '\\}\\|\\|\\{') -- 将josn字符串中的分隔符代换成||,格式{json}||{json}
, '\\|\\|') AS json_list
FROM
abm.abm_wireless_exposure_log
WHERE
dt = '2020-09-06'
AND page_id = 'ManChannel'
-- and event_id = 'RecProExpo'
AND event_param_json LIKE '%ass%'
)
a lateral VIEW explode(json_list) list_tab AS json_str
)
t
如图,运行中间sql,已经分割为多行单个json:
是(replace(replace(replace(event_param_json, '[', ''), ']', ''), '},{', '}#{'), '#')
分析:上面hive的正则在presto提取不到,所以json提起就换种方法。两次替换掉json中括号,后进行分割,逗号分割不行,逗号太多,而我们只想两个大括号之间的逗号做分割“ },{”,所以要把逗号替换成json数组中不存在的其它字符,之后再进行分割,如下,五步走:
1、 两次replace替换掉左右中括号[]
2、replace替换为逗号为自定义分隔符
3、split根据自定义分隔符分割
4、unnest行转列函数
5、json_extract_scalar解析json取字段值
SELECT
str_json,
json_extract_scalar(str_json, '$.id') AS sid
FROM
(
SELECT
event_param_json
FROM
abm.abm_wireless_exposure_log
WHERE
dt = '2020-09-06'
AND event_id = 'NecProExpo'
AND event_param_json LIKE '%ass%'
)
CROSS JOIN unnest(SPLIT(REPLACE(REPLACE(REPLACE(event_param_json, '[', ''), ']', ''), '},{', '}#{'), '#')) AS t(str_json)
运行结果如下图,json数组就分割为单个json,提取其中某个值了
参考:SELECT — Presto 0.273.3 Documentation