之前的一篇文章介绍了FlinkSQL解析Pulsar中的JSON结构数据,这次来描述一下如果遇到比较复杂的JSON,应该如何处理。我所使用的Flink的版本为v1.12。
{
"cityId":"1",
"cityCode":"1",
"values":[
{
"id":"1_0",
"deviceId":"1_0",
"value":{
"createTime":"1658368497581",
"value":"1.1",
"desc":"值为1.1"
}
},
{
"id":"1_1",
"deviceId":"1_1",
"value":{
"createTime":"1658368497582",
"value":"1.1",
"desc":"值为1.1"
}
}
]
}
JSON结构为示例,此处先不考虑JSON的结构是否合理,只考虑如何解析。
注意观察SQL中定义的结构,一定要和JSON的结构匹配
CREATE TABLE t_in
(
cityId string,
cityCode string,
`values` ARRAY<ROW<
id STRING,
deviceId STRING,
`value` ROW<
createTime STRING,
`value` STRING,
`desc` STRING>>>
) WITH (
'connector' = 'pulsar',
'generic' = 'true',
'topic' = 'persistent://public/default/test',
'service-url' = 'pulsar://127.0.0.1:6650',
'admin-url' = 'http://127.0.0.1:8080',
'scan.startup.mode' = 'external-subscription',
'scan.startup.sub-name' = 'test',
'scan.startup.sub-start-offset' = 'earliest',
'format' = 'json');
List中嵌套的对象,可以用<对象.属性>的方式获取。'connector'可自行定义。
Sql中一些函数的用法可以参考
官网。
CREATE TABLE t_out
(
cityId string,
cityCode string,
id string,
deviceId string,
createTime string,
`value` string,
`desc` string
) WITH (
'connector' = 'print'
);
INSERT INTO t_out
SELECT cityId,
cityCode,
id,
deviceId,
`value`.createTime as createTime,
`value`.`value` as `value`,
`value`.`desc` as `desc`
FROM t_in
CROSS JOIN UNNEST(`values`) AS t(id, deviceId, `value`);