CLICKHOUSE中JSON解析

CLICKHOUSE中JSON解析

Clickhouse一般使用字符串行式保存JSON数据,clickhouse提供了JSON函数,方便我们提取JSON数据。
如要提取CONTENT字符串中保存的JSON数据,DATA的所有字段。
在这里插入图片描述

{
    "SENDER":"operation",
    "TABLE":"team_goal",
    "TIME":1675844952009,
    "DATA":{
        "data":{
            "_id":"63d78bb319cef40007220f45",
            "appId":"62e9d317c653b800076fc089",
            "createTime":"2023-01-30T09:19:47.924Z",
            "creator":{
                "_id":"5c9c6562c9c631ec0899fbe8",
                "name":"李雷",
                "status":1,
                "type":0,
                "username":"1551345709528115"
            },
            "deleteTime":null,
            "deleter":null,
            "entryId":"6350bed4ff61f1000a167f33",
            "formName":"任务",
            "goal_amount":8200000,
            "goal_channel":"销售",
            "goal_date":"2023-11-30T16:00:00.000Z",
            "team_code":"100009",
            "team_name":"北京分公司",
            "updateTime":"2023-02-08T08:29:11.892Z",
            "updater":{
                "_id":"628bd2ca2aff520007a6b39d",
                "name":"韩梅梅",
                "status":1,
                "type":0,
                "username":"264125661026134153"
            }
        },
        "op":"data_update"
    }
}

该JSON嵌套了4层,我们应该怎么样去提取?从结构上看,可以利用JSONExtractRaw函数将DATA提取出来。
JSONExtractRaw(json[, indices_or_keys]…)
JSONExtractRaw函数可以将一部分 JSON 作为未分析的字符串返回。如果该部件不存在或类型错误,则将返回空字符串。

WITH JSONExtractRaw(replaceAll(CONTENT, '\n', '
'
), 'DATA') as DATA SELECT DATA FROM default.test;

提取结果:
在这里插入图片描述

{
    "data":{
        "_id":"63d78bb319cef40007220f45",
        "appId":"62e9d317c653b800076fc089",
        "createTime":"2023-01-30T09:19:47.924Z",
        "creator":{
            "_id":"5c9c6562c9c631ec0899fbe8",
            "name":"李雷",
            "status":1,
            "type":0,
            "username":"1551345709528115"
        },
        "deleteTime":null,
        "deleter":null,
        "entryId":"6350bed4ff61f1000a167f33",
        "formName":"任务",
        "goal_amount":8200000,
        "goal_channel":"销售",
        "goal_date":"2023-11-30T16:00:00.000Z",
        "team_code":"100009",
        "team_name":"北京分公司",
        "updateTime":"2023-02-08T08:29:11.892Z",
        "updater":{
            "_id":"628bd2ca2aff520007a6b39d",
            "name":"韩梅梅",
            "status":1,
            "type":0,
            "username":"264125661026134153"
        }
    },
    "op":"data_update"
}

此时可以看到已经将DATA从CONTENT中提取出来,同理可以将data从DATA中提取出来
在这里插入图片描述

{
    "_id":"63d78bb319cef40007220f45",
    "appId":"62e9d317c653b800076fc089",
    "createTime":"2023-01-30T09:19:47.924Z",
    "creator":{
        "_id":"5c9c6562c9c631ec0899fbe8",
        "name":"李雷",
        "status":1,
        "type":0,
        "username":"1551345709528115"
    },
    "deleteTime":null,
    "deleter":null,
    "entryId":"6350bed4ff61f1000a167f33",
    "formName":"任务",
    "goal_amount":8200000,
    "goal_channel":"销售",
    "goal_date":"2023-11-30T16:00:00.000Z",
    "team_code":"100009",
    "team_name":"北京分公司",
    "updateTime":"2023-02-08T08:29:11.892Z",
    "updater":{
        "_id":"628bd2ca2aff520007a6b39d",
        "name":"韩梅梅",
        "status":1,
        "type":0,
        "username":"264125661026134153"
    }
}

接下来可以对data中的每个字段进行提取。在此之前先来说说CLICKHOUSE中常见的几个JSON提取函数。

–1.抽取JSON中返回值为String类型的值

SELECT
    JSONExtractString('{"a": "hello", "b": [-100, 200.0, 300]}', 'a') AS a;

返回
CLICKHOUSE中JSON解析_第1张图片
–2.抽取JSON中返回值为Float类型的值

SELECT JSONExtractFloat('{"a": "hello", "b": [-100, 200.0, 300],"f":1.01}', 'f') as f;

返回
在这里插入图片描述
–3.抽取JSON中返回值为Int类型的值

SELECT JSONExtractInt('{"a": "hello", "b": [-100, 200.0, 300],"i":100}','i') as i;

返回
CLICKHOUSE中JSON解析_第2张图片

–4.抽取JSON中返回值为UInt类型的值

SELECT JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300],"i":-100}','i') as i0,
            JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300],"i":100}','i') as i;

返回
CLICKHOUSE中JSON解析_第3张图片
这些函数也可以换个写法,例如
–提取字段类型为String

SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300]}', 'a', 'String') AS a

返回
CLICKHOUSE中JSON解析_第4张图片
–提取字段类型为Float

SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300],"f":1.01}', 'f','Float') as f

返回CLICKHOUSE中JSON解析_第5张图片
–提取字段类型为UInt

SELECT
    JSONExtract('{"a": "hello", "b": [-100, 200.0, 300],"i":-100}', 'i', 'UInt64') AS i0,
    JSONExtractUInt('{"a": "hello", "b": [-100, 200.0, 300],"i":100}', 'i') AS i

返回
CLICKHOUSE中JSON解析_第6张图片
CLICKHOUSE中,没有提供Decimal类型的JSON提取函数,当要提取的字段类型为Decimal时,可以这样写

SELECT JSONExtract('{"a": "hello", "b": [-100, 200.0, 300],"f":1.0111111}', 'f', 'Decimal(18,2)') AS f

返回
在这里插入图片描述
即第三个参数直接指定该字段的类型。

下面开始提取data中的字段。

WITH JSONExtractRaw(JSONExtractRaw(replaceAll(CONTENT, '\n', '
'
), 'DATA'), 'data') AS data SELECT JSONExtractString(data, '_id') AS _id, JSONExtractString(JSONExtractRaw(replaceAll(CONTENT, '\n', '
'
), 'DATA'), 'op') AS _op, addHours(toDateTime64(replaceAll(JSONExtractString(data, 'deleteTime'), 'Z', ''), 3), 8) AS _delete_time, if(toYear(_delete_time) > 1970, _delete_time, addHours(toDateTime64(replaceAll(JSONExtractString(data, 'updateTime'), 'Z', ''), 3), 8)) AS _update_time, if(empty(JSONExtract(data, 'team_name', 'Nullable(String)')) = 0, JSONExtract(data, 'team_name', 'Nullable(String)'), NULL) AS team_name, toInt64(JSONExtractString(data, 'team_code')) AS team_code, addHours(toDateTime64(replaceAll(JSONExtractString(data, 'goal_date'), 'Z', ''), 3), 8) AS goal_date, JSONExtractString(data, 'goal_channel') AS goal_channel, toDecimal64(JSONExtractFloat(data, 'goal_amount'), 2) AS goal_amount, JSONExtractRaw(data, 'creator') AS creator, JSONExtractString(creator, '_id') AS fill_id, JSONExtractString(creator, 'name') AS fill_name, JSONExtractString(data, 'formName') AS formName, addTime FROM default.test

提取结果
CLICKHOUSE中JSON解析_第7张图片

你可能感兴趣的:(json,clickhouse,数据库)