在ClickHouse中计算衍生度量

指标的核心任务就是把原始数据转换为专家数据,指标有维度和度量等要素组成,有时为了提高计算效率,会基于指标度量同时定义其衍生度量,一次性计算多个值。如有当月销售额派生出上月销售额、去年同期销售额等。本文通过示例介绍ClickHouse如何实现衍生度量计算。

示例数据

首先创建示例表,并插入数据:

CREATE TABLE events
(
    timestamp DateTime,
    key LowCardinality(String),
    value UInt32
) engine=MergeTree()
ORDER BY (key, timestamp)

-- 插入去年同期数据
INSERT INTO events SELECT
    now() - toIntervalYear(1),
    toString(number % 100),
    rand()
FROM numbers(1000)

-- 插入当月数据
INSERT INTO events SELECT
    now(),
    toString((number % 100)),
    rand()
FROM numbers(1000)

-- 插入上月数据
INSERT INTO events SELECT
    now()- toIntervalMonth(1),
    toString((number % 100)),
    rand()
FROM numbers(1000)

计算YOY增长

WITH toYear(timestamp) AS year
SELECT
    key,
    sumIf(value, year = toYear(now())) AS this_year,
    sumIf(value, year = (toYear(now()) - 1)) AS past_year,
    round(this_year / past_year, 3) AS yoy
FROM events
WHERE key = '1'
GROUP BY key


Query id: 612ff273-101d-4e58-b576-08a914e30087

key|this_year  |past_year  |yoy  |
---+-----------+-----------+-----+
1  |42081909896|21430127835|1.964|

1 row in set. Elapsed: 0.006 sec. Processed 16.38 thousand rows, 147.68 KB (2.96 million rows/s., 26.66 MB/s.)

计算月度同比和环比

WITH ( toYear(timestamp) ) AS year, ( toMonth(timestamp) ) AS month
SELECT
    key,
    sumIf(value, year = toYear(now()) and month = toMonth(now()) ) AS this_month,
    sumIf(value, year = toYear(now()) and month = toMonth(now())-1 ) AS last_month,
    sumIf(value, year = (toYear(now()) - 1) and month = toMonth(now()) ) AS past_ym,
    round( (this_month- last_month) / last_month, 3)*100 AS mgr,
    round( (this_month- past_ym) / past_ym, 3)*100 AS ygr
FROM events
WHERE key = '1'
GROUP BY key

key|this_month |last_month |past_ym    |mgr  |ygr |
---+-----------+-----------+-----------+-----+----+
1  |19761379814|22320530082|21430127835|-11.5|-7.8|

移动平均

移动平均也是衍生度量之一,计算移动平均值函数为:

groupArrayMovingAvg(numbers_for_summing)
groupArrayMovingAvg(window_size)(numbers_for_summing)

该函数可以接受窗口大小作为参数,如果没有指定,则窗口大小等于记录数。

  • numbers_for_summing — 数值类型表达式值
  • window_size — 计算窗口大小.

返回值

  • 与输入列大小和数据类型一致的数组.

举例

CREATE TABLE t
(
    `int` UInt8,
    `float` Float32,
    `dec` Decimal32(2)
)
ENGINE = TinyLog

-- 插入示例数据
┌─int─┬─float─┬──dec─┐
│   11.11.10 │
│   22.22.20 │
│   44.44.40 │
│   77.777.77 │
└─────┴───────┴──────┘

-- 查询移动平均
SELECT
    groupArrayMovingAvg(int) AS I,
    groupArrayMovingAvg(float) AS F,
    groupArrayMovingAvg(dec) AS D
FROM t

-- 返回结果
┌─I─────────┬─F───────────────────────────────────┬─D─────────────────────┐
│ [0,0,1,3][0.275,0.82500005,1.9250001,3.8675][0.27,0.82,1.92,3.86] │
└───────────┴─────────────────────────────────────┴───────────────────────┘

-- 窗口为2
SELECT
    groupArrayMovingAvg(2)(int) AS I,
    groupArrayMovingAvg(2)(float) AS F,
    groupArrayMovingAvg(2)(dec) AS D
FROM t

-- 返回结果
┌─I─────────┬─F────────────────────────────────┬─D─────────────────────┐
│ [0,1,3,5][0.55,1.6500001,3.3000002,6.085][0.55,1.65,3.30,6.08] │
└───────────┴──────────────────────────────────┴───────────────────────┘

移动累加

移动累积和,计算函数如下:

groupArrayMovingSum(numbers_for_summing)
groupArrayMovingSum(window_size)(numbers_for_summing)

The function can take the window size as a parameter. If left unspecified, the function takes the window size equal to the number of rows in the column.

同样有参数可指定窗口大小,如果没有指定,则窗口大小等于列的记录数。

参数

  • numbers_for_summing — 数值数据类型表达式.
  • window_size — 窗口大小.

返回值

  • 与输入数据类型和大小一致的数组.

举例

CREATE TABLE t
(
    `int` UInt8,
    `float` Float32,
    `dec` Decimal32(2)
)
ENGINE = TinyLog

--  插入示例数据
┌─int─┬─float─┬──dec─┐
│   11.11.10 │
│   22.22.20 │
│   44.44.40 │
│   77.777.77 │
└─────┴───────┴──────┘

## 不指定窗口大小
SELECT
    groupArrayMovingSum(int) AS I,
    groupArrayMovingSum(float) AS F,
    groupArrayMovingSum(dec) AS D
FROM t

-- 返回结果
┌─I──────────┬─F───────────────────────────────┬─D──────────────────────┐
│ [1,3,7,14][1.1,3.3000002,7.7000003,15.47][1.10,3.30,7.70,15.47] │
└────────────┴─────────────────────────────────┴────────────────────────┘

-- 窗口大小为2
SELECT
    groupArrayMovingSum(2)(int) AS I,
    groupArrayMovingSum(2)(float) AS F,
    groupArrayMovingSum(2)(dec) AS D
FROM t

-- 返回结果
┌─I──────────┬─F───────────────────────────────┬─D──────────────────────┐
│ [1,3,6,11][1.1,3.3000002,6.6000004,12.17][1.10,3.30,6.60,12.17] │
└────────────┴─────────────────────────────────┴────────────────────────┘

指数移动平均

对时间序列数据计算指数移动平均,语法如下:

exponentialMovingAverage(x)(value, timeunit)

value 为对应timeunit的值. 半衰期x是指指数权重衰减1 / 2的时间滞后。该函数返回加权平均值:时间点越老,对应值的权重越小。

Arguments

  • value — 为 Integer, Float 或 Decimal.
  • timeunit — 为 Integer, Float 或 Decimal. Timeunit 不是时间戳,是时间间隔索引 (秒), 它是时间间隔的索引,可以使用 intDiv函数计算。

Parameters

  • x — 半衰期. 类型为 Integer, Float 或 Decimal.

返回值

  • 在最近的时间点返回过去x时间值的指数平滑移动平均值

返回值类型: Float64.

示例

输入示例数据:

┌──temperature─┬─timestamp──┐
│          951  │
│          952  │
│          953  │
│          964  │
│          965  │
│          966  │
│          967  │
│          978  │
│          979  │
│          9710  │
│          9711  │
│          9812  │
│          9813  │
│          9814  │
│          9815  │
│          9916  │
│          9917  │
│          9918  │
│         10019  │
│         10020  │
└──────────────┴────────────┘

SELECT exponentialMovingAverage(5)(temperature, timestamp);

-- 返回结果
┌──exponentialMovingAverage(5)(temperature, timestamp)──┐
│                                    92.25779635374204  │
└───────────────────────────────────────────────────────┘

参考官网文档:exponentialmovingaverage | ClickHouse Docs

你可能感兴趣的:(ClickHouse,clickhouse,数据库)