指标的核心任务就是把原始数据转换为专家数据,指标有维度和度量等要素组成,有时为了提高计算效率,会基于指标度量同时定义其衍生度量,一次性计算多个值。如有当月销售额派生出上月销售额、去年同期销售额等。本文通过示例介绍ClickHouse如何实现衍生度量计算。
首先创建示例表,并插入数据:
CREATE TABLE events
(
timestamp DateTime,
key LowCardinality(String),
value UInt32
) engine=MergeTree()
ORDER BY (key, timestamp)
-- 插入去年同期数据
INSERT INTO events SELECT
now() - toIntervalYear(1),
toString(number % 100),
rand()
FROM numbers(1000)
-- 插入当月数据
INSERT INTO events SELECT
now(),
toString((number % 100)),
rand()
FROM numbers(1000)
-- 插入上月数据
INSERT INTO events SELECT
now()- toIntervalMonth(1),
toString((number % 100)),
rand()
FROM numbers(1000)
WITH toYear(timestamp) AS year
SELECT
key,
sumIf(value, year = toYear(now())) AS this_year,
sumIf(value, year = (toYear(now()) - 1)) AS past_year,
round(this_year / past_year, 3) AS yoy
FROM events
WHERE key = '1'
GROUP BY key
Query id: 612ff273-101d-4e58-b576-08a914e30087
key|this_year |past_year |yoy |
---+-----------+-----------+-----+
1 |42081909896|21430127835|1.964|
1 row in set. Elapsed: 0.006 sec. Processed 16.38 thousand rows, 147.68 KB (2.96 million rows/s., 26.66 MB/s.)
WITH ( toYear(timestamp) ) AS year, ( toMonth(timestamp) ) AS month
SELECT
key,
sumIf(value, year = toYear(now()) and month = toMonth(now()) ) AS this_month,
sumIf(value, year = toYear(now()) and month = toMonth(now())-1 ) AS last_month,
sumIf(value, year = (toYear(now()) - 1) and month = toMonth(now()) ) AS past_ym,
round( (this_month- last_month) / last_month, 3)*100 AS mgr,
round( (this_month- past_ym) / past_ym, 3)*100 AS ygr
FROM events
WHERE key = '1'
GROUP BY key
key|this_month |last_month |past_ym |mgr |ygr |
---+-----------+-----------+-----------+-----+----+
1 |19761379814|22320530082|21430127835|-11.5|-7.8|
移动平均也是衍生度量之一,计算移动平均值函数为:
groupArrayMovingAvg(numbers_for_summing)
groupArrayMovingAvg(window_size)(numbers_for_summing)
该函数可以接受窗口大小作为参数,如果没有指定,则窗口大小等于记录数。
numbers_for_summing
— 数值类型表达式值window_size
— 计算窗口大小.返回值
举例
CREATE TABLE t
(
`int` UInt8,
`float` Float32,
`dec` Decimal32(2)
)
ENGINE = TinyLog
-- 插入示例数据
┌─int─┬─float─┬──dec─┐
│ 1 │ 1.1 │ 1.10 │
│ 2 │ 2.2 │ 2.20 │
│ 4 │ 4.4 │ 4.40 │
│ 7 │ 7.77 │ 7.77 │
└─────┴───────┴──────┘
-- 查询移动平均
SELECT
groupArrayMovingAvg(int) AS I,
groupArrayMovingAvg(float) AS F,
groupArrayMovingAvg(dec) AS D
FROM t
-- 返回结果
┌─I─────────┬─F───────────────────────────────────┬─D─────────────────────┐
│ [0,0,1,3] │ [0.275,0.82500005,1.9250001,3.8675] │ [0.27,0.82,1.92,3.86] │
└───────────┴─────────────────────────────────────┴───────────────────────┘
-- 窗口为2
SELECT
groupArrayMovingAvg(2)(int) AS I,
groupArrayMovingAvg(2)(float) AS F,
groupArrayMovingAvg(2)(dec) AS D
FROM t
-- 返回结果
┌─I─────────┬─F────────────────────────────────┬─D─────────────────────┐
│ [0,1,3,5] │ [0.55,1.6500001,3.3000002,6.085] │ [0.55,1.65,3.30,6.08] │
└───────────┴──────────────────────────────────┴───────────────────────┘
移动累积和,计算函数如下:
groupArrayMovingSum(numbers_for_summing)
groupArrayMovingSum(window_size)(numbers_for_summing)
The function can take the window size as a parameter. If left unspecified, the function takes the window size equal to the number of rows in the column.
同样有参数可指定窗口大小,如果没有指定,则窗口大小等于列的记录数。
参数
numbers_for_summing
— 数值数据类型表达式.window_size
— 窗口大小.返回值
举例
CREATE TABLE t
(
`int` UInt8,
`float` Float32,
`dec` Decimal32(2)
)
ENGINE = TinyLog
-- 插入示例数据
┌─int─┬─float─┬──dec─┐
│ 1 │ 1.1 │ 1.10 │
│ 2 │ 2.2 │ 2.20 │
│ 4 │ 4.4 │ 4.40 │
│ 7 │ 7.77 │ 7.77 │
└─────┴───────┴──────┘
## 不指定窗口大小
SELECT
groupArrayMovingSum(int) AS I,
groupArrayMovingSum(float) AS F,
groupArrayMovingSum(dec) AS D
FROM t
-- 返回结果
┌─I──────────┬─F───────────────────────────────┬─D──────────────────────┐
│ [1,3,7,14] │ [1.1,3.3000002,7.7000003,15.47] │ [1.10,3.30,7.70,15.47] │
└────────────┴─────────────────────────────────┴────────────────────────┘
-- 窗口大小为2
SELECT
groupArrayMovingSum(2)(int) AS I,
groupArrayMovingSum(2)(float) AS F,
groupArrayMovingSum(2)(dec) AS D
FROM t
-- 返回结果
┌─I──────────┬─F───────────────────────────────┬─D──────────────────────┐
│ [1,3,6,11] │ [1.1,3.3000002,6.6000004,12.17] │ [1.10,3.30,6.60,12.17] │
└────────────┴─────────────────────────────────┴────────────────────────┘
对时间序列数据计算指数移动平均,语法如下:
exponentialMovingAverage(x)(value, timeunit)
value 为对应timeunit的值. 半衰期x是指指数权重衰减1 / 2的时间滞后。该函数返回加权平均值:时间点越老,对应值的权重越小。
Arguments
value
— 为 Integer, Float 或 Decimal.timeunit
— 为 Integer, Float 或 Decimal. Timeunit 不是时间戳,是时间间隔索引 (秒), 它是时间间隔的索引,可以使用 intDiv函数计算。Parameters
x
— 半衰期. 类型为 Integer, Float 或 Decimal.返回值
返回值类型: Float64.
示例
输入示例数据:
┌──temperature─┬─timestamp──┐
│ 95 │ 1 │
│ 95 │ 2 │
│ 95 │ 3 │
│ 96 │ 4 │
│ 96 │ 5 │
│ 96 │ 6 │
│ 96 │ 7 │
│ 97 │ 8 │
│ 97 │ 9 │
│ 97 │ 10 │
│ 97 │ 11 │
│ 98 │ 12 │
│ 98 │ 13 │
│ 98 │ 14 │
│ 98 │ 15 │
│ 99 │ 16 │
│ 99 │ 17 │
│ 99 │ 18 │
│ 100 │ 19 │
│ 100 │ 20 │
└──────────────┴────────────┘
SELECT exponentialMovingAverage(5)(temperature, timestamp);
-- 返回结果
┌──exponentialMovingAverage(5)(temperature, timestamp)──┐
│ 92.25779635374204 │
└───────────────────────────────────────────────────────┘
参考官网文档:exponentialmovingaverage | ClickHouse Docs