Clickhouse(流量分析(一).漏斗分析案例)

1. 漏斗分析定义

神策用户分析模型——漏斗分析的使用方法

2. 有序漏斗windowFunnel

Clickhouse数据模型之有序漏斗分析

Hologres漏斗分析函数

Java UDF StarRocks Docs

windowFunnel(window, [mode])(timestamp, cond1, cond2, ..., condN)
参数

window — 滑动窗户的大小,单位是秒。
mode - 这是一个可选的参数。
‘strict’ - 当 ‘strict’ 设置时,windowFunnel()仅对唯一值应用匹配条件。
timestamp — 包含时间的列。 数据类型支持: 日期, 日期时间 和其他无符号整数类型(请注意,即使时间戳支持 UInt64 类型,它的值不能超过Int64最大值,即2^63-1)。
cond — 事件链的约束条件。 UInt8 类型。

如果数据在不同的完成点具有多个事件链,则该函数将仅输出最长链的大小

数据
CREATE TABLE nonodb.log_action_tf
( uid Int32,
event_type String,
eventTime DateTime )
ENGINE = MergeTree
PARTITION BY uid
ORDER BY (uid, eventTime)
SETTINGS index_granularity = 8192;



insert into nonodb.log_action_tf values(1,'浏览','2022-01-02 11:00:00');
insert into nonodb.log_action_tf values(1,'点击','2022-01-02 11:10:00');
insert into nonodb.log_action_tf values(1,'下单','2022-01-02 11:20:00');
insert into nonodb.log_action_tf values(1,'支付','2022-01-02 11:30:00');
insert into nonodb.log_action_tf values(2,'下单','2022-01-02 11:00:00');
insert into nonodb.log_action_tf values(2,'支付','2022-01-02 11:10:00');
insert into nonodb.log_action_tf values(1,'浏览','2022-01-02 11:00:00');
insert into nonodb.log_action_tf values(3,'浏览','2022-01-02 11:20:00');
insert into nonodb.log_action_tf values(3,'点击','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(4,'浏览','2022-01-02 11:50:00');
insert into nonodb.log_action_tf values(4,'点击','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(5,'浏览','2022-01-02 11:50:00');
insert into nonodb.log_action_tf values(5,'点击','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(5,'下单','2022-01-02 11:10:00');
insert into nonodb.log_action_tf values(6,'浏览','2022-01-02 11:50:00');
insert into nonodb.log_action_tf values(6,'点击','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(6,'下单','2022-01-02 12:10:00');
示例
SELECT
    uid,
    windowFunnel(1800)(eventTime, event_type = '浏览', event_type = '点击', event_type = '下单', event_type = '支付') AS level
FROM
(
    SELECT
        eventTime,
        event_type,
        uid
    FROM nonodb.log_action_tf
)
GROUP BY uid

user  level
4   2
3   1
2   0
5   2
1   4
6   3

ClickHouse数组函数

漏斗分析模型

分析"2022-01-02"这天 路径为“浏览->点击->下单->支付”的转化情况

select
        uid,
        arrayWithConstant(level, 1) levels,
        arrayJoin(arrayEnumerate( levels )) level_index
       from (
 SELECT
          uid,
          windowFunnel(1800)(
            eventTime,
            event_type = '浏览',
            event_type = '点击' ,
            event_type = '下单',
            event_type = '支付'
          ) AS level
        FROM (
          SELECT  eventTime,  event_type , uid
          FROM nonodb.log_action_tf
          WHERE toDate(eventTime) = '2022-01-02'
        ) t1
group by uid)

4   [1,1]   1
4   [1,1]   2
3   [1] 1
5   [1,1]   1
5   [1,1]   2
1   [1,1,1,1]   1
1   [1,1,1,1]   2
1   [1,1,1,1]   3
1   [1,1,1,1]   4
6   [1,1,1] 1
6   [1,1,1] 2
6   [1,1,1] 3

--- ---
select
       level_index,count(1) as ct
       from
           (
               select
        uid,
               arrayWithConstant(level, 1) levels,
               arrayJoin(arrayEnumerate(levels)) level_index
       from (
 SELECT
          uid,
          windowFunnel(1800)(
            eventTime,
            event_type = '浏览',
            event_type = '点击' ,
            event_type = '下单',
            event_type = '支付'
          ) AS level
        FROM (
          SELECT  eventTime,  event_type , uid
          FROM nonodb.log_action_tf
          WHERE toDate(eventTime) = '2022-01-02'
        ) t1
group by uid) t2
           )t3 group by level_index order by  level_index

--- ---
SELECT  transform(level_index,[1,2,3,4],['浏览','点击','下单','支付'],'其他') as event,
        count(1)
FROM (
select
        uid,
               arrayWithConstant(level, 1) levels,
               arrayJoin(arrayEnumerate(levels)) level_index
       from (
 SELECT
          uid,
          windowFunnel(1800)(
            eventTime,
            event_type = '浏览',
            event_type = '点击' ,
            event_type = '下单',
            event_type = '支付'
          ) AS level
        FROM (
          SELECT  eventTime,  event_type , uid
          FROM nonodb.log_action_tf
          WHERE toDate(eventTime) = '2022-01-02'
        ) t1
group by uid) t2 )
group by level_index
ORDER BY level_index ;

这个函数看起来很强大,但是少了点什么,我理解的流量分析滑动窗口不太一样

痛点:很显然,如果数据量超过100亿往上,clickhouse大概就拉了,比较好的方法还是结合bitmap进行编码,这里有篇文章可以参考一下的

每天数百亿用户行为数据,美团点评怎么实现秒级转化分析?

转化率计算

neighbor
uniqCombined | ClickHouse Docs
housepower/olap2018: 易观第二届OLAP漏斗算法大赛

select
    level,
    uniqCombined(uid) AS pv,
    neighbor(pv, -1) AS prev_pv,
    if (prev_pv = 0, -1, round(pv / prev_pv * 100, 3)) AS rate
        from (
SELECT
          uid,
          windowFunnel(1800)(
            eventTime,
            event_type = '浏览',
            event_type = '点击' ,
            event_type = '下单',
            event_type = '支付'
          ) AS level
        FROM (
          SELECT  eventTime,  event_type , uid
          FROM nonodb.log_action_tf
          WHERE toDate(eventTime) = '2022-01-02'
        ) t1
group by uid ) WHERE level > 0  GROUP BY level order by  level ;

上面这种是这样的,如果一个uid路径是4层,那么他可能走了第一层,着看产品的口径了,如果只算走了第一步的是1,走了四步的只算作4,那么就是上面这种口径

select
    level,
    sum(uid) AS pv,
    neighbor(pv, 1) AS prev_pv,
    if (prev_pv = 0, -1, round(pv / prev_pv * 100, 3)) AS rate
        from (
select
       level_index as level,count(1) as uid
       from
           (
               select
        uid,
               arrayWithConstant(level, 1) levels,
               arrayJoin(arrayEnumerate(levels)) level_index
       from (
 SELECT
          uid,
          windowFunnel(1800)(
            eventTime,
            event_type = '浏览',
            event_type = '点击' ,
            event_type = '下单',
            event_type = '支付'
          ) AS level
        FROM (
          SELECT  eventTime,  event_type , uid
          FROM nonodb.log_action_tf
          WHERE toDate(eventTime) = '2022-01-02'
        ) t1
group by uid) t2
           )t3 group by level_index order by  level_index)WHERE level > 0  GROUP BY level order by  level ;

3. 无序漏斗分析

groupArray | ClickHouse Docs
Array Functions | ClickHouse Docs
Clickhouse中的Array类型

案例:
select  groupArray(num)
from (
     select 1 as num union all  select  2 union all  select  3
         )

select  arrayCount(x -> x=1,[1,1,2,2] )

SELECT day           AS day,
       sum(level1_pv) AS level1_pv,
       sum(level2_pv) AS level2_pv,
       sum(level1_uv) as level1_uv,
       sum(level2_uv) as level2_uv
from (
select toDate(eventTime) as day,
       uid,
       groupArray(event_type) as events,
       arrayCount(x-> x = '浏览', events)  as level1_pv,
       if(has(events, '浏览'), arrayCount(x-> x = '点击', events),0) as level2_pv,
       hasAll(events, ['浏览'])  as level1_uv,
       hasAll(events, ['浏览','点击'])  as level2_uv
from nonodb.log_action_tf
where toDate(eventTime) >= '2021-01-01'
group by uid,toDate(eventTime) ) group by day order by  day;

你可能感兴趣的:(Clickhouse(流量分析(一).漏斗分析案例))