SQL-KPI的线上分析处理(OLAP)

案例 MovieNow

一家虚构的在线影视租赁公司 MovieNow,它拥有电影信息(类型和主演)以及(用户信息,用户评分)。它需要决定:投资于哪些新电影?

BI(商业智能目标)

  • 运营决策的数据
    • 根据演员的流行度,决定投资哪一部电影。
    • 根据上月的收入,辅助决定,短期的投资预算
  • 策略决策的数据
    • 在哪国更受欢迎,用来决定市场扩张
    • 长线的投资及收入

KPI(关键绩效指标)

  • 租赁总收入
  • 用户满意度:所有电影的平均评分
  • 活跃用户总数:用户参与度

OLAP(联机分析处理)

联机分析处理(英语:Online analytical processing),简称OLAP (/ˈoʊlæp/),是计算机技术中快速解决多维分析问题(英语:multi-dimensional analytical)(MDA)的一种方法。[1] OLAP是更广泛的商业智能范畴的一部分,它还包括关系数据库、报告编写和数据挖掘。[2] OLAP的典型应用包括销售业务报告(英语:business reporting)、市场营销、管理报告、业务流程管理(BPM)[3]、预算和预测、财务报表以及类似领域,新的应用正在出现,如农业。[4] 术语“OLAP”是对传统数据库术语“联机事务处理”(OLTP)稍微修改而成的。[5]

下面主要是上卷(roll-up)和切片(slicing)的例子。

上卷:汇集所有的电影租赁数据、用户评分,查询该租赁网站用户偏好的趋势。
切片:根据不同维度,国家、性别对演员的评分进行筛选查看。

ROLLUP运算符

SELECT 
    c.country, 
    m.genre, 
    AVG(r.rating) AS avg_rating, 
    COUNT(*) AS num_rating
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY ROLLUP(c.country, m.genre)
ORDER BY c.country, m.genre;

按照租赁用户所属国家、电影的类型来查看平均评分及评分数目。

相比与 GOURP BY , ROLLUP 包含了所有电影的平均评分,不同国家不限类型电影的平均评分,不同类型电影的平均评分。

CUBE运算符

SELECT 
    c.country, 
    m.genre, 
    AVG(r.rating) AS avg_rating 
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY CUBE(country, genre);

按照租赁用户所属国家、电影的类型来查看平均评分及评分数目。

ROLLUP 结果是 88条结果,CUBE 得到 96 条结果,多出来的结果,是因为 CUBE 生成所选列的所有合计组合。而 ROLLUP 是生成所选列的层级聚合结果。

ROLLUP (YEAR, MONTH, DAY)

With a ROLLUP, it will have the following outputs:

YEAR, MONTH, DAY
YEAR, MONTH
YEAR
()
With CUBE, it will have the following:

YEAR, MONTH, DAY
YEAR, MONTH
YEAR, DAY
YEAR
MONTH, DAY
MONTH
DAY
()

GROUPING SET运算符

SELECT 
    nationality, 
    gender, 
    COUNT(*) 
FROM actors
GROUP BY GROUPING SETS ((nationality), (gender), ()); 

GROUP SET 最灵活

可以返回不同的 GROUP BY 检索的集合

完整BI例子

问题: 投资哪些新电影?需要考虑,新电影比旧电影更贵。

  • 用户对新旧电影的评分差异?
  • 不同国家用户之间,对新旧电影的评分,有差异吗?

表联结(Join data)

联结三张表,分别是 renting/customers/movies

SELECT *
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id;

选择相关记录(Select relevant records)

SELECT *
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN (
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(rating) >= 4)
AND r.date_renting >= '2018-04-01';

聚集数据(Aggregation)

SELECT c.country,
m.year_of_release,
COUNT(*) AS n_rentals,
COUNT(DISTINCT r.movie_id) AS n_movies,
AVG(rating) AS avg_rating
FROM renting AS r
LEFT JOIN customers AS c
ON c.customer_id = r.customer_id
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN (
SELECT movie_id
FROM renting
GROUP BY movie_id
HAVING COUNT(rating) >= 4)
AND r.date_renting >= '2018-04-01'
GROUP BY ROLLUP (m.year_of_release, c.country)
ORDER BY c.country, m.year_of_release;

按照不同的年份、国家来查看租赁次数、电影总数和平均评分

SELECT genre,
       AVG(rating) AS avg_rating,
       COUNT(rating) AS n_rating,
       COUNT(*) AS n_rentals,     
       COUNT(DISTINCT m.movie_id) AS n_movies 
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN ( 
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(rating) >= 3 )
AND r.date_renting >= '2018-01-01'
GROUP BY genre
ORDER BY avg_rating DESC; 

筛选条件是 2018 年以后的租赁记录,评分超过 3 次的电影。
根据类型 genre 分组,按照平均评分降序排序

SELECT a.nationality,
       a.gender,
       AVG(r.rating) AS avg_rating,
       COUNT(r.rating) AS n_rating,
       COUNT(*) AS n_rentals,
       COUNT(DISTINCT a.actor_id) AS n_actors
FROM renting AS r
LEFT JOIN actsin AS ai
ON ai.movie_id = r.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id
WHERE r.movie_id IN ( 
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(rating) >= 4)
AND r.date_renting >= '2018-04-01'
GROUP BY CUBE(nationality, gender); 

哪些国家/性别的演员更受欢迎?
根据国籍、性别,查看演员参演的电影之中,所获得的平均评分、租赁字数、演员数目等等。

参考资料:

DataCamp
SQL中ROLLUP、CUBE的用法和区别
understanding-the-differences-between-cube-and-rollup

你可能感兴趣的:(SQL-KPI的线上分析处理(OLAP))