SQL实现占比2种语法(时间函数升级版)

之前博文中写过一篇

《SQL实现占比、同比、环比指标分析》

其中列举了mysql和oracle实现占比的两种方式,分别使用on1=1和cross join 实现笛卡尔积。
基本语法如下

SELECT
  `status`,
  number,
  concat(round(number / total * 100.00, 2), '%') percent
FROM
  (
    SELECT
      *
    FROM
      (
        SELECT
          `status`,
          COUNT(1) number
        FROM
          `user_tasks`
        GROUP BY
          `status`
      ) t1
      INNER JOIN(
        SELECT
          COUNT(1) total
        FROM
          `user_tasks`
      ) t2 ON 1 = 1

基本这种操作可以应对大部分的占比求值, 但是当使用时间进行分组求占比的时候就需要注意了

例如下面的例子

创建数据库并插入数据

CREATE TABLE `order` (
  `order_id` int(11) NOT NULL,
  `order_time` datetime DEFAULT NULL,
  `order_num` int(11) DEFAULT NULL,
  PRIMARY KEY (`order_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

插入数据

INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (1, '2019-01-02 15:02:42', 100);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (2, '2019-01-24 15:03:18', 200);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (3, '2018-01-04 15:03:37', 50);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (4, '2018-01-26 15:12:12', 120);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (5, '2019-02-01 15:12:48', 300);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (6, '2018-02-20 15:12:58', 180);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (7, '2019-03-12 15:13:08', 260);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (8, '2018-03-22 15:13:14', 220);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (9, '2019-04-17 15:13:27', 350);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (10, '2018-04-19 15:13:59', 280);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (11, '2019-04-17 15:21:45', 260);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (12, '2019-05-21 15:21:54', 200);
INSERT INTO `order`(`order_id`, `order_time`, `order_num`) VALUES (13, '2018-05-10 15:22:03', 220);

如下图,该表记录了2018年1-5月和209年1-5月的订单量
SQL实现占比2种语法(时间函数升级版)_第1张图片

2.误区:没有进行对月和年分别分组

这时候我们发现,在做汇总的时候月份需要按照月份,年份需要按照年份分组,这时候我们进行笛卡尔积进行查看

3.误区:笛卡尔积错位

SELECT
	* 
FROM
	( SELECT DATE_FORMAT( order_time, '%Y-%m' ) AS MONTH, 
	sum( order_num ) AS number FROM `order` 
	GROUP BY MONTH ) t1
	JOIN 
	( SELECT DATE_FORMAT( order_time, '%Y' ) AS YEAR,
	sum( order_num ) AS total FROM `order` 
	GROUP BY YEAR ) t2 
	ON 1 = 1

这时候我们查看一下
SQL实现占比2种语法(时间函数升级版)_第2张图片
total和number的汇总值是没问题对的,一个是按照月份一个是按照年份,但是在进行笛卡尔积关联的时候错位了

这是因为我们在拼接的时候少了拼接条件

4.误区:添加条件失败

SELECT
	* 
FROM
	( SELECT DATE_FORMAT( order_time, '%Y-%m' ) AS MONTH, sum( order_num ) AS number FROM `order` 
	GROUP BY MONTH ) t1
	JOIN 
	( SELECT DATE_FORMAT( order_time, '%Y' ) AS YEAR,sum( order_num ) AS total FROM `order` 
	GROUP BY YEAR ) t2 
	ON 1 = 1 AND date_format( t1.MONTH, '%Y' ) = t2.YEAR 

这时候我们加上条件,让年份等于年份,结果却为空值,这是为什么呢?

SQL实现占比2种语法(时间函数升级版)_第3张图片
不如我们看一下 date_format( t1.MONTH, ‘%Y’ ),这几个值哪里有问题,为什么不能连接

SELECT
	t1.month,date_format( t1.MONTH, '%Y' ),t2.YEAR 
FROM
	( SELECT DATE_FORMAT( order_time, '%Y-%m' ) AS MONTH, 
	sum( order_num ) AS number FROM `order` 
	GROUP BY MONTH ) t1
	JOIN 
	( SELECT DATE_FORMAT( order_time, '%Y' ) AS YEAR,
	sum( order_num ) AS total FROM `order` 
	GROUP BY YEAR ) t2 
	ON 1 = 1 

SQL实现占比2种语法(时间函数升级版)_第4张图片
所以关键是 date_format( t1.MONTH, ‘%Y’ )有问题,

这是因为在时间格式化成月份之后,再次进行格式化的时候不能识别出这是个时间格式,所以可以进行字符串拼接,根据第一次格式化出来的格式,在月份后面添加上相同格式的日期

例如 date_format( concat( t1.MONTH, ‘-01’ ), ‘%Y’ )就可以了,这样就可以识别出时间格式了,
正确的语法应为

SELECT 
	MONTH,
	YEAR,
	number,
	concat( round( number / total * 100.00, 2 ), '%' ) percent 
FROM
	(
SELECT
	* 
FROM
	( SELECT DATE_FORMAT( order_time, '%Y-%m' ) AS MONTH, 
	sum( order_num ) AS number FROM `order` 
	GROUP BY MONTH ) t1
	JOIN 
	( SELECT DATE_FORMAT( order_time, '%Y' ) AS YEAR,
	sum( order_num ) AS total FROM `order` 
	GROUP BY YEAR ) t2 
	ON 1 = 1
	AND date_format( concat( t1.MONTH, '-01' ), '%Y' ) = t2.YEAR 
	) t3;

执行结果如下
SQL实现占比2种语法(时间函数升级版)_第5张图片
此时年份和月份是一一对应的。

写法二

SELECT
	b.MONTH AS YEAR,
	a.MONTH,
	a.num / b.num AS 完成率
FROM
	( SELECT sum( order_num ) num, LEFT ( order_time, 7 ) MONTH FROM `order` GROUP BY LEFT ( order_time, 7 ) ) a,
	( SELECT sum( order_num ) num, LEFT ( order_time, 4 ) MONTH FROM `order` GROUP BY LEFT ( order_time, 4 ) ) b 
WHERE
	LEFT ( a.MONTH, 4 ) = b.MONTH 
GROUP BY
a.MONTH

效果是一样的

hive语法

SELECT 
	month_order,
	year_order,
	number,
	concat( round( number / total * 100.00, 2 ), '%' ) percent 
FROM
	(
SELECT
	* 
FROM
	( SELECT substr( order_time, 1,7 ) AS month_order, sum( order_num ) AS number FROM `order` 
	GROUP BY substr( order_time, 1,7 ) ) t1
	JOIN 
	( SELECT DATE_FORMAT( order_time, 'Y' ) AS year_order,sum( order_num ) AS total FROM `order` 
	GROUP BY  DATE_FORMAT( order_time, 'Y' )) t2 
	ON 1 = 1
	AND date_format( concat( t1.month_order, '-01' ), 'Y' ) = t2.year_order 
	) t3;
	

你可能感兴趣的:(MySQL)