Hive 练习 简单任务处理

1、2018年4月份的用户数、订单量、销量、GMV (不局限与这些统计量,你也可以自己想一些)

-- -- -- 2018年4月份的用户数量
select
	count(a.user_id) as user_nums
from
	(
		select
			user_id
		from
			app_jypt_m04_ord_det_di
		where
			dt >= '2018-04-01'
			and sale_ord_dt <= '2018-04-30'
			and sale_ord_dt >= '2018-04-01'
		group by
			user_id
	)
	a;
-- 2018年4月份的订单量
select
	count(a.sale_ord_id) as sale_nums
from
	(
		select
			sale_ord_id
		from
			app_jypt_m04_ord_det_di
		where
			dt >= '2018-04-01'
			and sale_ord_dt <= '2018-04-30'
			and sale_ord_dt >= '2018-04-01'
		group by
			sale_ord_id
	)
	a;
-- -- 2018年4月份的销量
select
	sum(COALESCE(sale_qtty, 0)) as xiaoliang
from
	app_jypt_m04_ord_det_di
where
	dt >= '2018-04-01'
	and sale_ord_dt <= '2018-04-30'
	and sale_ord_dt >= '2018-04-01';
-- -- -- 2018年4月份的销售额GMV
-- user_payable_pay_amount 用户应付金额
select
	sum(user_payable_pay_amount) as xiaoshoujine
from
	app_jypt_m04_ord_det_di
where
	dt >= '2018-04-01'
	and sale_ord_dt <= '2018-04-30'
	and sale_ord_dt >= '2018-04-01';

  

 

PS: 

  • 订单数就是卖了几单 ;
  • 销量就是卖了多少件,一个订单中可能卖出一件或多件;
  • GMV: Gross Merchandise Volume,是成交总额(一定时间段内)的意思。
  • 在电商网站定义里面是网站成交金额。这个实际指的是拍下订单金额, 包含付款和未付款的部分。

 

2、上述这些变化量相对3月份的变化


3、计算2018年4月1号的新用户数量(之前半年未购买的用户为新用户)

 

-- 计算2018年4月1号的新用户数量(之前半年未购买的用户为新用户)
-- 首先找出4月1号的用户的xxx,然后统计半年内有过购买记录的用户yyy。


-- select distinct user_id as xxx from gdm_m04_ord_det_sum where dt>='2018-04-01' and sale_ord_dt='2018-04-01';
-- select distinct user_id as yyy from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01';

-- 用xxx-yyy,然后count()计算数量;



-- 两种方法,一种用not in ,一种用not exists

-- not in 方法
select distinct user_id from gdm_m04_ord_det_sum 
where user_id not in (select distinct user_id from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01');



-- not exists 方法
select distinct user_id from gdm_m04_ord_det_sum where dt>='2018-04-01' and sale_ord_dt='2018-04-01' where not exists (select distinct user_id from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01' where gdm_m04_ord_det_sum.user_id=gdm_m04_ord_det_sum.user_id);



-- 另一种 left outer join 这样效率更高 语法有问题??

select distinct user_id from gdm_m04_ord_det_sum where dt>='2018-04-01' and sale_ord_dt='2018-04-01' a left outer join (select distinct user_id from gdm_m04_ord_det_sum where dt>='2017-10-01' and sale_ord_dt<='2018-03-31' and sale_ord_dt>='2017-10-01' b) on a.user_id=b.user_id where b.user_id is null;

 

正确方法:

select
	count(a.id1) as user_new_nums
from
	(
		select distinct
			user_id as id1
		from
			app_jypt_m04_ord_det_di
		where
			dt >= '2018-04-01'
			and sale_ord_dt = '2018-04-01'
	)
	a
left outer join
	(
		select distinct
			user_id as id2
		from
			app_jypt_m04_ord_det_di
		where
			dt >= '2017-10-01'
			and sale_ord_dt <= '2018-03-31'
			and sale_ord_dt >= '2017-10-01'
	)
	b
on
	a.id1 = b.id2
where
	b.id2 is null;

  

你可能感兴趣的:(Hive 练习 简单任务处理)