一、常用日期函数
1. unix_timestamp:返回当前或指定时间的时间戳
select unix_timestamp();
select unix_timestamp("2020-10-28",'yyyy-MM-dd');
2. from_unixtime:将时间戳转为日期格式
select from_unixtime(1603843200);
3. current_date:当前日期
select current_date;
4. current_timestamp:当前的日期加时间
select current_timestamp;
5. to_date:抽取日期部分
select to_date('2020-10-28 12:12:12');
6. year:获取年
select year('2020-10-28 12:12:12');
7. month:获取月
select month('2020-10-28 12:12:12');
8. day:获取日
select day('2020-10-28 12:12:12');
9. hour:获取时
select hour('2020-10-28 12:13:14');
10. minute:获取分
select minute('2020-10-28 12:13:14');
11. second:获取秒
select second('2020-10-28 12:13:14');
12. weekofyear:当前时间是一年中的第几周
select weekofyear('2020-10-28 12:12:12');
13. dayofmonth:当前时间是一个月中的第几天
select dayofmonth('2020-10-28 12:12:12');
14. months_between: 两个日期间的月份
select months_between('2020-04-01','2020-10-28');
15. add_months:日期加减月
select add_months('2020-10-28',-3);
16. datediff:两个日期相差的天数
select datediff('2020-11-04','2020-10-28');
17. date_add:日期加天数
select date_add('2020-10-28',4);
18. date_sub:日期减天数
select date_sub('2020-10-28',-4);
19. last_day:日期的当月的最后一天
select last_day('2020-02-30');
20. date_format(): 格式化日期
select date_format('2020-10-28 12:12:12','yyyy/MM/dd HH:mm:ss');
二、常用取整函数
21. round: 四舍五入
select round(3.14);
select round(3.54);
22. ceil: 向上取整
select ceil(3.14);
select ceil(3.54);
floor: 向下取整
23. select floor(3.14);
select floor(3.54);
相除后取小数点两位
cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate
sum(if(action_id='cart_add',1,0))
三、常用字符串操作函数
24. upper: 转大写
select upper('low');
25. lower: 转小写
select lower('low');
26. length: 长度
select length("atguigu");
27. trim: 前后去空格
select trim(" atguigu ");
28. lpad: 向左补齐,到指定长度
select lpad('atguigu',9,'g');
29. rpad: 向右补齐,到指定长度
select rpad('atguigu',9,'g');
30. regexp_replace:使用正则表达式匹配目标字符串,匹配成功后替换!
SELECT regexp_replace('2020/10/25', '/', '-');
replace 替换
四、集合操作
31. size: 集合中元素的个数
select size(friends) from test3;
32. map_keys: 返回map中的key
select map_keys(children) from test3;
33. map_values: 返回map中的value
select map_values(children) from test3;
34. array_contains: 判断array中是否包含某个元素
select array_contains(friends,'bingbing') from test3;
sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count
if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理
35. sort_array: 将array中的元素排序
select sort_array(friends) from test3;
37、nvl 条件判断是否为空
nvl (isnll1,isnull2) 如果isnll1是null,就用null2
hive (gmall)> select nvl(1,0);1
hive (gmall)> select nvl(null,"hello");hello
38、coalesce等同于nvl功能
used_time为空第二个expire_time,第二个为空就用最后一个
coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99')
39、get_json_object 操作字符串
SELECT get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]',"$[0].age");
40、转成json字符串
str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=')
-- 将=连接起来,字符串按,分割开,最后以,分隔开,等号转:{“”:""}
41、sum
sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count
sum(if(appraise='1201',1,0))
sum(if(date_format(using_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_using_count,
42、last_value
首先,按照课程c_id进行分组,按照成绩s_core进行排序 按照 c_id分组,s_score从小到大进行排序的话,数据不会聚合丢失
select *, row_number() over( partition by c_id order by s_score) rank_show from score
43 、FIRST_VALUE()
接着查看FIRST_VALUE()和LAST_VALUE(),它们的字面意思很直观了,就是取首尾记录值。同样sum()也应该是取对应分组后的值
select * ,first_value(s_id) over (partition by c_id order by s_score) first_show from score;
其实可以这样去理解:last_value()默认统计范围是 rows between unbounded preceding and current row,也就是取当前行数据与当前行之前的数据的比较。
那么,如何像**first_value()**那样直接在每行数据中显示最后的那个数据呢?
在order by 条件的后面加上语句:rows between unbounded preceding and unbounded following
select * ,last_value(s_id) over (partition by c_id order by s_score rows between unbounded preceding and unbounded following
)last_show from score;
原文链接:https://blog.csdn.net/yilulvxing/article/details/86627969
总结
1:first_value()的结果容易理解,直接在结果的所有行记录中输出同一个满足条件的首个记录;
2:last_value()默认统计范围是 rows between unbounded preceding and current row,也就是取当前行数据与当前行之前的数据的比较,如果需要在结果的所有行记录中输出同一个满足条件的最后一个记录,在order by 条件的后面加上语句:rows between unbounded preceding and unbounded following。
3:sum( ) over(…):默认统计范围是 rows between unbounded preceding and current row,也就是取当前行数据与当前行之前的数据的求和,如果需要在结果的所有行记录中输出同一个满足条件的所有记录之和,在order by 条件的后面加上语句:rows between unbounded preceding and unbounded following。
select * ,sum(s_id) over (partition by c_id order by s_score)sum_show from score;
select * ,sum(s_id) over (partition by c_id order by s_score rows between unbounded preceding and unbounded following)sum_show from score;
44、上次购买时间和下次购买时间lag、lead
lag(orderdate ,1 ,'1970-01-01') over(partition by name order by orderdate) p_orderdate,
lead(orderdate ,1 ,'9999-01-01') over(partition by name order by orderdate) p_orderdate
链接表
full outer join、union all、join、left join 、union
t1 join t3 on t1.mid_id=t3.mid_id and t1.brand=t3.brand and t1.model=t3.model;
(列转行)collect_list / collect_set/ explode
Hive中collect相关的函数有collect_list和collect_set。
它们都是将分组中的某列转为一个数组返回,不同的是collect_list(列传行)不去重而collect_set去重。一般和group by一起使用
1 ["string1" ,"string2"]
explode(): 将数组或者map拆分成多行 炸开
行转列
1) 相关函数:
concat(order_status,'=',operate_time)
sa sb ==》 "sa=sb"
concat(user_id,'-',last_value(session_start_point,true) over(partition by user_id order by ts)) session_id
concat(): 字符串拼接
concat_ws(“,”,concat(order_status,'=',operate_time)) //将拼接的字符串一,隔开
`order_detail_stats` array sku_num:bigint,order_count:bigint,activity_reduce_amount:decimal(16,2),coupon_reduce_amount:decimal(16,2),original_amount:decimal(16,2),final_amount:decimal(16,2)>> COMMENT '下单明细统计' collect_set(named_struct('sku_id',sku_id, 'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats 五、多维分析 36. grouping sets:多维分析 create table testgrouping ( id int, name string, sex string, deptno int ) row format delimited fields terminated by ','; 1001,zhangsan,man,10 1002,xiaohua,female,10 1003,lisi,man,20 1004,xiaohong,female,20 需求: 统计每个部门各多少人, 男女各多少人, 每个部门中男女各多少人 select deptno, sex ,count(id) from testgrouping group by deptno,sex grouping sets( (deptno,sex), sex , deptno ) 5. case when then else end 1) 数据 +---------------+------------------+--------------+ | emp_sex.name | emp_sex.dept_id | emp_sex.sex | +---------------+------------------+--------------+ | 悟空 | A | 男 | | 大海 | A | 男 | | 宋宋 | B | 男 | | 凤姐 | A | 女 | | 婷姐 | B | 女 | | 婷婷 | B | 女 | +---------------+------------------+--------------+ 2) 需求 dept_Id 男 女 A 2 1 B 1 2 3) SQL select dept_Id, sum(case sex when '男' then 1 else 0 end ) man, sum(case sex when '女' then 1 else 0 end ) female from emp_sex group by dept_Id ; select dept_Id, sum(if(sex='男',1,0)) man, sum(if(sex='女',1,0)) female from emp_sex group by dept_Id 1. 行转列 1) 相关函数: concat(order_status,'=',operate_time) concat(): 字符串拼接 concat_ws(): 字符串拼接 , CONCAT_WS must be "string or array collect_set(): 去重汇总 collect_list(): 汇总 2) 数据 +-------------------+----------------------------+-------------------------+ | person_info.name | person_info.constellation | person_info.blood_type | +-------------------+----------------------------+-------------------------+ | 孙悟空 | 白羊座 | A | | 大海 | 射手座 | A | | 宋宋 | 白羊座 | B | | 猪八戒 | 白羊座 | A | | 凤姐 | 射手座 | A | | 苍老师 | 白羊座 | B | +-------------------+----------------------------+-------------------------+ 3)需求 射手座,A 大海|凤姐 白羊座,A 孙悟空|猪八戒 白羊座,B 宋宋|苍老师 4)SQL ① 将 constellation 和 blood_type 拼接到一起 select name, concat_ws(',',constellation,blood_type) c_b from person_info =>t1 ② 按照 c_b分组,在组内将name进行汇总 select t1.c_b, concat_ws('|',collect_set(t1.name)) names from (select name, concat_ws(',',constellation,blood_type) c_b from person_info)t1 group by t1.c_b ======================================================================= 1) 相关函数 explode(): 将数组或者map拆分成多行 LATERAL VIEW : 侧写表(虚拟表) 2) 数据 +-------------------+----------------------+ | movie_info.movie | movie_info.category | +-------------------+----------------------+ | 《疑犯追踪》 | 悬疑,动作,科幻,剧情 | | 《Lie to me》 | 悬疑,警匪,动作,心理,剧情 | | 《战狼2》 | 战争,动作,灾难 | +-------------------+----------------------+ 3) 结果 《疑犯追踪》 悬疑 《疑犯追踪》 动作 《疑犯追踪》 科幻 《疑犯追踪》 剧情 《Lie to me》 悬疑 《Lie to me》 警匪 《Lie to me》 动作 《Lie to me》 心理 《Lie to me》 剧情 《战狼2》 战争 《战狼2》 动作 《战狼2》 灾难 4) SQL select movie , category_name from movie_info LATERAL view explode(split(category,',')) movie_info_tmp as category_name ; ========================================================================= 3. 窗口函数 1) 数据 +----------------+---------------------+----------------+ | business.name | business.orderdate | business.cost | +----------------+---------------------+----------------+ | jack | 2017-01-01 | 10 | | tony | 2017-01-02 | 15 | | jack | 2017-02-03 | 23 | | tony | 2017-01-04 | 29 | | jack | 2017-01-05 | 46 | | jack | 2017-04-06 | 42 | | tony | 2017-01-07 | 50 | | jack | 2017-01-08 | 55 | | mart | 2017-04-08 | 62 | | mart | 2017-04-09 | 68 | | neil | 2017-05-10 | 12 | | mart | 2017-04-11 | 75 | | neil | 2017-06-12 | 80 | | mart | 2017-04-13 | 94 | +----------------+---------------------+----------------+ 2)需求 需求一: 查询在2017年4月份购买过的顾客及总人数 ① 查出2017年4月份购买过的顾客 select name, orderdate, cost from business where month(orderdate) = '4' select name, orderdate, cost from business where substring(orderdate,0,7) = '2017-04' =>t1 +-------+-------------+-------+ | name | orderdate | cost | +-------+-------------+-------+ | jack | 2017-04-06 | 42 | | mart | 2017-04-08 | 62 | | mart | 2017-04-09 | 68 | | mart | 2017-04-11 | 75 | | mart | 2017-04-13 | 94 | +-------+-------------+-------+ ② 按照name分组,求count select t1.name, count(t1.name) over() from (select name, orderdate, cost from business where substring(orderdate,0,7) = '2017-04')t1 group by t1.name ; 第二种方式: select distinct(t1.name) from (select name, orderdate, cost from business where substring(orderdate,0,7) = '2017-04')t1 +----------+ | t1.name | +----------+ | jack | | mart | +----------+ select t2.name, count(t2.name) over() from (select distinct(t1.name) from (select name, orderdate, cost from business where substring(orderdate,0,7) = '2017-04')t1) t2 +----------+-----------------+ | t2.name | count_window_0 | +----------+-----------------+ | mart | 2 | | jack | 2 | +----------+-----------------+ 需求二: 查询顾客的购买明细及所有顾客的月购买总额 select name, orderdate, cost , sum(cost) over(partition by substring(orderdate,0,7)) month_cost from business +-------+-------------+-------+-------------+ | name | orderdate | cost | month_cost | +-------+-------------+-------+-------------+ | jack | 2017-01-01 | 10 | 205 | | jack | 2017-01-08 | 55 | 205 | | tony | 2017-01-07 | 50 | 205 | | jack | 2017-01-05 | 46 | 205 | | tony | 2017-01-04 | 29 | 205 | | tony | 2017-01-02 | 15 | 205 | | jack | 2017-02-03 | 23 | 23 | | mart | 2017-04-13 | 94 | 341 | | jack | 2017-04-06 | 42 | 341 | | mart | 2017-04-11 | 75 | 341 | | mart | 2017-04-09 | 68 | 341 | | mart | 2017-04-08 | 62 | 341 | | neil | 2017-05-10 | 12 | 12 | | neil | 2017-06-12 | 80 | 80 | +-------+-------------+-------+-------------+ 需求二: 查询顾客的购买明细及每个顾客的月购买总额 select name, orderdate, cost, sum(cost) over(partition by name, substring(orderdate,0,7)) name_month_cost from business +-------+-------------+-------+------------------+ | name | orderdate | cost | name_month_cost | +-------+-------------+-------+------------------+ | jack | 2017-01-05 | 46 | 111 | | jack | 2017-01-08 | 55 | 111 | | jack | 2017-01-01 | 10 | 111 | | jack | 2017-02-03 | 23 | 23 | | jack | 2017-04-06 | 42 | 42 | | mart | 2017-04-13 | 94 | 299 | | mart | 2017-04-11 | 75 | 299 | | mart | 2017-04-09 | 68 | 299 | | mart | 2017-04-08 | 62 | 299 | | neil | 2017-05-10 | 12 | 12 | | neil | 2017-06-12 | 80 | 80 | | tony | 2017-01-04 | 29 | 94 | | tony | 2017-01-02 | 15 | 94 | | tony | 2017-01-07 | 50 | 94 | +-------+-------------+-------+------------------+ 需求三: 将每个顾客的cost按照日期进行累加 select name, orderdate, cost, sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and CURRENT ROW) lj from business select name, orderdate, cost, sum(cost) over(partition by name order by orderdate ) lj from business +-------+-------------+-------+------+ | name | orderdate | cost | lj | +-------+-------------+-------+------+ | jack | 2017-01-01 | 10 | 10 | | jack | 2017-01-05 | 46 | 56 | | jack | 2017-01-08 | 55 | 111 | | jack | 2017-02-03 | 23 | 134 | | jack | 2017-04-06 | 42 | 176 | | mart | 2017-04-08 | 62 | 62 | | mart | 2017-04-09 | 68 | 130 | | mart | 2017-04-11 | 75 | 205 | | mart | 2017-04-13 | 94 | 299 | | neil | 2017-05-10 | 12 | 12 | | neil | 2017-06-12 | 80 | 92 | | tony | 2017-01-02 | 15 | 15 | | tony | 2017-01-04 | 29 | 44 | | tony | 2017-01-07 | 50 | 94 | +-------+-------------+-------+------+ 需求三: 将所有顾客的cost按照日期进行累加 select name, orderdate, cost, sum(cost) over(order by orderdate ) lj from business +-------+-------------+-------+------+ | name | orderdate | cost | lj | +-------+-------------+-------+------+ | jack | 2017-01-01 | 10 | 10 | | tony | 2017-01-02 | 15 | 25 | | tony | 2017-01-04 | 29 | 54 | | jack | 2017-01-05 | 46 | 100 | | tony | 2017-01-07 | 50 | 150 | | jack | 2017-01-08 | 55 | 205 | | jack | 2017-02-03 | 23 | 228 | | jack | 2017-04-06 | 42 | 270 | | mart | 2017-04-08 | 62 | 332 | | mart | 2017-04-09 | 68 | 400 | | mart | 2017-04-11 | 75 | 475 | | mart | 2017-04-13 | 94 | 569 | | neil | 2017-05-10 | 12 | 581 | | neil | 2017-06-12 | 80 | 661 | +-------+-------------+-------+------+ 需求三: 求所有顾客的购买明细及按照日期进行排序后 求 所有顾客的cost 第一行 到 当前行 累加 所有顾客的cost 上一行 到 当前行 的累加和 所有顾客的cost 上一行 到 下一行 的累加和 所有顾客的cost 当前行 到 下一行 的累加和 所有顾客的cost 当前行 到 最后一行的累加和 select name, orderdate, cost, sum(cost) over(order by orderdate rows between UNBOUNDED PRECEDING and CURRENT ROW) f_c, sum(cost) over(order by orderdate rows between 1 PRECEDING and CURRENT ROW ) p_c, sum(cost) over(order by orderdate rows between 1 PRECEDING and 1 FOLLOWING ) p_n, sum(cost) over(order by orderdate rows between CURRENT ROW and 1 FOLLOWING ) c_n, sum(cost) over(order by orderdate rows between CURRENT ROW and UNBOUNDED FOLLOWING ) c_l from business +-------+-------------+-------+------+------+------+------+------+ | name | orderdate | cost | f_c | p_c | p_n | c_n | c_l | +-------+-------------+-------+------+------+------+------+------+ | jack | 2017-01-01 | 10 | 10 | 10 | 25 | 25 | 661 | | tony | 2017-01-02 | 15 | 25 | 25 | 54 | 44 | 651 | | tony | 2017-01-04 | 29 | 54 | 44 | 90 | 75 | 636 | | jack | 2017-01-05 | 46 | 100 | 75 | 125 | 96 | 607 | | tony | 2017-01-07 | 50 | 150 | 96 | 151 | 105 | 561 | | jack | 2017-01-08 | 55 | 205 | 105 | 128 | 78 | 511 | | jack | 2017-02-03 | 23 | 228 | 78 | 120 | 65 | 456 | | jack | 2017-04-06 | 42 | 270 | 65 | 127 | 104 | 433 | | mart | 2017-04-08 | 62 | 332 | 104 | 172 | 130 | 391 | | mart | 2017-04-09 | 68 | 400 | 130 | 205 | 143 | 329 | | mart | 2017-04-11 | 75 | 475 | 143 | 237 | 169 | 261 | | mart | 2017-04-13 | 94 | 569 | 169 | 181 | 106 | 186 | | neil | 2017-05-10 | 12 | 581 | 106 | 186 | 92 | 92 | | neil | 2017-06-12 | 80 | 661 | 92 | 92 | 80 | 80 | +-------+-------------+-------+------+------+------+------+------+ 需求四: 查询每个顾客上次的购买时间 及 下一次的购买时间 select name, cost, orderdate c_orderdate, lag(orderdate ,1 ,'1970-01-01') over(partition by name order by orderdate) p_orderdate, lead(orderdate ,1 ,'9999-01-01') over(partition by name order by orderdate) p_orderdate from business +-------+-------+--------------+--------------+--------------+ | name | cost | c_orderdate | p_orderdate | p_orderdate | +-------+-------+--------------+--------------+--------------+ | jack | 10 | 2017-01-01 | 1970-01-01 | 2017-01-05 | | jack | 46 | 2017-01-05 | 2017-01-01 | 2017-01-08 | | jack | 55 | 2017-01-08 | 2017-01-05 | 2017-02-03 | | jack | 23 | 2017-02-03 | 2017-01-08 | 2017-04-06 | | jack | 42 | 2017-04-06 | 2017-02-03 | 9999-01-01 | | mart | 62 | 2017-04-08 | 1970-01-01 | 2017-04-09 | | mart | 68 | 2017-04-09 | 2017-04-08 | 2017-04-11 | | mart | 75 | 2017-04-11 | 2017-04-09 | 2017-04-13 | | mart | 94 | 2017-04-13 | 2017-04-11 | 9999-01-01 | | neil | 12 | 2017-05-10 | 1970-01-01 | 2017-06-12 | | neil | 80 | 2017-06-12 | 2017-05-10 | 9999-01-01 | | tony | 15 | 2017-01-02 | 1970-01-01 | 2017-01-04 | | tony | 29 | 2017-01-04 | 2017-01-02 | 2017-01-07 | | tony | 50 | 2017-01-07 | 2017-01-04 | 9999-01-01 | +-------+-------+--------------+--------------+--------------+ 需求五: 查询前20%时间的订单信息 select t1.name, t1.orderdate, t1.cost , t1.gid from (select name, orderdate, cost, ntile(5) over(order by orderdate ) gid from business) t1 where t1.gid = 1 ; +----------+---------------+----------+---------+ | t1.name | t1.orderdate | t1.cost | t1.gid | +----------+---------------+----------+---------+ | jack | 2017-01-01 | 10 | 1 | | tony | 2017-01-02 | 15 | 1 | | tony | 2017-01-04 | 29 | 1 | +----------+---------------+----------+---------+ 总结: over(): 会为每条数据都开启一个窗口. 默认的窗口大小就是当前数据集的大小. over(partition by ....) : 会按照指定的字段进行分区, 将分区字段的值相同的数据划分到相同的区。 每个区中的每条数据都会开启一个窗口.每条数据的窗口大小默认为当前分区数据集的大小. over(order by ....) : 会在窗口中按照指定的字段对数据进行排序. 会为每条数据都开启一个窗口,默认的窗口大小为从数据集开始到当前行. over(partition by .... order by ....) :会按照指定的字段进行分区, 将分区字段的值相同的数据划分到相同的区, 在每个区中会按照指定的字段进行排序. 会为每条数据都开启一个窗口,默认的窗口大小为当前分区中从数据集开始到当前行. over(partition by ... order by ... rows between ... and ....) : 指定每条数据的窗口大小. 关键字: order by : 全局排序 或者 窗口函数中排序. distribute by : 分区 sort by : 区内排序 cluster by : 分区排序 partition by : 窗口函数中分区 partitioned by : 建表指定分区字段 clustered by : 建表指定分桶字段 4. 排名函数 1) 相关函数 rank() dense_rank() row_number() 2) 数据 +-------------+----------------+--------------+ | score.name | score.subject | score.score | +-------------+----------------+--------------+ | 孙悟空 | 语文 | 87 | | 孙悟空 | 数学 | 95 | | 孙悟空 | 英语 | 68 | | 大海 | 语文 | 94 | | 大海 | 数学 | 56 | | 大海 | 英语 | 84 | | 宋宋 | 语文 | 64 | | 宋宋 | 数学 | 86 | | 宋宋 | 英语 | 84 | | 婷婷 | 语文 | 65 | | 婷婷 | 数学 | 85 | | 婷婷 | 英语 | 78 | +-------------+----------------+--------------+ 3)需求: 按照学科进行排名 select name, subject, score, rank() over(partition by subject order by score desc ) rk, dense_rank() over(partition by subject order by score desc ) drk , row_number() over(partition by subject order by score desc ) rn from score +-------+----------+--------+-----+------+-----+ | name | subject | score | rk | drk | rn | +-------+----------+--------+-----+------+-----+ | 孙悟空 | 数学 | 95 | 1 | 1 | 1 | | 宋宋 | 数学 | 86 | 2 | 2 | 2 | | 婷婷 | 数学 | 85 | 3 | 3 | 3 | | 大海 | 数学 | 56 | 4 | 4 | 4 | | 宋宋 | 英语 | 84 | 1 | 1 | 1 | | 大海 | 英语 | 84 | 1 | 1 | 2 | | 婷婷 | 英语 | 78 | 3 | 2 | 3 | | 孙悟空 | 英语 | 68 | 4 | 3 | 4 | | 大海 | 语文 | 94 | 1 | 1 | 1 | | 孙悟空 | 语文 | 87 | 2 | 2 | 2 | | 婷婷 | 语文 | 65 | 3 | 3 | 3 | | 宋宋 | 语文 | 64 | 4 | 4 | 4 | +-------+----------+--------+-----+------+-----+ ============================================================== 4. 排名函数 1) 相关函数 rank() dense_rank() row_number() 2) 数据 +-------------+----------------+--------------+ | score.name | score.subject | score.score | +-------------+----------------+--------------+ | 孙悟空 | 语文 | 87 | | 孙悟空 | 数学 | 95 | | 孙悟空 | 英语 | 68 | | 大海 | 语文 | 94 | | 大海 | 数学 | 56 | | 大海 | 英语 | 84 | | 宋宋 | 语文 | 64 | | 宋宋 | 数学 | 86 | | 宋宋 | 英语 | 84 | | 婷婷 | 语文 | 65 | | 婷婷 | 数学 | 85 | | 婷婷 | 英语 | 78 | +-------------+----------------+--------------+ 3)需求: 按照学科进行排名 select name, subject, score, rank() over(partition by subject order by score desc ) rk, dense_rank() over(partition by subject order by score desc ) drk , row_number() over(partition by subject order by score desc ) rn from score +-------+----------+--------+-----+------+-----+ | name | subject | score | rk | drk | rn | +-------+----------+--------+-----+------+-----+ | 孙悟空 | 数学 | 95 | 1 | 1 | 1 | | 宋宋 | 数学 | 86 | 2 | 2 | 2 | | 婷婷 | 数学 | 85 | 3 | 3 | 3 | | 大海 | 数学 | 56 | 4 | 4 | 4 | | 宋宋 | 英语 | 84 | 1 | 1 | 1 | | 大海 | 英语 | 84 | 1 | 1 | 2 | | 婷婷 | 英语 | 78 | 3 | 2 | 3 | | 孙悟空 | 英语 | 68 | 4 | 3 | 4 | | 大海 | 语文 | 94 | 1 | 1 | 1 | | 孙悟空 | 语文 | 87 | 2 | 2 | 2 | | 婷婷 | 语文 | 65 | 3 | 3 | 3 | | 宋宋 | 语文 | 64 | 4 | 4 | 4 | +-------+----------+--------+-----+------+-----+ 用户在2017年,连续三天(或以上)的天数里, 思路二 :推荐 select t6.user_id,t6.data_dt,t7.low_carbon,t6.lx_day from ( -- 求连续n天 select t5.user_id,t5.data_dt,t5.lx_day from( -- 开窗 按照人和 lx_data 分区 求统计 select t4.user_id,t4.data_dt,count(t4.lx_data) over(partition by t4.user_id,t4.lx_data) lx_day from ( -- 计算日期和 rn的差值 select t3.user_id,t3.data_dt,date_sub(t3.data_dt,t3.rn) lx_data from ( -- 开窗 按人分区, 内按时间排序,求row_number select t2.user_id,t2.data_dt,row_number() over(partition by t2.user_id order by t2.data_dt) rn from ( -- 2 每个人每天的能量 select t1.user_id,t1.data_dt,sum(t1.low_carbon) day_sum from ( -- 日期转换 select user_id,date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd') data_dt,low_carbon from user_low_carbon ) t1 group by user_id,data_dt having day_sum>=100 ) t2 )t3 )t4 )t5 where t5.lx_day >=3 )t6 join (select user_id,date_format(regexp_replace(data_dt,'/','-'),'yyyy-MM-dd') data_dt,low_carbon from user_low_carbon) t7 on t6.user_id=t7.user_id and t6.data_dt=t7.data_dt;2. 列转行