Hive--开窗函数--窗口位置函数:LAG、LEAD、FIRST_VALUE、LAST_VALUE

窗口位置函数

  • 窗口函数:LAG、LEAD、FIRST_VALUE、LAST_VALUE
    • 总结
    • FIRST_VALUE
    • LAST_VALUE
    • LAG
    • LEAD

窗口函数:LAG、LEAD、FIRST_VALUE、LAST_VALUE

总结

  • 不指定窗口时,不排序默认第一行到最后一行,排序默认第一行到当前行

  • 指定窗口时 --rows between 起始位置 and 结束位置

    • N preceding:往前多少行
    • N following:往后多少行
    • current row:当前行
    • unbounded:起点或者终点,没有边界
    • unbounded preceding 表示从前面的起点
    • unbounded following:表示到后面的终点
  • FIRST_VALUE

    • 功能:取每个分区内某列的第一个值
    • 语法:FIRST_VALUE(col,true/false) over (partition by col1 order by col2)
    • 第二个参数为true,跳过空值(默认为false)
  • LAST_VALUE

    • 功能:取每个分区内某列的最后一个值
    • 语法:LAST_VALUE(col,true/false) over (partition by col1 order by col2)
    • 第二个参数为true,跳过空值(默认为false)
  • LAG

    • 功能:取每个分区内某列的前面第N个值
    • 语法:LAG(col,N,defaultValue) over (partition by col1 order by col2)
  • LEAD

    • 功能:取每个分区内某列的后面第N个值
    • 语法:LEAD(col,N,defaultValue) over (partition by col1 order by col2)

FIRST_VALUE

  • 功能:取每个分区内某列的第一个值
  • 语法:FIRST_VALUE(col) over (partition by col1 order by col2)
  • 示例:取每个部门薪资最高的员工姓名
select
  empno,
  ename,
  salary,
  deptno,
  FIRST_VALUE(ename) over (partition by deptno order by salary desc ) as first
from
  db_emp.tb_emp;
  
  
+--------+---------+---------+---------+---------+--+
| empno  |  ename  | salary  | deptno  | maxsal  |
+--------+---------+---------+---------+---------+--+
| 7839   | KING    | 5000.0  | 10      | 5000.0  |
| 7782   | CLARK   | 2450.0  | 10      | 5000.0  |
| 7934   | MILLER  | 1300.0  | 10      | 5000.0  |
| 7788   | SCOTT   | 3000.0  | 20      | 3000.0  |
| 7902   | FORD    | 3000.0  | 20      | 3000.0  |
| 7566   | JONES   | 2975.0  | 20      | 3000.0  |
| 7876   | ADAMS   | 1100.0  | 20      | 3000.0  |
| 7369   | SMITH   | 800.0   | 20      | 3000.0  |
| 7698   | BLAKE   | 2850.0  | 30      | 2850.0  |
| 7499   | ALLEN   | 1600.0  | 30      | 2850.0  |
| 7844   | TURNER  | 1500.0  | 30      | 2850.0  |
| 7654   | MARTIN  | 1250.0  | 30      | 2850.0  |
| 7521   | WARD    | 1250.0  | 30      | 2850.0  |
| 7900   | JAMES   | 950.0   | 30      | 2850.0  |
+--------+---------+---------+---------+---------+--+

LAST_VALUE

  • 功能:取每个分区内某列的最后一个值
  • 语法:LAST_VALUE() over (partition by col1 order by col2)
  • 注意:默认窗口是从第一条到当前条
  • 示例:取每个部门薪资最低的员工编号
select
  empno,
  ename,
  salary,
  deptno,
  LAST_VALUE(empno) over (partition by deptno order by salary desc) as last
from
  db_emp.tb_emp; 
  
 --上面的语句实现不了,默认窗口是从每个分区的第一行到当前行

select
  empno,
  ename,
  salary,
  deptno,
  LAST_VALUE(empno) over (partition by deptno order by salary desc rows between unbounded preceding and unbounded following) as last
from
  db_emp.tb_emp;

+--------+---------+---------+---------+-------+--+
| empno  |  ename  | salary  | deptno  | last  |
+--------+---------+---------+---------+-------+--+
| 7839   | KING    | 5000.0  | 10      | 7934  |
| 7782   | CLARK   | 2450.0  | 10      | 7934  |
| 7934   | MILLER  | 1300.0  | 10      | 7934  |
| 7788   | SCOTT   | 3000.0  | 20      | 7369  |
| 7902   | FORD    | 3000.0  | 20      | 7369  |
| 7566   | JONES   | 2975.0  | 20      | 7369  |
| 7876   | ADAMS   | 1100.0  | 20      | 7369  |
| 7369   | SMITH   | 800.0   | 20      | 7369  |
| 7698   | BLAKE   | 2850.0  | 30      | 7900  |
| 7499   | ALLEN   | 1600.0  | 30      | 7900  |
| 7844   | TURNER  | 1500.0  | 30      | 7900  |
| 7654   | MARTIN  | 1250.0  | 30      | 7900  |
| 7521   | WARD    | 1250.0  | 30      | 7900  |
| 7900   | JAMES   | 950.0   | 30      | 7900  |
+--------+---------+---------+---------+-------+--+

LAG

  • 功能:取每个分区内某列的前N个值
  • 语法:LAG(col,N,defaultValue) over (partition by col1 order by col2)
  • 示例:
select
  empno,
  ename,
  salary,
  deptno,
  LAG(salary,1,0) over (partition by deptno order by salary) as lagvalue
from
  db_emp.tb_emp;

+--------+---------+---------+---------+-----------+--+
| empno  |  ename  | salary  | deptno  | lagvalue  |
+--------+---------+---------+---------+-----------+--+
| 7934   | MILLER  | 1300.0  | 10      | 0.0       |
| 7782   | CLARK   | 2450.0  | 10      | 1300.0    |
| 7839   | KING    | 5000.0  | 10      | 2450.0    |
| 7369   | SMITH   | 800.0   | 20      | 0.0       |
| 7876   | ADAMS   | 1100.0  | 20      | 800.0     |
| 7566   | JONES   | 2975.0  | 20      | 1100.0    |
| 7788   | SCOTT   | 3000.0  | 20      | 2975.0    |
| 7902   | FORD    | 3000.0  | 20      | 3000.0    |
| 7900   | JAMES   | 950.0   | 30      | 0.0       |
| 7654   | MARTIN  | 1250.0  | 30      | 950.0     |
| 7521   | WARD    | 1250.0  | 30      | 1250.0    |
| 7844   | TURNER  | 1500.0  | 30      | 1250.0    |
| 7499   | ALLEN   | 1600.0  | 30      | 1500.0    |
| 7698   | BLAKE   | 2850.0  | 30      | 1600.0    |
+--------+---------+---------+---------+-----------+--+

LEAD

  • 功能:取每个分区内某列的后N个值
  • 语法:LEAD(col,N,defaultValue) over (partition by col1 order by col2)
  • 示例:
select
  empno,
  ename,
  salary,
  deptno,
  LEAD(salary,1,0) over (partition by deptno order by salary) as leadvalue
from
  db_emp.tb_emp;
  
+--------+---------+---------+---------+------------+--+
| empno  |  ename  | salary  | deptno  | leadvalue  |
+--------+---------+---------+---------+------------+--+
| 7934   | MILLER  | 1300.0  | 10      | 2450.0     |
| 7782   | CLARK   | 2450.0  | 10      | 5000.0     |
| 7839   | KING    | 5000.0  | 10      | 0.0        |
| 7369   | SMITH   | 800.0   | 20      | 1100.0     |
| 7876   | ADAMS   | 1100.0  | 20      | 2975.0     |
| 7566   | JONES   | 2975.0  | 20      | 3000.0     |
| 7788   | SCOTT   | 3000.0  | 20      | 3000.0     |
| 7902   | FORD    | 3000.0  | 20      | 0.0        |
| 7900   | JAMES   | 950.0   | 30      | 1250.0     |
| 7654   | MARTIN  | 1250.0  | 30      | 1250.0     |
| 7521   | WARD    | 1250.0  | 30      | 1500.0     |
| 7844   | TURNER  | 1500.0  | 30      | 1600.0     |
| 7499   | ALLEN   | 1600.0  | 30      | 2850.0     |
| 7698   | BLAKE   | 2850.0  | 30      | 0.0        |
+--------+---------+---------+---------+------------+--+

你可能感兴趣的:(#,Hive函数,hive)