lag与lead函数是跟偏移量相关的两个分析函数,通过这两个函数可以在一次查询中取出同一字段的前N行的数据(lag)和后N行的数据(lead)作为独立的列,从而更方便地进行进行数据过滤。这种操作可以代替表的自联接,并且LAG和LEAD有更高的效率。
over()表示 lag()与lead()操作的数据都在over()的范围内,里面可以使用partition by 语句(用于分组) order by 语句(用于排序)。
partition by a order by b表示以a字段进行分组,再以b字段进行排序,对数据进行查询。
例如:lead(field, num, defaultvalue) field需要查找的字段,num往后查找的num行的数据,defaultvalue没有符合条件的默认值。
示例:
账号 |
逾期日期 |
逾期标志 |
acc01 |
20180101 |
Y |
acc01 |
20180201 |
Y |
acc01 |
20180301 |
Y |
acc01 |
20180401 |
Y |
acc01 |
20180501 |
N |
acc01 |
20180601 |
N |
acc01 |
20180701 |
Y |
acc01 |
20180801 |
Y |
acc01 |
20180901 |
N |
acc01 |
20181001 |
Y |
acc01 |
20181101 |
N |
acc01 |
20181201 |
N |
acc02 |
20180115 |
Y |
acc02 |
20180215 |
N |
acc02 |
20180315 |
Y |
acc02 |
20180415 |
Y |
acc02 |
20180515 |
N |
acc02 |
20180615 |
Y |
acc02 |
20180715 |
Y |
acc02 |
20180815 |
N |
acc02 |
20180915 |
N |
acc02 |
20181015 |
Y |
acc02 |
20181115 |
Y |
acc02 |
20181215 |
Y |
1.
用leg()获取对应上一条数据,使用"ROW_FIRST"标记首位行
用lead()获取对应上一条数据,使用"ROW_LAST"标记尾行
2.
当本期为首行时,如果本期逾期标志=Y,连续逾期标志=START_OVER,否则不标记
当本期为尾行时,如果本期逾期标志=Y,上期逾期标志=Y,连续逾期标志=END_OVER
上期逾期标志=N,连续逾期标志=ONE_OVER
如果本期逾期标志=N,连续逾期标志=不标记
当本期为首行,同时也是尾行时,连续逾期标志=ONE_OVER
当本期为中间行时,如果本期逾期标志=Y,上期逾期标志=N,连续逾期标志=START_OVER
如果本期逾期标志=N,上期逾期标志=Y,连续逾期标志=END_OVER
3.
筛选掉连续逾期标志=NO_FALG的数据,用leg()按日期排序,将逾期开始日期和逾期结束日期拼接到同一行。
4.
当连续逾期标志=ONE_FLAG时,逾期期数=1
当连续逾期标志=END_FLAG时,逾期期数=逾期开始日期-逾期结束日期
账号 |
逾期日期 |
逾期标志 |
上期逾期日期 |
上期逾期标志 |
本期+上期 |
连续逾期标志 |
acc01 |
20180101 |
Y |
|
|
Y- |
START_OVER |
acc01 |
20180201 |
Y |
20180101 |
Y |
Y-Y |
|
acc01 |
20180301 |
Y |
20180201 |
Y |
Y-Y |
|
acc01 |
20180401 |
Y |
20180301 |
Y |
Y-Y |
|
acc01 |
20180501 |
N |
20180401 |
Y |
N-Y |
END_OVER |
acc01 |
20180601 |
N |
20180501 |
N |
N-N |
|
acc01 |
20180701 |
Y |
20180601 |
N |
Y-N |
START_OVER |
acc01 |
20180801 |
Y |
20180701 |
Y |
Y-Y |
|
acc01 |
20180901 |
N |
20180801 |
Y |
N-Y |
END_OVER |
acc01 |
20181001 |
Y |
20180901 |
N |
Y-N |
START_OVER |
acc01 |
20181101 |
N |
20181001 |
Y |
N-Y |
END_OVER |
acc01 |
20181201 |
N |
20181101 |
N |
N-N |
|
|
|
|
|
N |
-N |
|
acc02 |
20180115 |
N |
|
|
N- |
|
acc02 |
20180215 |
N |
20180115 |
Y |
N-Y |
END_OVER |
acc02 |
20180315 |
Y |
20180215 |
N |
Y-N |
START_OVER |
acc02 |
20180415 |
Y |
20180315 |
Y |
Y-Y |
|
acc02 |
20180515 |
N |
20180415 |
Y |
N-Y |
END_OVER |
acc02 |
20180615 |
Y |
20180515 |
N |
Y-N |
START_OVER |
acc02 |
20180715 |
Y |
20180615 |
Y |
Y-Y |
|
acc02 |
20180815 |
N |
20180715 |
Y |
N-Y |
END_OVER |
acc02 |
20180915 |
N |
20180815 |
N |
N-N |
|
acc02 |
20181015 |
Y |
20180915 |
N |
Y-N |
START_OVER |
acc02 |
20181115 |
Y |
20181015 |
Y |
Y-Y |
|
acc02 |
20181215 |
Y |
20181115 |
Y |
Y-Y |
|
|
|
|
|
Y |
-Y |
|
SELECT
ACC_NO,
OVER_DT_F,
OVER_DT,
CASE WHEN SERIAL_OVER_FLAG='ONE_OVER' THEN 1
WHEN SERIAL_OVER_FLAG='END_OVER' THEN OVER_DT-OVER_DT_F
ELSE 'ERROR'
END
FROM (
SELECT
ACC_NO,
OVER_DT,
SERIAL_OVER_FLAG,
LAG(OVER_DT,1,OVER_DT)OVER(PARTITION BY ACC_NO ORDER BY OVER_DT) AS OVER_DT_F
FROM (
SELECT
ACC_NO,
OVER_DT,
CASE WHEN OVER_FLAG_F='ROW_FIRST' AND OVER_FLAG='Y' THEN 'START_OVER'
WHEN OVER_FLAG_F='ROW_FIRST' AND OVER_FLAG_B='ROW_LAST' THEN 'ONE_OVER'
WHEN OVER_FLAG_B='ROW_LAST' AND OVER_FALG='Y' AND OVER_FLAG_F='Y' THEN 'END_OVER'
WHEN OVER_FLAG_B='ROW_LAST' AND OVER_FALG='Y' AND OVER_FLAG_F='N' THEN 'ONE_OVER'
WHEN OVER_FALG='Y' AND OVER_FLAG_F='N' THEN 'STAR_OVER'
WHEN OVER_FLAG='N' AND OVER_FALG_F='Y' THEN 'END_OVER'
ELSE 'NO_FALG'
END AS SERIAL_OVER_FLAG
FROM (
SELECT
ACC_NO,
OVER_DT,
OVER_FLAG,
LAG(OVER_FLAG,1,'ROW_FIRST')OVER(PARTITION BY ACC_NO ORDER BY OVER_DT) AS OVER_FLAG_F, --用leg()获取对应上一条数据,使用"ROW_FIRST"标记首位行
LEAD(OVER_FLAG,1,'ROW_LAST')OVER(PARTITION BY ACC_NO ORDER BY OVER_DT) AS OVER_FLAG_B --用lead()获取对应上一条数据,使用"ROW_LAST"标记尾行
FROM ACC_TAB
)
)
WHERE SERIAL_OVER_FALG<>'NO_FALG'
) WHERE SERIAL_OVER_FLAG<>'START_OVER'
;