2020-12-30计算重复数据

一般来说我们重复数据的计算是直接去重就好
目前遇到一个问题,我们把上报时间在相差1min以内,且其他参数都相同的记录作为重复记录看代
如何判断呢,需要用到函数LEAD

# [SQL LEAD()函数 LAG()函数](https://www.cnblogs.com/jasonlai2016/p/10166842.html
lag ,lead 分别是向前,向后;
lag 和lead 有三个参数,第一个参数是列名,第二个参数是偏移的offset,第三个参数是 超出记录窗口时的默认值)
LEAD ( scalar_expression [ ,offset ] , [ default ] )     OVER ( [ partition_by_clause ] order_by_clause )
SELECT COUNT(1) AS repeat_num
  FROM (
        SELECT imei
               ,params
               ,event_time
               ,event_time_compare
          FROM (
                SELECT imei
                       ,params
                       ,event_time
                       ,lead(event_time,1) OVER(PARTITION BY imei,map_to_string(params) ORDER BY event_time ASC) AS event_time_compare
               from tabel
               where day = '2020-12-28'
               and hour = 3
               )tmp0
         WHERE ABS(event_time-event_time_compare)<1000
       )tmp1

你可能感兴趣的:(2020-12-30计算重复数据)