删除一个表中的重复数据同时保留第一次插入那一条以及sql优化

       业务:一个表中有很多数据(id为自增主键),在这些数据中有个别数据出现了重复的数据。

       目标:需要把这些重复数据删除同时保留第一次插入的那一条数据,还要保持其它的数据不受影响。

       解题过程:

       第一步:查出所有要保留的下来的数据的id(save_id)

 

SELECT id as save_id

  FROM yujing.alarm_event_info_snapshot aeis

 where aeis.event_id in

       (SELECT ae.id

          FROM yujing.alarm_event ae

         where ae.event_uuid like 'yuanwtj_%')

 group by (aeis.event_id)

 

优化后:

 

SELECT aeis.id as save_id

  FROM yujing.alarm_event ae

 right join yujing.alarm_event_info_snapshot aeis

    on aeis.event_id = ae.id

 where ae.event_uuid like 'yuanwtj_%'

 group by (aeis.event_id)

  

 

       第二步:获取所有相关数据的id(all_id)

 

SELECT aeis.id as all_id

  FROM yujing.alarm_event_info_snapshot aeis

 where aeis.event_id in

       (SELECT ae.id

          FROM yujing.alarm_event ae

         where ae.event_uuid like 'yuanwtj_%')

 order by aeis.event_id

 

优化后:       

 

SELECT aeis.id as all_id

  FROM yujing.alarm_event ae

 right join yujing.alarm_event_info_snapshot aeis

    on aeis.event_id = ae.id

 where ae.event_uuid like 'yuanwtj_%'


         第三步:获取要删除的数据的 id(del_id)

select ad.all_id as del_id

  from (SELECT aeis.id as all_id

          FROM yujing.alarm_event_info_snapshot aeis

         where aeis.event_id in

               (SELECT ae.id

                  FROM yujing.alarm_event ae

                 where ae.event_uuid like 'yuanwtj_%')) as ad

 where ad.all_id not in (SELECT id as save_id

                           FROM yujing.alarm_event_info_snapshot aeis

                          where aeis.event_id in

                                (SELECT ae.id

                                   FROM yujing.alarm_event ae

                                  where ae.event_uuid like 'yuanwtj_%')

                          group by (aeis.event_id))

优化后:

select ad.all_id as del_id

  from (SELECT aeis.id as all_id

          FROM yujing.alarm_event ae

         right join yujing.alarm_event_info_snapshot aeis

            on aeis.event_id = ae.id

         where ae.event_uuid like 'yuanwtj_%') as ad

  left join (SELECT aeis.id as save_id

               FROM yujing.alarm_event ae

              right join yujing.alarm_event_info_snapshot aeis

                 on aeis.event_id = ae.id

              where ae.event_uuid like 'yuanwtj_%'

              group by (aeis.event_id)) as sd

    on ad.all_id = sd.save_id

 where sd.save_id is null

       

 

       第四步:根据id删除所有节点,注意mysql中如果有大量数据时需要批量删除,我最后使用了ETL工具进行的批量删除

       总结:在mysql数据库中,sql语句中最好不要在in或not in关键字的查询里动态获取匹配的值,数据量大的情况下使用它们效率很低,可以使用左右连接来代替in操作,这样效率会提高很多倍,大数据量下尤为明显。


 

你可能感兴趣的:(SQL优化)