题目:
编写一个 SQL 查询,来删除 Person 表中所有重复的电子邮箱,重复的邮箱里只保留 Id 最小 的那个。
±—±-----------------+
| Id | Email |
±—±-----------------+
| 1 | [email protected] |
| 2 | [email protected] |
| 3 | [email protected] |
±—±-----------------+
Id 是这个表的主键。
例如,在运行你的查询语句之后,上面的 Person 表应返回以下几行:
±—±-----------------+
| Id | Email |
±—±-----------------+
| 1 | [email protected] |
| 2 | [email protected] |
±—±-----------------+
题解:
hive自带函数row_number可以实现
mysql中没有row_number函数,需要使用delete和where子句
select Id, Email
from (
select Id ,Email, row_number() over (partition by Email order by Id ) ranking
from Person
) t
where ranking =1
DELETE p1 FROM Person p1,
Person p2
WHERE
p1.Email = p2.Email AND p1.Id > p2.Id