Finding Duplicate Records Using GROUP BY in SQL Server

There are various times when we need to find duplicate records in SQL Server. It is possible to find duplicates using DISTINCT, ROW NUMBER as well as the GROUP BY approach.

Duplicate records can create problems sometimes when displaying reports or performing a Multiple Insert update. Finding duplicate records in a database needs further investigation. In some cases, duplicate records are positive, but it all depends on the data and the database design as well.

For example, if a customer has ordered the same product twice on the same date with the the same shipping and billing address, then this may result in a duplicate record.

Let us create a table Customer with First Name, Last Name, and Mobile Number fields.

CREATE DATABASE IF NOT EXISTS mysql_test;
USE mysql_test;

DROP TABLE IF EXISTS CUSTOMER;
CREATE TABLE CUSTOMER
(
FirstName VARCHAR(50),
LastName  VARCHAR(50),
MobileNo  VARCHAR(15)
);

INSERT INTO CUSTOMER VALUES ('Niraj','Yadav',989898);

INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);
INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);

INSERT INTO CUSTOMER VALUES ('Atul','Kokam2',42424242);
INSERT INTO CUSTOMER VALUES ('Atul','Kokam3',42424242);

INSERT INTO CUSTOMER VALUES ('Vishal','Parte3',9394453);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte3',9394454);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte5',9394455);

INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);
INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);

Using the DISTINCT approach, we can quickly get unique rows in a table.

SELECT DISTINCT FirstName, LastName, MobileNo FROM CUSTOMER;

However, this does not show how many times a row has been duplicated. Using the GROUP BY approach, we can find this.

Finding Duplicates Using GROUP BY

Adding grouping and a counting field to our display of FirstName, LastName and MobileNo combination shows how many times each customer’s name appears.

SELECT FirstName, LastName, MobileNo, COUNT(1) as CNT FROM CUSTOMER GROUP BY FirstName, LastName, MobileNo;

GROUP BY will show just one record for each combination of FirstName, LastName and MobileNo.

The count CNT shows how many times the row has been duplicated.
CNT = 1 indicates that row appears only once.

Let us filter out using the Having clause to exclude rows that appear only once.

SELECT FirstName, LastName, MobileNo, COUNT(1) as CNT FROM CUSTOMER GROUP BY FirstName, LastName, MobileNo HAVING COUNT(1) > 1;

Let us filter out records which FirstName is duplicate and MobileNo is the max value.

SELECT FirstName, LastName, Max(MobileNo) AS MobileNo, COUNT(1) AS CNT FROM CUSTOMER GROUP BY FirstName HAVING CNT>1;

or

SELECT * FROM (SELECT FirstName, LastName, Max(MobileNo) AS MobileNo, COUNT(1) AS CNT FROM CUSTOMER GROUP BY FirstName)TMP WHERE CNT>1;

Reference: http://www.tech-recipes.com/rx/49959/finding-duplicate-records-using-group-by-in-sql-server/

你可能感兴趣的:(SQL)