Operations involved in handling duplicate rows include the following:
Several tools are at your disposal for dealing with duplicate rows.choose them according to the objective that you want to achieve:
Problem
Solution:
permits duplicate rows:
mysql> CREATE TABLE person
-> (
-> last_name char(20),
-> first_name char(20),
-> address char(40)
-> );
Query OK, 0 rows affected (0.01 sec)
prevent multiple rows, add a primary key to its fefinition. and then a primary key prohibits NULL values:
mysql> CREATE TABLE person
-> (
-> last_name char(20) not null,
-> first_name char(20) not null,
-> address char(40),
-> PRIMARY KEY (last_name,first_name)
-> );
Query OK, 0 rows affected (0.01 sec)
mysql>
Another way to enforce uniqueness is to add a UNIQUE index rather than a PRIMARY KEY to a table.The two types of indexes are similar,but a UNIQUE index can be created on columns that permit NULL values
mysql> CREATE TABLE person (
-> last_name char(20) NOT NULL,
-> first_name char(20) NOT NULL,
-> address char(40),
-> UNIQUE (last_name,first_name)
-> );
Query OK, 0 rows affected (0.02 sec)
mysql>
Each person must be assigned some sort of unique identifier,which becomes the value that distingishes one row from another.In MySQL,it's common to accomplish this by using an AUTO_INCREMENT column:
mysql>
mysql> CREATE TABLE person
-> (
-> id INT UNSIGNED NOT NULL AUTO_INCREMENT,
-> last_name CHAR(20),
-> first_name CHAR(20),
-> address CHAR(40),
-> PRIMARY KEY (id)
-> );
Query OK, 0 rows affected (0.01 sec)
mysql>
Problem:
You have created a table with a unique index to prevent duplicate values in the indexed column or columns. But this results in an error if you attempt to insert a duplicate row, and you want to avoid having to deal with such errors.
Solution:
mysql> Drop table if exists person;
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE person
-> (
-> last_name CHAR(20) NOT NULL,
-> first_name CHAR(20) NOT NULL,
-> address CHAR(40),
-> PRIMARY KEY (last_name, first_name)
-> );
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO person (last_name, first_name)
-> values('X1','Y1');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO person (last_name, first_name)
-> values('X1','Y1');
ERROR 1062 (23000): Duplicate entry 'X1-Y1' for key 'PRIMARY'
mysql>
MySQL Provides three single-query solutions to the problem of handling duplicate row.Choose from among them depending on the duplicate-handling behavior you want:
mysql> INSERT IGNORE INTO person (last_name, first_name)
-> VALUES('X2','Y2');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT IGNORE INTO person (last_name, first_name)
-> VALUES('X2','Y2');
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql>
mysql>
mysql> REPLACE INTO person (last_name,first_name)
-> VALUES ('X3','Y3');
Query OK, 1 row affected (0.00 sec)
mysql> REPLACE INTO person (last_name,first_name)
-> VALUES ('X3','Y3');
Query OK, 1 row affected (0.00 sec)
mysql> CREATE TABLE poll_vote
-> (
-> poll_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
-> candidate_id INT UNSIGNED,
-> vote_count INT UNSIGNED,
-> PRIMARY KEY (poll_id, candidate_id)
-> );
Query OK, 0 rows affected (0.01 sec)
mysql> select * from poll_vote;
Empty set (0.00 sec)
mysql> INSERT INTO poll_vote (poll_id,candidate_id,vote_count) VALUES(14,3,1)
-> ON DUPLICATE KEY UPDATE vote_count = vote_count + 1;
Query OK, 1 row affected (0.00 sec)
mysql> select * from poll_vote;
+---------+--------------+------------+
| poll_id | candidate_id | vote_count |
+---------+--------------+------------+
| 14 | 3 | 1 |
+---------+--------------+------------+
1 row in set (0.00 sec)
mysql> INSERT INTO poll_vote (poll_id,candidate_id,vote_count) VALUES(14,3,1)
-> ON DUPLICATE KEY UPDATE vote_count = vote_count + 1;
Query OK, 2 rows affected (0.00 sec)
mysql> select * from poll_vote;
+---------+--------------+------------+
| poll_id | candidate_id | vote_count |
+---------+--------------+------------+
| 14 | 3 | 2 |
+---------+--------------+------------+
1 row in set (0.00 sec)
mysql>
Problem
You want to determine whether a table contains duplicates,and to what extent they occur.Or you want to see the rows that contain the duplicated values.
Solution
Use a counting summary that displays duplicated values.To see the rows in which the duplicated values occur.join the summary to the original table to display the matching rows.
mysql> CREATE TABLE catalog_list (
-> last_name CHAR(20) NOT NULL,
-> first_name CHAR(20) NOT NULL,
-> street CHAR(40),
-> PRIMARY KEY (last_name, first_name)
-> );
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE catalog_list ( last_name CHAR(20) NOT NULL, first_name CHAR(20) NOT NULL, street CHAR(40));
ERROR 1050 (42S01): Table 'catalog_list' already exists
mysql> Drop table if exists catalog_list;
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE catalog_list ( last_name CHAR(20) NOT NULL, first_name CHAR(20) NOT NULL, street CHAR(40));
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Isaacson','Jim','515 Fordam St., Apt. 917')
-> ;
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Baxter','Wallace','57 3rd Ave.');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Baxter','Wallace','57 3rd Ave.');
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('McTavish','Taylor','432 River Run');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Pinter','Marlene','9 Sunset Trail');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Pinter','Marlene','9 Sunset Trail');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('BAXTER','WALLACE','57 3rd Ave.');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Brown','Bartholowmew','432 River Run');
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Baxter','Wallace','57 3rd Ave., Apt 102');
Query OK, 1 row affected (0.01 sec)
mysql> select count(*) as rows from catalog_list;
+------+
| rows |
+------+
| 9 |
+------+
1 row in set (0.00 sec)
mysql> select count(*) - count(distinct last_name,first_name)
-> as 'duplicate names'
-> from catalog_list;
+-----------------+
| duplicate names |
+-----------------+
| 4 |
+-----------------+
1 row in set (0.00 sec)
mysql> select count(distinct last_name,first_name)/ count(*)
-> as 'unique',
-> 1 - (count(distinct last_name,first_name) / count(*))
-> as 'nonunique'
-> from catalog_list;
+--------+-----------+
| unique | nonunique |
+--------+-----------+
| 0.5556 | 0.4444 |
+--------+-----------+
1 row in set (0.00 sec)
mysql> select count(*),last_name,first_name
-> from catalog_list
-> group by last_name,first_name
-> having count(*) > 1;
+----------+-----------+------------+
| count(*) | last_name | first_name |
+----------+-----------+------------+
| 4 | Baxter | Wallace |
| 2 | Pinter | Marlene |
+----------+-----------+------------+
2 rows in set (0.00 sec)
mysql>
Problem:
You want to remove duplicate rows from a table, leaving only unique rows.
Solution:
Select the unique rows from the table into a second table, then use it to replace the original one.Or use DELETE ...LIMIT n to remove all but one instance of a specific set of duplicates rows.
mysql> select * from catalog_list order by last_name,first_name;
+-----------+--------------+--------------------------+
| last_name | first_name | street |
+-----------+--------------+--------------------------+
| Baxter | Wallace | 57 3rd Ave. |
| Baxter | Wallace | 57 3rd Ave. |
| BAXTER | WALLACE | 57 3rd Ave. |
| Baxter | Wallace | 57 3rd Ave., Apt 102 |
| Brown | Bartholowmew | 432 River Run |
| Isaacson | Jim | 515 Fordam St., Apt. 917 |
| McTavish | Taylor | 432 River Run |
| Pinter | Marlene | 9 Sunset Trail |
| Pinter | Marlene | 9 Sunset Trail |
+-----------+--------------+--------------------------+
9 rows in set (0.00 sec)
mysql>
mysql> CREATE TABLE tmp LIKE catalog_list;
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO tmp SELECT DISTINCT * FROM catalog_list;
Query OK, 6 rows affected (0.01 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from tmp order by last_name,first_name;
+-----------+--------------+--------------------------+
| last_name | first_name | street |
+-----------+--------------+--------------------------+
| Baxter | Wallace | 57 3rd Ave. |
| Baxter | Wallace | 57 3rd Ave., Apt 102 |
| Brown | Bartholowmew | 432 River Run |
| Isaacson | Jim | 515 Fordam St., Apt. 917 |
| McTavish | Taylor | 432 River Run |
| Pinter | Marlene | 9 Sunset Trail |
+-----------+--------------+--------------------------+
6 rows in set (0.00 sec)
mysql> CREATE TABLE catalog_list_original LIKE catalog_list;
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO catalog_list_original SELECT * FROM catalog_list;
Query OK, 9 rows affected (0.00 sec)
Records: 9 Duplicates: 0 Warnings: 0
mysql> drop table catalog_list;
Query OK, 0 rows affected (0.01 sec)
mysql> rename table tmp to catalog_list;
Query OK, 0 rows affected (0.00 sec)
mysql>
Removing duplicates using table replacement
mysql> drop table catalog_list;
Query OK, 0 rows affected (0.01 sec)
mysql> select * from catalog_list;
ERROR 1146 (42S02): Table 'cookbook.catalog_list' doesn't exist
mysql> create table catalog_list like catalog_list_original;
Query OK, 0 rows affected (0.01 sec)
mysql> select * from catalog_list;
Empty set (0.00 sec)
mysql> insert into catalog_list select * from catalog_list_original;
Query OK, 9 rows affected (0.01 sec)
Records: 9 Duplicates: 0 Warnings: 0
mysql> CREATE TABLE tmp like catalog_list;
Query OK, 0 rows affected (0.02 sec)
mysql> ALTER TABLE tmp ADD PRIMARY KEY (last_name,first_name);
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> INSERT IGNORE INTO tmp SELECT * FROM catalog_list;
Query OK, 5 rows affected, 4 warnings (0.00 sec)
Records: 9 Duplicates: 4 Warnings: 4
mysql> select * from tmp order by last_name,first_name;
+-----------+--------------+--------------------------+
| last_name | first_name | street |
+-----------+--------------+--------------------------+
| Baxter | Wallace | 57 3rd Ave. |
| Brown | Bartholowmew | 432 River Run |
| Isaacson | Jim | 515 Fordam St., Apt. 917 |
| McTavish | Taylor | 432 River Run |
| Pinter | Marlene | 9 Sunset Trail |
+-----------+--------------+--------------------------+
5 rows in set (0.00 sec)
mysql> drop table catalog_list;
Query OK, 0 rows affected (0.01 sec)
mysql> RENAME TABLE tmp to catalog_list;
Query OK, 0 rows affected (0.01 sec)
Removing duplicates of a particular row
Use Limit to restrict the effect of a delete statement to a subset of the rows that it otherwise would delete.
mysql> select * from catalog_list;
+-----------+--------------+--------------------------+
| last_name | first_name | street |
+-----------+--------------+--------------------------+
| Baxter | Wallace | 57 3rd Ave. |
| Brown | Bartholowmew | 432 River Run |
| Isaacson | Jim | 515 Fordam St., Apt. 917 |
| McTavish | Taylor | 432 River Run |
| Pinter | Marlene | 9 Sunset Trail |
+-----------+--------------+--------------------------+
5 rows in set (0.00 sec)
mysql> drop table catalog_list;
Query OK, 0 rows affected (0.02 sec)
mysql> create table catalog_list like catalog_list_original;
Query OK, 0 rows affected (0.01 sec)
mysql> select * from catalog_list;
Empty set (0.00 sec)
mysql> insert into catalog_list select * from catalog_list_original;
Query OK, 9 rows affected (0.01 sec)
Records: 9 Duplicates: 0 Warnings: 0
mysql> select * from catalog_list;
+-----------+--------------+--------------------------+
| last_name | first_name | street |
+-----------+--------------+--------------------------+
| Isaacson | Jim | 515 Fordam St., Apt. 917 |
| Baxter | Wallace | 57 3rd Ave. |
| Baxter | Wallace | 57 3rd Ave. |
| McTavish | Taylor | 432 River Run |
| Pinter | Marlene | 9 Sunset Trail |
| Pinter | Marlene | 9 Sunset Trail |
| BAXTER | WALLACE | 57 3rd Ave. |
| Brown | Bartholowmew | 432 River Run |
| Baxter | Wallace | 57 3rd Ave., Apt 102 |
+-----------+--------------+--------------------------+
9 rows in set (0.00 sec)
mysql> select count(*),last_name,first_name
-> from catalog_list
-> group by last_name,first_name
-> having count(*) > 1;
+----------+-----------+------------+
| count(*) | last_name | first_name |
+----------+-----------+------------+
| 4 | Baxter | Wallace |
| 2 | Pinter | Marlene |
+----------+-----------+------------+
2 rows in set (0.00 sec)
mysql> DELETE FROM catalog_list where last_name = 'Baxter'
-> AND first_name = 'Wallace' limit 2;
Query OK, 2 rows affected (0.00 sec)
mysql> DELETE FROM catalog_list where last_name = 'Pinter'
-> AND first_name = 'Marlene' limit 1;
Query OK, 1 row affected (0.00 sec)
mysql> select * from catalog_list;
+-----------+--------------+--------------------------+
| last_name | first_name | street |
+-----------+--------------+--------------------------+
| Isaacson | Jim | 515 Fordam St., Apt. 917 |
| McTavish | Taylor | 432 River Run |
| Pinter | Marlene | 9 Sunset Trail |
| BAXTER | WALLACE | 57 3rd Ave. |
| Brown | Bartholowmew | 432 River Run |
| Baxter | Wallace | 57 3rd Ave., Apt 102 |
+-----------+--------------+--------------------------+
6 rows in set (0.00 sec)
mysql> DELETE FROM catalog_list where last_name = 'Baxter' AND first_name = 'Wallace' limit 1;
Query OK, 1 row affected (0.01 sec)
mysql> select * from catalog_list;
+-----------+--------------+--------------------------+
| last_name | first_name | street |
+-----------+--------------+--------------------------+
| Isaacson | Jim | 515 Fordam St., Apt. 917 |
| McTavish | Taylor | 432 River Run |
| Pinter | Marlene | 9 Sunset Trail |
| Brown | Bartholowmew | 432 River Run |
| Baxter | Wallace | 57 3rd Ave., Apt 102 |
+-----------+--------------+--------------------------+
5 rows in set (0.00 sec)
mysql>