How to handling duplicates in MySQL

Operations involved in handling duplicate rows include the following:

  • Preventing duplicates from being created in the frist place.
  • Counting the number of duplicates to determine whether they are present and to what extent.
  • Identifying duplicated values(or the rows containing them) so you can see where they occur.
  • Eliminating duplicates to ensure that each row is unique.

Several tools are at your disposal for dealing with duplicate rows.choose them according to the objective that you want to achieve:

  • When you create a table, include a primary key or unique index to prevent duplicates from being added to the table.MySQL uses the index as a constraint to enforce the requirement that each row in the table contains a unique key in the indexed column or columns.
  • In conjunction with a unique index,the INSERT IGNORE and REPLACE statements enable you to handle insertion of duplicate rows gracefully without generating errors.
  • To determine whether a table contains duplicates ,use group by to categorize rows into groups,and count() to see how many rows are in each group.
  • SELECT DISTINCT removes duplicate row from a result set.

1.How to preventing duplicates from occurring in a table?

Problem

  • prevent  a table from ever containing duplicates.

Solution:

  • Use a PRIMARY KEY or a UNIQUE index

permits duplicate rows:

mysql> CREATE TABLE person
    -> (
    ->   last_name char(20),
    ->   first_name char(20),
    ->   address  char(40)
    -> );
Query OK, 0 rows affected (0.01 sec)

prevent multiple rows, add a primary key to its fefinition. and then a primary key prohibits NULL values:

mysql> CREATE TABLE person
    -> (
    ->   last_name char(20) not null,
    ->   first_name char(20) not null,
    ->   address char(40),
    ->   PRIMARY KEY (last_name,first_name)
    -> );
Query OK, 0 rows affected (0.01 sec)

mysql> 

Another way to enforce uniqueness is to add a UNIQUE index rather than a PRIMARY KEY to a table.The two types of indexes are similar,but a UNIQUE index can be created on columns that permit NULL values 

mysql> CREATE TABLE person (    
    ->   last_name char(20) NOT NULL,                           
    ->   first_name char(20) NOT NULL,
    ->   address   char(40),
    ->   UNIQUE  (last_name,first_name)
    -> );
Query OK, 0 rows affected (0.02 sec)

mysql> 

Each person must be assigned some sort of unique identifier,which becomes the value that distingishes one row from another.In MySQL,it's common to accomplish this by using an AUTO_INCREMENT column:

mysql> 
mysql> CREATE TABLE person
    -> (
    ->   id         INT UNSIGNED NOT NULL AUTO_INCREMENT,
    ->   last_name  CHAR(20),
    ->   first_name CHAR(20),
    ->   address    CHAR(40),
    ->   PRIMARY KEY (id)
    -> );
Query OK, 0 rows affected (0.01 sec)

mysql> 

2.How to dealing with duplicates when loading rows into a table.

Problem:

You have created a table with a unique index to prevent duplicate values in the indexed column or columns. But this results in an error if you attempt to insert a duplicate row, and you want to avoid having to deal with such errors.

Solution:

  • One approach is to just ignore the error.
  • Another is to use an INSERT IGNORE,REPLACE or INSERT ... ON DUPLICATE KEY UPDATE statement

mysql> Drop table if exists person;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE person
    -> (
    ->  last_name CHAR(20) NOT NULL,
    ->  first_name CHAR(20) NOT NULL,
    ->  address CHAR(40),
    ->  PRIMARY KEY (last_name, first_name)
    -> );

Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO person (last_name, first_name)
    -> values('X1','Y1');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO person (last_name, first_name) 
    -> values('X1','Y1');                                           
ERROR 1062 (23000): Duplicate entry 'X1-Y1' for key 'PRIMARY'
mysql> 

MySQL Provides three single-query solutions to the problem of handling duplicate row.Choose from among them depending on the duplicate-handling behavior you want:

  • To keep the original row when a duplicate occurs,use INSERT IGNORE than INSERT.

mysql> INSERT IGNORE INTO person (last_name, first_name) 
    -> VALUES('X2','Y2');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT IGNORE INTO person (last_name, first_name) 
    -> VALUES('X2','Y2');                                                  
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql>

  • To replace the original row with the new one when a duplicate occurs,use REPLACE rather than INSERT.

mysql> 
mysql> REPLACE INTO person (last_name,first_name)
    -> VALUES ('X3','Y3');
Query OK, 1 row affected (0.00 sec)

mysql> REPLACE INTO person (last_name,first_name) 
    -> VALUES ('X3','Y3');                                           
Query OK, 1 row affected (0.00 sec)

  • To modify columns of an existing row when a duplicate occurs,use INSERT ... ON DUPLICATE KEY UPDATE.

mysql> CREATE TABLE poll_vote
    -> (
    ->  poll_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    ->  candidate_id INT UNSIGNED,
    ->  vote_count INT UNSIGNED,
    ->  PRIMARY KEY (poll_id, candidate_id)
    -> );

Query OK, 0 rows affected (0.01 sec)

mysql> select * from poll_vote;
Empty set (0.00 sec)

mysql> INSERT INTO poll_vote (poll_id,candidate_id,vote_count) VALUES(14,3,1)
    -> ON DUPLICATE KEY UPDATE vote_count = vote_count + 1;

Query OK, 1 row affected (0.00 sec)

mysql> select * from poll_vote;
+---------+--------------+------------+
| poll_id | candidate_id | vote_count |
+---------+--------------+------------+
|      14 |            3 |          1 |
+---------+--------------+------------+
1 row in set (0.00 sec)

mysql> INSERT INTO poll_vote (poll_id,candidate_id,vote_count) VALUES(14,3,1)
    -> ON DUPLICATE KEY UPDATE vote_count = vote_count + 1;

Query OK, 2 rows affected (0.00 sec)

mysql> select * from poll_vote;
+---------+--------------+------------+
| poll_id | candidate_id | vote_count |
+---------+--------------+------------+
|      14 |            3 |          2 |
+---------+--------------+------------+
1 row in set (0.00 sec)

mysql> 

3.How to counting and Identifying duplicates

Problem

You want to determine whether a table contains duplicates,and to what extent they occur.Or you want to see the rows that contain the duplicated values.

Solution

Use a counting summary that displays duplicated values.To see the rows in which the duplicated values occur.join the summary to the original table to display the matching rows.

mysql> CREATE TABLE catalog_list (
    -> last_name CHAR(20) NOT NULL,
    -> first_name CHAR(20) NOT NULL,
    -> street CHAR(40),        
    -> PRIMARY KEY (last_name, first_name)
    -> );

Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE catalog_list ( last_name CHAR(20) NOT NULL, first_name CHAR(20) NOT NULL, street CHAR(40));
ERROR 1050 (42S01): Table 'catalog_list' already exists
mysql> Drop table if exists catalog_list;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE catalog_list ( last_name CHAR(20) NOT NULL, first_name CHAR(20) NOT NULL, street CHAR(40));
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Isaacson','Jim','515 Fordam St., Apt. 917')
    -> ;
Query OK, 1 row affected (0.02 sec)

mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Baxter','Wallace','57 3rd Ave.');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Baxter','Wallace','57 3rd Ave.');
Query OK, 1 row affected (0.02 sec)

mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('McTavish','Taylor','432 River Run');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Pinter','Marlene','9 Sunset Trail');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Pinter','Marlene','9 Sunset Trail');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('BAXTER','WALLACE','57 3rd Ave.');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Brown','Bartholowmew','432 River Run');
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO catalog_list (last_name,first_name,street) values ('Baxter','Wallace','57 3rd Ave., Apt 102');
Query OK, 1 row affected (0.01 sec)

mysql> select count(*) as rows from catalog_list;
+------+
| rows |
+------+
|    9 |
+------+
1 row in set (0.00 sec)

mysql> select count(*) - count(distinct last_name,first_name) 
    -> as 'duplicate names'
    -> from catalog_list;

+-----------------+
| duplicate names |
+-----------------+
|               4 |
+-----------------+
1 row in set (0.00 sec)

mysql> select count(distinct last_name,first_name)/ count(*)
    -> as 'unique',
    -> 1 - (count(distinct last_name,first_name) / count(*))
    -> as 'nonunique'
    -> from catalog_list;

+--------+-----------+
| unique | nonunique |
+--------+-----------+
| 0.5556 |    0.4444 |
+--------+-----------+
1 row in set (0.00 sec)

mysql> select count(*),last_name,first_name
    -> from catalog_list
    -> group by last_name,first_name
    -> having count(*) > 1;

+----------+-----------+------------+
| count(*) | last_name | first_name |
+----------+-----------+------------+
|        4 | Baxter    | Wallace    |
|        2 | Pinter    | Marlene    |
+----------+-----------+------------+
2 rows in set (0.00 sec)

mysql> 

3.How to eliminating duplicates from a Table.

Problem

You want to remove duplicate rows from a table, leaving only unique rows.

Solution:

Select the unique rows from the table into a second table, then use it to replace the original one.Or use DELETE ...LIMIT n to remove all but one instance of a specific set of duplicates rows.

mysql> select * from catalog_list order by last_name,first_name;
+-----------+--------------+--------------------------+
| last_name | first_name   | street                   |
+-----------+--------------+--------------------------+
| Baxter    | Wallace      | 57 3rd Ave.              |
| Baxter    | Wallace      | 57 3rd Ave.              |
| BAXTER    | WALLACE      | 57 3rd Ave.              |
| Baxter    | Wallace      | 57 3rd Ave., Apt 102     |
| Brown     | Bartholowmew | 432 River Run            |
| Isaacson  | Jim          | 515 Fordam St., Apt. 917 |
| McTavish  | Taylor       | 432 River Run            |
| Pinter    | Marlene      | 9 Sunset Trail           |
| Pinter    | Marlene      | 9 Sunset Trail           |
+-----------+--------------+--------------------------+
9 rows in set (0.00 sec)
mysql> 
mysql> CREATE TABLE tmp LIKE catalog_list;
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO tmp SELECT DISTINCT * FROM catalog_list;
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select * from tmp order by last_name,first_name;
+-----------+--------------+--------------------------+
| last_name | first_name   | street                   |
+-----------+--------------+--------------------------+
| Baxter    | Wallace      | 57 3rd Ave.              |
| Baxter    | Wallace      | 57 3rd Ave., Apt 102     |
| Brown     | Bartholowmew | 432 River Run            |
| Isaacson  | Jim          | 515 Fordam St., Apt. 917 |
| McTavish  | Taylor       | 432 River Run            |
| Pinter    | Marlene      | 9 Sunset Trail           |
+-----------+--------------+--------------------------+
6 rows in set (0.00 sec)

mysql> CREATE TABLE catalog_list_original LIKE catalog_list;                    
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO catalog_list_original SELECT * FROM catalog_list;             
Query OK, 9 rows affected (0.00 sec)
Records: 9  Duplicates: 0  Warnings: 0

mysql> drop table catalog_list;
Query OK, 0 rows affected (0.01 sec)

mysql> rename table tmp to catalog_list;
Query OK, 0 rows affected (0.00 sec)

mysql> 

Removing duplicates using table replacement

mysql> drop table catalog_list;
Query OK, 0 rows affected (0.01 sec)

mysql> select * from catalog_list;
ERROR 1146 (42S02): Table 'cookbook.catalog_list' doesn't exist
mysql> create table catalog_list like catalog_list_original;
Query OK, 0 rows affected (0.01 sec)

mysql> select * from catalog_list;
Empty set (0.00 sec)
mysql> insert into catalog_list select * from catalog_list_original;    
Query OK, 9 rows affected (0.01 sec)
Records: 9  Duplicates: 0  Warnings: 0

mysql> CREATE TABLE tmp like catalog_list;
Query OK, 0 rows affected (0.02 sec)

mysql> ALTER TABLE tmp ADD PRIMARY KEY (last_name,first_name);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> INSERT IGNORE INTO tmp SELECT * FROM catalog_list;
Query OK, 5 rows affected, 4 warnings (0.00 sec)
Records: 9  Duplicates: 4  Warnings: 4

mysql> select * from tmp order by last_name,first_name;
+-----------+--------------+--------------------------+
| last_name | first_name   | street                   |
+-----------+--------------+--------------------------+
| Baxter    | Wallace      | 57 3rd Ave.              |
| Brown     | Bartholowmew | 432 River Run            |
| Isaacson  | Jim          | 515 Fordam St., Apt. 917 |
| McTavish  | Taylor       | 432 River Run            |
| Pinter    | Marlene      | 9 Sunset Trail           |
+-----------+--------------+--------------------------+
5 rows in set (0.00 sec)

mysql> drop table catalog_list;
Query OK, 0 rows affected (0.01 sec)
mysql> RENAME TABLE tmp to catalog_list;
Query OK, 0 rows affected (0.01 sec)

Removing duplicates of a particular row

Use Limit to restrict the effect of a delete statement to a subset of the rows that it otherwise would delete.

mysql> select * from catalog_list;
+-----------+--------------+--------------------------+
| last_name | first_name   | street                   |
+-----------+--------------+--------------------------+
| Baxter    | Wallace      | 57 3rd Ave.              |
| Brown     | Bartholowmew | 432 River Run            |
| Isaacson  | Jim          | 515 Fordam St., Apt. 917 |
| McTavish  | Taylor       | 432 River Run            |
| Pinter    | Marlene      | 9 Sunset Trail           |
+-----------+--------------+--------------------------+
5 rows in set (0.00 sec)

mysql> drop table catalog_list;
Query OK, 0 rows affected (0.02 sec)

mysql> create table catalog_list like catalog_list_original;
Query OK, 0 rows affected (0.01 sec)

mysql> select * from catalog_list;
Empty set (0.00 sec)

mysql> insert into catalog_list select * from catalog_list_original;
Query OK, 9 rows affected (0.01 sec)
Records: 9  Duplicates: 0  Warnings: 0

mysql> select * from catalog_list;
+-----------+--------------+--------------------------+
| last_name | first_name   | street                   |
+-----------+--------------+--------------------------+
| Isaacson  | Jim          | 515 Fordam St., Apt. 917 |
| Baxter    | Wallace      | 57 3rd Ave.              |
| Baxter    | Wallace      | 57 3rd Ave.              |
| McTavish  | Taylor       | 432 River Run            |
| Pinter    | Marlene      | 9 Sunset Trail           |
| Pinter    | Marlene      | 9 Sunset Trail           |
| BAXTER    | WALLACE      | 57 3rd Ave.              |
| Brown     | Bartholowmew | 432 River Run            |
| Baxter    | Wallace      | 57 3rd Ave., Apt 102     |
+-----------+--------------+--------------------------+
9 rows in set (0.00 sec)

mysql> select count(*),last_name,first_name
    -> from catalog_list

    -> group by last_name,first_name
    -> having count(*) > 1;
+----------+-----------+------------+
| count(*) | last_name | first_name |
+----------+-----------+------------+
|        4 | Baxter    | Wallace    |
|        2 | Pinter    | Marlene    |
+----------+-----------+------------+
2 rows in set (0.00 sec)

mysql> DELETE FROM catalog_list where last_name = 'Baxter'
    -> AND first_name = 'Wallace' limit 2;
Query OK, 2 rows affected (0.00 sec)

mysql> DELETE FROM catalog_list where last_name = 'Pinter'
    -> AND first_name = 'Marlene' limit 1;
Query OK, 1 row affected (0.00 sec)

mysql> select * from catalog_list;
+-----------+--------------+--------------------------+
| last_name | first_name   | street                   |
+-----------+--------------+--------------------------+
| Isaacson  | Jim          | 515 Fordam St., Apt. 917 |
| McTavish  | Taylor       | 432 River Run            |
| Pinter    | Marlene      | 9 Sunset Trail           |
| BAXTER    | WALLACE      | 57 3rd Ave.              |
| Brown     | Bartholowmew | 432 River Run            |
| Baxter    | Wallace      | 57 3rd Ave., Apt 102     |
+-----------+--------------+--------------------------+
6 rows in set (0.00 sec)

mysql> DELETE FROM catalog_list where last_name = 'Baxter' AND first_name = 'Wallace' limit 1;
Query OK, 1 row affected (0.01 sec)

mysql> select * from catalog_list;
+-----------+--------------+--------------------------+
| last_name | first_name   | street                   |
+-----------+--------------+--------------------------+
| Isaacson  | Jim          | 515 Fordam St., Apt. 917 |
| McTavish  | Taylor       | 432 River Run            |
| Pinter    | Marlene      | 9 Sunset Trail           |
| Brown     | Bartholowmew | 432 River Run            |
| Baxter    | Wallace      | 57 3rd Ave., Apt 102     |
+-----------+--------------+--------------------------+
5 rows in set (0.00 sec)

mysql> 

你可能感兴趣的:(深入mysql的世界,mysql)