On Duplicate Data in MySQL

Keywords: MySQL SQL Database

In sql data processing, it is inevitable to encounter the problem of duplicate data, so how to deal with these duplicate data?

Let's see how to deal with it in MYSQL.

MySQL handles duplicate data

There may be duplicate records in some MySQL tables, and in some cases we allow duplicate data to exist, but sometimes we need to delete these duplicate data.

In this chapter, we will introduce how to prevent duplicate data from appearing in data tables and how to delete duplicate data from data tables.

Prevent duplicate data from appearing in tables

You can set the specified field to PRIMARY in MySQL data table KEY (primary key) or UNIQUE (unique) index ensures the uniqueness of data.

Let's try an example: There are no indexes and primary keys in the table below, so the table allows multiple duplicate records.

CREATE TABLE person_tbl
(
    first_name CHAR(20),
    last_name CHAR(20),
    sex CHAR(10)
);

If you want to set the field first_name, last_name data can not be duplicated, you can set double primary key mode to set the uniqueness of the data, if you set double primary key, then the default value of that key can not be NULL, can be set to NOT NULL. As follows:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);

If we set up a unique index, the SQL statement will fail to execute successfully and throw an error when inserting duplicate data.

The difference between INSERT IGNORE INTO and INSERT INTO is that INSERT IGNORE ignores existing data in the database, inserts new data if there is no data in the database, and skips this data if there is data. In this way, the existing data in the database can be retained, and the purpose of inserting data in the gap can be achieved.

The following example uses INSERT IGNORE INTO to execute without errors or to insert duplicate data into the data table:

mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 0 rows affected (0.00 sec)
When INSERT IGNORE INTO inserts data, after setting the uniqueness of the record, if it inserts duplicate data, it will not return errors, only in the form of warnings. If REPLACE INTO into has the same record as primary or unique, delete it first. Insert a new record.

Another way to set the uniqueness of data is to add a UNIQUE index as follows:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10)
   UNIQUE (last_name, first_name)
);

Statistical duplicate data

Following are the duplicate records of first_name and last_name in the statistics table:

mysql> SELECT COUNT(*) as repetitions, last_name, first_name
    -> FROM person_tbl
    -> GROUP BY last_name, first_name
    -> HAVING repetitions > 1;

The above query statement returns the number of duplicate records in the person_tbl table. In general, if you query for duplicate values, do the following:

  • Determine which column contains values that may be duplicated.
  • Use the columns listed by COUNT(*) in the column selection list.
  • Columns listed in the GROUP BY clause.
  • The HAVING clause sets the number of repetitions to be greater than 1.

Filtering duplicate data

If you need to read non-duplicate data, you can use the DISTINCT keyword in the SELECT statement to filter duplicate data.

mysql> SELECT DISTINCT last_name, first_name
    -> FROM person_tbl
    -> ORDER BY last_name;

You can also use GROUP BY to read data that is not duplicated in the data table:

mysql> SELECT last_name, first_name
    -> FROM person_tbl
    -> GROUP BY (last_name, first_name);

Delete duplicate data

If you want to delete duplicate data in the data table, you can use the following SQL statements:

mysql> CREATE TABLE tmp SELECT last_name, first_name, sex
    ->                  FROM person_tbl;
    ->                  GROUP BY (last_name, first_name);
mysql> DROP TABLE person_tbl;
mysql> ALTER TABLE tmp RENAME TO person_tbl;

Of course, you can also add INDEX (index) and PRIMAY KEY (primary key) to the data table to delete duplicate records in the table. The methods are as follows:

mysql> ALTER IGNORE TABLE person_tbl
    -> ADD PRIMARY KEY (last_name, first_name);

Posted by oeb on Fri, 05 Apr 2019 14:27:31 -0700