How to eliminate duplicate rows in SQL

Learn how to eliminate duplicate rows in SQL with examples and explanations from SQLPad.

Problem

You have a table in your SQL database that contains duplicate rows, and you want to remove these duplicates to clean up your data.

Sample Data

Let's consider a sample table users with the following data:

id	name	email
1	John	[email protected]
2	Jane	[email protected]
3	John	[email protected]
4	Alice	[email protected]
5	John	[email protected]

In the above table, we have duplicate rows with id 1, 3, and 5.

MySQL Solution

DELETE t1 FROM users t1
INNER JOIN users t2 
WHERE t1.id > t2.id AND t1.email = t2.email;

Explanation (MySQL)

In MySQL, we use the DELETE statement to remove rows from a table. Here we are using a self join, where the table users is joined with itself. The condition t1.id > t2.id ensures that we keep one copy of the duplicate rows (with the lowest id) and delete the rest. The condition t1.email = t2.email specifies that we want to consider rows as duplicates if they have the same email.

PostgreSQL Solution

DELETE FROM users
WHERE id IN (
  SELECT id
  FROM (
    SELECT id, ROW_NUMBER() OVER( PARTITION BY email ORDER BY id ) AS rnum
    FROM users ) t
  WHERE t.rnum > 1 );

Explanation (PostgreSQL)

In PostgreSQL, we can use the ROW_NUMBER() function to assign a unique row number to each row within the partition of email. The rows with a row number greater than 1 are the duplicate rows. These duplicate rows are selected in the subquery and the outer DELETE FROM statement removes these rows from the users table.