SQL Cookbook Lesson
How to eliminate duplicate rows in SQL
Learn how to eliminate duplicate rows in SQL with examples and explanations from SQLPad.
Problem
You have a table in your SQL database that contains duplicate rows, and you want to remove these duplicates to clean up your data.
Sample Data
Let's consider a sample table users with the following data:
| id | name | |
|---|---|---|
| 1 | John | [email protected] |
| 2 | Jane | [email protected] |
| 3 | John | [email protected] |
| 4 | Alice | [email protected] |
| 5 | John | [email protected] |
In the above table, we have duplicate rows with id 1, 3, and 5.
MySQL Solution
DELETE t1 FROM users t1
INNER JOIN users t2
WHERE t1.id > t2.id AND t1.email = t2.email;
Explanation (MySQL)
In MySQL, we use the DELETE statement to remove rows from a table. Here we are using a self join, where the table users is joined with itself. The condition t1.id > t2.id ensures that we keep one copy of the duplicate rows (with the lowest id) and delete the rest. The condition t1.email = t2.email specifies that we want to consider rows as duplicates if they have the same email.
PostgreSQL Solution
DELETE FROM users
WHERE id IN (
SELECT id
FROM (
SELECT id, ROW_NUMBER() OVER( PARTITION BY email ORDER BY id ) AS rnum
FROM users ) t
WHERE t.rnum > 1 );
Explanation (PostgreSQL)
In PostgreSQL, we can use the ROW_NUMBER() function to assign a unique row number to each row within the partition of email. The rows with a row number greater than 1 are the duplicate rows. These duplicate rows are selected in the subquery and the outer DELETE FROM statement removes these rows from the users table.