SQL Cookbook Lesson

How to eliminate duplicate rows in SQL

Learn how to eliminate duplicate rows in SQL with examples and explanations from SQLPad.

Problem

You have a table in your SQL database that contains duplicate rows, and you want to remove these duplicates to clean up your data.

Sample Data

Let's consider a sample table users with the following data:

id name email
1 John [email protected]
2 Jane [email protected]
3 John [email protected]
4 Alice [email protected]
5 John [email protected]

In the above table, we have duplicate rows with id 1, 3, and 5.

MySQL Solution

DELETE t1 FROM users t1
INNER JOIN users t2 
WHERE t1.id > t2.id AND t1.email = t2.email;

Explanation (MySQL)

In MySQL, we use the DELETE statement to remove rows from a table. Here we are using a self join, where the table users is joined with itself. The condition t1.id > t2.id ensures that we keep one copy of the duplicate rows (with the lowest id) and delete the rest. The condition t1.email = t2.email specifies that we want to consider rows as duplicates if they have the same email.

PostgreSQL Solution

DELETE FROM users
WHERE id IN (
  SELECT id
  FROM (
    SELECT id, ROW_NUMBER() OVER( PARTITION BY email ORDER BY id ) AS rnum
    FROM users ) t
  WHERE t.rnum > 1 );

Explanation (PostgreSQL)

In PostgreSQL, we can use the ROW_NUMBER() function to assign a unique row number to each row within the partition of email. The rows with a row number greater than 1 are the duplicate rows. These duplicate rows are selected in the subquery and the outer DELETE FROM statement removes these rows from the users table.

Keep Practicing

Related Cookbook Lessons

View all lessons

Interview Prep

Begin Your SQL, Python, and R Journey

Master 230 interview-style coding questions and build the data skills needed for analyst, scientist, and engineering roles.