No more must you think about how to find a clean way to delete duplicate records in your database table. The following article will provide a single SQL query that will delete rows that are duplicated in your table. Before you begin, be sure you’ve tracked down the bad code that is causing this; otherwise, you will need to run it multiple times!
To start, I will create a basic table structure that will help understand the SQL query:
- id (Primary Key)
After running with this table for a period of time, users began registering with the same email multiple times. A business logic decision has now been made that determines the email must be unique. Logical solution is to alter the database and add a unique index on the email field.
Uh oh! Now we’ve just received a nice database error indicating that the column contains duplicate records and you must clean them up before you can apply the unique index to that column.
Bring on the SQL! The following query is going to group the duplicate records together and then using the MIN or MAX function, the duplicate rows will be deleted, leaving the id with the minimum or maximum id; depending on the group function you choose.
DELETE Users FROM Users LEFT OUTER JOIN ( SELECT MIN(id) as UniqueId, email FROM Users GROUP BY email ) as RowsToKeep ON Users.id = RowsToKeep.UniqueId WHERE RowsToKeep.UniqueId IS NULL
In the above SQL, only the minimum id will be selected from the users table which means that all of the other rows that contain the same email will be NULL allowing the delete statement to only delete those rows and keeping, in this example, the record with the minimum id. Swapping out the MIN function with the MAX function, would then keep the record with the maximum id instead; typically meaning the last record created versus the first record created.
Leave a Reply
You must be logged in to post a comment.