Removing duplicates from a table

On 12 May, 2015

T-SQL

By : Dave Jenkins

No Comments

So in my day job I was asked to investigate why some data was not matching up. After a bit of investigation I found a table that had 90-95% of it’s records duplicated, so I needed a way of removing them quickly. That’s where the following code snippet came in handy:

;WITH Duplicates AS (
	SELECT
		FirstCol,
		SecondCol,
		ROW_NUMBER() OVER(PARTITION BY FirstCol, SecondCol ORDER BY DateValue) AS RN
	FROM
		MyTable
)
DELETE	Duplicates
WHERE	RN > 1

;WITH Duplicates AS (

SELECT

FirstCol,

SecondCol,

ROW_NUMBER() OVER(PARTITION BY FirstCol, SecondCol ORDER BY DateValue) AS RN

FROM

MyTable

)

DELETE Duplicates

WHERE RN > 1

It’s a simple CTE that assigns a row number to each record. We can then issue the DELETE command to any row number over 1. If you have a DATETIME column, you could keep the latest value if you wanted.

Tags : CTE Duplicates T-SQL

Previous Post Next Post

About The Author

Dave Jenkins

Hello! I'm Dave Jenkins and I have been working with MS SQL Server for two years. I'm an MCP (70-461) working towards an MCSA. I love working with MS SQL Server and the BI Stack.

Number of Posts : 26

All Posts by : Dave Jenkins