Login

I need to remove duplicate rows from a fairly large SQL Server table (i.e. 300,000+ rows).

The rows, of course, will not be perfect duplicates because of the existence of the `RowID` identity field.

**MyTable**

RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null

How can I do this?

Here is another good article on [removing duplicates][1].

It discusses why its hard: "*SQL is based on relational algebra, and duplicates cannot occur in relational algebra, because duplicates are not allowed in a set.*"

The temp table solution, and two mysql examples.

In the future are you going to prevent it at a database level, or from an application perspective. I would suggest the database level because your database should be responsible for maintaining referential integrity, developers just will cause problems ;)

[1]:

[To see links please register here]

I had a table where I needed to preserve non-duplicate rows.
I'm not sure on the speed or efficiency.

DELETE FROM myTable WHERE RowID IN (
SELECT MIN(RowID) AS IDNo FROM myTable
GROUP BY Col1, Col2, Col3
HAVING COUNT(*) = 2 )

By useing below query we can able to delete duplicate records based on the single column or multiple column. below query is deleting based on two columns. table name is: `testing` and column names `empno,empname`

DELETE FROM testing WHERE empno not IN (SELECT empno FROM (SELECT empno, ROW_NUMBER() OVER (PARTITION BY empno ORDER BY empno)
AS [ItemNumber] FROM testing) a WHERE ItemNumber > 1)
or empname not in
(select empname from (select empname,row_number() over(PARTITION BY empno ORDER BY empno)
AS [ItemNumber] FROM testing) a WHERE ItemNumber > 1)

CREATE TABLE car(Id int identity(1,1), PersonId int, CarId int)

INSERT INTO car(PersonId,CarId)
VALUES(1,2),(1,3),(1,2),(2,4)

--SELECT * FROM car

;WITH CTE as(
SELECT ROW_NUMBER() over (PARTITION BY personid,carid order by personid,carid) as rn,Id,PersonID,CarId from car)

DELETE FROM car where Id in(SELECT Id FROM CTE WHERE rn>1)

SELECT DISTINCT *
INTO tempdb.dbo.tmpTable
FROM myTable

TRUNCATE TABLE myTable
INSERT INTO myTable SELECT * FROM tempdb.dbo.tmpTable
DROP TABLE tempdb.dbo.tmpTable

The following query is useful to delete duplicate rows. The table in this example has `ID` as an identity column and the columns which have duplicate data are `Column1`, `Column2` and `Column3`.

DELETE FROM TableName
WHERE ID NOT IN (SELECT MAX(ID)
FROM TableName
GROUP BY Column1,
Column2,
Column3
/*Even if ID is not null-able SQL Server treats MAX(ID) as potentially
nullable. Because of semantics of NOT IN (NULL) including the clause
below can simplify the plan*/
HAVING MAX(ID) IS NOT NULL)

The following script shows usage of `GROUP BY`, `HAVING`, `ORDER BY` in one query, and returns the results with duplicate column and its count.

SELECT YourColumnName,
COUNT(*) TotalCount
FROM YourTableName
GROUP BY YourColumnName
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC

I would mention this approach as well as it can be helpful, and works in all SQL servers:
Pretty often there is only one - two duplicates, and Ids and count of duplicates are known. In this case:

SET ROWCOUNT 1 -- or set to number of rows to be deleted
delete from myTable where RowId = DuplicatedID
SET ROWCOUNT 0

Quick and Dirty to delete exact duplicated rows (for small tables):

select distinct * into t2 from t1;
delete from t1;
insert into t1 select * from t2;
drop table t2;

Oh sure. Use a temp table. If you want a single, not-very-performant statement that "works" you can go with:

DELETE FROM MyTable WHERE NOT RowID IN
(SELECT
(SELECT TOP 1 RowID FROM MyTable mt2
WHERE mt2.Col1 = mt.Col1
AND mt2.Col2 = mt.Col2
AND mt2.Col3 = mt.Col3)
FROM MyTable mt)

Basically, for each row in the table, the sub-select finds the top RowID of all rows that are exactly like the row under consideration. So you end up with a list of RowIDs that represent the "original" non-duplicated rows.

sheliasheline768

mohammadmohammed194

inconsolable59224

pulicides860973

kirigami3233

Mrwoodborer763

laparohysterotomy677361

nodding954685

erroneously481497

fernandinasym