Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 419 Vote(s) - 3.57 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How can I remove duplicate rows?

#1
I need to remove duplicate rows from a fairly large SQL Server table (i.e. 300,000+ rows).

The rows, of course, will not be perfect duplicates because of the existence of the `RowID` identity field.


**MyTable**

RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null

How can I do this?
Reply

#2
Here is another good article on [removing duplicates][1].

It discusses why its hard: "*SQL is based on relational algebra, and duplicates cannot occur in relational algebra, because duplicates are not allowed in a set.*"

The temp table solution, and two mysql examples.

In the future are you going to prevent it at a database level, or from an application perspective. I would suggest the database level because your database should be responsible for maintaining referential integrity, developers just will cause problems ;)

[1]:

[To see links please register here]

Reply

#3
I had a table where I needed to preserve non-duplicate rows.
I'm not sure on the speed or efficiency.

DELETE FROM myTable WHERE RowID IN (
SELECT MIN(RowID) AS IDNo FROM myTable
GROUP BY Col1, Col2, Col3
HAVING COUNT(*) = 2 )
Reply

#4
By useing below query we can able to delete duplicate records based on the single column or multiple column. below query is deleting based on two columns. table name is: `testing` and column names `empno,empname`

DELETE FROM testing WHERE empno not IN (SELECT empno FROM (SELECT empno, ROW_NUMBER() OVER (PARTITION BY empno ORDER BY empno)
AS [ItemNumber] FROM testing) a WHERE ItemNumber > 1)
or empname not in
(select empname from (select empname,row_number() over(PARTITION BY empno ORDER BY empno)
AS [ItemNumber] FROM testing) a WHERE ItemNumber > 1)
Reply

#5


CREATE TABLE car(Id int identity(1,1), PersonId int, CarId int)

INSERT INTO car(PersonId,CarId)
VALUES(1,2),(1,3),(1,2),(2,4)

--SELECT * FROM car

;WITH CTE as(
SELECT ROW_NUMBER() over (PARTITION BY personid,carid order by personid,carid) as rn,Id,PersonID,CarId from car)

DELETE FROM car where Id in(SELECT Id FROM CTE WHERE rn>1)
Reply

#6
SELECT DISTINCT *
INTO tempdb.dbo.tmpTable
FROM myTable

TRUNCATE TABLE myTable
INSERT INTO myTable SELECT * FROM tempdb.dbo.tmpTable
DROP TABLE tempdb.dbo.tmpTable
Reply

#7
The following query is useful to delete duplicate rows. The table in this example has `ID` as an identity column and the columns which have duplicate data are `Column1`, `Column2` and `Column3`.

DELETE FROM TableName
WHERE ID NOT IN (SELECT MAX(ID)
FROM TableName
GROUP BY Column1,
Column2,
Column3
/*Even if ID is not null-able SQL Server treats MAX(ID) as potentially
nullable. Because of semantics of NOT IN (NULL) including the clause
below can simplify the plan*/
HAVING MAX(ID) IS NOT NULL)


The following script shows usage of `GROUP BY`, `HAVING`, `ORDER BY` in one query, and returns the results with duplicate column and its count.

SELECT YourColumnName,
COUNT(*) TotalCount
FROM YourTableName
GROUP BY YourColumnName
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC

Reply

#8
I would mention this approach as well as it can be helpful, and works in all SQL servers:
Pretty often there is only one - two duplicates, and Ids and count of duplicates are known. In this case:

SET ROWCOUNT 1 -- or set to number of rows to be deleted
delete from myTable where RowId = DuplicatedID
SET ROWCOUNT 0

Reply

#9
Quick and Dirty to delete exact duplicated rows (for small tables):

select distinct * into t2 from t1;
delete from t1;
insert into t1 select * from t2;
drop table t2;

Reply

#10
Oh sure. Use a temp table. If you want a single, not-very-performant statement that "works" you can go with:

DELETE FROM MyTable WHERE NOT RowID IN
(SELECT
(SELECT TOP 1 RowID FROM MyTable mt2
WHERE mt2.Col1 = mt.Col1
AND mt2.Col2 = mt.Col2
AND mt2.Col3 = mt.Col3)
FROM MyTable mt)

Basically, for each row in the table, the sub-select finds the top RowID of all rows that are exactly like the row under consideration. So you end up with a list of RowIDs that represent the "original" non-duplicated rows.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through