Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 492 Vote(s) - 3.52 Average
  • 1
  • 2
  • 3
  • 4
  • 5
T-SQL: Deleting all duplicate rows but keeping one

#1
I have a table with a very large amount of rows. Duplicates are not allowed but due to a problem with how the rows were created I know there are some duplicates in this table.
I need to eliminate the extra rows from the perspective of the key columns. Some other columns may have *slightly* different data but I do not care about that. I still need to keep one of these rows however. SELECT DISTINCT won't work because it operates on all columns and I need to suppress duplicates based on the key columns.

How can I delete the extra rows but still keep one efficiently?
Reply

#2
Example query:

DELETE FROM Table
WHERE ID NOT IN
(
SELECT MIN(ID)
FROM Table
GROUP BY Field1, Field2, Field3, ...
)
Here `fields` are column on which you want to group the duplicate rows.
Reply

#3
You didn't say what version you were using, but in SQL 2005 and above, you can use a common table expression with the [OVER Clause][1]. It goes a little something like this:

WITH cte AS (
SELECT[foo], [bar],
row_number() OVER(PARTITION BY foo, bar ORDER BY baz) AS [rn]
FROM TABLE
)
DELETE cte WHERE [rn] > 1

Play around with it and see what you get.

(Edit: In an attempt to be helpful, someone edited the `ORDER BY` clause within the CTE. To be clear, you can order by anything you want here, it needn't be one of the columns returned by the cte. In fact, a common use-case here is that "foo, bar" are the group identifier and "baz" is some sort of time stamp. In order to keep the latest, you'd do `ORDER BY baz desc`)


[1]:

[To see links please register here]

Reply

#4
Here's my twist on it, with a runnable example. **Note** this will only work in the situation where `Id` is unique, and you have duplicate values in other columns.

DECLARE @SampleData AS TABLE (Id int, Duplicate varchar(20))

INSERT INTO @SampleData
SELECT 1, 'ABC' UNION ALL
SELECT 2, 'ABC' UNION ALL
SELECT 3, 'LMN' UNION ALL
SELECT 4, 'XYZ' UNION ALL
SELECT 5, 'XYZ'

DELETE FROM @SampleData WHERE Id IN (
SELECT Id FROM (
SELECT
Id
,ROW_NUMBER() OVER (PARTITION BY [Duplicate] ORDER BY Id) AS [ItemNumber]
-- Change the partition columns to include the ones that make the row distinct
FROM
@SampleData
) a WHERE ItemNumber > 1 -- Keep only the first unique item
)

SELECT * FROM @SampleData

And the results:

Id Duplicate
----------- ---------
1 ABC
3 LMN
4 XYZ

Not sure why that's what I thought of first... definitely not the simplest way to go but it works.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through