Login

Yet another easy solution can be found at the link pasted [here][1]. This one easy to grasp and seems to be effective for most of the similar problems. It is for SQL Server though but the concept used is more than acceptable.

Here are the relevant portions from the linked page:

Consider this data:

EMPLOYEE_ID ATTENDANCE_DATE
A001 2011-01-01
A001 2011-01-01
A002 2011-01-01
A002 2011-01-01
A002 2011-01-01
A003 2011-01-01

So how can we delete those duplicate data?

First, insert an identity column in that table by using the following code:

ALTER TABLE dbo.ATTENDANCE ADD AUTOID INT IDENTITY(1,1)

Use the following code to resolve it:

DELETE FROM dbo.ATTENDANCE WHERE AUTOID NOT IN (SELECT MIN(AUTOID) _
FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE)

[1]:

[To see links please register here]

I thought I'd share my solution since it works under special circumstances.
I my case the table with duplicate values did not have a foreign key (because the values were duplicated from another db).

begin transaction
-- create temp table with identical structure as source table
Select * Into #temp From tableName Where 1 = 2

-- insert distinct values into temp
insert into #temp
select distinct *
from tableName

-- delete from source
delete from tableName

-- insert into source from temp
insert into tableName
select *
from #temp

rollback transaction
-- if this works, change rollback to commit and execute again to keep you changes!!

PS: when working on things like this I always use a transaction, this not only ensures everything is executed as a whole, but also allows me to test without risking anything. But off course you should take a backup anyway just to be sure...

The other way is **Create a new** table with same fields and **with Unique Index**. Then **move all data from old table to new table**. Automatically SQL SERVER ignore (there is also an option about what to do if there will be a duplicate value: ignore, interrupt or sth) duplicate values. So we have the same table without duplicate rows. **If you don't want Unique Index, after the transfer data you can drop it**.

Especially **for larger tables** you may use DTS (SSIS package to import/export data) in order to transfer all data rapidly to your new uniquely indexed table. For 7 million row it takes just a few minute.

I prefer the subquery\having count(*) > 1 solution to the inner join because I found it easier to read and it was very easy to turn into a SELECT statement to verify what would be deleted before you run it.

--DELETE FROM table1
--WHERE id IN (
SELECT MIN(id) FROM table1
GROUP BY col1, col2, col3
-- could add a WHERE clause here to further filter
HAVING count(*) > 1
--)

> From the application level (unfortunately). I agree that the proper way to prevent duplication is at the database level through the use of a unique index, but in SQL Server 2005, an index is allowed to be only 900 bytes, and my varchar(2048) field blows that away.

I dunno how well it would perform, but I think you could write a trigger to enforce this, even if you couldn't do it directly with an index. Something like:

-- given a table stories(story_id int not null primary key, story varchar(max) not null)
CREATE TRIGGER prevent_plagiarism
ON stories
after INSERT, UPDATE
AS
DECLARE @cnt AS INT

SELECT @cnt = Count(*)
FROM stories
INNER JOIN inserted
ON ( stories.story = inserted.story
AND stories.story_id != inserted.story_id )

IF @cnt > 0
BEGIN
RAISERROR('plagiarism detected',16,1)

ROLLBACK TRANSACTION
END

Also, varchar(2048) sounds fishy to me (some things in life are 2048 bytes, but it's pretty uncommon); should it really not be varchar(max)?

DELETE LU
FROM (SELECT *,
Row_number()
OVER (
partition BY col1, col1, col3
ORDER BY rowid DESC) [Row]
FROM mytable) LU
WHERE [row] > 1

I you want to preview the rows you are about to remove and keep control over which of the duplicate rows to keep. See

[To see links please register here]

with MYCTE as (
SELECT ROW_NUMBER() OVER (
PARTITION BY DuplicateKey1
,DuplicateKey2 -- optional
ORDER BY CreatedAt -- the first row among duplicates will be kept, other rows will be removed
) RN
FROM MyTable
)
DELETE FROM MYCTE
WHERE RN > 1

Use this

WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Name,Department ORDER BY Name)
As RowNumber,* FROM <table_name>
)
DELETE FROM tblTemp where RowNumber >1

DELETE
FROM MyTable
WHERE NOT EXISTS (
SELECT min(RowID)
FROM Mytable
WHERE (SELECT RowID
FROM Mytable
GROUP BY Col1, Col2, Col3
))
);

1. Create new blank table with the same structure

2. Execute query like this

INSERT INTO tc_category1
SELECT *
FROM tc_category
GROUP BY category_id, application_id
HAVING count(*) > 1

3. Then execute this query

INSERT INTO tc_category1
SELECT *
FROM tc_category
GROUP BY category_id, application_id
HAVING count(*) = 1

hairdo143750

tharp583

rlustow32232

rochelleafkkxskmt

anguished244

unhackneyed356041

tensity791139

katrinnurmrlibf

earnestobpbamtq

kaichingiypqfnei