Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 419 Vote(s) - 3.57 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How can I remove duplicate rows?

#11
Yet another easy solution can be found at the link pasted [here][1]. This one easy to grasp and seems to be effective for most of the similar problems. It is for SQL Server though but the concept used is more than acceptable.

Here are the relevant portions from the linked page:

Consider this data:

EMPLOYEE_ID ATTENDANCE_DATE
A001 2011-01-01
A001 2011-01-01
A002 2011-01-01
A002 2011-01-01
A002 2011-01-01
A003 2011-01-01

So how can we delete those duplicate data?

First, insert an identity column in that table by using the following code:

ALTER TABLE dbo.ATTENDANCE ADD AUTOID INT IDENTITY(1,1)

Use the following code to resolve it:

DELETE FROM dbo.ATTENDANCE WHERE AUTOID NOT IN (SELECT MIN(AUTOID) _
FROM dbo.ATTENDANCE GROUP BY EMPLOYEE_ID,ATTENDANCE_DATE)


[1]:

[To see links please register here]

Reply

#12
I thought I'd share my solution since it works under special circumstances.
I my case the table with duplicate values did not have a foreign key (because the values were duplicated from another db).


begin transaction
-- create temp table with identical structure as source table
Select * Into #temp From tableName Where 1 = 2

-- insert distinct values into temp
insert into #temp
select distinct *
from tableName

-- delete from source
delete from tableName

-- insert into source from temp
insert into tableName
select *
from #temp

rollback transaction
-- if this works, change rollback to commit and execute again to keep you changes!!

PS: when working on things like this I always use a transaction, this not only ensures everything is executed as a whole, but also allows me to test without risking anything. But off course you should take a backup anyway just to be sure...
Reply

#13
The other way is **Create a new** table with same fields and **with Unique Index**. Then **move all data from old table to new table**. Automatically SQL SERVER ignore (there is also an option about what to do if there will be a duplicate value: ignore, interrupt or sth) duplicate values. So we have the same table without duplicate rows. **If you don't want Unique Index, after the transfer data you can drop it**.

Especially **for larger tables** you may use DTS (SSIS package to import/export data) in order to transfer all data rapidly to your new uniquely indexed table. For 7 million row it takes just a few minute.
Reply

#14
I prefer the subquery\having count(*) > 1 solution to the inner join because I found it easier to read and it was very easy to turn into a SELECT statement to verify what would be deleted before you run it.

--DELETE FROM table1
--WHERE id IN (
SELECT MIN(id) FROM table1
GROUP BY col1, col2, col3
-- could add a WHERE clause here to further filter
HAVING count(*) > 1
--)
Reply

#15
> From the application level (unfortunately). I agree that the proper way to prevent duplication is at the database level through the use of a unique index, but in SQL Server 2005, an index is allowed to be only 900 bytes, and my varchar(2048) field blows that away.

I dunno how well it would perform, but I think you could write a trigger to enforce this, even if you couldn't do it directly with an index. Something like:

-- given a table stories(story_id int not null primary key, story varchar(max) not null)
CREATE TRIGGER prevent_plagiarism
ON stories
after INSERT, UPDATE
AS
DECLARE @cnt AS INT

SELECT @cnt = Count(*)
FROM stories
INNER JOIN inserted
ON ( stories.story = inserted.story
AND stories.story_id != inserted.story_id )

IF @cnt > 0
BEGIN
RAISERROR('plagiarism detected',16,1)

ROLLBACK TRANSACTION
END


Also, varchar(2048) sounds fishy to me (some things in life are 2048 bytes, but it's pretty uncommon); should it really not be varchar(max)?
Reply

#16
DELETE LU
FROM (SELECT *,
Row_number()
OVER (
partition BY col1, col1, col3
ORDER BY rowid DESC) [Row]
FROM mytable) LU
WHERE [row] > 1
Reply

#17
I you want to preview the rows you are about to remove and keep control over which of the duplicate rows to keep. See

[To see links please register here]


with MYCTE as (
SELECT ROW_NUMBER() OVER (
PARTITION BY DuplicateKey1
,DuplicateKey2 -- optional
ORDER BY CreatedAt -- the first row among duplicates will be kept, other rows will be removed
) RN
FROM MyTable
)
DELETE FROM MYCTE
WHERE RN > 1

Reply

#18
Use this

WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Name,Department ORDER BY Name)
As RowNumber,* FROM <table_name>
)
DELETE FROM tblTemp where RowNumber >1
Reply

#19
DELETE
FROM MyTable
WHERE NOT EXISTS (
SELECT min(RowID)
FROM Mytable
WHERE (SELECT RowID
FROM Mytable
GROUP BY Col1, Col2, Col3
))
);
Reply

#20
1. Create new blank table with the same structure

2. Execute query like this

INSERT INTO tc_category1
SELECT *
FROM tc_category
GROUP BY category_id, application_id
HAVING count(*) > 1

3. Then execute this query

INSERT INTO tc_category1
SELECT *
FROM tc_category
GROUP BY category_id, application_id
HAVING count(*) = 1
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through