Saturday, February 25, 2012

Removing duplicate dat

Hi
I have a 1.5 million row table with a text field. Can
anybody think of a way to remove any duplicate rows where the text field
contains the same data (and a
way which will not take days to run!)
For example if I have two rows with "SQL SERVER IS GREAT", I want to remove
one of them.
Thanks
here is an example of something that works for me:
create table textstuff
(pk int not null primary key,
textcol text)
go
--insert statements
declare @.int int
select @.int=max(datalength(textcol)) from textstuff
select pk, checksum=checksum(substring(textcol,1, @.int)) into holding from
textstuff order by 2
select pk, holding.checksum from holding,
(select checksum, test=count(checksum) from holding group by checksum having
count(checksum) >1) as a
where holding.checksum=a.checksum
rows which have identical checksums will show up here.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Paul" <xx@.nospam.com> wrote in message
news:e1O%23GfsuEHA.3416@.TK2MSFTNGP09.phx.gbl...
> Hi
> I have a 1.5 million row table with a text field. Can
> anybody think of a way to remove any duplicate rows where the text field
> contains the same data (and a
> way which will not take days to run!)
> For example if I have two rows with "SQL SERVER IS GREAT", I want to
remove
> one of them.
> Thanks
>
>
|||That's awsome, thanks Hilary
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:u4beTGwuEHA.3808@.TK2MSFTNGP15.phx.gbl...
> here is an example of something that works for me:
> create table textstuff
> (pk int not null primary key,
> textcol text)
> go
> --insert statements
> declare @.int int
> select @.int=max(datalength(textcol)) from textstuff
> select pk, checksum=checksum(substring(textcol,1, @.int)) into holding from
> textstuff order by 2
> select pk, holding.checksum from holding,
> (select checksum, test=count(checksum) from holding group by checksum
having
> count(checksum) >1) as a
> where holding.checksum=a.checksum
>
> rows which have identical checksums will show up here.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
>
> "Paul" <xx@.nospam.com> wrote in message
> news:e1O%23GfsuEHA.3416@.TK2MSFTNGP09.phx.gbl...
> remove
>

No comments:

Post a Comment