Wednesday, March 7, 2012

removing duplicate rows

I've been given a table that has hundreds of duplicate rows but I'm
having a bit of trouble trying to remove them, leaving just one unique
row, so maybe someone here can shed some light on it...
OK so here's the table spec:
VFEATURE
{
idVFeature unique identity
idFeature integer
idVehicle integer
}
There should only be ONE instance of each idFeature, idVehicle pair.
At the moment there are 100s.
I run the following query to identify which 'idFeature' values are
appearing more than once for each 'idVehicle'...
SELECT idFeature, idVehicle, COUNT(*) AS Expr1
FROM VFEATURE
GROUP BY idFeature, idVehicle
HAVING (COUNT(*) > 1)
Now that I have that... I've tried extracting all rows except one for
each duplicate, but with no success yet.
Anyone done something like this before?
Thanks,
Peter
--
"I hear ma train a comin'
... hear freedom comin"Check out: http://support.microsoft.com/?kbid=139444 for some techniques on
the same.
--
HTH,
SriSamp
Email: srisamp@.gmail.com
Blog: http://blogs.sqlxml.org/srinivassampath
URL: http://www32.brinkster.com/srisamp
"Stimp" <ren@.spumco.com> wrote in message
news:slrndngtd0.ipj.ren@.carbon.redbrick.dcu.ie...
> I've been given a table that has hundreds of duplicate rows but I'm
> having a bit of trouble trying to remove them, leaving just one unique
> row, so maybe someone here can shed some light on it...
> OK so here's the table spec:
> VFEATURE
> {
> idVFeature unique identity
> idFeature integer
> idVehicle integer
> }
> There should only be ONE instance of each idFeature, idVehicle pair.
> At the moment there are 100s.
> I run the following query to identify which 'idFeature' values are
> appearing more than once for each 'idVehicle'...
> SELECT idFeature, idVehicle, COUNT(*) AS Expr1
> FROM VFEATURE
> GROUP BY idFeature, idVehicle
> HAVING (COUNT(*) > 1)
> Now that I have that... I've tried extracting all rows except one for
> each duplicate, but with no success yet.
> Anyone done something like this before?
> Thanks,
> Peter
> --
> "I hear ma train a comin'
> ... hear freedom comin"
>|||On Mon, 14 Nov 2005 Stimp <ren@.spumco.com> wrote:
> Anyone done something like this before?
I actually figured it out after posting.. don't you hate it when that
happens...
DELETE FROM VFeature
WHERE idVFeature IN
(SELECT MIN(idVFeature)
FROM VFeature
GROUP BY idFeature, idVehicle
HAVING COUNT(*) > 1)
"I hear ma train a comin'
... hear freedom comin"|||IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp
GO
CREATE TABLE #tmp (
idVFeature INT IDENTITY(1,1)
, idFeature INT NOT NULL
, idVehicle INT NOT NULL
, CONSTRAINT PK_#tmp PRIMARY KEY CLUSTERED ( idVFeature )
)
/* create sample data */
INSERT INTO
#tmp ( idFeature , idVehicle )
SELECT
idFeature
, idVehicle
FROM
( SELECT N AS idFeature FROM tblNumbers WHERE N BETWEEN 1 AND 10 ) Features
CROSS JOIN
( SELECT N AS idVehicle FROM tblNumbers WHERE N BETWEEN 1 AND 10 ) Vehicles
CROSS JOIN
( SELECT N AS N FROM tblNumbers WHERE N BETWEEN 1 AND 5 ) Duplicates
ORDER BY
idFeature
, n
, idVehicle
/* remove non-lowest idVFeature for unique idVehicle,idFeature combinations
*/
DELETE FROM
#tmp
FROM
#tmp
LEFT OUTER JOIN
( /* lowest idVFeature for unique idVehicle,idFeature combinations */
SELECT
MIN( idVFeature ) idVFeature
, idVehicle
, idFeature
FROM
#tmp
GROUP BY
idVehicle
, idFeature
) LowestVehicleFeatures
ON
#tmp.idVFeature = LowestVehicleFeatures.idVFeature
AND
#tmp.idVehicle = LowestVehicleFeatures.idVehicle
AND
#tmp.idFeature = LowestVehicleFeatures.idFeature
WHERE
LowestVehicleFeatures.idVFeature IS NULL
GO
/* add unique index to idVehicle , idFeature */
CREATE UNIQUE NONCLUSTERED INDEX UIX_#tmp_VehicleFeatures ON #tmp (
idVehicle , idFeature )
GO
SELECT * FROM #tmp WITH(INDEX(UIX_#tmp_VehicleFeatures)) ORDER BY idVehicle
, idFeature
"Stimp" <ren@.spumco.com> wrote in message
news:slrndngtd0.ipj.ren@.carbon.redbrick.dcu.ie...
> I've been given a table that has hundreds of duplicate rows but I'm
> having a bit of trouble trying to remove them, leaving just one unique
> row, so maybe someone here can shed some light on it...
> OK so here's the table spec:
> VFEATURE
> {
> idVFeature unique identity
> idFeature integer
> idVehicle integer
> }
> There should only be ONE instance of each idFeature, idVehicle pair.
> At the moment there are 100s.
> I run the following query to identify which 'idFeature' values are
> appearing more than once for each 'idVehicle'...
> SELECT idFeature, idVehicle, COUNT(*) AS Expr1
> FROM VFEATURE
> GROUP BY idFeature, idVehicle
> HAVING (COUNT(*) > 1)
> Now that I have that... I've tried extracting all rows except one for
> each duplicate, but with no success yet.
> Anyone done something like this before?
> Thanks,
> Peter
> --
> "I hear ma train a comin'
> ... hear freedom comin"
>|||That would just delete the first instance of the duplicate.
Change "IN (...)" to "NOT IN ( ... )"
and you're sorted.
"Stimp" <ren@.spumco.com> wrote in message
news:slrndngvu1.b55.ren@.carbon.redbrick.dcu.ie...
> On Mon, 14 Nov 2005 Stimp <ren@.spumco.com> wrote:
> I actually figured it out after posting.. don't you hate it when that
> happens...
> DELETE FROM VFeature
> WHERE idVFeature IN
> (SELECT MIN(idVFeature)
> FROM VFeature
> GROUP BY idFeature, idVehicle
> HAVING COUNT(*) > 1)
> --
> "I hear ma train a comin'
> ... hear freedom comin"
>|||On Mon, 14 Nov 2005 Rebecca York <rebecca.york> wrote:
> That would just delete the first instance of the duplicate.
> Change "IN (...)" to "NOT IN ( ... )"
> and you're sorted.
yeah I just ran it 10 times and it cleared out the table :)
Thanks,
Peter

> "Stimp" <ren@.spumco.com> wrote in message
> news:slrndngvu1.b55.ren@.carbon.redbrick.dcu.ie...
>
"I hear ma train a comin'
... hear freedom comin"|||Here is the process:
1. SELECT * INTO #Temp1 FROM [Table1]
2. TRUNCATE TABLE [Table1]
3. CREATE UNIQUE INDEX [Index1] ON [Table1] (Unique Column Names) WITH
IGNORE_DUP_KEY
4. INSERT INTO [Table1] (Column Names) SELECT (Column Names) FROM [Table1]
5. DROP INDEX [Table1].[Index1]
Substitute your own names for those within the [].
Unique Column Names are those columns that uniquely identify each row.
Column Names is the full list of columns in your table.
HTH,
Mike
"Stimp" wrote:

> I've been given a table that has hundreds of duplicate rows but I'm
> having a bit of trouble trying to remove them, leaving just one unique
> row, so maybe someone here can shed some light on it...
> OK so here's the table spec:
> VFEATURE
> {
> idVFeature unique identity
> idFeature integer
> idVehicle integer
> }
> There should only be ONE instance of each idFeature, idVehicle pair.
> At the moment there are 100s.
> I run the following query to identify which 'idFeature' values are
> appearing more than once for each 'idVehicle'...
> SELECT idFeature, idVehicle, COUNT(*) AS Expr1
> FROM VFEATURE
> GROUP BY idFeature, idVehicle
> HAVING (COUNT(*) > 1)
> Now that I have that... I've tried extracting all rows except one for
> each duplicate, but with no success yet.
> Anyone done something like this before?
> Thanks,
> Peter
> --
> "I hear ma train a comin'
> ... hear freedom comin"
>

No comments:

Post a Comment