There are two types of duplicate data , One is the repetition of some fields in the table , One is that more than two lines of records are exactly the same .
- Repetition of some fields :
Query non duplicate data SQL:
select Field 1, Field 2,count(*) from Table name group by Field 1, Field 2 having count(*) = 1
Deletion of duplicate data in some fields :
Query duplicate data SQL
:
select Field 1, Field 2,count(*) from Table name group by Field 1, Field 2 having count(*) > 1
Delete the duplicate data found above :
delete from Table name a where Field 1, Field 2 in
(select Field 1, Field 2,count(*) from Table name group by Field 1, Field 2 having count(*) > 1)
The above is to delete the queried data , This deletion is inefficient , When the amount of data is large , Not suitable for .
Another method is to insert the duplicate data found in the query into a temporary table , Then delete it , such , When deleting, you don't have to query again .
CREATE TABLE A temporary table AS
(select Field 1, Field 2,count(*) from Table name group by Field 1, Field 2 having count(*) > 1)
-- Create temporary tables , And insert the query data into it .
Delete again :
delete from Table name a where Field 1, Field 2 in (select Field 1, Field 2 from A temporary table );
This operation of creating a temporary table first and then deleting it is much more efficient than deleting it directly with one statement .
The above is to delete all duplicate data , One of the records that does not retain duplicate data .
How to keep one record of duplicate data ?
stay oracle in , A hidden field is rowid, Each record in the table has a unique rowid,
If you want to keep the latest record , You can use this field , Keep duplicate data rowid The biggest record is enough
In the following brackets are the duplicate data found in the query rowid The biggest record , Outside the brackets is the query except rowid Duplicate data other than the largest :
select a.rowid,a.* from Table name a
where a.rowid !=
(
select max(b.rowid) from Table name b
where a. Field 1 = b. Field 1 and
a. Field 2 = b. Field 2
)
To remove duplicate data , Keep only the latest piece of data :
delete from Table name a
where a.rowid !=
(
select max(b.rowid) from Table name b
where a. Field 1 = b. Field 1 and
a. Field 2 = b. Field 2
)
The above is to delete the duplicate data found in the query , Keep the latest record , This deletion is inefficient , When the amount of data is large , Not suitable for .
Consider creating temporary tables , You will need to determine the duplicate fields ,rowid Insert into temporary table , Then compare when deleting .
create table A temporary table as
select a. Field 1,a. Field 2,MAX(a.ROWID) dataid from Official form a GROUP BY a. Field 1,a. Field 2;
delete from Table name a
where a.rowid !=
(
select b.dataid from A temporary table b
where a. Field 1 = b. Field 1 and
a. Field 2 = b. Field 2
);
- Deletion of complete duplicate records :
When two or more rows of records in a table are exactly the same , You can use the following statement to get the record after removing duplicate data :
select distinct * from Table name
Put the queried records into the temporary table , Then delete the original table record , Finally, import the data of the temporary table back to the original table .
CREATE TABLE A temporary table AS (select distinct * from Table name );
truncate table Official form ;
insert into Official form (select * from A temporary table );
drop table A temporary table ;