首页 > 代码库 > 重复记录(duplicate records)数据的相关操作
重复记录(duplicate records)数据的相关操作
- 创建表和测试数据
/* 表结构 */
DROPTABLEIFEXISTS `t1`;
CREATETABLEIFNOTEXISTS `t1`(
`id` INT(1)NOTNULL AUTO_INCREMENT,
`name` VARCHAR(20)NOTNULL,
`add`VARCHAR(20)NOTNULL,
PRIMARYKEY(`id`)
)Engine=InnoDB;
/* 插入测试数据 */
INSERTINTO `t1`(`name`,`add`)VALUES
(‘abc‘,"123"),
(‘abc‘,"123"),
(‘abc‘,"321"),
(‘abc‘,"123"),
(‘xzy‘,"123"),
(‘xzy‘,"456"),
(‘xzy‘,"456"),
(‘xzy‘,"456"),
(‘xzy‘,"789"),
(‘xzy‘,"987"),
(‘xzy‘,"789"),
(‘ijk‘,"147"),
(‘ijk‘,"147"),
(‘ijk‘,"852"),
(‘opq‘,"852"),
(‘opq‘,"963"),
(‘opq‘,"741"),
(‘tpk‘,"741"),
(‘tpk‘,"963"),
(‘tpk‘,"963"),
(‘wer‘,"546"),
(‘wer‘,"546"),
(‘once‘,"546");
- 查找所有重复的数据
重复数据只显示其中一条:
/* 查找所有重复数据,重复数据只显示一次 */ SELECT * FROM `t1` GROUP BY `name` HAVING COUNT(1) > 1 #HAVING COUNT(1) = 1 #这条语句表示显示其中不重复的数据
效果如下:
------ ------ --------
1 abc 123
12 ijk 147
15 opq 852
18 tpk 741
21 wer 546
5 xzy 123
重复数据每条都显示
SELECT t1.* FROM t1, (SELECT t1.name, t1.add FROM t1 GROUP BY `name`, `add` HAVING COUNT(1) > 1) AS `t2` WHERE `t1`.`name` = `t2`.`name` AND `t1`.`add` = `t2`.`add`
------ ------ --------
1 abc 123
2 abc 123
4 abc 123
6 xzy 456
7 xzy 456
8 xzy 456
9 xzy 789
11 xzy 789
12 ijk 147
13 ijk 147
19 tpk 963
20 tpk 963
21 wer 546
22 wer 546
- 显示所有数据(重复数据只显示一次)
下面的SQL简单快速有效,但不能保证重复字段的records显示哪一个record。
SELECT * FROM t1 GROUP BY NAME #having count(*) > 1 #加上这条语句,重复字段的records只显示一次
- 删除重复数据:
1 最简单快速 的方法:
1. 新建一个一样表结构的表,经重复内容字段设置为unique
2. 然后执行语句:INSERT IGNORE INTO new_table SELECT * FROM old_table
对于本题就是:
1. 新建t1_copy,然后将name,add设置为主键
2.执行INSERT IGNORE INTO t1_copy SELECT * FROM t1 ORDER BY t1.`id`, OK
2 笨方法UNION(数据量大时很慢):
SELECT * FROM t1 GROUP BY `name`, `add` HAVING COUNT(1) > 1 UNION SELECT * FROM t1 GROUP BY `name`, `add` HAVING COUNT(1) = 1 ORDER BY id
注意:
这里表中有一个主键id,若没有主键id,数据量大时会很慢,这样可以人为插入一个主键id。如