首页 > 代码库 > Postgresql 数据在一张表中存在,另一张表不满足完整性的查找

Postgresql 数据在一张表中存在,另一张表不满足完整性的查找

有两张表T1,T2,表结构和数据如下:
create table t1 (
  id int
);

create table t2 (
  id int,
  finished int
);

insert into t1 values (1);
insert into t1 values (2);
insert into t1 values (3);
insert into t1 values (4);
insert into t1 values (5);
insert into t1 values (6);

insert into t2 values (1,1);
insert into t2 values (2,0);
insert into t2 values (3,1);
insert into t2 values (4,0);
SQL Fiddle 可以测试SQL语句的执行。
想要实现T1中存在前提下,T2中不存在或者finished=0,也就是查询结果是:2, 4, 5, 6.

一、性能测试:

1. not in:
explain (analyze,verbose,costs,buffers) select ID from T1 where ID not in (select ID from T2 where finished=1);
Total runtime: 0.128 ms

2. not exists:
explain (analyze,verbose,costs,buffers) select ID from T1 where not exists (select 1 from T2 where T1.ID=T2.ID and T2.finished=1);
Total runtime: 0.105 ms

3. left join:
explain (analyze,verbose,costs,buffers) select T1.ID from T1 left join T2 on T1.ID=T2.ID and T2.finished=1 where T2.ID is null;
Total runtime: 0.096 ms

4. 网上还看到一种更快方法,但测试下来此方法不对,所以不讨论:
select ID from T2 where (select count(1) from T1 where T1.ID=T2.ID) = 0; 这条语句查询结果为空


因此在postgresql 9.3 上语句执行速度 left join > not exists > not in
当T1和T2表中ID出现null时,not in 语句会有不同的表现,所以推荐总是用not exists 代替 not in.



二、大数据量性能测试:

在大量数据的时候,not in有严重性能下降的问题,下面是我在i5 2.4GHz MAC pro 13吋上的测试。
department(T1) 为59280条数据,数据长度29字符;dept(T2) 为23633条数据,数据长度29字符。

1. explain analyze select department.id from department where department.id not in (select id from dept where finished=true);
Total runtime: 447073.065 ms

2. explain analyze select department.id from department where not exists (select 1 from dept where department.id=dept.id and finished=true);
Total runtime: 325.732 ms

3. explain analyze select department.id from department left join dept on department.id=dept.id and dept.finished=true where dept.id is null;
Total runtime: 319.869 ms


三、总结:

在Postgresql 9.3上:
not in 不仅性能差,而且逻辑可能有问题。
not exists 性能不错,思考起来比较容易。
left join 性能最好,但总体跟not exists 比也快不了多少,思考稍微有点绕。


下面是一张网上的left join 的图,但找不到出处,有助于理解 left join 的过程:

技术分享

Postgresql 数据在一张表中存在,另一张表不满足完整性的查找