首页 > 代码库 > 灌入大量数据后手工采集统计信息的重要性
灌入大量数据后手工采集统计信息的重要性
1. 创建测试表TBL_STAT,及索引,但不插入记录
SQL> create table TBL_STAT as select * from dba_objects where 1<>1;
Table created.
SQL> create index idx_tbl_stat on tbl_stat (object_id);
Index created.
SQL> select count(*) from tbl_stat;
COUNT(*)
----------
0
Table created.
SQL> create index idx_tbl_stat on tbl_stat (object_id);
Index created.
SQL> select count(*) from tbl_stat;
COUNT(*)
----------
0
2. 检索TBL_STAT的执行计划
SQL> explain plan for select object_name from tbl_stat where object_id = 1;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2448091186
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 79 | 2 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TBL_STAT | 1 | 79 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter("OBJECT_ID"=1)
Note
-----
- dynamic sampling used for this statement
17 rows selected.
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2448091186
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 79 | 2 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TBL_STAT | 1 | 79 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter("OBJECT_ID"=1)
Note
-----
- dynamic sampling used for this statement
17 rows selected.
发现按照索引字段查询使用的是全表扫描。
3. 手工收集TBL_STAT表的统计信息
SQL> exec dbms_stats.gather_table_stats(ownname=>‘DCSOPEN‘, tabname=>‘TBL_STAT‘, estimate_percent=>100);
PL/SQL procedure successfully completed.
4. 再次检索TBL_STAT表
SQL> explain plan for select object_name from tbl_stat where object_id = 1;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3529113932
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
PL/SQL procedure successfully completed.
4. 再次检索TBL_STAT表
SQL> explain plan for select object_name from tbl_stat where object_id = 1;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3529113932
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 79 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TBL_STAT | 1 | 79 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_TBL_STAT | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=1)
14 rows selected.
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 79 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TBL_STAT | 1 | 79 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_TBL_STAT | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=1)
14 rows selected.
发现这次用到了索引范围扫描,说明收集统计信息让Oracle可以选择正确的执行计划路径。
5. 插入100万的测试记录
SQL> begin
2 for i in 1 .. 10 loop
3 insert into tbl_stat select * from dba_objects;
4 commit;
5 end loop;
6 end;
7 /
PL/SQL procedure successfully completed.
2 for i in 1 .. 10 loop
3 insert into tbl_stat select * from dba_objects;
4 commit;
5 end loop;
6 end;
7 /
PL/SQL procedure successfully completed.
SQL> select count(*) from tbl_stat;
COUNT(*)
----------
1190725
COUNT(*)
----------
1190725
6. 查看检索TBL_STAT表的执行计划
SQL> explain plan for select object_name from tbl_stat where object_id = 1;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3529113932
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3529113932
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 79 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TBL_STAT | 1 | 79 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_TBL_STAT | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=1)
14 rows selected.
插入100万记录后,发现仍是索引范围扫描。
| 1 | TABLE ACCESS BY INDEX ROWID| TBL_STAT | 1 | 79 | 1 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_TBL_STAT | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OBJECT_ID"=1)
14 rows selected.
插入100万记录后,发现仍是索引范围扫描。
7. 创建第二个测试表TBL_STAT_2,以及索引
SQL> create table tbl_stat_2 as select * from tbl_stat;
Table created.
SQL> create index idx_tbl_stat_2 on tbl_stat_2 (object_id);
Index created.
SQL> select count(*) from tbl_stat_2;
COUNT(*)
----------
1190725
Table created.
SQL> create index idx_tbl_stat_2 on tbl_stat_2 (object_id);
Index created.
SQL> select count(*) from tbl_stat_2;
COUNT(*)
----------
1190725
8. 检索TBL_STAT和TBL_STAT_2关联查询的执行计划
SQL> explain plan for select a.object_name, b.object_name from tbl_stat a, tbl_stat_2 b where a.object_Id = b.object_id;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 752230886
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 158 | 27 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TBL_STAT_2 | 25 | 1975 | 25 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 158 | 27 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TBL_STAT | 1 | 79 | 2 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|* 4 | INDEX RANGE SCAN | IDX_TBL_STAT_2 | 25 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
4 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
20 rows selected.
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 752230886
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 158 | 27 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TBL_STAT_2 | 25 | 1975 | 25 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 158 | 27 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TBL_STAT | 1 | 79 | 2 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|* 4 | INDEX RANGE SCAN | IDX_TBL_STAT_2 | 25 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
4 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
20 rows selected.
可以看到这里对TBl_STAT使用的是全表扫描,对TBL_STAT_2使用的是索引扫描,表之间是嵌套循环连接。
SQL> explain plan for select a.object_name, b.object_name from tbl_stat_2 a, tbl_stat b where a.object_Id = b.object_id;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 752230886
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 752230886
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 158 | 27 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TBL_STAT_2 | 25 | 1975 | 25 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 158 | 27 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TBL_STAT | 1 | 79 | 2 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID| TBL_STAT_2 | 25 | 1975 | 25 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 158 | 27 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | TBL_STAT | 1 | 79 | 2 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|* 4 | INDEX RANGE SCAN | IDX_TBL_STAT_2 | 25 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
4 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
20 rows selected.
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
4 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
20 rows selected.
即使置换两个表的连接顺序,依旧选择TBL_STAT表是全表扫描,TBL_STAT_2是索引范围扫描,但由于插入记录后未采集过统计信息,两张表的预估记录数现在都是和实际相差较多。
9. 手工采集TBL_STAT的统计信息
SQL> exec dbms_stats.gather_table_stats(ownname=>‘DCSOPEN‘, tabname=>‘TBL_STAT‘, estimate_percent=>100);
PL/SQL procedure successfully completed.
SQL> explain plan for select a.object_name, b.object_name from tbl_stat_2 a, tbl_stat b where a.object_Id = b.object_id;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
PL/SQL procedure successfully completed.
SQL> explain plan for select a.object_name, b.object_name from tbl_stat_2 a, tbl_stat b where a.object_Id = b.object_id;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1789047457
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 29M| 3038M| | 15552 (2)| 00:03:07 |
|* 1 | HASH JOIN | | 29M| 3038M| 47M| 15552 (2)| 00:03:07 |
| 2 | TABLE ACCESS FULL| TBL_STAT | 1190K| 34M| | 3790 (1)| 00:00:46 |
| 3 | TABLE ACCESS FULL| TBL_STAT_2 | 1299K| 97M| | 3645 (1)| 00:00:44 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Note
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
- dynamic sampling used for this statement
19 rows selected.
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 29M| 3038M| | 15552 (2)| 00:03:07 |
|* 1 | HASH JOIN | | 29M| 3038M| 47M| 15552 (2)| 00:03:07 |
| 2 | TABLE ACCESS FULL| TBL_STAT | 1190K| 34M| | 3790 (1)| 00:00:46 |
| 3 | TABLE ACCESS FULL| TBL_STAT_2 | 1299K| 97M| | 3645 (1)| 00:00:44 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Note
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
- dynamic sampling used for this statement
19 rows selected.
发现此时TBL_STAT和TBL_STAT_2的预估行数已经不是1了,而且表之间采用的是全表扫描的哈希连接。
10. 手工采集TBL_STAT_2表的统计信息
SQL> exec dbms_stats.gather_table_stats(ownname=>‘DCSOPEN‘, tabname=>‘TBL_STAT_2‘, estimate_percent=>100);
PL/SQL procedure successfully completed.
SQL> explain plan for select a.object_name, b.object_name from tbl_stat_2 a, tbl_stat b where a.object_Id = b.object_id;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
PL/SQL procedure successfully completed.
SQL> explain plan for select a.object_name, b.object_name from tbl_stat_2 a, tbl_stat b where a.object_Id = b.object_id;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2620555949
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 29M| 1703M| | 12327 (2)| 00:02:28 |
|* 1 | HASH JOIN | | 29M| 1703M| 47M| 12327 (2)| 00:02:28 |
| 2 | TABLE ACCESS FULL| TBL_STAT_2 | 1190K| 34M| | 3644 (1)| 00:00:44 |
| 3 | TABLE ACCESS FULL| TBL_STAT | 1190K| 34M| | 3790 (1)| 00:00:46 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
15 rows selected.
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 29M| 1703M| | 12327 (2)| 00:02:28 |
|* 1 | HASH JOIN | | 29M| 1703M| 47M| 12327 (2)| 00:02:28 |
| 2 | TABLE ACCESS FULL| TBL_STAT_2 | 1190K| 34M| | 3644 (1)| 00:00:44 |
| 3 | TABLE ACCESS FULL| TBL_STAT | 1190K| 34M| | 3790 (1)| 00:00:46 |
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
15 rows selected.
此时TBL_STAT_2表的记录也趋于和实际一致,两表的连接仍是哈希连接。
总结:
1. 表的统计信息收集还是比较重要的一项工作,除了Oracle 10g以后会有自动收集的作业外,也可以手工进行统计信息的收集。
2. 本例中,由于TBL_STAT表灌入100万数据后,未收集统计信息,和TBL_STAT_2表连接采用的是嵌套循环连接,这种连接适用于大表和小表的关联场景,但实际这的两张表数据量相当,且都超过了100万,这样相当于100万*100万次关联,当收集统计信息后,两表连接改为了哈希连接,说明此时Oracle已经知道了表的实际数据量,执行计划也是依据表的实际数据量来做的判断,因此当表灌入大量数据后,建议手工采集统计信息,否则在系统自动采集统计信息之前,可能得到的执行计划就是错的。
灌入大量数据后手工采集统计信息的重要性
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。