首页 > 代码库 > 一条sql导致数据库整体性能下降的诊断和解决的全过程

一条sql导致数据库整体性能下降的诊断和解决的全过程

今天早上一来,数据库load就比往常高了许多。想想数据库唯一的变化是昨天早上我曾经重新分析过数据库对象。

[@more@]

发现数据库load很高,首先看top发现没有特别异常的进程,在数据库中适时抓取正在运行的sql也没发现异常(通常运行时间非常短的sql是不能被抓取到的)。询问相关应用程序人员,最近没有变动。检查应用程序日志发现今天早上跟往常也没有过多登陆和操作。基本上可以圈定是在数据库服务器本身上面。

但是这个时候我还没有办法确定到底是哪个应用的哪个查询的问题,因为数百个进程的几十台server连着,我不能去及时的追踪。打算等到10点过去后,抽取8/9/10高峰期的整点的statspack出来,跟上星期的这个时间产生的报告对比看看。

通过对比报告我们发现 CPU TIME 今天一小时内增加了大约1200秒(2,341  - 1,175    )。这是一个重大的变化,很显然是两种可能

1:今天过多地执行了某些sql

2:某些sql的执行计划发生变化导致cpu使用过多

 

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total
Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time                                                        2,341    42.60
db file sequential read                           387,534       2,255    41.04
global cache cr request                           745,170         231     4.21
log file sync                                      98,041         229     4.17
log file parallel write                            96,264         158     2.88

 

Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file sequential read                           346,851       1,606    47.60
CPU time                                                        1,175    34.83
global cache cr request                           731,368         206     6.10
log file sync                                      90,556          91     2.71
db file scattered read                             37,746          90     2.66

 

接下来我对比了 sql 部分内容,发现

  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
     17,899,606       47,667          375.5   55.6  1161.27   1170.22 3481369999
Module: /home/oracle/AlitalkSrv/config/../../AlitalkSrv/
SELECT login_id, to_char(gmt_create, ‘YYYY-MM-DD HH24:MI:SS‘)
  from IM_BlackList     where black_id = :b1

这条sql出现在了今天报告的前列,而以往的报告中该sql根本不排在 buffer  gets 前面位置,显然这条sql消耗了大约  1161.27 秒 cpu  time .检查原来的报告

Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------

        107,937       47,128            2.3    0.8     7.39      6.94 3481369999
Module: /home/oracle/AlitalkSrv/config/../../AlitalkSrv/
SELECT login_id, to_char(gmt_create, ‘YYYY-MM-DD HH24:MI:SS‘)
  from IM_BlackList     where black_id = :b1

我们发现只消耗了  7.39 秒的 cpu time 。

到这个时候我基本可以断定,是由于这个sql没有走索引而走了全表扫描。但是为什么会走全表扫描呢,这是一个问题,接下来我检查了表的索引:

SQL>      select  index_name,column_name from  user_ind_columns where  table_name = ‘IM_BLACKLIST‘;

IM_BLACKLIST_PK                LOGIN_ID
IM_BLACKLIST_PK                BLACK_ID
IM_BLACKLIST_LID_IND           BLACK_ID

很显然存在着 black_id 的单独的索引,应该正常使用才对。于是我在生产库上执行这个sql一看,却发现走了全表扫描。为此我到一个周六的standby的opren  read  only 的数据库上查询了一下该索引字段的histogram(这个时候昨天早上分析对象的日志还没有被应用过去)

sys@OCN> select  COLUMN_NAME ,ENDPOINT_NUMBER, ENDPOINT_VALUE  , ENDPOINT_ACTUAL_VALUE  from  dba_histograms
  2   where  table_name =  ‘IM_BLACKLIST‘ and  column_name = ‘BLACK_ID‘;

COLUMN_NAME                    ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
------------------------------ --------------- -------------- ------------------------------
BLACK_ID                                     0     2.5031E+35
BLACK_ID                                     1     2.6065E+35
BLACK_ID                                     2     2.8661E+35
BLACK_ID                                     3     5.0579E+35
BLACK_ID                                     4     5.0585E+35
BLACK_ID                                     5     5.0585E+35
BLACK_ID                                     6     5.0589E+35
BLACK_ID                                     7     5.0601E+35
BLACK_ID                                     8     5.1082E+35
BLACK_ID                                     9     5.1119E+35
BLACK_ID                                    10     5.1615E+35
BLACK_ID                                    11     5.1616E+35
BLACK_ID                                    12     5.1628E+35
BLACK_ID                                    13     5.1646E+35
BLACK_ID                                    14     5.2121E+35
BLACK_ID                                    15     5.2133E+35
BLACK_ID                                    16     5.2155E+35
BLACK_ID                                    17     5.2662E+35
BLACK_ID                                    18     5.3169E+35
BLACK_ID                                    19     5.3193E+35
BLACK_ID                                    20     5.3686E+35
BLACK_ID                                    21     5.3719E+35
BLACK_ID                                    22     5.4198E+35
BLACK_ID                                    23     5.4206E+35
BLACK_ID                                    24     5.4214E+35
BLACK_ID                                    25     5.4224E+35
BLACK_ID                                    26     5.4238E+35
BLACK_ID                                    27     5.4246E+35
BLACK_ID                                    28     5.4743E+35
BLACK_ID                                    29     5.5244E+35
BLACK_ID                                    30     5.5252E+35
BLACK_ID                                    31     5.5252E+35
BLACK_ID                                    32     5.5272E+35
BLACK_ID                                    33     5.5277E+35
BLACK_ID                                    34     5.5285E+35
BLACK_ID                                    35     5.5763E+35
BLACK_ID                                    36     5.6274E+35
BLACK_ID                                    37     5.6291E+35
BLACK_ID                                    38     5.6291E+35
BLACK_ID                                    39     5.6291E+35
BLACK_ID                                    40     5.6291E+35
BLACK_ID                                    42     5.6311E+35
BLACK_ID                                    43     5.6794E+35
BLACK_ID                                    44     5.6810E+35
BLACK_ID                                    45     5.6842E+35
BLACK_ID                                    46     5.7351E+35
BLACK_ID                                    47     5.8359E+35
BLACK_ID                                    48     5.8887E+35
BLACK_ID                                    49     5.8921E+35
BLACK_ID                                    50     5.9430E+35
BLACK_ID                                    51     5.9913E+35
BLACK_ID                                    52     5.9923E+35
BLACK_ID                                    53     5.9923E+35
BLACK_ID                                    54     5.9931E+35
BLACK_ID                                    55     5.9947E+35
BLACK_ID                                    56     5.9959E+35
BLACK_ID                                    57     6.0428E+35
BLACK_ID                                    58     6.0457E+35
BLACK_ID                                    59     6.0477E+35
BLACK_ID                                    60     6.0479E+35
BLACK_ID                                    61     6.1986E+35
BLACK_ID                                    62     6.1986E+35
BLACK_ID                                    63     6.1994E+35
BLACK_ID                                    64     6.2024E+35
BLACK_ID                                    65     6.2037E+35
BLACK_ID                                    66     6.2521E+35
BLACK_ID                                    67     6.2546E+35
BLACK_ID                                    68     6.3033E+35
BLACK_ID                                    69     6.3053E+35
BLACK_ID                                    70     6.3069E+35
BLACK_ID                                    71     6.3553E+35
BLACK_ID                                    72     6.3558E+35
BLACK_ID                                    73     6.3562E+35
BLACK_ID                                    74     6.3580E+35
BLACK_ID                                    75     1.1051E+36

然后对比了一下当前的histograms


COLUMN_NAME                    ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
------------------------------ --------------- -------------- ------------------------------
BLACK_ID                                     0     1.6715E+35
BLACK_ID                                     1     2.5558E+35
BLACK_ID                                     2     2.7619E+35
BLACK_ID                                     3     2.9185E+35
BLACK_ID                                     4     5.0579E+35
BLACK_ID                                     5     5.0589E+35
BLACK_ID                                     6     5.0601E+35
BLACK_ID                                     7     5.1100E+35
BLACK_ID                                     8     5.1601E+35
BLACK_ID                                     9     5.1615E+35
BLACK_ID                                    10     5.1624E+35
BLACK_ID                                    11     5.1628E+35
BLACK_ID                                    12     5.1642E+35
BLACK_ID                                    13     5.2121E+35
BLACK_ID                                    14     5.2131E+35
BLACK_ID                                    15     5.2155E+35
BLACK_ID                                    16     5.2676E+35
BLACK_ID                                    17     5.3175E+35
BLACK_ID                                    18     5.3684E+35
BLACK_ID                                    19     5.3727E+35
BLACK_ID                                    20     5.4197E+35
BLACK_ID                                    21     5.4200E+35
BLACK_ID                                    22     5.4217E+35
BLACK_ID                                    23     5.4238E+35
BLACK_ID                                    24     5.4244E+35
BLACK_ID                                    25     5.4755E+35
BLACK_ID                                    26     5.5252E+35
BLACK_ID                                    27     5.5252E+35
BLACK_ID                                    28     5.5252E+35
BLACK_ID                                    29     5.5283E+35
BLACK_ID                                    30     5.5771E+35
BLACK_ID                                    31     5.6282E+35
BLACK_ID                                    32     5.6291E+35
BLACK_ID                                    33     5.6291E+35
BLACK_ID                                    34     5.6291E+35
BLACK_ID                                    35     5.6299E+35
BLACK_ID                                    36     5.6315E+35
BLACK_ID                                    37     5.6794E+35
BLACK_ID                                    39     5.6816E+35
BLACK_ID                                    40     5.6842E+35
BLACK_ID                                    41     5.7838E+35
BLACK_ID                                    42     5.8877E+35
BLACK_ID                                    43     5.8917E+35
BLACK_ID                                    44     5.9406E+35
BLACK_ID                                    45     5.9909E+35
BLACK_ID                                    46     5.9923E+35
BLACK_ID                                    47     5.9923E+35
BLACK_ID                                    48     5.9946E+35
BLACK_ID                                    49     5.9950E+35
BLACK_ID                                    50     5.9960E+35
BLACK_ID                                    51     5.9960E+35
BLACK_ID                                    52     5.9960E+35
BLACK_ID                                    53     5.9960E+35
BLACK_ID                                    54     5.9960E+35
BLACK_ID                                    55     5.9960E+35
BLACK_ID                                    56     5.9960E+35
BLACK_ID                                    57     6.0436E+35
BLACK_ID                                    58     6.0451E+35
BLACK_ID                                    59     6.0471E+35
BLACK_ID                                    60     6.1986E+35
BLACK_ID                                    61     6.1998E+35
BLACK_ID                                    62     6.2014E+35
BLACK_ID                                    63     6.2037E+35
BLACK_ID                                    64     6.2521E+35
BLACK_ID                                    65     6.2544E+35
BLACK_ID                                    66     6.3024E+35
BLACK_ID                                    67     6.3041E+35
BLACK_ID                                    68     6.3053E+35
BLACK_ID                                    69     6.3073E+35
BLACK_ID                                    70     6.3558E+35
BLACK_ID                                    71     6.3558E+35
BLACK_ID                                    72     6.3558E+35
BLACK_ID                                    73     6.3558E+35
BLACK_ID                                    74     6.3580E+35
BLACK_ID                                    75     1.1160E+36

我们发现原来的histograms值分布比较均匀,而昨天分析后的值分布就有一些地方是集中的,参考上面红色部分。

于是我再做了个 10053  dump对比,昨天分析之前

通过  alter session set events ‘10053 trace name context forever‘;

然后执行相关的sql 再去看trace文件

 

Table stats    Table: IM_BLACKLIST   Alias: IM_BLACKLIST
  TOTAL ::  CDN: 57477  NBLKS:  374  AVG_ROW_LEN:  38
-- Index stats
  INDEX NAME: IM_BLACKLIST_LID_IND  COL#: 2 
    TOTAL ::  LVLS: 1   #LB: 219  #DK: 17181  LB/K: 1  DB/K: 2  CLUF: 44331
  INDEX NAME: IM_BLACKLIST_PK  COL#: 1 2 
    TOTAL ::  LVLS: 1   #LB: 304  #DK: 57477  LB/K: 1  DB/K: 1  CLUF: 55141
_OPTIMIZER_PERCENT_PARALLEL = 0
***************************************
SINGLE TABLE ACCESS PATH
Column:   BLACK_ID  Col#: 2      Table: IM_BLACKLIST   Alias: IM_BLACKLIST
    NDV: 17181     NULLS: 0         DENS: 5.8204e-05
    NO HISTOGRAM: #BKT: 1 #VAL: 2
  TABLE: IM_BLACKLIST     ORIG CDN: 57477  ROUNDED CDN: 3  CMPTD CDN: 3
  Access path: tsc  Resc:  38  Resp:  38
  Access path: index (equal)
      Index: IM_BLACKLIST_LID_IND
  TABLE: IM_BLACKLIST
      RSC_CPU: 0   RSC_IO: 4
  IX_SEL:  0.0000e+00  TB_SEL:  5.8204e-05
  Skip scan: ss-sel 0  andv 27259  
    ss cost 27259 
    table io scan cost 38 
  Access path: index (no sta/stp keys)
      Index: IM_BLACKLIST_PK
  TABLE: IM_BLACKLIST
      RSC_CPU: 0   RSC_IO: 309
  IX_SEL:  1.0000e+00  TB_SEL:  5.8204e-05
  BEST_CST: 4.00  PATH: 4  Degree:  1
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***********************
Join order[1]: IM_BLACKLIST [IM_BLACKLIST] 
Best so far: TABLE#: 0  CST:          4  CDN:          3  BYTES:         75
Final:
  CST: 4  CDN: 3  RSC: 4  RSP: 4  BYTES: 75
  IO-RSC: 4  IO-RSP: 4  CPU-RSC: 0  CPU-RSP: 0

这是昨天分析之后的

SINGLE TABLE ACCESS PATH
Column:   BLACK_ID  Col#: 2      Table: IM_BLACKLIST   Alias: IM_BLACKLIST
    NDV: 17069     NULLS: 0         DENS: 1.4470e-03
    HEIGHT BALANCED HISTOGRAM: #BKT: 75 #VAL: 75
  TABLE: IM_BLACKLIST     ORIG CDN: 57267  ROUNDED CDN: 83  CMPTD CDN: 83
  Access path: tsc  Resc:  38  Resp:  38
  Access path: index (equal)
      Index: IM_BLACKLIST_LID_IND
  TABLE: IM_BLACKLIST
      RSC_CPU: 0   RSC_IO: 65
  IX_SEL:  0.0000e+00  TB_SEL:  1.4470e-03
  Skip scan: ss-sel 0  andv 27151  
    ss cost 27151 
    table io scan cost 38 
  Access path: index (no sta/stp keys)
      Index: IM_BLACKLIST_PK
  TABLE: IM_BLACKLIST
      RSC_CPU: 0   RSC_IO: 384
  IX_SEL:  1.0000e+00  TB_SEL:  1.4470e-03
  BEST_CST: 38.00  PATH: 2  Degree:  1
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***********************
Join order[1]: IM_BLACKLIST [IM_BLACKLIST] 
Best so far: TABLE#: 0  CST:         38  CDN:         83  BYTES:       2407
Final:
  CST: 38  CDN: 83  RSC: 38  RSP: 38  BYTES: 2407
  IO-RSC: 38  IO-RSP: 38  CPU-RSC: 0  CPU-RSP: 0

我发现分析之前之后全表扫描cost都是38,但是分析之后的根据索引扫描却成为了 65 ,而分析之前是4。

很显然是由于这个查询导致昨天早上分析之后走了全表扫描。于是我再对表进行了分析,只不过这次我没有分析索引字段,而是

analyze table im_blacklist compute  statistics;

这样之后,dbms_histograms 中信息只剩下

COLUMN_NAME                    ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
------------------------------ --------------- -------------- ------------------------------
GMT_CREATE                                   0     2452842.68
GMT_MODIFIED                                 0     2452842.68
LOGIN_ID                                     0     2.5021E+35
BLACK_ID                                     0     1.6715E+35
GMT_CREATE                                   1     2453269.44
GMT_MODIFIED                                 1     2453269.44
LOGIN_ID                                     1     6.3594E+35
BLACK_ID                                     1     1.1160E+36

再执行该sql,就走了索引,从而使得数据库的load降了下来。

 

 

分析这整个过程中,我无法知道oracle的走索引cost  65 是怎么计算出来的,当然是跟 histograms有关,但计算方法我却是不清楚的。

这条sql是 bind  var,但是却走了全表扫描,这是由于920中数据库在对bind  var 的sql进行第一次解析的时候去histograms中窥视了数据分布从而根据cost选择了FTS。然后后面继续执行的sql呢,则不论是否该走索引,都走了FTS。这是920这个版本的特性的弊病。也就是说,这有偶然性因素的存在。 但是对于这个表,我做了分析(不分析索引字段)之后不存在histograms,则sql无论如何都走了索引扫描。

转:http://blog.itpub.net/7509771/viewspace-778742/

一条sql导致数据库整体性能下降的诊断和解决的全过程