首页 > 代码库 > 数据库优化(十二)-tpch中Q3-Q5仅仅在sql优化方面的分析

数据库优化(十二)-tpch中Q3-Q5仅仅在sql优化方面的分析

--Q3;

1.1执行计划

EXPLAIN EXTENDED SELECT

L_ORDERKEY, 

SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS REVENUE,

O_ORDERDATE,

O_SHIPPRIORITY

FROM 

CUSTOMER, 

ORDERS, 

LINEITEM 

WHERE

C_MKTSEGMENT = ‘BUILDING‘

AND C_CUSTKEY = O_CUSTKEY

AND L_ORDERKEY = O_ORDERKEY

AND O_ORDERDATE < DATE ‘1995-03-15‘

AND L_SHIPDATE > DATE ‘1995-03-15‘ 

GROUP BY

L_ORDERKEY,

O_ORDERDATE,

O_SHIPPRIORITY

ORDER BY

REVENUE DESC,

O_ORDERDATE;

+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+

| id | select_type | table    | type   | possible_keys                              | key     | key_len | ref                    | rows    | filtered | Extra                                        |

+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+

|  1 | SIMPLE      | ORDERS   | ALL    | PRIMARY,i_o_orderdate,i_o_custkey          | NULL    | NULL    | NULL                   | 1522266 |    50.00 | Using where; Using temporary; Using filesort |

|  1 | SIMPLE      | CUSTOMER | eq_ref | PRIMARY                                    | PRIMARY | 4       | tpch.ORDERS.o_custkey  |       1 |   100.00 | Using where                                  |

|  1 | SIMPLE      | LINEITEM | ref    | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4       | tpch.ORDERS.o_orderkey |       1 |   100.00 | Using where                                  |

+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+

3 rows in set, 1 warning (0.00 sec)

1.2分析

该语句带有分组、排序、聚集操作并存的三表查询操作。对于orders表执行了全表扫描,使用了临时表和文件排序。

1.2.1查看ordrs表的索引情况

mysql> show indexes from orders;

+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

| Table  | Non_unique | Key_name           | Seq_in_index | Column_name    | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |

+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

| orders |          0 | PRIMARY            |            1 | o_orderkey     | A         |     1522266 |     NULL | NULL   |      | BTREE      |         |               |

| orders |          1 | i_o_orderdate      |            1 | o_orderDATE    | A         |        5722 |     NULL | NULL   | YES  | BTREE      |         |               |

| orders |          1 | i_o_custkey        |            1 | o_custkey      | A         |      217466 |     NULL | NULL   | YES  | BTREE      |         |               |

|+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

3 rows in set (0.00 sec)

1.3groupby的优化

为字段列o_shippriority添加索引

MySQL对于GROUPBY的处理,通常采用的方式是扫描整个表、创建一个临时表用以执行分组操作。查询执行计划中出现“Using temporary”字样表示MySQL采用了常规的处理方式。现在为列o_shippriority添加索引,看添加索引后的查询执行情况。

mysql> alter table orders drop index idx_O_SHIPPRIORITY,add index idx_O_SHIPPRIORITY(O_SHIPPRIORITY);

Query OK, 0 rows affected (3.02 sec)

Records: 0  Duplicates: 0  Warnings: 0


再次查看执行计划

+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+

| id | select_type | table    | type   | possible_keys                              | key     | key_len | ref                    | rows    | filtered | Extra                                        |

+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+

|  1 | SIMPLE      | ORDERS   | ALL    | PRIMARY,i_o_orderdate,i_o_custkey          | NULL    | NULL    | NULL                   | 1522266 |    50.00 | Using where; Using temporary; Using filesort |

|  1 | SIMPLE      | CUSTOMER | eq_ref | PRIMARY                                    | PRIMARY | 4       | tpch.ORDERS.o_custkey  |       1 |   100.00 | Using where                                  |

|  1 | SIMPLE      | LINEITEM | ref    | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4       | tpch.ORDERS.o_orderkey |       1 |   100.00 | Using where                                  |

+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+

3 rows in set, 1 warning (0.00 sec)

查看表orders的索引情况:

mysql> show indexes from orders;

+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

| Table  | Non_unique | Key_name           | Seq_in_index | Column_name    | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |

+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

| orders |          0 | PRIMARY            |            1 | o_orderkey     | A         |     1522266 |     NULL | NULL   |      | BTREE      |         |               |

| orders |          1 | i_o_orderdate      |            1 | o_orderDATE    | A         |        5722 |     NULL | NULL   | YES  | BTREE      |         |               |

| orders |          1 | i_o_custkey        |            1 | o_custkey      | A         |      217466 |     NULL | NULL   | YES  | BTREE      |         |               |

| orders |          1 | idx_O_SHIPPRIORITY |            1 | o_shippriority | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |

+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

字段列o_shippriority中添加索引idx_O_SHIPPRIORITY后,Cardinality列的值最小,也就是数据的唯一值的数应该是最少的,查询优化器认为没有必要使用该索引,或许这就是不能使用该索引的原因?

对于group by 的优化的可能性应该是没有了。

1.4order by的优化

查询语句中O_ORDERDATE列上已经存在索引,对于聚集函数sum的优化可以转化为对sum列的优化。查看lineitem表中的是索引。

mysql> show indexes from lineitem;

+----------+------------+-----------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

| Table    | Non_unique | Key_name              | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |

+----------+------------+-----------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

| lineitem |          0 | PRIMARY               |            1 | l_orderkey      | A         |     5976465 |     NULL | NULL   |      | BTREE      |         |               |

| lineitem |          0 | PRIMARY               |            2 | l_linenumber    | A         |     5976465 |     NULL | NULL   |      | BTREE      |         |               |

| lineitem |          1 | i_l_partkey           |            1 | l_partkey       | A         |      398431 |     NULL | NULL   | YES  | BTREE      |         |               |

| lineitem |          1 | i_l_orderkey          |            1 | l_orderkey      | A         |     5976465 |     NULL | NULL   |      | BTREE      |         |               |

| lineitem |          1 | i_l_commitdate        |            1 | l_commitDATE    | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |

| lineitem |          1 | i_l_suppkey_partkey   |            1 | l_partkey       | A         |      351556 |     NULL | NULL   | YES  | BTREE      |         |               |

| lineitem |          1 | i_l_suppkey_partkey   |            2 | l_suppkey       | A         |     1494116 |     NULL | NULL   | YES  | BTREE      |         |               |

| lineitem |          1 | i_l_suppkey           |            1 | l_suppkey       | A         |       21972 |     NULL | NULL   | YES  | BTREE      |         |               |

| lineitem |          1 | i_l_receiptdate       |            1 | l_receiptDATE   | A         |        2584 |     NULL | NULL   | YES  | BTREE      |         |               |

| lineitem |          1 | i_l_orderkey_quantity |            1 | l_orderkey      | A         |     5976465 |     NULL | NULL   |      | BTREE      |         |               |

| lineitem |          1 | i_l_orderkey_quantity |            2 | l_quantity      | A         |     5976465 |     NULL | NULL   | YES  | BTREE      |         |               |

| lineitem |          1 | i_l_extendedprice     |            1 | l_extendedprice | A         |     1992155 |     NULL | NULL   | YES  | BTREE      |         |               |

| lineitem |          1 | idx_l_discount        |            1 | l_discount      | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |

+----------+------------+-----------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

13 rows in set (0.00 sec)

可以看到列l_extendedprice和l_discount(新添加)都有索引,并把sum函数的表达式改为SUM(L_EXTENDEDPRICE * 1 - L_EXTENDEDPRICE * L_DISCOUNT) AS REVENUE

+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+

| id | select_type | table    | type   | possible_keys                              | key     | key_len | ref                    | rows    | filtered | Extra                                        |

+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+

|  1 | SIMPLE      | ORDERS   | ALL    | PRIMARY,i_o_orderdate,i_o_custkey          | NULL    | NULL    | NULL                   | 1522266 |    50.00 | Using where; Using temporary; Using filesort |

|  1 | SIMPLE      | CUSTOMER | eq_ref | PRIMARY                                    | PRIMARY | 4       | tpch.ORDERS.o_custkey  |       1 |   100.00 | Using where                                  |

|  1 | SIMPLE      | LINEITEM | ref    | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4       | tpch.ORDERS.o_orderkey |       1 |   100.00 | Using where                                  |

+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+

3 rows in set, 1 warning (0.00 sec)


mysql> show warnings\G;

*************************** 1. row ***************************

  Level: Note

   Code: 1003

Message: /* select#1 */ select `tpch`.`lineitem`.`l_orderkey` AS `L_ORDERKEY`,sum((`tpch`.`lineitem`.`l_extendedprice` - (`tpch`.`lineitem`.`l_extendedprice` * `tpch`.`lineitem`.`l_discount`))) AS `REVENUE`,`tpch`.`orders`.`o_orderDATE` AS `O_ORDERDATE`,`tpch`.`orders`.`o_shippriority` AS `O_SHIPPRIORITY` from `tpch`.`customer` join `tpch`.`orders` join `tpch`.`lineitem` where ((`tpch`.`lineitem`.`l_orderkey` = `tpch`.`orders`.`o_orderkey`) and (`tpch`.`customer`.`c_custkey` = `tpch`.`orders`.`o_custkey`) and (`tpch`.`customer`.`c_mktsegment` = ‘BUILDING‘) and (`tpch`.`orders`.`o_orderDATE` < DATE‘1995-03-15‘) and (`tpch`.`lineitem`.`l_shipDATE` > DATE‘1995-03-15‘)) group by `tpch`.`lineitem`.`l_orderkey`,`tpch`.`orders`.`o_orderDATE`,`tpch`.`orders`.`o_shippriority` order by `REVENUE` desc,`tpch`.`orders`.`o_orderDATE`

1 row in set (0.00 sec)

1.5综述

综上,通过该语句的分析,仅仅从sql语句优化方面考虑,暂时没有优化的方法。


--Q4;

2.1执行计划

EXPLAIN EXTENDED SELECT

O_ORDERPRIORITY, 

COUNT(*) AS ORDER_COUNT

FROM

ORDERS

WHERE

O_ORDERDATE >= DATE ‘1993-07-01‘

AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH

AND EXISTS ( 

SELECT

FROM

LINEITEM

WHERE

L_ORDERKEY = O_ORDERKEY

AND L_COMMITDATE < L_RECEIPTDATE 

)

GROUP BY

O_ORDERPRIORITY 

ORDER BY

O_ORDERPRIORITY;

+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+

| id | select_type        | table    | type  | possible_keys                              | key           | key_len | ref                    | rows   | filtered | Extra                                                               |

+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+

|  1 | PRIMARY            | ORDERS   | range | i_o_orderdate                              | i_o_orderdate | 4       | NULL                   | 117768 |   100.00 | Using index condition; Using where; Using temporary; Using filesort |

|  2 | DEPENDENT SUBQUERY | LINEITEM | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY       | 4       | tpch.ORDERS.o_orderkey |      1 |   100.00 | Using where                                                         |

+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+

2 rows in set, 2 warnings (0.00 sec)


mysql> show warnings\G;

*************************** 1. row ***************************

  Level: Note

   Code: 1276

Message: Field or reference ‘tpch.ORDERS.o_orderkey‘ of SELECT #2 was resolved in SELECT #1

*************************** 2. row ***************************

  Level: Note

   Code: 1003

Message: /* select#1 */ select `tpch`.`orders`.`o_orderpriority` AS `O_ORDERPRIORITY`,count(0) AS `ORDER_COUNT` from `tpch`.`orders` where ((`tpch`.`orders`.`o_orderDATE` >= DATE‘1993-07-01‘) and (`tpch`.`orders`.`o_orderDATE` < (DATE‘1993-07-01‘ + interval ‘3‘ month)) and exists(/* select#2 */ select 1 from `tpch`.`lineitem` where ((`tpch`.`lineitem`.`l_orderkey` = `tpch`.`orders`.`o_orderkey`) and (`tpch`.`lineitem`.`l_commitDATE` < `tpch`.`lineitem`.`l_receiptDATE`)))) group by `tpch`.`orders`.`o_orderpriority` order by `tpch`.`orders`.`o_orderpriority`

2 rows in set (0.00 sec)

2.2分析

该语句带有分组、排序、聚集操作、子查询并存的单表查询操作。子查询是相关子查询。

2.2.1对于分组和排序的优化

对于分组和排序的优化,首先想到的是给涉及的列添加索引。

mysql> alter table orders add index idx_o_orderpriority(o_orderpriority);

Query OK, 0 rows affected (1 min 9.11 sec)

Records: 0  Duplicates: 0  Warnings: 0

+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+

| id | select_type        | table    | type  | possible_keys                              | key           | key_len | ref                    | rows   | filtered | Extra                                                               |

+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+

|  1 | PRIMARY            | ORDERS   | range | i_o_orderdate,idx_o_orderpriority          | i_o_orderdate | 4       | NULL                   | 117768 |   100.00 | Using index condition; Using where; Using temporary; Using filesort |

|  2 | DEPENDENT SUBQUERY | LINEITEM | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY       | 4       | tpch.ORDERS.o_orderkey |      1 |   100.00 | Using where                                                         |

+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+

2 rows in set, 2 warnings (0.00 sec)


mysql> show warnings\G;                                                                                                                                                                                                                                              

*************************** 1. row ***************************

  Level: Note

   Code: 1276

Message: Field or reference ‘tpch.ORDERS.o_orderkey‘ of SELECT #2 was resolved in SELECT #1

*************************** 2. row ***************************

  Level: Note

   Code: 1003

Message: /* select#1 */ select `tpch`.`orders`.`o_orderpriority` AS `O_ORDERPRIORITY`,count(0) AS `ORDER_COUNT` from `tpch`.`orders` where ((`tpch`.`orders`.`o_orderDATE` >= DATE‘1993-07-01‘) and (`tpch`.`orders`.`o_orderDATE` < (DATE‘1993-07-01‘ + interval ‘3‘ month)) and exists(/* select#2 */ select 1 from `tpch`.`lineitem` where ((`tpch`.`lineitem`.`l_orderkey` = `tpch`.`orders`.`o_orderkey`) and (`tpch`.`lineitem`.`l_commitDATE` < `tpch`.`lineitem`.`l_receiptDATE`)))) group by `tpch`.`orders`.`o_orderpriority` order by `tpch`.`orders`.`o_orderpriority`

2 rows in set (0.00 sec)

从执行计划可以看出对于查询没有任何改观。再查看一下该表的索引情况:

mysql> show indexes from orders;

+--------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

| Table  | Non_unique | Key_name            | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |

+--------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

| orders |          0 | PRIMARY             |            1 | o_orderkey      | A         |     1445997 |     NULL | NULL   |      | BTREE      |         |               |

| orders |          1 | i_o_orderdate       |            1 | o_orderDATE     | A         |        5738 |     NULL | NULL   | YES  | BTREE      |         |               |

| orders |          1 | i_o_custkey         |            1 | o_custkey       | A         |      206571 |     NULL | NULL   | YES  | BTREE      |         |               |

| orders |          1 | idx_O_SHIPPRIORITY  |            1 | o_shippriority  | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |

| orders |          1 | idx_o_orderpriority |            1 | o_orderpriority | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |

+--------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

5 rows in set (0.00 sec)

字段列o_orderpriority中添加索引idx_o_orderpriority后,Cardinality列的值依然是最小,也就是数据的唯一值的数应该是最少的,查询优化器依旧认为没有必要使用该索引。

如果强制性使用idx_o_orderpriority,看看执行计划的情况:

EXPLAIN EXTENDED SELECT

O_ORDERPRIORITY, 

COUNT(*) AS ORDER_COUNT

FROM

#ORDERS

ORDERS force index(idx_o_orderpriority)

WHERE

O_ORDERDATE >= DATE ‘1993-07-01‘

AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH

AND EXISTS ( 

SELECT

FROM

LINEITEM

WHERE

L_ORDERKEY = O_ORDERKEY

AND L_COMMITDATE < L_RECEIPTDATE 

)

GROUP BY

O_ORDERPRIORITY 

ORDER BY

O_ORDERPRIORITY;

+----+--------------------+----------+-------+--------------------------------------------+---------------------+---------+------------------------+---------+----------+-------------+

| id | select_type        | table    | type  | possible_keys                              | key                 | key_len | ref                    | rows    | filtered | Extra       |

+----+--------------------+----------+-------+--------------------------------------------+---------------------+---------+------------------------+---------+----------+-------------+

|  1 | PRIMARY            | ORDERS   | index | idx_o_orderpriority                        | idx_o_orderpriority | 16      | NULL                   | 1445997 |   100.00 | Using where |

|  2 | DEPENDENT SUBQUERY | LINEITEM | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY             | 4       | tpch.ORDERS.o_orderkey |       1 |   100.00 | Using where |

+----+--------------------+----------+-------+--------------------------------------------+---------------------+---------+------------------------+---------+----------+-------------+

2 rows in set, 2 warnings (0.01 sec)

可以看到,扫描的行数(1445997)却比不适用该索引多扫描(117768 )了100多万行。

因此对于分组和排序的优化余地应该不存在。

2.2.2尝试使用mrr算法的优化

mysql> show variables like ‘optimizer_switch‘\G;

*************************** 1. row ***************************

Variable_name: optimizer_switch

        Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=off,mrr_cost_based=on,block_nested_loop=on,batched_key_access=on,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on

mysql> set optimizer_switch=‘mrr=on,mrr_cost_based=off‘;

Query OK, 0 rows affected (0.00 sec)

再次查看执行计划:

+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+--------------------------------------------------------------------------------+

| id | select_type        | table    | type  | possible_keys                              | key           | key_len | ref                    | rows   | Extra                                                                          |

+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+--------------------------------------------------------------------------------+

|  1 | PRIMARY            | ORDERS   | range | i_o_orderdate,idx_o_orderpriority          | i_o_orderdate | 4       | NULL                   | 117768 | Using index condition; Using where; Using MRR; Using temporary; Using filesort |

|  2 | DEPENDENT SUBQUERY | LINEITEM | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY       | 4       | tpch.ORDERS.o_orderkey |      1 | Using where                                                                    |

+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+--------------------------------------------------------------------------------+

2 rows in set (0.00 sec)

优化目的是减少磁盘的随机访问,其前提是使用索引(i_o_orderdate),找到所需的行,并按照主键进行排序,并将随机访问转换为较为顺序的数据访问,其优化使用于range、ref和eq_ref类型的查询,也就是说非这些类型的查询,优化也就失去了作用:

2.3综述

通过该语句的分析,仅仅从sql语句优化方面考虑,增加不必要的索引,除了维护上不必要的开销外,对优化没有任何意义;

可以通过mrr算法进行优化,以减少对IO的访问,提高查询效率。

--Q5;

3.1执行计划

EXPLAIN EXTENDED SELECT

N_NAME,

SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS REVENUE

FROM

CUSTOMER,

ORDERS,

LINEITEM,

SUPPLIER,

NATION,

REGION

WHERE

C_CUSTKEY = O_CUSTKEY

AND L_ORDERKEY = O_ORDERKEY

AND L_SUPPKEY = S_SUPPKEY

AND C_NATIONKEY = S_NATIONKEY

AND S_NATIONKEY = N_NATIONKEY

AND N_REGIONKEY = R_REGIONKEY

AND R_NAME = ‘ASIA‘

AND O_ORDERDATE >= DATE ‘1994-01-01‘

AND O_ORDERDATE < DATE ‘1994-01-01‘ + INTERVAL ‘1‘ YEAR

GROUP BY 

N_NAME

ORDER BY

REVENUE DESC;

+----+-------------+----------+--------+--------------------------------------------------------+---------------+---------+--------------------------+------+----------+-----------------------------------------------------+

| id | select_type | table    | type   | possible_keys                                          | key           | key_len | ref                      | rows | filtered | Extra                                               |

+----+-------------+----------+--------+--------------------------------------------------------+---------------+---------+--------------------------+------+----------+-----------------------------------------------------+

|  1 | SIMPLE      | REGION   | ALL    | PRIMARY                                                | NULL          | NULL    | NULL                     |    5 |   100.00 | Using where; Using temporary; Using filesort        |

|  1 | SIMPLE      | NATION   | ref    | PRIMARY,i_n_regionkey,idx_n_name                       | i_n_regionkey | 5       | tpch.REGION.r_regionkey  |    2 |   100.00 | Using join buffer (Batched Key Access)              |

|  1 | SIMPLE      | SUPPLIER | ref    | i_s_nationkey                                          | i_s_nationkey | 5       | tpch.NATION.n_nationkey  |  221 |   100.00 | Using where; Using join buffer (Batched Key Access) |

|  1 | SIMPLE      | LINEITEM | ref    | PRIMARY,i_l_orderkey,i_l_suppkey,i_l_orderkey_quantity | i_l_suppkey   | 5       | tpch.SUPPLIER.s_suppkey  |  272 |   100.00 | Using join buffer (Batched Key Access)              |

|  1 | SIMPLE      | ORDERS   | eq_ref | PRIMARY,i_o_orderdate,i_o_custkey                      | PRIMARY       | 4       | tpch.LINEITEM.l_orderkey |    1 |   100.00 | Using where                                         |

|  1 | SIMPLE      | CUSTOMER | eq_ref | PRIMARY,i_c_nationkey                                  | PRIMARY       | 4       | tpch.ORDERS.o_custkey    |    1 |   100.00 | Using where                                         |

+----+-------------+----------+--------+--------------------------------------------------------+---------------+---------+--------------------------+------+----------+-----------------------------------------------------+

6 rows in set, 1 warning (0.01 sec)

mysql> show warnings\G;

*************************** 1. row ***************************

  Level: Note

   Code: 1003

Message: /* select#1 */ select `tpch`.`nation`.`n_name` AS `N_NAME`,sum((`tpch`.`lineitem`.`l_extendedprice` * (1 - `tpch`.`lineitem`.`l_discount`))) AS `REVENUE` from `tpch`.`customer` join `tpch`.`orders` join `tpch`.`lineitem` join `tpch`.`supplier` join `tpch`.`nation` join `tpch`.`region` where ((`tpch`.`region`.`r_name` = ‘ASIA‘) and (`tpch`.`nation`.`n_regionkey` = `tpch`.`region`.`r_regionkey`) and (`tpch`.`supplier`.`s_nationkey` = `tpch`.`nation`.`n_nationkey`) and (`tpch`.`customer`.`c_nationkey` = `tpch`.`nation`.`n_nationkey`) and (`tpch`.`lineitem`.`l_suppkey` = `tpch`.`supplier`.`s_suppkey`) and (`tpch`.`orders`.`o_orderkey` = `tpch`.`lineitem`.`l_orderkey`) and (`tpch`.`customer`.`c_custkey` = `tpch`.`orders`.`o_custkey`) and (`tpch`.`orders`.`o_orderDATE` >= DATE‘1994-01-01‘) and (`tpch`.`orders`.`o_orderDATE` < <cache>((DATE‘1994-01-01‘ + interval ‘1‘ year)))) group by `tpch`.`nation`.`n_name` order by `REVENUE` desc

1 row in set (0.00 sec)

3.2分析

优化使用了Batched Key Access算法和<cache>方法,另外,通过对表region的全表扫描,使用了临时表和文件排序。通过对前面Q3、Q4的分析,order by无优化的可能,而对于n_name列,由于其数据量很少,添加索引完全没有必要。

尝试去掉表连接:

where条件:

AND C_NATIONKEY = S_NATIONKEY

AND S_NATIONKEY = N_NATIONKEY

更改为AND C_NATIONKEY = N_NATIONKEY

查看执行计划:

+----+-------------+----------+--------+--------------------------------------------------------+-------------+---------+---------------------------+-------+----------+----------------------------------------------------+

| id | select_type | table    | type   | possible_keys                                          | key         | key_len | ref                       | rows  | filtered | Extra                                              |

+----+-------------+----------+--------+--------------------------------------------------------+-------------+---------+---------------------------+-------+----------+----------------------------------------------------+

|  1 | SIMPLE      | SUPPLIER | ALL    | NULL                                                   | NULL        | NULL    | NULL                      | 10176 |   100.00 | Using where; Using temporary; Using filesort       |

|  1 | SIMPLE      | LINEITEM | ref    | PRIMARY,i_l_orderkey,i_l_suppkey,i_l_orderkey_quantity | i_l_suppkey | 5       | tpch.SUPPLIER.s_suppkey   |   272 |   100.00 | Using join buffer (Batched Key Access)             |

|  1 | SIMPLE      | ORDERS   | eq_ref | PRIMARY,i_o_orderdate,i_o_custkey                      | PRIMARY     | 4       | tpch.LINEITEM.l_orderkey  |     1 |   100.00 | Using where                                        |

|  1 | SIMPLE      | CUSTOMER | eq_ref | PRIMARY,i_c_nationkey                                  | PRIMARY     | 4       | tpch.ORDERS.o_custkey     |     1 |   100.00 | Using where                                        |

|  1 | SIMPLE      | NATION   | eq_ref | PRIMARY,i_n_regionkey,idx_n_name                       | PRIMARY     | 4       | tpch.CUSTOMER.c_nationkey |     1 |   100.00 | NULL                                               |

|  1 | SIMPLE      | REGION   | ALL    | PRIMARY                                                | NULL        | NULL    | NULL                      |     5 |    80.00 | Using where; Using join buffer (Block Nested Loop) |

+----+-------------+----------+--------+--------------------------------------------------------+-------------+---------+---------------------------+-------+----------+----------------------------------------------------+

6 rows in set, 1 warning (0.11 sec)

mysql> show warnings\G;

*************************** 1. row ***************************

  Level: Note

   Code: 1003

Message: /* select#1 */ select `tpch`.`nation`.`n_name` AS `N_NAME`,sum((`tpch`.`lineitem`.`l_extendedprice` * (1 - `tpch`.`lineitem`.`l_discount`))) AS `REVENUE` from `tpch`.`customer` join `tpch`.`orders` join `tpch`.`lineitem` join `tpch`.`supplier` join `tpch`.`nation` join `tpch`.`region` where ((`tpch`.`region`.`r_name` = ‘ASIA‘) and (`tpch`.`region`.`r_regionkey` = `tpch`.`nation`.`n_regionkey`) and (`tpch`.`nation`.`n_nationkey` = `tpch`.`customer`.`c_nationkey`) and (`tpch`.`lineitem`.`l_suppkey` = `tpch`.`supplier`.`s_suppkey`) and (`tpch`.`orders`.`o_orderkey` = `tpch`.`lineitem`.`l_orderkey`) and (`tpch`.`customer`.`c_custkey` = `tpch`.`orders`.`o_custkey`) and (`tpch`.`orders`.`o_orderDATE` >= DATE‘1994-01-01‘) and (`tpch`.`orders`.`o_orderDATE` < <cache>((DATE‘1994-01-01‘ + interval ‘1‘ year)))) group by `tpch`.`nation`.`n_name` order by `REVENUE` desc

1 row in set (0.00 sec)

源语句对于表REGION全表扫描,影响的行数为5,而改造后的语句对于表supplier的全表扫描,影响的行数为10176。   

3.3综述

对于SQL语句来说,仅仅从语句方面来说,优化的空间很小,怎么与物理设备和数据库的参数结合起来共同考虑,确实值得探讨。


数据库优化(十二)-tpch中Q3-Q5仅仅在sql优化方面的分析