首页 > 代码库 > 数据库优化(十二)-tpch中Q3-Q5仅仅在sql优化方面的分析
数据库优化(十二)-tpch中Q3-Q5仅仅在sql优化方面的分析
--Q3;
1.1执行计划
EXPLAIN EXTENDED SELECT
L_ORDERKEY,
SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS REVENUE,
O_ORDERDATE,
O_SHIPPRIORITY
FROM
CUSTOMER,
ORDERS,
LINEITEM
WHERE
C_MKTSEGMENT = ‘BUILDING‘
AND C_CUSTKEY = O_CUSTKEY
AND L_ORDERKEY = O_ORDERKEY
AND O_ORDERDATE < DATE ‘1995-03-15‘
AND L_SHIPDATE > DATE ‘1995-03-15‘
GROUP BY
L_ORDERKEY,
O_ORDERDATE,
O_SHIPPRIORITY
ORDER BY
REVENUE DESC,
O_ORDERDATE;
+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+
| 1 | SIMPLE | ORDERS | ALL | PRIMARY,i_o_orderdate,i_o_custkey | NULL | NULL | NULL | 1522266 | 50.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | CUSTOMER | eq_ref | PRIMARY | PRIMARY | 4 | tpch.ORDERS.o_custkey | 1 | 100.00 | Using where |
| 1 | SIMPLE | LINEITEM | ref | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4 | tpch.ORDERS.o_orderkey | 1 | 100.00 | Using where |
+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
1.2分析
该语句带有分组、排序、聚集操作并存的三表查询操作。对于orders表执行了全表扫描,使用了临时表和文件排序。
1.2.1查看ordrs表的索引情况
mysql> show indexes from orders;
+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| orders | 0 | PRIMARY | 1 | o_orderkey | A | 1522266 | NULL | NULL | | BTREE | | |
| orders | 1 | i_o_orderdate | 1 | o_orderDATE | A | 5722 | NULL | NULL | YES | BTREE | | |
| orders | 1 | i_o_custkey | 1 | o_custkey | A | 217466 | NULL | NULL | YES | BTREE | | |
|+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)
1.3groupby的优化
为字段列o_shippriority添加索引
MySQL对于GROUPBY的处理,通常采用的方式是扫描整个表、创建一个临时表用以执行分组操作。查询执行计划中出现“Using temporary”字样表示MySQL采用了常规的处理方式。现在为列o_shippriority添加索引,看添加索引后的查询执行情况。
mysql> alter table orders drop index idx_O_SHIPPRIORITY,add index idx_O_SHIPPRIORITY(O_SHIPPRIORITY);
Query OK, 0 rows affected (3.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
再次查看执行计划
+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+
| 1 | SIMPLE | ORDERS | ALL | PRIMARY,i_o_orderdate,i_o_custkey | NULL | NULL | NULL | 1522266 | 50.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | CUSTOMER | eq_ref | PRIMARY | PRIMARY | 4 | tpch.ORDERS.o_custkey | 1 | 100.00 | Using where |
| 1 | SIMPLE | LINEITEM | ref | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4 | tpch.ORDERS.o_orderkey | 1 | 100.00 | Using where |
+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
查看表orders的索引情况:
mysql> show indexes from orders;
+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| orders | 0 | PRIMARY | 1 | o_orderkey | A | 1522266 | NULL | NULL | | BTREE | | |
| orders | 1 | i_o_orderdate | 1 | o_orderDATE | A | 5722 | NULL | NULL | YES | BTREE | | |
| orders | 1 | i_o_custkey | 1 | o_custkey | A | 217466 | NULL | NULL | YES | BTREE | | |
| orders | 1 | idx_O_SHIPPRIORITY | 1 | o_shippriority | A | 18 | NULL | NULL | YES | BTREE | | |
+--------+------------+--------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
字段列o_shippriority中添加索引idx_O_SHIPPRIORITY后,Cardinality列的值最小,也就是数据的唯一值的数应该是最少的,查询优化器认为没有必要使用该索引,或许这就是不能使用该索引的原因?
对于group by 的优化的可能性应该是没有了。
1.4order by的优化
查询语句中O_ORDERDATE列上已经存在索引,对于聚集函数sum的优化可以转化为对sum列的优化。查看lineitem表中的是索引。
mysql> show indexes from lineitem;
+----------+------------+-----------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+-----------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| lineitem | 0 | PRIMARY | 1 | l_orderkey | A | 5976465 | NULL | NULL | | BTREE | | |
| lineitem | 0 | PRIMARY | 2 | l_linenumber | A | 5976465 | NULL | NULL | | BTREE | | |
| lineitem | 1 | i_l_partkey | 1 | l_partkey | A | 398431 | NULL | NULL | YES | BTREE | | |
| lineitem | 1 | i_l_orderkey | 1 | l_orderkey | A | 5976465 | NULL | NULL | | BTREE | | |
| lineitem | 1 | i_l_commitdate | 1 | l_commitDATE | A | 18 | NULL | NULL | YES | BTREE | | |
| lineitem | 1 | i_l_suppkey_partkey | 1 | l_partkey | A | 351556 | NULL | NULL | YES | BTREE | | |
| lineitem | 1 | i_l_suppkey_partkey | 2 | l_suppkey | A | 1494116 | NULL | NULL | YES | BTREE | | |
| lineitem | 1 | i_l_suppkey | 1 | l_suppkey | A | 21972 | NULL | NULL | YES | BTREE | | |
| lineitem | 1 | i_l_receiptdate | 1 | l_receiptDATE | A | 2584 | NULL | NULL | YES | BTREE | | |
| lineitem | 1 | i_l_orderkey_quantity | 1 | l_orderkey | A | 5976465 | NULL | NULL | | BTREE | | |
| lineitem | 1 | i_l_orderkey_quantity | 2 | l_quantity | A | 5976465 | NULL | NULL | YES | BTREE | | |
| lineitem | 1 | i_l_extendedprice | 1 | l_extendedprice | A | 1992155 | NULL | NULL | YES | BTREE | | |
| lineitem | 1 | idx_l_discount | 1 | l_discount | A | 18 | NULL | NULL | YES | BTREE | | |
+----------+------------+-----------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
13 rows in set (0.00 sec)
可以看到列l_extendedprice和l_discount(新添加)都有索引,并把sum函数的表达式改为SUM(L_EXTENDEDPRICE * 1 - L_EXTENDEDPRICE * L_DISCOUNT) AS REVENUE
+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+
| 1 | SIMPLE | ORDERS | ALL | PRIMARY,i_o_orderdate,i_o_custkey | NULL | NULL | NULL | 1522266 | 50.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | CUSTOMER | eq_ref | PRIMARY | PRIMARY | 4 | tpch.ORDERS.o_custkey | 1 | 100.00 | Using where |
| 1 | SIMPLE | LINEITEM | ref | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4 | tpch.ORDERS.o_orderkey | 1 | 100.00 | Using where |
+----+-------------+----------+--------+--------------------------------------------+---------+---------+------------------------+---------+----------+----------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
mysql> show warnings\G;
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `tpch`.`lineitem`.`l_orderkey` AS `L_ORDERKEY`,sum((`tpch`.`lineitem`.`l_extendedprice` - (`tpch`.`lineitem`.`l_extendedprice` * `tpch`.`lineitem`.`l_discount`))) AS `REVENUE`,`tpch`.`orders`.`o_orderDATE` AS `O_ORDERDATE`,`tpch`.`orders`.`o_shippriority` AS `O_SHIPPRIORITY` from `tpch`.`customer` join `tpch`.`orders` join `tpch`.`lineitem` where ((`tpch`.`lineitem`.`l_orderkey` = `tpch`.`orders`.`o_orderkey`) and (`tpch`.`customer`.`c_custkey` = `tpch`.`orders`.`o_custkey`) and (`tpch`.`customer`.`c_mktsegment` = ‘BUILDING‘) and (`tpch`.`orders`.`o_orderDATE` < DATE‘1995-03-15‘) and (`tpch`.`lineitem`.`l_shipDATE` > DATE‘1995-03-15‘)) group by `tpch`.`lineitem`.`l_orderkey`,`tpch`.`orders`.`o_orderDATE`,`tpch`.`orders`.`o_shippriority` order by `REVENUE` desc,`tpch`.`orders`.`o_orderDATE`
1 row in set (0.00 sec)
1.5综述
综上,通过该语句的分析,仅仅从sql语句优化方面考虑,暂时没有优化的方法。
--Q4;
2.1执行计划
EXPLAIN EXTENDED SELECT
O_ORDERPRIORITY,
COUNT(*) AS ORDER_COUNT
FROM
ORDERS
WHERE
O_ORDERDATE >= DATE ‘1993-07-01‘
AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH
AND EXISTS (
SELECT
*
FROM
LINEITEM
WHERE
L_ORDERKEY = O_ORDERKEY
AND L_COMMITDATE < L_RECEIPTDATE
)
GROUP BY
O_ORDERPRIORITY
ORDER BY
O_ORDERPRIORITY;
+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | ORDERS | range | i_o_orderdate | i_o_orderdate | 4 | NULL | 117768 | 100.00 | Using index condition; Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | LINEITEM | ref | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4 | tpch.ORDERS.o_orderkey | 1 | 100.00 | Using where |
+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+
2 rows in set, 2 warnings (0.00 sec)
mysql> show warnings\G;
*************************** 1. row ***************************
Level: Note
Code: 1276
Message: Field or reference ‘tpch.ORDERS.o_orderkey‘ of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `tpch`.`orders`.`o_orderpriority` AS `O_ORDERPRIORITY`,count(0) AS `ORDER_COUNT` from `tpch`.`orders` where ((`tpch`.`orders`.`o_orderDATE` >= DATE‘1993-07-01‘) and (`tpch`.`orders`.`o_orderDATE` < (DATE‘1993-07-01‘ + interval ‘3‘ month)) and exists(/* select#2 */ select 1 from `tpch`.`lineitem` where ((`tpch`.`lineitem`.`l_orderkey` = `tpch`.`orders`.`o_orderkey`) and (`tpch`.`lineitem`.`l_commitDATE` < `tpch`.`lineitem`.`l_receiptDATE`)))) group by `tpch`.`orders`.`o_orderpriority` order by `tpch`.`orders`.`o_orderpriority`
2 rows in set (0.00 sec)
2.2分析
该语句带有分组、排序、聚集操作、子查询并存的单表查询操作。子查询是相关子查询。
2.2.1对于分组和排序的优化
对于分组和排序的优化,首先想到的是给涉及的列添加索引。
mysql> alter table orders add index idx_o_orderpriority(o_orderpriority);
Query OK, 0 rows affected (1 min 9.11 sec)
Records: 0 Duplicates: 0 Warnings: 0
+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+
| 1 | PRIMARY | ORDERS | range | i_o_orderdate,idx_o_orderpriority | i_o_orderdate | 4 | NULL | 117768 | 100.00 | Using index condition; Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | LINEITEM | ref | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4 | tpch.ORDERS.o_orderkey | 1 | 100.00 | Using where |
+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+----------+---------------------------------------------------------------------+
2 rows in set, 2 warnings (0.00 sec)
mysql> show warnings\G;
*************************** 1. row ***************************
Level: Note
Code: 1276
Message: Field or reference ‘tpch.ORDERS.o_orderkey‘ of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `tpch`.`orders`.`o_orderpriority` AS `O_ORDERPRIORITY`,count(0) AS `ORDER_COUNT` from `tpch`.`orders` where ((`tpch`.`orders`.`o_orderDATE` >= DATE‘1993-07-01‘) and (`tpch`.`orders`.`o_orderDATE` < (DATE‘1993-07-01‘ + interval ‘3‘ month)) and exists(/* select#2 */ select 1 from `tpch`.`lineitem` where ((`tpch`.`lineitem`.`l_orderkey` = `tpch`.`orders`.`o_orderkey`) and (`tpch`.`lineitem`.`l_commitDATE` < `tpch`.`lineitem`.`l_receiptDATE`)))) group by `tpch`.`orders`.`o_orderpriority` order by `tpch`.`orders`.`o_orderpriority`
2 rows in set (0.00 sec)
从执行计划可以看出对于查询没有任何改观。再查看一下该表的索引情况:
mysql> show indexes from orders;
+--------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| orders | 0 | PRIMARY | 1 | o_orderkey | A | 1445997 | NULL | NULL | | BTREE | | |
| orders | 1 | i_o_orderdate | 1 | o_orderDATE | A | 5738 | NULL | NULL | YES | BTREE | | |
| orders | 1 | i_o_custkey | 1 | o_custkey | A | 206571 | NULL | NULL | YES | BTREE | | |
| orders | 1 | idx_O_SHIPPRIORITY | 1 | o_shippriority | A | 18 | NULL | NULL | YES | BTREE | | |
| orders | 1 | idx_o_orderpriority | 1 | o_orderpriority | A | 18 | NULL | NULL | YES | BTREE | | |
+--------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.00 sec)
字段列o_orderpriority中添加索引idx_o_orderpriority后,Cardinality列的值依然是最小,也就是数据的唯一值的数应该是最少的,查询优化器依旧认为没有必要使用该索引。
如果强制性使用idx_o_orderpriority,看看执行计划的情况:
EXPLAIN EXTENDED SELECT
O_ORDERPRIORITY,
COUNT(*) AS ORDER_COUNT
FROM
#ORDERS
ORDERS force index(idx_o_orderpriority)
WHERE
O_ORDERDATE >= DATE ‘1993-07-01‘
AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH
AND EXISTS (
SELECT
*
FROM
LINEITEM
WHERE
L_ORDERKEY = O_ORDERKEY
AND L_COMMITDATE < L_RECEIPTDATE
)
GROUP BY
O_ORDERPRIORITY
ORDER BY
O_ORDERPRIORITY;
+----+--------------------+----------+-------+--------------------------------------------+---------------------+---------+------------------------+---------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+----------+-------+--------------------------------------------+---------------------+---------+------------------------+---------+----------+-------------+
| 1 | PRIMARY | ORDERS | index | idx_o_orderpriority | idx_o_orderpriority | 16 | NULL | 1445997 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | LINEITEM | ref | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4 | tpch.ORDERS.o_orderkey | 1 | 100.00 | Using where |
+----+--------------------+----------+-------+--------------------------------------------+---------------------+---------+------------------------+---------+----------+-------------+
2 rows in set, 2 warnings (0.01 sec)
可以看到,扫描的行数(1445997)却比不适用该索引多扫描(117768 )了100多万行。
因此对于分组和排序的优化余地应该不存在。
2.2.2尝试使用mrr算法的优化
mysql> show variables like ‘optimizer_switch‘\G;
*************************** 1. row ***************************
Variable_name: optimizer_switch
Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=off,mrr_cost_based=on,block_nested_loop=on,batched_key_access=on,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on
mysql> set optimizer_switch=‘mrr=on,mrr_cost_based=off‘;
Query OK, 0 rows affected (0.00 sec)
再次查看执行计划:
+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+--------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+--------------------------------------------------------------------------------+
| 1 | PRIMARY | ORDERS | range | i_o_orderdate,idx_o_orderpriority | i_o_orderdate | 4 | NULL | 117768 | Using index condition; Using where; Using MRR; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | LINEITEM | ref | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | PRIMARY | 4 | tpch.ORDERS.o_orderkey | 1 | Using where |
+----+--------------------+----------+-------+--------------------------------------------+---------------+---------+------------------------+--------+--------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
优化目的是减少磁盘的随机访问,其前提是使用索引(i_o_orderdate),找到所需的行,并按照主键进行排序,并将随机访问转换为较为顺序的数据访问,其优化使用于range、ref和eq_ref类型的查询,也就是说非这些类型的查询,优化也就失去了作用:
2.3综述
通过该语句的分析,仅仅从sql语句优化方面考虑,增加不必要的索引,除了维护上不必要的开销外,对优化没有任何意义;
可以通过mrr算法进行优化,以减少对IO的访问,提高查询效率。
--Q5;
3.1执行计划
EXPLAIN EXTENDED SELECT
N_NAME,
SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS REVENUE
FROM
CUSTOMER,
ORDERS,
LINEITEM,
SUPPLIER,
NATION,
REGION
WHERE
C_CUSTKEY = O_CUSTKEY
AND L_ORDERKEY = O_ORDERKEY
AND L_SUPPKEY = S_SUPPKEY
AND C_NATIONKEY = S_NATIONKEY
AND S_NATIONKEY = N_NATIONKEY
AND N_REGIONKEY = R_REGIONKEY
AND R_NAME = ‘ASIA‘
AND O_ORDERDATE >= DATE ‘1994-01-01‘
AND O_ORDERDATE < DATE ‘1994-01-01‘ + INTERVAL ‘1‘ YEAR
GROUP BY
N_NAME
ORDER BY
REVENUE DESC;
+----+-------------+----------+--------+--------------------------------------------------------+---------------+---------+--------------------------+------+----------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+--------+--------------------------------------------------------+---------------+---------+--------------------------+------+----------+-----------------------------------------------------+
| 1 | SIMPLE | REGION | ALL | PRIMARY | NULL | NULL | NULL | 5 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | NATION | ref | PRIMARY,i_n_regionkey,idx_n_name | i_n_regionkey | 5 | tpch.REGION.r_regionkey | 2 | 100.00 | Using join buffer (Batched Key Access) |
| 1 | SIMPLE | SUPPLIER | ref | i_s_nationkey | i_s_nationkey | 5 | tpch.NATION.n_nationkey | 221 | 100.00 | Using where; Using join buffer (Batched Key Access) |
| 1 | SIMPLE | LINEITEM | ref | PRIMARY,i_l_orderkey,i_l_suppkey,i_l_orderkey_quantity | i_l_suppkey | 5 | tpch.SUPPLIER.s_suppkey | 272 | 100.00 | Using join buffer (Batched Key Access) |
| 1 | SIMPLE | ORDERS | eq_ref | PRIMARY,i_o_orderdate,i_o_custkey | PRIMARY | 4 | tpch.LINEITEM.l_orderkey | 1 | 100.00 | Using where |
| 1 | SIMPLE | CUSTOMER | eq_ref | PRIMARY,i_c_nationkey | PRIMARY | 4 | tpch.ORDERS.o_custkey | 1 | 100.00 | Using where |
+----+-------------+----------+--------+--------------------------------------------------------+---------------+---------+--------------------------+------+----------+-----------------------------------------------------+
6 rows in set, 1 warning (0.01 sec)
mysql> show warnings\G;
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `tpch`.`nation`.`n_name` AS `N_NAME`,sum((`tpch`.`lineitem`.`l_extendedprice` * (1 - `tpch`.`lineitem`.`l_discount`))) AS `REVENUE` from `tpch`.`customer` join `tpch`.`orders` join `tpch`.`lineitem` join `tpch`.`supplier` join `tpch`.`nation` join `tpch`.`region` where ((`tpch`.`region`.`r_name` = ‘ASIA‘) and (`tpch`.`nation`.`n_regionkey` = `tpch`.`region`.`r_regionkey`) and (`tpch`.`supplier`.`s_nationkey` = `tpch`.`nation`.`n_nationkey`) and (`tpch`.`customer`.`c_nationkey` = `tpch`.`nation`.`n_nationkey`) and (`tpch`.`lineitem`.`l_suppkey` = `tpch`.`supplier`.`s_suppkey`) and (`tpch`.`orders`.`o_orderkey` = `tpch`.`lineitem`.`l_orderkey`) and (`tpch`.`customer`.`c_custkey` = `tpch`.`orders`.`o_custkey`) and (`tpch`.`orders`.`o_orderDATE` >= DATE‘1994-01-01‘) and (`tpch`.`orders`.`o_orderDATE` < <cache>((DATE‘1994-01-01‘ + interval ‘1‘ year)))) group by `tpch`.`nation`.`n_name` order by `REVENUE` desc
1 row in set (0.00 sec)
3.2分析
优化使用了Batched Key Access算法和<cache>方法,另外,通过对表region的全表扫描,使用了临时表和文件排序。通过对前面Q3、Q4的分析,order by无优化的可能,而对于n_name列,由于其数据量很少,添加索引完全没有必要。
尝试去掉表连接:
where条件:
AND C_NATIONKEY = S_NATIONKEY
AND S_NATIONKEY = N_NATIONKEY
更改为AND C_NATIONKEY = N_NATIONKEY
查看执行计划:
+----+-------------+----------+--------+--------------------------------------------------------+-------------+---------+---------------------------+-------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+--------+--------------------------------------------------------+-------------+---------+---------------------------+-------+----------+----------------------------------------------------+
| 1 | SIMPLE | SUPPLIER | ALL | NULL | NULL | NULL | NULL | 10176 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | LINEITEM | ref | PRIMARY,i_l_orderkey,i_l_suppkey,i_l_orderkey_quantity | i_l_suppkey | 5 | tpch.SUPPLIER.s_suppkey | 272 | 100.00 | Using join buffer (Batched Key Access) |
| 1 | SIMPLE | ORDERS | eq_ref | PRIMARY,i_o_orderdate,i_o_custkey | PRIMARY | 4 | tpch.LINEITEM.l_orderkey | 1 | 100.00 | Using where |
| 1 | SIMPLE | CUSTOMER | eq_ref | PRIMARY,i_c_nationkey | PRIMARY | 4 | tpch.ORDERS.o_custkey | 1 | 100.00 | Using where |
| 1 | SIMPLE | NATION | eq_ref | PRIMARY,i_n_regionkey,idx_n_name | PRIMARY | 4 | tpch.CUSTOMER.c_nationkey | 1 | 100.00 | NULL |
| 1 | SIMPLE | REGION | ALL | PRIMARY | NULL | NULL | NULL | 5 | 80.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+----------+--------+--------------------------------------------------------+-------------+---------+---------------------------+-------+----------+----------------------------------------------------+
6 rows in set, 1 warning (0.11 sec)
mysql> show warnings\G;
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `tpch`.`nation`.`n_name` AS `N_NAME`,sum((`tpch`.`lineitem`.`l_extendedprice` * (1 - `tpch`.`lineitem`.`l_discount`))) AS `REVENUE` from `tpch`.`customer` join `tpch`.`orders` join `tpch`.`lineitem` join `tpch`.`supplier` join `tpch`.`nation` join `tpch`.`region` where ((`tpch`.`region`.`r_name` = ‘ASIA‘) and (`tpch`.`region`.`r_regionkey` = `tpch`.`nation`.`n_regionkey`) and (`tpch`.`nation`.`n_nationkey` = `tpch`.`customer`.`c_nationkey`) and (`tpch`.`lineitem`.`l_suppkey` = `tpch`.`supplier`.`s_suppkey`) and (`tpch`.`orders`.`o_orderkey` = `tpch`.`lineitem`.`l_orderkey`) and (`tpch`.`customer`.`c_custkey` = `tpch`.`orders`.`o_custkey`) and (`tpch`.`orders`.`o_orderDATE` >= DATE‘1994-01-01‘) and (`tpch`.`orders`.`o_orderDATE` < <cache>((DATE‘1994-01-01‘ + interval ‘1‘ year)))) group by `tpch`.`nation`.`n_name` order by `REVENUE` desc
1 row in set (0.00 sec)
源语句对于表REGION全表扫描,影响的行数为5,而改造后的语句对于表supplier的全表扫描,影响的行数为10176。
3.3综述
对于SQL语句来说,仅仅从语句方面来说,优化的空间很小,怎么与物理设备和数据库的参数结合起来共同考虑,确实值得探讨。
数据库优化(十二)-tpch中Q3-Q5仅仅在sql优化方面的分析