首页 > 代码库 > 浅析TPCH对查询Q4的优化-正确改写子查询与强制使用索引优化
浅析TPCH对查询Q4的优化-正确改写子查询与强制使用索引优化
1、源查询语句执行计划
EXPLAIN EXTENDED SELECT sql_no_cache
O_ORDERPRIORITY,
COUNT(*) AS ORDER_COUNT
FROM ORDERS
WHERE
O_ORDERDATE >= DATE ‘1993-07-01‘
AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH
AND EXISTS (
SELECT *
FROM
LINEITEM
WHERE
L_ORDERKEY = O_ORDERKEY
AND L_COMMITDATE < L_RECEIPTDATE
)
GROUP BY O_ORDERPRIORITY
ORDER BY O_ORDERPRIORITY;
1.1、查询执行计划
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: ORDERS
type: range
possible_keys: i_o_orderdate
key: i_o_orderdate
key_len: 4
ref: NULL
rows: 110418
filtered: 100.00
Extra: Using index condition; Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: LINEITEM
type: ref
possible_keys: PRIMARY,i_l_orderkey,i_l_orderkey_quantity
key: i_l_orderkey_quantity
key_len: 4
ref: tpch.ORDERS.o_orderkey
rows: 1
filtered: 100.00
Extra: Using where
2 rows in set, 2 warnings (0.00 sec)
1.2、查询执行结果
+-----------------+-------------+
| O_ORDERPRIORITY | ORDER_COUNT |
+-----------------+-------------+
| 1-URGENT | 10594 |
| 2-HIGH | 10476 |
| 3-MEDIUM | 10410 |
| 4-NOT SPECIFIED | 10556 |
| 5-LOW | 10487 |
+-----------------+-------------+
5 rows in set (3.03 sec)
2、不该写SQL语句优化子查询
2.1、查询orders和lineitem表中字段的唯一值
mysql> select count(distinct O_ORDERDATE),count(distinct O_ORDERKEY),count(distinct O_ORDERPRIORITY) from ORDERS\G;
*************************** 1. row ***************************
count(distinct O_ORDERDATE): 2406
count(distinct O_ORDERKEY): 1500000
count(distinct O_ORDERPRIORITY): 5
1 row in set (6.23 sec)
mysql> alter table ORDERS add index idx_merge(O_ORDERDATE,O_ORDERKEY,O_ORDERPRIORITY);
mysql> select count(distinct L_ORDERKEY),count(distinct L_COMMITDATE),count(distinct L_RECEIPTDATE) from lineitem\G;
*************************** 1. row ***************************
count(distinct L_ORDERKEY): 1500000
count(distinct L_COMMITDATE): 2466
count(distinct L_RECEIPTDATE): 2554
1 row in set (20.82 sec)
2.1、为表orders和lineitem增加索引
mysql> alter table ORDERS add index idx_merge(O_ORDERDATE,O_ORDERKEY,O_ORDERPRIORITY);
Query OK, 0 rows affected (21.49 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> alter table lineitem add index idx_merge2(L_ORDERKEY,L_COMMITDATE,L_RECEIPTDATE);
Query OK, 0 rows affected (1 min 47.14 sec)
Records: 0 Duplicates: 0 Warnings: 0
2.3、查看执行计划及执行时间
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: ORDERS
type: range
possible_keys: i_o_orderdate,idx_merge
key: idx_merge
key_len: 4
ref: NULL
rows: 118320
filtered: 93.32
Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: LINEITEM
type: ref
possible_keys: PRIMARY,i_l_orderkey,i_l_orderkey_quantity,idx_merge2
key: i_l_orderkey_quantity
key_len: 4
ref: tpch.ORDERS.o_orderkey
rows: 1
filtered: 100.00
Extra: Using where
2 rows in set, 2 warnings (0.00 sec)
LINEITEM表没有用到所建立的索引,看看执行时间:
+-----------------+-------------+
| O_ORDERPRIORITY | ORDER_COUNT |
+-----------------+-------------+
| 1-URGENT | 10594 |
| 2-HIGH | 10476 |
| 3-MEDIUM | 10410 |
| 4-NOT SPECIFIED | 10556 |
| 5-LOW | 10487 |
+-----------------+-------------+
5 rows in set (1.09 sec)
2.4、使用强制索引查看执行计划及执行时间
mysql> EXPLAIN EXTENDED SELECT sql_no_cache
-> O_ORDERPRIORITY,
-> COUNT(*) AS ORDER_COUNT
-> FROM ORDERS
-> WHERE
-> O_ORDERDATE >= DATE ‘1993-07-01‘
-> AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH
-> AND EXISTS (
-> SELECT *
-> FROM
-> -- LINEITEM
-> LINEITEM force index(idx_merge2)
-> WHERE
-> L_ORDERKEY = O_ORDERKEY
-> AND L_COMMITDATE < L_RECEIPTDATE
-> )
-> GROUP BY O_ORDERPRIORITY
-> ORDER BY O_ORDERPRIORITY\G;
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: ORDERS
type: range
possible_keys: i_o_orderdate,idx_merge
key: idx_merge
key_len: 4
ref: NULL
rows: 118320
filtered: 93.32
Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: LINEITEM
type: ref
possible_keys: idx_merge2
key: idx_merge2
key_len: 4
ref: tpch.ORDERS.o_orderkey
rows: 1
filtered: 100.00
Extra: Using where; Using index
2 rows in set, 2 warnings (0.00 sec)
查看执行时间:
mysql> SELECT sql_no_cache
-> O_ORDERPRIORITY,
-> COUNT(*) AS ORDER_COUNT
-> FROM ORDERS
-> WHERE
-> O_ORDERDATE >= DATE ‘1993-07-01‘
-> AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH
-> AND EXISTS (
-> SELECT *
-> FROM
-> -- LINEITEM
-> LINEITEM force index(idx_merge2)
-> WHERE
-> L_ORDERKEY = O_ORDERKEY
-> AND L_COMMITDATE < L_RECEIPTDATE
-> )
-> GROUP BY O_ORDERPRIORITY
-> ORDER BY O_ORDERPRIORITY;
+-----------------+-------------+
| O_ORDERPRIORITY | ORDER_COUNT |
+-----------------+-------------+
| 1-URGENT | 10594 |
| 2-HIGH | 10476 |
| 3-MEDIUM | 10410 |
| 4-NOT SPECIFIED | 10556 |
| 5-LOW | 10487 |
+-----------------+-------------+
5 rows in set (0.74 sec)
执行时间缩短为0.74s,优化效果明显。
3、改写子查询的优化
3.1、改写exists语句的优化
mysql> EXPLAIN EXTENDED SELECT
-> O_ORDERPRIORITY,
-> -- COUNT(*) AS ORDER_COUNT
-> COUNT(distinct o_orderkey) AS ORDER_COUNT
-> -- FROM ORDERS,LINEITEM
-> FROM ORDERS,LINEITEM force index(idx_merge2)
-> WHERE
-> O_ORDERDATE >= DATE ‘1993-07-01‘
-> AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH
-> AND L_ORDERKEY = O_ORDERKEY
-> AND L_COMMITDATE < L_RECEIPTDATE
->
-> GROUP BY O_ORDERPRIORITY
-> ORDER BY O_ORDERPRIORITY\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ORDERS
type: range
possible_keys: PRIMARY,i_o_orderdate,idx_merge
key: idx_merge
key_len: 4
ref: NULL
rows: 118320
filtered: 93.32
Extra: Using where; Using index; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: LINEITEM
type: ref
possible_keys: idx_merge2
key: idx_merge2
key_len: 4
ref: tpch.ORDERS.o_orderkey
rows: 1
filtered: 100.00
Extra: Using where; Using index
2 rows in set, 1 warning (0.00 sec)
查看执行时间:
+-----------------+-------------+
| O_ORDERPRIORITY | ORDER_COUNT |
+-----------------+-------------+
| 1-URGENT | 10594 |
| 2-HIGH | 10476 |
| 3-MEDIUM | 10410 |
| 4-NOT SPECIFIED | 10556 |
| 5-LOW | 10487 |
+-----------------+-------------+
5 rows in set (1.07 sec)
改写SQL的结果并没有改写前效果好。
3.2、改写exists语句易产生的误区
通常容易改为:
EXPLAIN EXTENDED SELECT
O_ORDERPRIORITY,
COUNT(*) AS ORDER_COUNT
-- COUNT(distinct o_orderkey) AS ORDER_COUNT
FROM ORDERS,LINEITEM
-- FROM ORDERS,LINEITEM force index(idx_merge2)
WHERE
O_ORDERDATE >= DATE ‘1993-07-01‘
AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH
AND L_ORDERKEY = O_ORDERKEY
AND L_COMMITDATE < L_RECEIPTDATE
GROUP BY O_ORDERPRIORITY
ORDER BY O_ORDERPRIORITY;
这样产生的结果与改写前的结果是不同的:
+-----------------+-------------+
| O_ORDERPRIORITY | ORDER_COUNT |
+-----------------+-------------+
| 1-URGENT | 29215 |
| 2-HIGH | 29020 |
| 3-MEDIUM | 28616 |
| 4-NOT SPECIFIED | 29253 |
| 5-LOW | 28765 |
+-----------------+-------------+
5 rows in set (1.35 sec)
原因是exists子句的结果只是true或false,然后和其他结果做“与”操作;而改写后,它的结果直接参与“与”操作,这样只要符合 L_ORDERKEY = O_ORDERKEY AND L_COMMITDATE < L_RECEIPTDATE条件的结果都会在结果集中出现,这样count(*)时,就会计算orders表中重复的行。如果这时加上distinct关建字,则可消除,即COUNT(distinct o_orderkey),从而得到正确的结果。
4、综述
从Q4的优化得到如下总结:
(1)增加索引,尤其是联合索引提高查询执行效率;
(2)使用索引提示,可以提高查询效率;
(3)改写子查询的前提是不可改变他的查询结果,改写子查询不一定能提高查询效率。具体查询要具体分析。
浅析TPCH对查询Q4的优化-正确改写子查询与强制使用索引优化