首页 > 代码库 > 浅析TPCH对查询Q4的优化-正确改写子查询与强制使用索引优化

浅析TPCH对查询Q4的优化-正确改写子查询与强制使用索引优化

1、源查询语句执行计划

EXPLAIN EXTENDED SELECT sql_no_cache

O_ORDERPRIORITY, 

  COUNT(*) AS ORDER_COUNT

FROM ORDERS

WHERE

O_ORDERDATE >= DATE ‘1993-07-01‘

AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH

AND EXISTS ( 

SELECT

FROM

LINEITEM

WHERE

L_ORDERKEY = O_ORDERKEY

AND L_COMMITDATE < L_RECEIPTDATE 

)

GROUP BY O_ORDERPRIORITY 

ORDER BY O_ORDERPRIORITY;

1.1、查询执行计划

*************************** 1. row ***************************

           id: 1

  select_type: PRIMARY

        table: ORDERS

         type: range

possible_keys: i_o_orderdate

          key: i_o_orderdate

      key_len: 4

          ref: NULL

         rows: 110418

     filtered: 100.00

        Extra: Using index condition; Using where; Using temporary; Using filesort

*************************** 2. row ***************************

           id: 2

  select_type: DEPENDENT SUBQUERY

        table: LINEITEM

         type: ref

possible_keys: PRIMARY,i_l_orderkey,i_l_orderkey_quantity

          key: i_l_orderkey_quantity

      key_len: 4

          ref: tpch.ORDERS.o_orderkey

         rows: 1

     filtered: 100.00

        Extra: Using where

2 rows in set, 2 warnings (0.00 sec)

1.2、查询执行结果


+-----------------+-------------+

| O_ORDERPRIORITY | ORDER_COUNT |

+-----------------+-------------+

| 1-URGENT        |       10594 |

| 2-HIGH          |       10476 |

| 3-MEDIUM        |       10410 |

| 4-NOT SPECIFIED |       10556 |

| 5-LOW           |       10487 |

+-----------------+-------------+

5 rows in set (3.03 sec)

2、不该写SQL语句优化子查询

2.1、查询orders和lineitem表中字段的唯一值

mysql> select count(distinct O_ORDERDATE),count(distinct O_ORDERKEY),count(distinct O_ORDERPRIORITY) from ORDERS\G;

*************************** 1. row ***************************

    count(distinct O_ORDERDATE): 2406

     count(distinct O_ORDERKEY): 1500000

count(distinct O_ORDERPRIORITY): 5

1 row in set (6.23 sec)

mysql> alter table ORDERS add index idx_merge(O_ORDERDATE,O_ORDERKEY,O_ORDERPRIORITY);


mysql> select count(distinct L_ORDERKEY),count(distinct L_COMMITDATE),count(distinct L_RECEIPTDATE) from lineitem\G;

*************************** 1. row ***************************

   count(distinct L_ORDERKEY): 1500000

 count(distinct L_COMMITDATE): 2466

count(distinct L_RECEIPTDATE): 2554

1 row in set (20.82 sec)

2.1、为表orders和lineitem增加索引

mysql> alter table ORDERS add index idx_merge(O_ORDERDATE,O_ORDERKEY,O_ORDERPRIORITY);

Query OK, 0 rows affected (21.49 sec)

Records: 0  Duplicates: 0  Warnings: 0


mysql> alter table lineitem add index idx_merge2(L_ORDERKEY,L_COMMITDATE,L_RECEIPTDATE);

Query OK, 0 rows affected (1 min 47.14 sec)

Records: 0  Duplicates: 0  Warnings: 0

2.3、查看执行计划及执行时间

*************************** 1. row ***************************

           id: 1

  select_type: PRIMARY

        table: ORDERS

         type: range

possible_keys: i_o_orderdate,idx_merge

          key: idx_merge

      key_len: 4

          ref: NULL

         rows: 118320

     filtered: 93.32

        Extra: Using where; Using index; Using temporary; Using filesort

*************************** 2. row ***************************

           id: 2

  select_type: DEPENDENT SUBQUERY

        table: LINEITEM

         type: ref

possible_keys: PRIMARY,i_l_orderkey,i_l_orderkey_quantity,idx_merge2

          key: i_l_orderkey_quantity

      key_len: 4

          ref: tpch.ORDERS.o_orderkey

         rows: 1

     filtered: 100.00

        Extra: Using where

2 rows in set, 2 warnings (0.00 sec)

    LINEITEM表没有用到所建立的索引,看看执行时间:

+-----------------+-------------+

| O_ORDERPRIORITY | ORDER_COUNT |

+-----------------+-------------+

| 1-URGENT        |       10594 | 

| 2-HIGH          |       10476 | 

| 3-MEDIUM        |       10410 | 

| 4-NOT SPECIFIED |       10556 | 

| 5-LOW           |       10487 | 

+-----------------+-------------+

5 rows in set (1.09 sec)

2.4、使用强制索引查看执行计划及执行时间

mysql> EXPLAIN EXTENDED SELECT sql_no_cache

    -> O_ORDERPRIORITY, 

    ->  COUNT(*) AS ORDER_COUNT

    -> FROM ORDERS

    -> WHERE

    -> O_ORDERDATE >= DATE ‘1993-07-01‘

    -> AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH

    -> AND EXISTS ( 

    -> SELECT * 

    -> FROM

    -> -- LINEITEM

    ->  LINEITEM force index(idx_merge2)

    -> WHERE

    -> L_ORDERKEY = O_ORDERKEY

    -> AND L_COMMITDATE < L_RECEIPTDATE 

    -> )

    -> GROUP BY O_ORDERPRIORITY 

    -> ORDER BY O_ORDERPRIORITY\G;

*************************** 1. row ***************************

           id: 1

  select_type: PRIMARY

        table: ORDERS

         type: range

possible_keys: i_o_orderdate,idx_merge

          key: idx_merge

      key_len: 4

          ref: NULL

         rows: 118320

     filtered: 93.32

        Extra: Using where; Using index; Using temporary; Using filesort

*************************** 2. row ***************************

           id: 2

  select_type: DEPENDENT SUBQUERY

        table: LINEITEM

         type: ref

possible_keys: idx_merge2

          key: idx_merge2

      key_len: 4

          ref: tpch.ORDERS.o_orderkey

         rows: 1

     filtered: 100.00

        Extra: Using where; Using index

2 rows in set, 2 warnings (0.00 sec)

    查看执行时间:

mysql> SELECT sql_no_cache

    -> O_ORDERPRIORITY, 

    ->  COUNT(*) AS ORDER_COUNT

    -> FROM ORDERS

    -> WHERE

    -> O_ORDERDATE >= DATE ‘1993-07-01‘

    -> AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH

    -> AND EXISTS ( 

    -> SELECT * 

    -> FROM

    -> -- LINEITEM

    ->  LINEITEM force index(idx_merge2)

    -> WHERE

    -> L_ORDERKEY = O_ORDERKEY

    -> AND L_COMMITDATE < L_RECEIPTDATE 

    -> )

    -> GROUP BY O_ORDERPRIORITY 

    -> ORDER BY O_ORDERPRIORITY;

+-----------------+-------------+

| O_ORDERPRIORITY | ORDER_COUNT |

+-----------------+-------------+

| 1-URGENT        |       10594 | 

| 2-HIGH          |       10476 | 

| 3-MEDIUM        |       10410 | 

| 4-NOT SPECIFIED |       10556 | 

| 5-LOW           |       10487 | 

+-----------------+-------------+

5 rows in set (0.74 sec)

    执行时间缩短为0.74s,优化效果明显。

3、改写子查询的优化

3.1、改写exists语句的优化

mysql> EXPLAIN EXTENDED SELECT

    -> O_ORDERPRIORITY, 

    -> --  COUNT(*) AS ORDER_COUNT

    -> COUNT(distinct o_orderkey) AS ORDER_COUNT

    -> -- FROM ORDERS,LINEITEM

    -> FROM ORDERS,LINEITEM force index(idx_merge2)

    -> WHERE

    -> O_ORDERDATE >= DATE ‘1993-07-01‘

    -> AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH

    -> AND L_ORDERKEY = O_ORDERKEY

    -> AND L_COMMITDATE < L_RECEIPTDATE

    -> 

    -> GROUP BY O_ORDERPRIORITY 

    -> ORDER BY O_ORDERPRIORITY\G;

*************************** 1. row ***************************

           id: 1

  select_type: SIMPLE

        table: ORDERS

         type: range

possible_keys: PRIMARY,i_o_orderdate,idx_merge

          key: idx_merge

      key_len: 4

          ref: NULL

         rows: 118320

     filtered: 93.32

        Extra: Using where; Using index; Using filesort

*************************** 2. row ***************************

           id: 1

  select_type: SIMPLE

        table: LINEITEM

         type: ref

possible_keys: idx_merge2

          key: idx_merge2

      key_len: 4

          ref: tpch.ORDERS.o_orderkey

         rows: 1

     filtered: 100.00

        Extra: Using where; Using index

2 rows in set, 1 warning (0.00 sec)

查看执行时间:

+-----------------+-------------+

| O_ORDERPRIORITY | ORDER_COUNT |

+-----------------+-------------+

| 1-URGENT        |       10594 | 

| 2-HIGH          |       10476 | 

| 3-MEDIUM        |       10410 | 

| 4-NOT SPECIFIED |       10556 | 

| 5-LOW           |       10487 | 

+-----------------+-------------+

5 rows in set (1.07 sec)

    改写SQL的结果并没有改写前效果好。

3.2、改写exists语句易产生的误区

    通常容易改为:

EXPLAIN EXTENDED SELECT

O_ORDERPRIORITY, 

   COUNT(*) AS ORDER_COUNT

-- COUNT(distinct o_orderkey) AS ORDER_COUNT

FROM ORDERS,LINEITEM

-- FROM ORDERS,LINEITEM force index(idx_merge2)

WHERE

O_ORDERDATE >= DATE ‘1993-07-01‘

AND O_ORDERDATE < DATE ‘1993-07-01‘ + INTERVAL ‘3‘ MONTH

AND L_ORDERKEY = O_ORDERKEY

AND L_COMMITDATE < L_RECEIPTDATE

GROUP BY O_ORDERPRIORITY 

ORDER BY O_ORDERPRIORITY;

    这样产生的结果与改写前的结果是不同的:

+-----------------+-------------+

| O_ORDERPRIORITY | ORDER_COUNT |

+-----------------+-------------+

| 1-URGENT        |       29215

| 2-HIGH          |       29020

| 3-MEDIUM        |       28616

| 4-NOT SPECIFIED |       29253

| 5-LOW           |       28765

+-----------------+-------------+

5 rows in set (1.35 sec)

    原因是exists子句的结果只是true或false,然后和其他结果做“与”操作;而改写后,它的结果直接参与“与”操作,这样只要符合 L_ORDERKEY = O_ORDERKEY AND L_COMMITDATE < L_RECEIPTDATE条件的结果都会在结果集中出现,这样count(*)时,就会计算orders表中重复的行。如果这时加上distinct关建字,则可消除,即COUNT(distinct o_orderkey),从而得到正确的结果。

4、综述

    从Q4的优化得到如下总结:

    (1)增加索引,尤其是联合索引提高查询执行效率;

    (2)使用索引提示,可以提高查询效率;

    (3)改写子查询的前提是不可改变他的查询结果,改写子查询不一定能提高查询效率。具体查询要具体分析。





浅析TPCH对查询Q4的优化-正确改写子查询与强制使用索引优化