5.5 Stupid Query Tricks

We can't end a chapter on query optimization without looking at some common tricks that can increase performance of some queries. While these are all rather specific, you may find techniques that can be applied in other circumstances.

5.5.1 Two Is Better Than One

Sometimes MySQL doesn't optimize a seemingly simple query the way you'd expect. A good example of this behavior occurred in a database used to track historical stock prices. There are two tables involved: SymbolHistory and Symbols.

As far as we're concerned, the Symbols table contains two important fields: Id and Symbol. The Id is an auto_increment primary key. Here's the PriceHistory table:

mysql> DESCRIBE PriceHistory;

+----------+---------+------+-----+------------+-------+

| Field    | Type    | Null | Key | Default    | Extra |

+----------+---------+------+-----+------------+-------+

| SymbolID | int(11) |      | PRI | 0          |       |

| Date     | date    |      | PRI | 0000-00-00 |       |

| Open     | float   |      |     | 0          |       |

| High     | float   |      |     | 0          |       |

| Low      | float   |      |     | 0          |       |

| Close    | float   |      |     | 0          |       |

| Volume   | float   |      |     | 0          |       |

+----------+---------+------+-----+------------+-------+

8 rows in set (0.01 sec)

It has a two-part index on (SymbolID, Date).

The Symbols table maps stock tickers to numeric identifiers. It also contains various other bits of metadata about each security. The PriceHistory table contains the historical price data. One of the most common queries run against the data is, "Show me all closing prices for a given stock sorted from newest to oldest."

To fetch the price history for IBM, the query looks like this:

mysql> EXPLAIN SELECT date_format(Date,'%Y%m%d') as Day, Close

    -> FROM Symbols, PriceHistory

    -> WHERE Symbols.ID=PriceHistory.SymbolID AND Symbols.Symbol = 'ibm'

    -> ORDER BY Date DESC \G

*************************** 1. row ***************************

        table: Symbols

         type: const

possible_keys: PRIMARY,Symbols_SymbolIDX

          key: Symbols_SymbolIDX

      key_len: 20

          ref: const

         rows: 1

        Extra: Using filesort

*************************** 2. row ***************************

        table: PriceHistory

         type: ref

possible_keys: PriceHistory_IDX

          key: PriceHistory_IDX

      key_len: 4

          ref: const

         rows: 471

        Extra: Using where

2 rows in set (0.01 sec)

Notice the Using filesort in the EXPLAIN output, which means MySQL will need to sort all the records based on the date. It turns out that the Date column is in the index, but MySQL can't use it directly for sorting because it's not the first part of a composite index. The result is a second pass over the rows to return them in the correct order. That sorting process can be slow when the query is run hundreds of times each minute on a large variety of stocks, some of which have thousands of records.

To improve the performance, we need to arrange it so that MySQL can query the PriceHistory and use the index on the Date column. The easiest way to do so is to break it up into two queries using a temporary variable, just like we did earlier to work around the lack of subselects:

mysql> SELECT @sid := Id FROM Symbols WHERE Symbol = 'ibm';

+------------+

| @sid := Id |

+------------+

|     459378 |

+------------+

1 row in set (0.02 sec)



mysql> EXPLAIN SELECT date_format(Date,'%Y%m%d') as Day, Close

    -> FROM PriceHistory WHERE SymbolID = @sid ORDER BY Date DESC \G

*************************** 1. row ***************************

        table: PriceHistory

         type: ref

possible_keys: PriceHistory_IDX

          key: PriceHistory_IDX

      key_len: 4

          ref: const

         rows: 7234

        Extra: Using where

1 row in set (0.00 sec)

An improvement like this can often mean the difference between a CPU-bound server handling 200 queries per second and a partially idle server handling 700 queries per second. The overhead associated with performing two queries instead of one is still usually less than that extra sorting pass.

5.5.2 Unions Instead of ORs

Earlier we used a query like this to demonstrate that MySQL handles the situation efficiently:

mysql> EXPLAIN SELECT COUNT(*) FROM Headline

    -> WHERE ExpireTime >= 1112201600 AND Id <= 5000000 \G

*************************** 1. row ***************************

           id: 1

  select_type: SIMPLE

        table: Headline

         type: range

possible_keys: PRIMARY,ExpireTime

          key: ExpireTime

      key_len: 4

          ref: NULL

         rows: 12009

        Extra: Using where

1 row in set (0.00 sec)

In this example, MySQL uses the ExpireTime index to fetch a set of rows. It then applies the rest of the WHERE clause to eliminate those rows with ID values less than or equal to 5,000,000.

But what if the AND is changed to an OR condition, and we change it from a COUNT(*) to something a bit more meaningful?

mysql> EXPLAIN SELECT * FROM Headline

    -> WHERE ExpireTime >= 1012201600 OR Id <= 5000000

    -> ORDER BY ExpireTime ASC LIMIT 10\G

*************************** 1. row ***************************

           id: 1

  select_type: SIMPLE

        table: Headline

         type: ALL

possible_keys: PRIMARY,ExpireTime

          key: NULL

      key_len: NULL

          ref: NULL

         rows: 302116

        Extra: Using where

1 row in set (0.00 sec)

Uh oh. MySQL has decided to perform a full table scan. Actually executing the full query (rather than just explaining it) takes almost three seconds. Let's think about why MySQL made this choice.

We know that MySQL will use only one index per table per query, and the Headline table has an index on Id as well as one on ExpireTime. So why didn't it pick either one?

No matter which index MySQL selects, it has to perform a full table scan to satisfy the other condition. Queries using OR conditions prevent MySQL from easily eliminating candidate rows. So rather than use one index to find some of the rows and then perform the table scan, MySQL decides that it's faster to simply use a table scan. This is slated to be fixed in MySQL 5.0.

In a well-normalized database, queries like the previous one tend not be very common. But when they do occur, they can be real performance killers. Luckily we can sometimes rewrite them using a UNION.

To do this, we'll break the query into two queries that can each use a single index. Then we'll merge and sort the results. The result looks like this:

(SELECT * FROM Headline WHERE ExpireTime >= 1081020749

ORDER BY ExpireTime ASC LIMIT 10)



UNION



(SELECT * FROM Headline WHERE Id <= 50000

ORDER BY ExpireTime ASC LIMIT 10)



ORDER BY ExpireTime ASC LIMIT 10

The first query should be able to use the ExpireTime index while the second one uses the Id index. We must make sure to ask for the total number of rows desired (10) in both queries. The outer ORDER BY and LIMIT clauses will take care of the final sorting and counting.

It turns out that the UNION-based query runs in 0.02 seconds. That's far faster than the query it replaces. Just to make sure we understand what MySQL does, let's explain it:

mysql> EXPLAIN (SELECT * FROM Headline WHERE ExpireTime >= 1081020749

    -> ORDER BY ExpireTime ASC LIMIT 10)

    -> UNION

    -> (SELECT * FROM Headline WHERE Id <= 50000

    -> ORDER BY ExpireTime ASC LIMIT 10)

    -> ORDER BY ExpireTime ASC LIMIT 10 \G

*************************** 1. row ***************************

           id: 1

  select_type: PRIMARY

        table: Headline

         type: range

possible_keys: ExpireTime

          key: ExpireTime

      key_len: 4

          ref: NULL

         rows: 40306

        Extra: Using where

*************************** 2. row ***************************

           id: 2

  select_type: UNION

        table: Headline

         type: range

possible_keys: PRIMARY

          key: PRIMARY

      key_len: 4

          ref: NULL

         rows: 1

        Extra: Using where; Using filesort

2 rows in set (0.00 sec)

Not bad at all. The second query needs a file sort operation, but at least it will use an index to locate all the rows.

< Day Day Up >