5.5 Stupid Query Tricks
We can't end a
chapter on query optimization without looking at some common tricks
that can increase performance of some queries. While these are all
rather specific, you may find techniques that can be applied in other
circumstances.
5.5.1 Two Is Better Than One
Sometimes MySQL doesn't
optimize a seemingly simple query the way you'd
expect. A good example of this behavior occurred in a database used
to track historical stock prices. There are two tables involved:
SymbolHistory and Symbols.
As far as we're concerned, the
Symbols table contains two important fields:
Id and Symbol. The
Id is an auto_increment primary
key. Here's the PriceHistory
table:
mysql> DESCRIBE PriceHistory;
+----------+---------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+------------+-------+
| SymbolID | int(11) | | PRI | 0 | |
| Date | date | | PRI | 0000-00-00 | |
| Open | float | | | 0 | |
| High | float | | | 0 | |
| Low | float | | | 0 | |
| Close | float | | | 0 | |
| Volume | float | | | 0 | |
+----------+---------+------+-----+------------+-------+
8 rows in set (0.01 sec)
It has a two-part index on (SymbolID,
Date).
The Symbols table maps stock tickers to numeric
identifiers. It also contains various other bits of metadata about
each security. The PriceHistory table contains the
historical price data. One of the most common queries run against the
data is, "Show me all closing prices for a given
stock sorted from newest to oldest."
To fetch the price history for IBM, the query looks like this:
mysql> EXPLAIN SELECT date_format(Date,'%Y%m%d') as Day, Close
-> FROM Symbols, PriceHistory
-> WHERE Symbols.ID=PriceHistory.SymbolID AND Symbols.Symbol = 'ibm'
-> ORDER BY Date DESC \G
*************************** 1. row ***************************
table: Symbols
type: const
possible_keys: PRIMARY,Symbols_SymbolIDX
key: Symbols_SymbolIDX
key_len: 20
ref: const
rows: 1
Extra: Using filesort
*************************** 2. row ***************************
table: PriceHistory
type: ref
possible_keys: PriceHistory_IDX
key: PriceHistory_IDX
key_len: 4
ref: const
rows: 471
Extra: Using where
2 rows in set (0.01 sec)
Notice the Using filesort in the
EXPLAIN output, which means MySQL will need to
sort all the records based on the date. It turns out that the
Date column is in the index, but MySQL
can't use it directly for sorting because
it's not the first part of a composite index. The
result is a second pass over the rows to return them in the correct
order. That sorting process can be slow when the query is run
hundreds of times each minute on a large variety of stocks, some of
which have thousands of records.
To improve the performance, we need to arrange it so that MySQL can
query the PriceHistory and use the index on the
Date column. The easiest way to do so is to break
it up into two queries using a temporary variable, just like we did
earlier to work around the lack of subselects:
mysql> SELECT @sid := Id FROM Symbols WHERE Symbol = 'ibm';
+------------+
| @sid := Id |
+------------+
| 459378 |
+------------+
1 row in set (0.02 sec)
mysql> EXPLAIN SELECT date_format(Date,'%Y%m%d') as Day, Close
-> FROM PriceHistory WHERE SymbolID = @sid ORDER BY Date DESC \G
*************************** 1. row ***************************
table: PriceHistory
type: ref
possible_keys: PriceHistory_IDX
key: PriceHistory_IDX
key_len: 4
ref: const
rows: 7234
Extra: Using where
1 row in set (0.00 sec)
An improvement like this can often mean the difference between a
CPU-bound server handling 200 queries per second and a partially idle
server handling 700 queries per second. The overhead associated with
performing two queries instead of one is still usually less than that
extra sorting pass.
5.5.2 Unions Instead of ORs
Earlier we used a query like this to
demonstrate that MySQL handles the situation efficiently:
mysql> EXPLAIN SELECT COUNT(*) FROM Headline
-> WHERE ExpireTime >= 1112201600 AND Id <= 5000000 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: Headline
type: range
possible_keys: PRIMARY,ExpireTime
key: ExpireTime
key_len: 4
ref: NULL
rows: 12009
Extra: Using where
1 row in set (0.00 sec)
In this example, MySQL uses the ExpireTime index
to fetch a set of rows. It then applies the rest of the
WHERE clause to eliminate those rows with ID
values less than or equal to 5,000,000.
But what if the AND is changed to an
OR condition, and we change it from a
COUNT(*) to something a bit more meaningful?
mysql> EXPLAIN SELECT * FROM Headline
-> WHERE ExpireTime >= 1012201600 OR Id <= 5000000
-> ORDER BY ExpireTime ASC LIMIT 10\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: Headline
type: ALL
possible_keys: PRIMARY,ExpireTime
key: NULL
key_len: NULL
ref: NULL
rows: 302116
Extra: Using where
1 row in set (0.00 sec)
Uh oh. MySQL has decided to perform a full table scan. Actually
executing the full query (rather than just explaining it) takes
almost three seconds. Let's think about why MySQL
made this choice.
We know that MySQL will use only one index per table per query, and
the Headline table has an index on
Id as well as one on
ExpireTime. So why didn't it pick
either one?
No matter which index MySQL selects, it has to perform a full table
scan to satisfy the other condition. Queries using
OR conditions prevent MySQL from easily
eliminating candidate rows. So rather than use one index to find some
of the rows and then perform the table scan, MySQL decides that
it's faster to simply use a table scan. This is
slated to be fixed in MySQL 5.0.
In a well-normalized database, queries like the previous one tend not
be very common. But when they do occur, they can be real performance
killers. Luckily we can sometimes rewrite them using a
UNION.
To do this, we'll break the query into two queries
that can each use a single index. Then we'll merge
and sort the results. The result looks like this:
(SELECT * FROM Headline WHERE ExpireTime >= 1081020749
ORDER BY ExpireTime ASC LIMIT 10)
UNION
(SELECT * FROM Headline WHERE Id <= 50000
ORDER BY ExpireTime ASC LIMIT 10)
ORDER BY ExpireTime ASC LIMIT 10
The first query should be able to use the
ExpireTime index while the second one uses the
Id index. We must make sure to ask for the total
number of rows desired (10) in both queries. The
outer ORDER BY and LIMIT
clauses will take care of the final sorting and counting.
It turns out that the UNION-based query runs in
0.02 seconds. That's far faster than the query it
replaces. Just to make sure we understand what MySQL does,
let's explain it:
mysql> EXPLAIN (SELECT * FROM Headline WHERE ExpireTime >= 1081020749
-> ORDER BY ExpireTime ASC LIMIT 10)
-> UNION
-> (SELECT * FROM Headline WHERE Id <= 50000
-> ORDER BY ExpireTime ASC LIMIT 10)
-> ORDER BY ExpireTime ASC LIMIT 10 \G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: Headline
type: range
possible_keys: ExpireTime
key: ExpireTime
key_len: 4
ref: NULL
rows: 40306
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: UNION
table: Headline
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 1
Extra: Using where; Using filesort
2 rows in set (0.00 sec)
Not bad at all. The second query needs a file sort operation, but at
least it will use an index to locate all the rows.
|