< Day Day Up > |
13.2 Using EXPLAIN to Analyze QueriesWhen a SELECT query does not run as quickly as you think it should, use the EXPLAIN statement to ask the MySQL server for information about how the query optimizer processes the query. This information is useful in several ways:
When using EXPLAIN to analyze a query, it's helpful to have a good understanding of the tables involved. If you need to determine a table's structure, remember that you can use DESCRIBE to obtain information about a table's columns, and SHOW INDEX for information about its indexes. This section describes how EXPLAIN works. Later in the chapter, section 13.3, "General Query Enhancement," discusses some general query-writing principles that help MySQL use indexes more effectively. You can apply those principles in conjunction with EXPLAIN to determine the best way of writing a query. 13.2.1 Identifying Candidates for Query AnalysisEXPLAIN can be used to analyze any SELECT query, but some query performance characteristics make it especially likely that EXPLAIN will be helpful:
Recognize that "slow" can be a relative term. You don't want to waste time trying to optimize a query that seems slow but is so only for external reasons and is not inherently slow:
Keeping in mind the preceding considerations, you have a good indicator that a query might be in need of being optimized if you find that it is consistently slow in comparison to other queries no matter when you run it, and you know the machine isn't just generally bogged down all the time. Another factor to recognize is that the mere presence of a query in the slow query log does not necessarily mean that the query is slow. If the server is run with the --long-log-format option, the slow query log also will contain queries that execute without using any index. In some cases, such a query may indeed be a prime candidate for optimization (for example, by adding an index). But in other cases, MySQL might elect not to use an existing index simply because a table is so small that scanning all of its rows is just as fast as using an index. The SHOW PROCESSLIST statement is another useful source of information about query execution. Use it periodically to get information about what queries currently are running. If you notice that a particular query often seems to be causing a backlog by making other queries block, see whether you can optimize it. If you're successful, it will alleviate the backlog. To get the most information from SHOW PROCESSLIST, you should have the PROCESS privilege. Then the statement will display queries being run by all clients, not just your own queries. 13.2.2 How EXPLAIN WorksTo use EXPLAIN, write your SELECT query as you normally would, but place the keyword EXPLAIN in front of it. As a very simple example, take the following statement: SELECT 1; To see what EXPLAIN will do with it, issue the statement like this:
mysql> EXPLAIN SELECT 1;
+----------------+
| Comment |
+----------------+
| No tables used |
+----------------+
In practice, it's unlikely that you'd use EXPLAIN very often for a query like that because the output doesn't tell you anything particularly interesting. However, the example does illustrate an important principle: EXPLAIN can be applied to any SELECT query. One of the implications of this principle is that you can use EXPLAIN with simple queries while you're learning how to use it and how to interpret its results. You don't have to begin with a complicated multiple-table join. With that in mind, consider these two simple single-table queries: SELECT * FROM Country WHERE Name = 'France'; SELECT * FROM Country WHERE Code = 'FRA'; Both queries produce the same output (information about the country of France), but they are not equally efficient. How do you know? Because EXPLAIN tells you so. When you use EXPLAIN with each of the two queries, it provides the following information about how the MySQL optimizer views them: mysql> EXPLAIN SELECT * FROM Country WHERE Name = 'France'\G *************************** 1. row *************************** table: Country type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 239 Extra: Using where mysql> EXPLAIN SELECT * FROM Country WHERE Code = 'FRA'\G *************************** 1. row *************************** table: Country type: const possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: const rows: 1 Extra: EXPLAIN produces several columns of information. In the example just shown, NULL in the possible_keys and key columns shows for the first query that no index is considered available or usable for processing the query. For the second query, the table's PRIMARY KEY column (the Code column that contains three-letter country codes) can be used, and is in fact the one that the optimizer would choose. The rows column of the EXPLAIN output shows the effect of this difference. Its value indicates the number of rows that MySQL estimates it will need to examine while processing the query:
This example briefly indicates the kind of useful information that EXPLAIN can provide, even for simple queries. The conclusion to draw is that, if possible, you should use the Code column rather than the Name column to look up Country table records. However, the real power of EXPLAIN lies in what it can tell you about joins—SELECT queries that use multiple tables. EXPLAIN is especially important for join analysis because they have such enormous potential to increase the amount of processing the server must do. If you select from a table with a thousand rows, the server might need to scan all one thousand rows in the worst case. But if you perform a join between two tables with a thousand rows each, the server might need to examine every possible combination of rows, which is one million combinations. That's a much worse worst case. EXPLAIN can help you reduce the work the server must do to process such a query, so it's well worth using. 13.2.3 Analyzing a QueryThe following example demonstrates how to use EXPLAIN to analyze and optimize a sample query. The purpose of the query is to answer the question, "Which cities have a population of more than eight million?" and to display for each city its name and population, along with the country name. This question could be answered using only city information, except that to get each country's name rather than its code, city information must be joined to country information. The example uses tables created from world database information. Initially, these tables will have no indexes, so EXPLAIN will show that the query is not optimal. The example then adds indexes and uses EXPLAIN to determine the effect of indexing on query performance. Begin by creating the initial tables, CountryList and CityList. These are derived from the Country and City tables, but need contain only the columns involved in the query: mysql> CREATE TABLE CountryList -> SELECT Code, Name FROM Country; Query OK, 239 rows affected (0.04 sec) mysql> CREATE TABLE CityList -> SELECT CountryCode, Name, Population FROM City; Query OK, 4079 rows affected (0.04 sec) The query that retrieves the desired information in the required format looks like this: mysql> SELECT CountryList.Name, CityList.Name, CityList.Population -> FROM CountryList, CityList -> WHERE CountryList.Code = CityList.CountryCode -> AND CityList.Population > 8000000; +--------------------+------------------+------------+ | Name | Name | Population | +--------------------+------------------+------------+ | Brazil | Saõ Paulo | 9968485 | | Indonesia | Jakarta | 9604900 | | India | Mumbai (Bombay) | 10500000 | | China | Shanghai | 9696300 | | South Korea | Seoul | 9981619 | | Mexico | Ciudad de México | 8591309 | | Pakistan | Karachi | 9269265 | | Turkey | Istanbul | 8787958 | | Russian Federation | Moscow | 8389200 | | United States | New York | 8008278 | +--------------------+------------------+------------+ While the tables are in their initial unindexed state, applying EXPLAIN to the query yields the following result: mysql> EXPLAIN SELECT CountryList.Name, CityList.Name, CityList.Population -> FROM CountryList, CityList -> WHERE CountryList.Code = CityList.CountryCode -> AND CityList.Population > 8000000\G *************************** 1. row *************************** table: CountryList type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 239 Extra: *************************** 2. row *************************** table: CityList type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 4079 Extra: Using where The information displayed by EXPLAIN shows that no optimizations could be made:
EXPLAIN shows that MySQL would need to check nearly a million row combinations to produce a query result that contains only 10 rows. Clearly, this query would benefit from the creation of indexes that allow the server to look up information faster. Good columns to index typically are those that you use for searching, grouping, or sorting records. The query does not have any GROUP BY or ORDER BY clauses, but it does use columns for searching. Specifically:
To see the effect of indexing, try creating indexes on the columns used to join the tables. In the CountryList table, Code is a primary key that uniquely identifies each row. Add the index using ALTER TABLE:
mysql> ALTER TABLE CountryList ADD PRIMARY KEY (Code);
In the CityList table, CountryCode is a nonunique index because multiple cities can share the same country code:
mysql> ALTER TABLE CityList ADD INDEX (CountryCode);
After creating the indexes, EXPLAIN reports a somewhat different result: mysql> EXPLAIN SELECT CountryList.Name, CityList.Name, CityList.Population -> FROM CountryList, CityList -> WHERE CountryList.Code = CityList.CountryCode -> AND CityList.Population > 8000000\G *************************** 1. row *************************** table: CityList type: ALL possible_keys: CountryCode key: NULL key_len: NULL ref: NULL rows: 4079 Extra: Using where *************************** 2. row *************************** table: CountryList type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: CityList.CountryCode rows: 1 Extra: Observe that EXPLAIN now lists the tables in a different order. CityList appears first, which indicates that MySQL will read rows from that table first and use them to search for matches in the second table, CountryList. The change in table processing order reflects the optimizer's use of the index information that is now available for executing the query. MySQL still will scan all rows of the CityList table (its type value is ALL), but now the server can use each of those rows to directly look up the corresponding CountryList row. This is seen by the information displayed for the CountryList table:
The result from EXPLAIN shows that indexing CountryList.Code as a primary key improves query performance. However, it still indicates a full scan of the CityList table. The optimizer sees that the index on CountryCode is available, but the key value of NULL indicates that it will not be used. Does that mean the index on the CountryCode column is of no value? It depends. For this query, the index is not used. In general, however, it's good to index joined columns, so you likely would find for other queries on the CityList table that the index does help. The product of the rows now is just 4,079. That's much better than 974,881, but perhaps further improvement is possible. The WHERE clause of the query restricts CityList rows based on their Population values, so try creating an index on that column:
mysql> ALTER TABLE CityList ADD INDEX (Population);
After creating the index, run EXPLAIN again: mysql> EXPLAIN SELECT CountryList.Name, CityList.Name, CityList.Population -> FROM CountryList, CityList -> WHERE CountryList.Code = CityList.CountryCode -> AND CityList.Population > 8000000\G *************************** 1. row *************************** table: CityList type: range possible_keys: CountryCode,Population key: Population key_len: 4 ref: NULL rows: 78 Extra: Using where *************************** 2. row *************************** table: CountryList type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: CityList.CountryCode rows: 1 Extra: The output for the CountryList table is unchanged compared to the previous step. That is not a surprise; MySQL already found that it could use a primary key for lookups, which is very efficient. On the other hand, the result for the CityList table is different. The optimizer now sees two indexes in the table as candidates. Furthermore, the key value shows that it will use the index on Population to look up records. This results in an improvement over a full scan, as seen in the change of the rows value from 4,079 to 78. The query now is optimized. Note that the product of the rows values, 78, still is larger than the actual number of rows produced by the query (10 rows). This is because the rows values are only estimates. The optimizer cannot give an exact count without actually executing the query. To summarize:
The example shows that using indexes effectively can substantially reduce the work required by the server to execute a query, and that EXPLAIN is a useful tool for assessing the effect of indexing. 13.2.4 EXPLAIN Output ColumnsTo use EXPLAIN productively, it's important to know the meaning of the columns in each row of output that it produces:
The type value indicates the join type, but joins may be performed with varying degrees of efficiency. The type value provides a measure of this efficiency by indicating the basis on which rows are selected from each table. The following list shows the possible values, from the best type to the worst:
The Extra column provides additional information about how the table is processed. Some values indicate that the query is efficient:
By contrast, some Extra values indicate that the query is not efficient:
Using filesort and Using temporary generally are the two indicators of worst performance. To use EXPLAIN for query analysis, examine its output for clues to ways the query might be improved. Make the change, and then run EXPLAIN again to see how its output changes. Changes might involve rewriting the query or changing the structure of your tables. The following query rewriting techniques can be useful:
Another way to provide the optimizer with better information on which to base its decisions is to change the structure of your tables:
|
< Day Day Up > |