6.2. Optimizing SELECT and Other StatementsFirst, one factor affects all statements: The more complex your permissions setup, the more overhead you have. Using simpler permissions when you issue GRANT statements enables MySQL to reduce permission-checking overhead when clients execute statements. For example, if you do not grant any table-level or column-level privileges, the server need not ever check the contents of the tables_priv and columns_priv tables. Similarly, if you place no resource limits on any accounts, the server does not have to perform resource counting. If you have a very high statement-processing load, it may be worth the time to use a simplified grant structure to reduce permission-checking overhead. If your problem is with a specific MySQL expression or function, you can perform a timing test by invoking the BENCHMARK() function using the mysql client program. Its syntax is BENCHMARK(loop_count,expression). The return value is always zero, but mysql prints a line displaying approximately how long the statement took to execute. For example: mysql> SELECT BENCHMARK(1000000,1+1);
+------------------------+
| BENCHMARK(1000000,1+1) |
+------------------------+
| 0 |
+------------------------+
1 row in set (0.32 sec)
This result was obtained on a Pentium II 400MHz system. It shows that MySQL can execute 1,000,000 simple addition expressions in 0.32 seconds on that system. All MySQL functions should be highly optimized, but there may be some exceptions. BENCHMARK() is an excellent tool for finding out if some function is a problem for your queries. 6.2.1. Optimizing Queries with EXPLAINEXPLAIN tbl_name
Or: EXPLAIN [EXTENDED] SELECT select_options
The EXPLAIN statement can be used either as a synonym for DESCRIBE or as a way to obtain information about how MySQL executes a SELECT statement:
This section describes the second use of EXPLAIN for obtaining query execution plan information. For a description of the DESCRIBE and SHOW COLUMNS statements, see the "MySQL Language Reference." With the help of EXPLAIN, you can see where you should add indexes to tables to get a faster SELECT that uses indexes to find rows. You can also use EXPLAIN to check whether the optimizer joins the tables in an optimal order. To force the optimizer to use a join order corresponding to the order in which the tables are named in the SELECT statement, begin the statement with SELECT STRAIGHT_JOIN rather than just SELECT. If you have a problem with indexes not being used when you believe that they should be, you should run ANALYZE TABLE to update table statistics, such as cardinality of keys, that can affect the choices the optimizer makes. EXPLAIN returns a row of information for each table used in the SELECT statement. The tables are listed in the output in the order that MySQL would read them while processing the query. MySQL resolves all joins using a single-sweep multi-join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on. When all tables are processed, MySQL outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. The next row is read from this table and the process continues with the next table. When the EXTENDED keyword is used, EXPLAIN produces extra information that can be viewed by issuing a SHOW WARNINGS statement following the EXPLAIN statement. This information displays how the optimizer qualifies table and column names in the SELECT statement, what the SELECT looks like after the application of rewriting and optimization rules, and possibly other notes about the optimization process. Each output row from EXPLAIN provides information about one table, and each row contains the following columns:
Condition pushdown, Using where with pushed condition, and engine_condition_ pushdown were all introduced in MySQL 5.0 Cluster. You can get a good indication of how good a join is by taking the product of the values in the rows column of the EXPLAIN output. This should tell you roughly how many rows MySQL must examine to execute the query. If you restrict queries with the max_join_size system variable, this row product also is used to determine which multiple-table SELECT statements to execute and which to abort. See Section 6.5.2, "Tuning Server Parameters." The following example shows how a multiple-table join can be optimized progressively based on the information provided by EXPLAIN. Suppose that you have the SELECT statement shown here and that you plan to examine it using EXPLAIN: EXPLAIN SELECT tt.TicketNumber, tt.TimeIn, tt.ProjectReference, tt.EstimatedShipDate, tt.ActualShipDate, tt.ClientID, tt.ServiceCodes, tt.RepetitiveID, tt.CurrentProcess, tt.CurrentDPPerson, tt.RecordVolume, tt.DPPrinted, et.COUNTRY, et_1.COUNTRY, do.CUSTNAME FROM tt, et, et AS et_1, do WHERE tt.SubmitTime IS NULL AND tt.ActualPC = et.EMPLOYID AND tt.AssignedPC = et_1.EMPLOYID AND tt.ClientID = do.CUSTNMBR; For this example, make the following assumptions:
Initially, before any optimizations have been performed, the EXPLAIN statement produces the following information: table type possible_keys key key_len ref rows Extra et ALL PRIMARY NULL NULL NULL 74 do ALL PRIMARY NULL NULL NULL 2135 et_1 ALL PRIMARY NULL NULL NULL 74 tt ALL AssignedPC, NULL NULL NULL 3872 ClientID, ActualPC range checked for each record (key map: 35) Because type is ALL for each table, this output indicates that MySQL is generating a Cartesian product of all the tables; that is, every combination of rows. This takes quite a long time, because the product of the number of rows in each table must be examined. For the case at hand, this product is 74 x 2135 x 74 x 3872 = 45,268,558,720 rows. If the tables were bigger, you can only imagine how long it would take. One problem here is that MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. tt.ActualPC is declared as CHAR(10) and et.EMPLOYID is CHAR(15), so there is a length mismatch. To fix this disparity between column lengths, use ALTER TABLE to lengthen ActualPC from 10 characters to 15 characters: mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15);
Now tt.ActualPC and et.EMPLOYID are both VARCHAR(15). Executing the EXPLAIN statement again produces this result: table type possible_keys key key_len ref rows Extra tt ALL AssignedPC, NULL NULL NULL 3872 Using ClientID, where ActualPC do ALL PRIMARY NULL NULL NULL 2135 range checked for each record (key map: 1) et_1 ALL PRIMARY NULL NULL NULL 74 range checked for each record (key map: 1) et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 This is not perfect, but is much better: The product of the rows values is less by a factor of 74. This version executes in a couple of seconds. A second alteration can be made to eliminate the column length mismatches for the tt.AssignedPC = et_1.EMPLOYID and tt.ClientID = do.CUSTNMBR comparisons: mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15), -> MODIFY ClientID VARCHAR(15); After that modification, EXPLAIN produces the output shown here: table type possible_keys key key_len ref rows Extra et ALL PRIMARY NULL NULL NULL 74 tt ref AssignedPC, ActualPC 15 et.EMPLOYID 52 Using ClientID, where ActualPC et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1 At this point, the query is optimized almost as well as possible. The remaining problem is that, by default, MySQL assumes that values in the tt.ActualPC column are evenly distributed, and that is not the case for the tt table. Fortunately, it is easy to tell MySQL to analyze the key distribution: mysql> ANALYZE TABLE tt;
With the additional index information, the join is perfect and EXPLAIN produces this result: table type possible_keys key key_len ref rows Extra tt ALL AssignedPC NULL NULL NULL 3872 Using ClientID, where ActualPC et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1 Note that the rows column in the output from EXPLAIN is an educated guess from the MySQL join optimizer. You should check whether the numbers are even close to the truth by comparing the rows product with the actual number of rows that the query returns. If the numbers are quite different, you might get better performance by using STRAIGHT_JOIN in your SELECT statement and trying to list the tables in a different order in the FROM clause. 6.2.2. Estimating Query PerformanceIn most cases, you can estimate query performance by counting disk seeks. For small tables, you can usually find a row in one disk seek (because the index is probably cached). For bigger tables, you can estimate that, using B-tree indexes, you need this many seeks to find a row: log(row_count) / log(index_block_length / 3 x 2 / (index_length +data_pointer_length)) + 1. In MySQL, an index block is usually 1,024 bytes and the data pointer is usually 4 bytes. For a 500,000-row table with an index length of three bytes (the size of MEDIUMINT), the formula indicates log(500,000)/log(1024/3*2/(3+4)) + 1 = 4 seeks. This index would require storage of about 500,000 x 7 x 3/2 = 5.2MB (assuming a typical index buffer fill ratio of 2/3), so you probably have much of the index in memory and so need only one or two calls to read data to find the row. For writes, however, you need four seek requests to find where to place a new index value and normally two seeks to update the index and write the row. Note that the preceding discussion does not mean that your application performance slowly degenerates by log N. As long as everything is cached by the OS or the MySQL server, things become only marginally slower as the table gets bigger. After the data gets too big to be cached, things start to go much slower until your applications are bound only by disk seeks (which increase by log N). To avoid this, increase the key cache size as the data grows. For MyISAM tables, the key cache size is controlled by the key_buffer_size system variable. See Section 6.5.2, "Tuning Server Parameters." 6.2.3. Speed of SELECT QueriesIn general, when you want to make a slow SELECT ... WHERE query faster, the first thing to check is whether you can add an index. All references between different tables should usually be done with indexes. You can use the EXPLAIN statement to determine which indexes are used for a SELECT. See Section 6.2.1, "Optimizing Queries with EXPLAIN," and Section 6.4.5, "How MySQL Uses Indexes" Some general tips for speeding up queries on MyISAM tables:
6.2.4. WHERE Clause OptimizationThis section discusses optimizations that can be made for processing WHERE clauses. The examples use SELECT statements, but the same optimizations apply for WHERE clauses in DELETE and UPDATE statements. Work on the MySQL optimizer is ongoing, so this section is incomplete. MySQL performs a great many optimizations, not all of which are documented here. Some of the optimizations performed by MySQL follow:
Some examples of queries that are very fast: SELECT COUNT(*) FROM tbl_name; SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name; SELECT MAX(key_part2) FROM tbl_name WHERE key_part1=constant; SELECT ... FROM tbl_name ORDER BY key_part1,key_part2,... LIMIT 10; SELECT ... FROM tbl_name ORDER BY key_part1 DESC, key_part2 DESC, ... LIMIT 10; MySQL resolves the following queries using only the index tree, assuming that the indexed columns are numeric: SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val; SELECT COUNT(*) FROM tbl_name WHERE key_part1=val1 AND key_part2=val2; SELECT key_part2 FROM tbl_name GROUP BY key_part1; The following queries use indexing to retrieve the rows in sorted order without a separate sorting pass: SELECT ... FROM tbl_name ORDER BY key_part1,key_part2,... ; SELECT ... FROM tbl_name ORDER BY key_part1 DESC, key_part2 DESC, ... ; 6.2.5. Range OptimizationThe range access method uses a single index to retrieve a subset of table rows that are contained within one or several index value intervals. It can be used for a single-part or multiple-part index. The following sections give a detailed description of how intervals are extracted from the WHERE clause. 6.2.5.1. The Range Access Method for Single-Part IndexesFor a single-part index, index value intervals can be conveniently represented by corresponding conditions in the WHERE clause, so we speak of range conditions rather than "intervals." The definition of a range condition for a single-part index is as follows:
"Constant value" in the preceding descriptions means one of the following:
Here are some examples of queries with range conditions in the WHERE clause: SELECT * FROM t1 WHERE key_col > 1 AND key_col < 10; SELECT * FROM t1 WHERE key_col = 1 OR key_col IN (15,18,20); SELECT * FROM t1 WHERE key_col LIKE ' ab%' OR key_col BETWEEN ' bar' AND 'foo' ; Note that some non-constant values may be converted to constants during the constant propagation phase. MySQL tries to extract range conditions from the WHERE clause for each of the possible indexes. During the extraction process, conditions that cannot be used for constructing the range condition are dropped, conditions that produce overlapping ranges are combined, and conditions that produce empty ranges are removed. Consider the following statement, where key1 is an indexed column and nonkey is not indexed: SELECT * FROM t1 WHERE (key1 < 'abc' AND (key1 LIKE' abcde% OR key1 LIKE' %b )) OR (key1 < 'bar' AND nonkey = 4) OR (key1 < 'uux' AND key1 > 'z' ); The extraction process for key key1 is as follows:
In general (and as demonstrated by the preceding example), the condition used for a range scan is less restrictive than the WHERE clause. MySQL performs an additional check to filter out rows that satisfy the range condition but not the full WHERE clause. The range condition extraction algorithm can handle nested AND/OR constructs of arbitrary depth, and its output does not depend on the order in which conditions appear in WHERE clause. 6.2.5.2. The Range Access Method for Multiple-Part IndexesRange conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index. For example, consider a multiple-part index defined as key1(key_part1,key_part2,key_part3), and the following set of key tuples listed in key order: key_part1 key_part2 key_part3 NULL 1 'abc' NULL 1 'xyz' NULL 2 'foo' 1 1 'abc' 1 1 'xyz' 1 2 'abc' 2 1 'aaa' The condition key_part1 = 1 defines this interval: (1,-inf,-inf) <= ( key_part1, key_part2, key_part3) < (1,+inf,+inf) The interval covers the 4th, 5th, and 6th tuples in the preceding data set and can be used by the range access method. By contrast, the condition key_part3 = 'abc' does not define a single interval and cannot be used by the range access method. The following descriptions indicate how range conditions work for multiple-part indexes in greater detail.
Section 6.2.5.1, "The Range Access Method for Single-Part Indexes," describes how optimizations are performed to combine or eliminate intervals for range conditions on a single-part index. Analogous steps are performed for range conditions on multiple-part indexes. 6.2.6. Index Merge OptimizationThe Index Merge method is used to retrieve rows with several range scans and to merge their results into one. The merge can produce unions, intersections, or unions-of-intersections of its underlying scans. Note: If you have upgraded from a previous version of MySQL, you should be aware that this type of join optimization is first introduced in MySQL 5.0, and represents a significant change in behavior with regard to indexes. (Formerly, MySQL was able to use at most only one index for each referenced table.) In EXPLAIN output, the Index Merge method appears as index_merge in the type column. In this case, the key column contains a list of indexes used, and key_len contains a list of the longest key parts for those indexes. Examples: SELECT * FROM tbl_name WHERE key_part1 = 10 OR key_part2 = 20; SELECT * FROM tbl_name WHERE (key_part1 = 10 OR key_part2 = 20) AND non_key_part=30; SELECT * FROM t1, t2 WHERE (t1.key1 IN (1,2) OR t1.key2 LIKE 'value%') AND t2.key1=t1.some_col; SELECT * FROM t1, t2 WHERE t1.key1=1 AND (t2.key1=t1.some_col OR t2.key2=t1.some_col2); The Index Merge method has several access algorithms (seen in the Extra field of EXPLAIN output):
The following sections describe these methods in greater detail. Note: The Index Merge optimization algorithm has the following known deficiencies:
The choice between different possible variants of the Index Merge access method and other access methods is based on cost estimates of various available options. 6.2.6.1. The Index Merge Intersection Access AlgorithmThis access algorithm can be employed when a WHERE clause was converted to several range conditions on different keys combined with AND, and each condition is one of the following:
Examples: SELECT * FROM innodb_table WHERE primary_key < 10 AND key_col1=20; SELECT * FROM tbl_name WHERE (key1_part1=1 AND key1_part2=2) AND key2=2; The Index Merge intersection algorithm performs simultaneous scans on all used indexes and produces the intersection of row sequences that it receives from the merged index scans. If all columns used in the query are covered by the used indexes, full table rows are not retrieved (EXPLAIN output contains Using index in ExTRa field in this case). Here is an example of such a query: SELECT COUNT(*) FROM t1 WHERE key1=1 AND key2=1; If the used indexes don't cover all columns used in the query, full rows are retrieved only when the range conditions for all used keys are satisfied. If one of the merged conditions is a condition over a primary key of an InnoDB or BDB table, it is not used for row retrieval, but is used to filter out rows retrieved using other conditions. 6.2.6.2. The Index Merge Union Access AlgorithmThe applicability criteria for this algorithm are similar to those for the Index Merge method intersection algorithm. The algorithm can be employed when the table's WHERE clause was converted to several range conditions on different keys combined with OR, and each condition is one of the following:
Examples: SELECT * FROM t1 WHERE key1=1 OR key2=2 OR key3=3; SELECT * FROM innodb_table WHERE (key1=1 AND key2=2) OR (key3= 'foo' AND key4= 'bar' ) AND key5=5; 6.2.6.3. The Index Merge Sort-Union Access AlgorithmThis access algorithm is employed when the WHERE clause was converted to several range conditions combined by OR, but for which the Index Merge method union algorithm is not applicable. Examples: SELECT * FROM tbl_name WHERE key_col1 < 10 OR key_col2 < 20; SELECT * FROM tbl_name WHERE (key_col1 > 10 OR key_col2 = 20) AND nonkey_col=30; The difference between the sort-union algorithm and the union algorithm is that the sort-union algorithm must first fetch row IDs for all rows and sort them before returning any rows. 6.2.7. IS NULL OptimizationMySQL can perform the same optimization on col_name IS NULL that it can use for col_name=constant_value. For example, MySQL can use indexes and ranges to search for NULL with IS NULL. Examples: SELECT * FROM tbl_name WHERE key_col IS NULL; SELECT * FROM tbl_name WHERE key_col <=> NULL; SELECT * FROM tbl_name WHERE key_col=const1 OR key_col=const2 OR key_col IS NULL; If a WHERE clause includes a col_name IS NULL condition for a column that is declared as NOT NULL, that expression is optimized away. This optimization does not occur in cases when the column might produce NULL anyway; for example, if it comes from a table on the right side of a LEFT JOIN. MySQL can also optimize the combination col_name = expr AND col_name IS NULL, a form that is common in resolved subqueries. EXPLAIN shows ref_or_null when this optimization is used. This optimization can handle one IS NULL for any key part. Some examples of queries that are optimized, assuming that there is an index on columns a and b of table t2: SELECT * FROM t1 WHERE t1.a=expr OR t1.a IS NULL; SELECT * FROM t1, t2 WHERE t1.a=t2.a OR t2.a IS NULL; SELECT * FROM t1, t2 WHERE (t1.a=t2.a OR t2.a IS NULL) AND t2.b=t1.b; SELECT * FROM t1, t2 WHERE t1.a=t2.a AND (t2.b=t1.b OR t2.b IS NULL); SELECT * FROM t1, t2 WHERE (t1.a=t2.a AND t2.a IS NULL AND ...) OR (t1.a=t2.a AND t2.a IS NULL AND ...); ref_or_null works by first doing a read on the reference key, and then a separate search for rows with a NULL key value. Note that the optimization can handle only one IS NULL level. In the following query, MySQL uses key lookups only on the expression (t1.a=t2.a AND t2.a IS NULL) and is not able to use the key part on b: SELECT * FROM t1, t2 WHERE (t1.a=t2.a AND t2.a IS NULL) OR (t1.b=t2.b AND t2.b IS NULL); 6.2.8. DISTINCT OptimizationDISTINCT combined with ORDER BY needs a temporary table in many cases. Because DISTINCT may use GROUP BY, you should be aware of how MySQL works with columns in ORDER BY or HAVING clauses that are not part of the selected columns. In most cases, a DISTINCT clause can be considered as a special case of GROUP BY. For example, the following two queries are equivalent: SELECT DISTINCT c1, c2, c3 FROM t1 WHERE c1 > const; SELECT c1, c2, c3 FROM t1 WHERE c1 > const GROUP BY c1, c2, c3; Due to this equivalence, the optimizations applicable to GROUP BY queries can be also applied to queries with a DISTINCT clause. Thus, for more details on the optimization possibilities for DISTINCT queries, see Section 6.2.13, "GROUP BY Optimization." When combining LIMIT row_count with DISTINCT, MySQL stops as soon as it finds row_count unique rows. If you do not use columns from all tables named in a query, MySQL stops scanning any unused tables as soon as it finds the first match. In the following case, assuming that t1 is used before t2 (which you can check with EXPLAIN), MySQL stops reading from t2 (for any particular row in t1) when it finds the first row in t2: SELECT DISTINCT t1.a FROM t1, t2 where t1.a=t2.a; 6.2.9. LEFT JOIN and RIGHT JOIN OptimizationMySQL implements an ALEFT JOIN B join_condition as follows:
The implementation of RIGHT JOIN is analogous to that of LEFT JOIN with the roles of the tables reversed. The join optimizer calculates the order in which tables should be joined. The table read order forced by LEFT JOIN or STRAIGHT_JOIN helps the join optimizer do its work much more quickly, because there are fewer table permutations to check. Note that this means that if you do a query of the following type, MySQL does a full scan on b because the LEFT JOIN forces it to be read before d: SELECT * FROM a JOIN b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key; The fix in this case is to reverse the order in which a and b are listed in the FROM clause: SELECT * FROM a JOIN b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key; For a LEFT JOIN, if the WHERE condition is always false for the generated NULL row, the LEFT JOIN is changed to a normal join. For example, the WHERE clause would be false in the following query if t2.column1 were NULL: SELECT * FROM t1 LEFT JOIN t2 ON (column1) WHERE t2.column2=5; Therefore, it is safe to convert the query to a normal join: SELECT * FROM t1, t2 WHERE t2.column2=5 AND t1.column1=t2.column1; This can be made faster because MySQL can use table t2 before table t1 if doing so would result in a better query plan. To force a specific table order, use STRAIGHT_JOIN. 6.2.10. Nested Join OptimizationAs of MySQL 5.0.1, the syntax for expressing joins allows nested joins. The syntax of table_factor is extended in comparison with the SQL Standard. The latter accepts only table_reference, not a list of them inside a pair of parentheses. This is a conservative extension if we consider each comma in a list of table_reference items as equivalent to an inner join. For example: SELECT * FROM t1 LEFT JOIN (t2, t3, t4) ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c) is equivalent to: SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4) ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c) In MySQL, CROSS JOIN is a syntactic equivalent to INNER JOIN (they can replace each other). In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause; CROSS JOIN is used otherwise. In versions of MySQL prior to 5.0.1, parentheses in table_references were just omitted and all join operations were grouped to the left. In general, parentheses can be ignored in join expressions containing only inner join operations. After removing parentheses and grouping operations to the left, the join expression: t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL) ON t1.a=t2.a transforms into the expression: (t1 LEFT JOIN t2 ON t1.a=t2.a) LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL Yet, the two expressions are not equivalent. To see this, suppose that the tables t1, t2, and t3 have the following state:
In this case, the first expression returns a result set including the rows (1,1,101,101), (2,NULL,NULL,NULL), whereas the second expression returns the rows (1,1,101,101), (2,NULL,NULL,101): mysql> SELECT * -> FROM t1 -> LEFT JOIN -> (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL) -> ON t1.a=t2.a; +------+------+------+------ + | a | a | b | b | +------+------+------+------ + | 1 | 1 | 101 | 101 | | 2 | NULL | NULL | NULL | +------+------+------+------ + mysql> SELECT * -> FROM (t1 LEFT JOIN t2 ON t1.a=t2.a) -> LEFT JOIN t3 -> ON t2.b=t3.b OR t2.b IS NULL; +------+------+------+------ + | a | a | b | b | +------+------+------+------ + | 1 | 1 | 101 | 101 | | 2 | NULL | NULL | 101 | +------+------+------+------ + In the following example, an outer join operation is used together with an inner join operation: t1 LEFT JOIN (t2, t3) ON t1.a=t2.a That expression cannot be transformed into the following expression: t1 LEFT JOIN t2 ON t1.a=t2.a, t3. For the given table states, the two expressions return different sets of rows: mysql> SELECT * -> FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a; +------+------+------+------ + | a | a | b | b | +------+------+------+------ + | 1 | 1 | 101 | 101 | | 2 | NULL | NULL | NULL | +------+------+------+------ + mysql> SELECT * -> FROM t1 LEFT JOIN t2 ON t1.a=t2.a, t3; +------+------+------+------ + | a | a | b | b | +------+------+------+------ + | 1 | 1 | 101 | 101 | | 2 | NULL | NULL | 101 | +------+------+------+------ + Therefore, if we omit parentheses in a join expression with outer join operators, we might change the result set for the original expression. More exactly, we cannot ignore parentheses in the right operand of the left outer join operation and in the left operand of a right join operation. In other words, we cannot ignore parentheses for the inner table expressions of outer join operations. Parentheses for the other operand (operand for the outer table) can be ignored. The following expression: (t1,t2) LEFT JOIN t3 ON P(t2.b,t3.b) is equivalent to this expression: t1, t2 LEFT JOIN t3 ON P(t2.b,t3.b) for any tables t1,t2,t3 and any condition P over attributes t2.b and t3.b. Whenever the order of execution of the join operations in a join expression (join_table) is not from left to right, we talk about nested joins. Consider the following queries: SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b) ON t1.a=t2.a WHERE t1.a > 1 SELECT * FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a WHERE (t2.b=t3.b OR t2.b IS NULL) AND t1.a > 1 Those queries are considered to contain these nested joins: t2 LEFT JOIN t3 ON t2.b=t3.b t2, t3 The nested join is formed in the first query with a left join operation, whereas in the second query it is formed with an inner join operation. In the first query, the parentheses can be omitted: The grammatical structure of the join expression will dictate the same order of execution for join operations. For the second query, the parentheses cannot be omitted, although the join expression here can be interpreted unambiguously without them. (In our extended syntax the parentheses in (t2, t3) of the second query are required, although theoretically the query could be parsed without them: We still would have unambiguous syntactical structure for the query because LEFT JOIN and ON would play the role of the left and right delimiters for the expression (t2,t3).) The preceding examples demonstrate these points:
Queries with nested outer joins are executed in the same pipeline manner as queries with inner joins. More exactly, a variation of the nested-loop join algorithm is exploited. Recall by what algorithmic schema the nested-loop join executes a query. Suppose that we have a join query over 3 tables T1,T2,T3 of the form: SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2) INNER JOIN T3 ON P2(T2,T3) WHERE P(T1,T2,T3). Here, P1(T1,T2) and P2(T3,T3) are some join conditions (on expressions), whereas P(t1,t2,t3) is a condition over columns of tables T1,T2,T3. The nested-loop join algorithm would execute this query in the following manner: FOR each row t1 in T1 { FOR each row t2 in T2 such that P1(t1,t2) { FOR each row t3 in T3 such that P2(t2,t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } } } } The notation t1||t2||t3 means "a row constructed by concatenating the columns of rows t1, t2, and t3." In some of the following examples, NULL where a row name appears means that NULL is used for each column of that row. For example, t1||t2||NULL means "a row constructed by concatenating the columns of rows t1 and t2, and NULL for each column of t3." Now let's consider a query with nested outer joins: SELECT * FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON P2(T2,T3)) ON P1(T1,T2) WHERE P(T1,T2,T3). For this query, we modify the nested-loop pattern to get: FOR each row t1 in T1 { BOOL f1:=FALSE; FOR each row t2 in T2 such that P1(t1,t2) { BOOL f2:=FALSE; FOR each row t3 in T3 such that P2(t2,t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } f2=TRUE; f1=TRUE; } IF (!f2) { IF P(t1,t2,NULL) { t:=t1||t2||NULL; OUTPUT t; } f1=TRUE; } } IF (!f1) { IF P(t1,NULL,NULL) { t:=t1||NULL||NULL; OUTPUT t; } } } In general, for any nested loop for the first inner table in an outer join operation, a flag is introduced that is turned off before the loop and is checked after the loop. The flag is turned on when for the current row from the outer table a match from the table representing the inner operand is found. If at the end of the loop cycle the flag is still off, no match has been found for the current row of the outer table. In this case, the row is complemented by NULL values for the columns of the inner tables. The result row is passed to the final check for the output or into the next nested loop, but only if the row satisfies the join condition of all embedded outer joins. In our example, the outer join table expressed by the following expression is embedded: (T2 LEFT JOIN T3 ON P2(T2,T3)) Note that for the query with inner joins, the optimizer could choose a different order of nested loops, such as this one: FOR each row t3 in T3 { FOR each row t2 in T2 such that P2(t2,t3) { FOR each row t1 in T1 such that P1(t1,t2) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } } } } For the queries with outer joins, the optimizer can choose only such an order where loops for outer tables precede loops for inner tables. Thus, for our query with outer joins, only one nesting order is possible. For the following query, the optimizer will evaluate two different nestings: SELECT * T1 LEFT JOIN (T2,T3) ON P1(T1,T2) AND P2(T1,T3) WHERE P(T1,T2,T3) The nestings are these: FOR each row t1 in T1 { BOOL f1:=FALSE; FOR each row t2 in T2 such that P1(t1,t2) { FOR each row t3 in T3 such that P2(t1,t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } f1:=TRUE } } IF (!f1) { IF P(t1,NULL,NULL) { t:=t1||NULL||NULL; OUTPUT t; } } } and: FOR each row t1 in T1 { BOOL f1:=FALSE; FOR each row t3 in T3 such that P2(t1,t3) { FOR each row t2 in T2 such that P1(t1,t2) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } f1:=TRUE } } IF (!f1) { IF P(t1,NULL,NULL) { t:=t1||NULL||NULL; OUTPUT t; } } } In both nestings, T1 must be processed in the outer loop because it is used in an outer join. T2 and T3 are used in an inner join, so that join must be processed in the inner loop. However, because the join is an inner join, T2 and T3 can be processed in either order. When discussing the nested-loop algorithm for inner joins, we omitted some details whose impact on the performance of query execution may be huge. We did not mention so-called "pushed-down" conditions. Suppose that our WHERE condition P(T1,T2,T3) can be represented by a conjunctive formula: P(T1,T2,T2) = C1(T1) AND C2(T2) AND C3(T3). In this case, MySQL actually uses the following nested-loop schema for the execution of the query with inner joins: FOR each row t1 in T1 such that C1(t1) { FOR each row t2 in T2 such that P1(t1,t2) AND C2(t2) { FOR each row t3 in T3 such that P2(t2,t3) AND C3(t3) { IF P(t1,t2,t3) { t:=t1||t2||t3; OUTPUT t; } } } } You see that each of the conjuncts C1(T1), C2(T2), C3(T3) is pushed out of the most inner loop to the most outer loop where it can be evaluated. If C1(T1) is a very restrictive condition, this condition pushdown may greatly reduce the number of rows from table T1 passed to the inner loops. As a result, the execution time for the query may improve immensely. For a query with outer joins, the WHERE condition is to be checked only after it has been found that the current row from the outer table has a match in the inner tables. Thus, the optimization of pushing conditions out of the inner nested loops cannot be applied directly to queries with outer joins. Here we have to introduce conditional pushed-down predicates guarded by the flags that are turned on when a match has been encountered. For our example with outer joins with: P(T1,T2,T3)=C1(T1) AND C(T2) AND C3(T3) the nested-loop schema using guarded pushed-down conditions looks like this: FOR each row t1 in T1 such that C1(t1) { BOOL f1:=FALSE; FOR each row t2 in T2 such that P1(t1,t2) AND (f1?C2(t2):TRUE) { BOOL f2:=FALSE; FOR each row t3 in T3 such that P2(t2,t3) AND (f1&&f2?C3(t3):TRUE) { IF (f1&&f2?TRUE:(C2(t2) AND C3(t3))) { t:=t1||t2||t3; OUTPUT t; } f2=TRUE; f1=TRUE; } IF (!f2) { IF (f1?TRUE:C2(t2) && P(t1,t2,NULL)) { t:=t1||t2||NULL; OUTPUT t; } f1=TRUE; } } IF (!f1 && P(t1,NULL,NULL)) { t:=t1||NULL||NULL; OUTPUT t; } } In general, pushed-down predicates can be extracted from join conditions such as P1(T1,T2) and P(T2,T3). In this case, a pushed-down predicate is guarded also by a flag that prevents checking the predicate for the NULL-complemented row generated by the corresponding outer join operation. Note that access by key from one inner table to another in the same nested join is prohibited if it is induced by a predicate from the WHERE condition. (We could use conditional key access in this case, but this technique is not employed yet in MySQL 5.0.) 6.2.11. Outer Join SimplificationTable expressions in the FROM clause of a query are simplified in many cases. At the parser stage, queries with right outer join operations are converted to equivalent queries containing only left join operations. In the general case, the conversion is performed according to the following rule: (T1, ...) RIGHT JOIN (T2,...) ON P(T1,...,T2,...) = (T2, ...) LEFT JOIN (T1,...) ON P(T1,...,T2,...) All inner join expressions of the form T1 INNER JOIN T2 ON P(T1,T2) are replaced by the list T1,T2, P(T1,T2) being joined as a conjunct to the WHERE condition (or to the join condition of the embedding join, if there is any). When the optimizer evaluates plans for join queries with outer join operation, it takes into consideration only the plans where, for each such operation, the outer tables are accessed before the inner tables. The optimizer options are limited because only such plans enable us to execute queries with outer joins operations by the nested loop schema. Suppose that we have a query of the form: SELECT * T1 LEFT JOIN T2 ON P1(T1,T2) WHERE P(T1,T2) AND R(T2) with R(T2) narrowing greatly the number of matching rows from table T2. If we executed the query as it is, the optimizer would have no other choice than to access table T1 before table T2, which may lead to a very inefficient execution plan. Fortunately, MySQL converts such a query into a query without an outer join operation if the WHERE condition is null-rejected. A condition is called null-rejected for an outer join operation if it evaluates to FALSE or to UNKNOWN for any NULL-complemented row built for the operation. Thus, for this outer join: T1 LEFT JOIN T2 ON T1.A=T2.A Conditions such as these are null-rejected: T2.B IS NOT NULL, T2.B >3, T2.C <= T1.C, T2.B < 2 OR T2.C > 1 Conditions such as these are not null-rejected: T2.B IS NULL, T1.B < 3 OR T2.B IS NOT NULL, T1.B < 3 OR T2.B > 3 The general rules for checking whether a condition is null-rejected for an outer join operation are simple. A condition is null-rejected in the following cases:
A condition can be null-rejected for one outer join operation in a query and not null-rejected for another. In the query: SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A LEFT JOIN T3 ON T3.B=T1.B WHERE T3.C > 0 the WHERE condition is null-rejected for the second outer join operation but is not null-rejected for the first one. If the WHERE condition is null-rejected for an outer join operation in a query, the outer join operation is replaced by an inner join operation. For example, the preceding query is replaced with the query: SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A INNER JOIN T3 ON T3.B=T1.B WHERE T3.C > 0 For the original query, the optimizer would evaluate plans compatible with only one access order T1,T2,T3. For the replacing query, it additionally considers the access sequence T3,T1,T2. A conversion of one outer join operation may trigger a conversion of another. Thus, the query: SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A LEFT JOIN T3 ON T3.B=T2.B WHERE T3.C > 0 will be first converted to the query: SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A INNER JOIN T3 ON T3.B=T2.B WHERE T3.C > 0 which is equivalent to the query: SELECT * FROM (T1 LEFT JOIN T2 ON T2.A=T1.A), T3 WHERE T3.C > 0 AND T3.B=T2.B Now the remaining outer join operation can be replaced by an inner join, too, because the condition T3.B=T2.B is null-rejected and we get a query without outer joins at all: SELECT * FROM (T1 INNER JOIN T2 ON T2.A=T1.A), T3 WHERE T3.C > 0 AND T3.B=T2.B Sometimes we succeed in replacing an embedded outer join operation, but cannot convert the embedding outer join. The following query: SELECT * FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON T3.B=T2.B) ON T2.A=T1.A WHERE T3.C > 0 is converted to: SELECT * FROM T1 LEFT JOIN (T2 INNER JOIN T3 ON T3.B=T2.B) ON T2.A=T1.A WHERE T3.C > 0, That can be rewritten only to the form still containing the embedding outer join operation: SELECT * FROM T1 LEFT JOIN (T2,T3) ON (T2.A=T1.A AND T3.B=T2.B) WHERE T3.C > 0. When trying to convert an embedded outer join operation in a query, we must take into account the join condition for the embedding outer join together with the WHERE condition. In the query: SELECT * FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON T3.B=T2.B) ON T2.A=T1.A AND T3.C=T1.C WHERE T3.D > 0 OR T1.D > 0 the WHERE condition is not null-rejected for the embedded outer join, but the join condition of the embedding outer join T2.A=T1.A AND T3.C=T1.C is null-rejected. So the query can be converted to: SELECT * FROM T1 LEFT JOIN (T2, T3) ON T2.A=T1.A AND T3.C=T1.C AND T3.B=T2.B WHERE T3.D > 0 OR T1.D > 0 The algorithm that converts outer join operations into inner joins was implemented in full measure, as it has been described here, in MySQL 5.0.1. MySQL 4.1 performs only some simple conversions. 6.2.12. ORDER BY OptimizationIn some cases, MySQL can use an index to satisfy an ORDER BY clause without doing any extra sorting. The index can also be used even if the ORDER BY does not match the index exactly, as long as all of the unused portions of the index and all the extra ORDER BY columns are constants in the WHERE clause. The following queries use the index to resolve the ORDER BY part: SELECT * FROM t1 ORDER BY key_part1,key_part2,... ; SELECT * FROM t1 WHERE key_part1=constant ORDER BY key_part2; SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 DESC; SELECT * FROM t1 WHERE key_part1=1 ORDER BY key_part1 DESC, key_part2 DESC; In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:
With EXPLAIN SELECT ... ORDER BY, you can check whether MySQL can use indexes to resolve the query. It cannot if you see Using filesort in the Extra column. See Section 6.2.1, "Optimizing Queries with EXPLAIN." A filesort optimization is used that records not only the sort key value and row position, but the columns required for the query as well. This avoids reading the rows twice. The filesort algorithm works like this:
This algorithm represents a significant improvement over that used in some older versions of MySQL. To avoid a slowdown, this optimization is used only if the total size of the extra columns in the sort tuple does not exceed the value of the max_length_for_sort_data system variable. (A symptom of setting the value of this variable too high is that you should see high disk activity and low CPU activity.) If you want to increase ORDER BY speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:
By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well. If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs. If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL. For example: INSERT INTO foo SELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL; 6.2.13. GROUP BY OptimizationThe most general way to satisfy a GROUP BY clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive, and then use this temporary table to discover groups and apply aggregate functions (if any). In some cases, MySQL is able to do much better than that and to avoid creation of temporary tables by using index access. The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order (for example, this is a BTREE index and not a HASH index). Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions. There are two ways to execute a GROUP BY query via index access, as detailed in the following sections. In the first method, the grouping operation is applied together with all range predicates (if any). The second method first performs a range scan, and then groups the resulting tuples. 6.2.13.1. Loose Index ScanThe most efficient way to process GROUP BY is when the index is used to directly retrieve the group fields. With this access method, MySQL uses the property of some index types that the keys are ordered (for example, BTREE). This property enables use of lookup groups in an index without having to consider all keys in the index that satisfy all WHERE conditions. This access method considers only a fraction of the keys in an index, so it is called a loose index scan. When there is no WHERE clause, a loose index scan reads as many keys as the number of groups, which may be a much smaller number than that of all keys. If the WHERE clause contains range predicates (see the discussion of the range join type in Section 6.2.1, "Optimizing Queries with EXPLAIN"), a loose index scan looks up the first key of each group that satisfies the range conditions, and again reads the least possible number of keys. This is possible under the following conditions:
The EXPLAIN output for such queries shows Using index for group-by in the ExTRa column. The following queries fall into this category, assuming that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4): SELECT c1, c2 FROM t1 GROUP BY c1, c2; SELECT DISTINCT c1, c2 FROM t1; SELECT c1, MIN(c2) FROM t1 GROUP BY c1; SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2; SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2; SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2; SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2; The following queries cannot be executed with this quick select method, for the reasons given:
6.2.13.2. Tight Index ScanA tight index scan may be either a full index scan or a range index scan, depending on the query conditions. When the conditions for a loose index scan are not met, it is still possible to avoid creation of temporary tables for GROUP BY queries. If there are range conditions in the WHERE clause, this method reads only the keys that satisfy these conditions. Otherwise, it performs an index scan. Because this method reads all keys in each range defined by the WHERE clause, or scans the whole index if there are no range conditions, we term it a tight index scan. Notice that with a tight index scan, the grouping operation is performed only after all keys that satisfy the range conditions have been found. For this method to work, it is sufficient that there is a constant equality condition for all columns in a query referring to parts of the key coming before or in between parts of the GROUP BY key. The constants from the equality conditions fill in any "gaps" in the search keys so that it is possible to form complete prefixes of the index. These index prefixes then can be used for index lookups. If we require sorting of the GROUP BY result, and it is possible to form search keys that are prefixes of the index, MySQL also avoids extra sorting operations because searching with prefixes in an ordered index already retrieves all the keys in order. The following queries do not work with the loose index scan access method described earlier, but still work with the tight index scan access method (assuming that there is an index idx(c1,c2,c3) on table t1(c1,c2,c3,c4)).
6.2.14. LIMIT OptimizationIn some cases, MySQL handles a query differently when you are using LIMIT row_count and not using HAVING:
6.2.15. How to Avoid Table ScansThe output from EXPLAIN shows ALL in the type column when MySQL uses a table scan to resolve a query. This usually happens under the following conditions:
For small tables, a table scan often is appropriate and the performance impact is negligible. For large tables, try the following techniques to avoid having the optimizer incorrectly choose a table scan:
6.2.16. Speed of INSERT StatementsThe time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:
This does not take into consideration the initial overhead to open tables, which is done once for each concurrently running query. The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes. You can use the following methods to speed up inserts:
6.2.17. Speed of UPDATE StatementsAn update statement is optimized like a SELECT query with the additional overhead of a write. The speed of the write depends on the amount of data being updated and the number of indexes that are updated. Indexes that are not changed do not get updated. Another way to get fast updates is to delay updates and then do many updates in a row later. Performing multiple updates together is much quicker than doing one at a time if you lock the table. For a MyISAM table that uses dynamic row format, updating a row to a longer total length may split the row. If you do this often, it is very important to use OPTIMIZE TABLE occasionally. 6.2.18. Speed of DELETE StatementsThe time required to delete individual rows is exactly proportional to the number of indexes. To delete rows more quickly, you can increase the size of the key cache by increasing the key_buffer_size system variable. See Section 6.5.2, "Tuning Server Parameters." To delete all rows from a table, trUNCATE TABLE tbl_name is faster than DELETE FROM tbl_name. 6.2.19. Other Optimization TipsThis section lists a number of miscellaneous tips for improving query processing speed:
|