< Day Day Up > |
13.4 Optimizing the Logical Database StructureThis section describes techniques for organizing data that can help you achieve better query performance. 13.4.1 Choosing Appropriate Table TypesWhen creating a table, ask yourself what types of queries you'll use it for. Then choose a table type that uses a locking level appropriate for the anticipated query mix. MyISAM table-level locking works best for a query mix that is heavily skewed toward retrievals and includes few updates. Use InnoDB if you must process a query mix containing many updates. InnoDB's use of row-level locking and multi-versioning provides good concurrency for a mix of retrievals and updates. One query can update rows while other queries read or update different rows of the table. If you're using MyISAM tables, choose their structure to reflect whether you consider efficiency of processing speed or disk usage to be more important. Different MyISAM storage formats have different performance characteristics. This influences whether you choose fixed-length or variable-length columns to store string data:
For InnoDB tables, it is also true that CHAR columns take more space on average than VARCHAR. But there is no retrieval speed advantage for InnoDB as there is with MyISAM, because the InnoDB engine implements storage for both CHAR and VARCHAR in a similar way. In fact, retrieval of CHAR values might be slower because on average they require more information to be read from disk. If a MyISAM table contains a mix of fixed-length and variable-length columns, the table format will be dynamic. However, if many of the queries on the table access only its fixed-length columns, it is sometimes possible to gain advantages both of static tables (faster retrieval) and of dynamic tables (lower storage requirements) by splitting the table into two tables. Use a fixed-format table to hold the fixed-length columns and a dynamic-format table to hold the variable-length columns. To split the table into two, use this procedure:
After modifying the table structure this way, queries that retrieve only fixed-width columns can use the static table, and will be quicker. For queries that retrieve both fixed-width and variable-width columns, join the two tables using the primary key values to match up rows. Another option with MyISAM tables is to use compressed read-only tables. For more information about MyISAM table structure, see section 14.2.1, "MyISAM Storage Formats." MERGE tables can use a mix of compressed and uncompressed tables. This can be useful for time-based records. For example, if you log records each year to a different log file, you can use an uncompressed log table for the current year so that you can update it, but compress the tables for past years to save space. If you then create a MERGE table from the collection, you can easily run queries that search all tables together. 13.4.2 Using Summary TablesSuppose that you run an analysis consisting of a set of retrievals that each perform a complex SELECT of a set of records (perhaps using an expensive join), and that differ only in the way they summarize the records. That's inefficient because it unnecessarily does the work of selecting the records repeatedly. A better technique is to select the records once, and then use them to generate the summaries. In such a situation, consider the following strategy:
The following example creates a summary table containing the average GNP value of countries in each continent. Then it compares the summary information to individual countries to find those countries with a GNP much less than the average and much more than the average. First, create the summary table: mysql> CREATE TABLE ContinentGNP -> SELECT Continent, AVG(GNP) AS AvgGNP -> FROM Country GROUP BY Continent; mysql> SELECT * FROM ContinentGNP; +---------------+---------------+ | Continent | AvgGNP | +---------------+---------------+ | Asia | 150105.725490 | | Europe | 206497.065217 | | North America | 261854.789189 | | Africa | 10006.465517 | | Oceania | 14991.953571 | | Antarctica | 0.000000 | | South America | 107991.000000 | +---------------+---------------+ Next, compare the summary table to the original table to find countries that have a GNP less than 1% of the continental average: mysql> SELECT -> Country.Continent, Country.Name, -> Country.GNP AS CountryGNP, -> ContinentGNP.AvgGNP AS ContinentAvgGNP -> FROM Country, ContinentGNP -> WHERE -> Country.Continent = ContinentGNP.Continent -> AND Country.GNP < ContinentGNP.AvgGNP * .01 -> ORDER BY Country.Continent, Country.Name; +-----------+-------------------------------+------------+-----------------+ | Continent | Name | CountryGNP | ContinentAvgGNP | +-----------+-------------------------------+------------+-----------------+ | Asia | Bhutan | 372.00 | 150105.725490 | | Asia | East Timor | 0.00 | 150105.725490 | | Asia | Laos | 1292.00 | 150105.725490 | | Asia | Maldives | 199.00 | 150105.725490 | | Asia | Mongolia | 1043.00 | 150105.725490 | | Europe | Andorra | 1630.00 | 206497.065217 | | Europe | Faroe Islands | 0.00 | 206497.065217 | | Europe | Gibraltar | 258.00 | 206497.065217 | | Europe | Holy See (Vatican City State) | 9.00 | 206497.065217 | | Europe | Liechtenstein | 1119.00 | 206497.065217 | ... Use the summary table again to find countries that have a GNP more than 10 times the continental average: mysql> SELECT -> Country.Continent, Country.Name, -> Country.GNP AS CountryGNP, -> ContinentGNP.AvgGNP AS ContinentAvgGNP -> FROM Country, ContinentGNP -> WHERE -> Country.Continent = ContinentGNP.Continent -> AND Country.GNP > ContinentGNP.AvgGNP * 10 -> ORDER BY Country.Continent, Country.Name; +---------------+---------------+------------+-----------------+ | Continent | Name | CountryGNP | ContinentAvgGNP | +---------------+---------------+------------+-----------------+ | Asia | Japan | 3787042.00 | 150105.725490 | | Europe | Germany | 2133367.00 | 206497.065217 | | North America | United States | 8510700.00 | 261854.789189 | | Africa | South Africa | 116729.00 | 10006.465517 | | Oceania | Australia | 351182.00 | 14991.953571 | +---------------+---------------+------------+-----------------+ The technique of using a summary table has several benefits:
Use of summary tables has the disadvantage that the records they contain are up-to-date only as long as the original values remain unchanged, and thus so are any summaries calculated from them. If the original table rarely or never changes, this might be only a minor concern. For many applications, summaries that are close approximations are sufficiently accurate. The summary table technique can be applied at multiple levels. Create a summary table that holds the results of an initial summary, and then summarize that table in different ways to produce secondary summaries. This avoids the computational expense of generating the initial summary repeatedly. When a summary consists of a single value, you need not create a table at all. Use a SQL variable to hold the value. Then you can use the value for comparison purposes in subsequent queries without having to calculate it again. |
< Day Day Up > |