Chapter 4. Query Optimization

The world of relational database theory is a world dominated by tables and sets, and operations on tables and sets. A database is a set of tables, and a table is a set of rows and columns. When you issue a SELECT statement to retrieve rows from a table, you get back another set of rows and columnsthat is, another table. These are abstract notions that make no reference to the underlying representation a database system uses to operate on the data in your tables. Another abstraction is that operations on tables happen all at once; queries are conceptualized as set operations and there is no concept of time in set theory.

The real world, of course, is quite different. Database management systems implement abstract concepts but do so on real hardware bound by real physical constraints. As a result, queries take timesometimes an annoyingly long time. And we, being impatient creatures, don't like to wait, so we leave the abstract world of instantaneous mathematical operations on sets and look around for ways to speed up our queries. Fortunately, there are several techniques for doing so:

We index tables to allow the database server to look up rows more quickly.
We consider how to write queries to take advantage of those indexes to the fullest extent, and use the EXPLAIN statement to check whether the MySQL server really is doing so.
We write queries to affect the server's scheduling mechanism so that queries arriving from multiple clients cooperate better.
We modify the server's operating parameters to get it to perform more efficiently. We think about what's going on with the underlying hardware and how we can work around its physical constraints to improve performance.

Those are the kinds of issues that this chapter focuses on, with the goal of assisting you in optimizing the performance of your database system so that it processes your queries as quickly as possible. MySQL is already quite fast, but even the fastest database can run queries more quickly if you help it do so.