Using Subqueries As Calculated FieldsAnother way to use subqueries is in creating calculated fields. Suppose you want to display the total number of orders placed by every customer in your customers table. Orders are stored in the orders table along with the appropriate customer ID. To perform this operation, follow these steps:
As you learned in the previous two chapters, you can use SELECT COUNT(*) to count rows in a table, and by providing a WHERE clause to filter a specific customer ID, you can count just that customer's orders. For example, the following code counts the number of orders placed by customer 10001: • Input SELECT COUNT(*) AS orders FROM orders WHERE cust_id = 10001; To perform that COUNT(*) calculation for each customer, use COUNT* as a subquery. Look at the following code: • Input SELECT cust_name, cust_state, (SELECT COUNT(*) FROM orders WHERE orders.cust_id = customers.cust_id) AS orders FROM customers ORDER BY cust_name; • Output +----------------+------------+--------+ | cust_name | cust_state | orders | +----------------+------------+--------+ | Coyote Inc. | MI | 2 | | E Fudd | IL | 1 | | Mouse House | OH | 0 | | Wascals | IN | 1 | | Yosemite Place | AZ | 1 | +----------------+------------+--------+ • Analysis This SELECT statement returns three columns for every customer in the customers table: cust_name, cust_state, and orders. orders is a calculated field that is set by a subquery provided in parentheses. That subquery is executed once for every customer retrieved. In this example, the subquery is executed five times because five customers were retrieved. The WHERE clause in the subquery is a little different from the WHERE clauses used previously because it uses fully qualified column names (first mentioned in Chapter 4, "Retrieving Data"). The following clause tells SQL to compare the cust_id in the orders table to the one currently being retrieved from the customers table: WHERE orders.cust_id = customers.cust_id New Term Correlated Subquery A subquery that refers to the outer query. The type of subquery is called a correlated subquery. This syntaxthe table name and the column name separated by a periodmust be used whenever there is possible ambiguity about column names. Why? Well, let's look at what happens if fully qualified column names are not used: • Input SELECT cust_name, cust_state, (SELECT COUNT(*) FROM orders WHERE cust_id = cust_id) AS orders FROM customers ORDER BY cust_name; • Output +----------------+------------+--------+ | cust_name | cust_state | orders | +----------------+------------+--------+ | Coyote Inc. | MI | 5 | | E Fudd | IL | 5 | | Mouse House | OH | 5 | | Wascals | IN | 5 | | Yosemite Place | AZ | 5 | +----------------+------------+--------+ • Analysis Obviously the returned results are incorrect (compare them to the previous results), but why did this happen? There are two cust_id columns, one in customers and one in orders, and those two columns need to be compared to correctly match orders with their appropriate customers. Without fully qualifying the column names, MySQL assumes you are comparing the cust_id in the orders table to itself. And SELECT COUNT(*) FROM orders WHERE cust_id = cust_id; always returns the total number of orders in the orders table (because MySQL checks to see that every order's cust_id matches itself, which it always does, of course). Although subqueries are extremely useful in constructing this type of SELECT statement, care must be taken to properly qualify ambiguous column names. Note Always More Than One Solution As explained earlier in this chapter, although the sample code shown here works, it is often not the most efficient way to perform this type of data retrieval. You will revisit this example in a later chapter. Tip Build Queries with Subqueries Incrementally Testing and debugging queries with subqueries can be tricky, particularly as these statements grow in complexity. The safest way to build (and test) queries with subqueries is to do so incrementally, in much the same way as MySQL processes them. Build and test the innermost query first. Then build and test the outer query with hard-coded data, and only after you have verified that it is working embed the subquery. Then test it again. And keep repeating these steps as for each additional query. This will take just a little longer to construct your queries, but doing so saves you lots of time later (when you try to figure out why queries are not working) and significantly increases the likelihood of them working the first time. |