Filtering GroupsIn addition to being able to group data using GROUP BY, MySQL also allows you to filter which groups to include and which to exclude. For example, you might want a list of all customers who have made at least two orders. To obtain this data you must filter based on the complete group, not on individual rows. You've already seen the WHERE clause in action (introduced in Chapter 6, "Filtering Data"). But WHERE does not work here because WHERE filters specific rows, not groups. As a matter of fact, WHERE has no idea what a group is. So what do you use instead of WHERE? MySQL provides yet another clause for this purpose: the HAVING clause. HAVING is very similar to WHERE. In fact, all types of WHERE clauses you learned about thus far can also be used with HAVING. The only difference is that WHERE filters rows and HAVING filters groups. Tip HAVING Supports All of WHERE's Operators In Chapter 6 and Chapter 7, "Advanced Data Filtering," you learned about WHERE clause conditions (including wildcard conditions and clauses with multiple operators). All the techniques and options you learned about WHERE can be applied to HAVING. The syntax is identical; just the keyword is different. So how do you filter rows? Look at the following example: • Input SELECT cust_id, COUNT(*) AS orders FROM orders GROUP BY cust_id HAVING COUNT(*) >= 2; • Output j115+---------+--------+ | cust_id | orders | +---------+--------+ | 10001 | 2 | +---------+--------+ • Analysis The first three lines of this SELECT statement are similar to the statements seen previously. The final line adds a HAVING clause that filters on those groups with a COUNT(*) >= 2two or more orders. As you can see, a WHERE clause does not work here because the filtering is based on the group aggregate value, not on the values of specific rows. Note The Difference Between HAVING and WHERE Here's another way to look at it: WHERE filters before data is grouped, and HAVING filters after data is grouped. This is an important distinction; rows that are eliminated by a WHERE clause are not included in the group. This could change the calculated values, which in turn could affect which groups are filtered based on the use of those values in the HAVING clause. So is there ever a need to use both WHERE and HAVING clauses in one statement? Actually, yes, there is. Suppose you want to further filter the previous statement so it returns any customers who placed two or more orders in the past 12 months. To do that, you can add a WHERE clause that filters out just the orders placed in the past 12 months. You then add a HAVING clause to filter just the groups with two or more rows in them. To better demonstrate this, look at the following example that lists all vendors who have 2 or more products priced at 10 or more: • Input SELECT vend_id, COUNT(*) AS num_prods FROM products WHERE prod_price >= 10 GROUP BY vend_id HAVING COUNT(*) >= 2; • Output +---------+-----------+ | vend_id | num_prods | +---------+-----------+ | 1003 | 4 | | 1005 | 2 | +---------+-----------+ • Analysis This statement warrants an explanation. The first line is a basic SELECT using an aggregate functionmuch like the examples thus far. The WHERE clause filters all rows with a prod_price of at least 10. Data is then grouped by vend_id, and then a HAVING clause filters just those groups with a count of 2 or more. Without the WHERE clause two extra rows would have been retrieved (vendor 1002 who only sells products all priced under 10, and vendor 1001 who sells three products but only one of them is priced greater or equal to 10) as seen here: • Input SELECT vend_id, COUNT(*) AS num_prods FROM products GROUP BY vend_id HAVING COUNT(*) >= 2; • Output +---------+-----------+ | vend_id | num_prods | +---------+-----------+ | 1001 | 3 | | 1002 | 2 | | 1003 | 7 | | 1005 | 2 | +---------+-----------+ |