Using Full-Text SearchingIn order to perform full-text searches, the columns to be searched must be indexed and constantly re-indexed as data changes. MySQL handles all indexing and re-indexing automatically after table columns have been appropriately designated. After indexing, SELECT can be used with Match() and Against() to actually perform the searches. Enabling Full-Text Searching SupportGenerally, full-text searching is enabled when a table is created. The CREATE TABLE statement (which will be introduced in Chapter 21) accepts a FULLTEXT clause, which is a comma-delimited list of the columns to be indexed. The following CREATE statement demonstrates the use of the FULLTEXT clause: • Input CREATE TABLE productnotes ( note_id int NOT NULL AUTO_INCREMENT, prod_id char(10) NOT NULL, note_date datetime NOT NULL, note_text text NULL , PRIMARY KEY(note_id), FULLTEXT(note_text) ) ENGINE=MyISAM; • Analysis We'll look at the CREATE TABLE statement in detail in Chapter 21. For now, just note that this CREATE TABLE statement defines table productnotes and lists the columns that it is to contain. One of those columns is named note_text, and it is indexed by MySQL for full-text searching as instructed by the clause FULLTEXT(note_text). Here FULLTEXT indexes a single column, but multiple columns may be specified if needed. Once defined, MySQL automatically maintains the index. When rows are added, updated, or deleted, the index is automatically updated accordingly. FULLTEXT may be specified at table creation time, or later on (in which case all existing data would have to be immediately indexed). Tip Don't Use FULLTEXT When Importing Data Updating indexes takes timenot a lot of time, but time nonetheless. If you are importing data into a new table, you should not enable FULLTEXT indexing at that time. Rather, first import all of the data, and then modify the table to define FULLTEXT. This makes for a much faster data import (and the total time needed to index all data will be less than the sum of the time needed to index each row individually). Performing Full-Text SearchesAfter indexing, full-text searches are performed using two functions: Match() to specify the columns to be searched and Against() to specify the search expression to be used. Here is a basic example: • Input SELECT note_text FROM productnotes WHERE Match(note_text) Against('rabbit'); • Output +-------------------------------------------------------------------+ | note_text | +-------------------------------------------------------------------| | Customer complaint: rabbit has been able to detect trap, food | | apparently less effective now. | | Quantity varies, sold by the sack load. All guaranteed to be | | bright and orange, and suitable for use as rabbit bait. | +-------------------------------------------------------------------+ • Analysis The SELECT statement retrieves a single row, note_text. For the WHERE clause, a full-text search is performed. Match(note_text) instructs MySQL to perform the search against that named column, and Against('rabbit') specifies the word rabbit as the search text. As two rows contained the word rabbit, those two rows were returned. Note Use Full Match() Specification The value passed to Match() must be the same as the one used in the FULLTEXT() definition. If multiple columns are specified, all of them must be listed (and in the correct order). Note Searches Are Not Case Sensitive Full-text searches are not case sensitive, unless BINARY mode (not covered in this chapter) is used. The truth is that the search just performed could just as easily have used a LIKE clause, as seen here: • Input SELECT note_text FROM productnotes WHERE note_text LIKE '%rabbit%'; • Output +-------------------------------------------------------------------+ | note_text | +-------------------------------------------------------------------| | Quantity varies, sold by the sack load. All guaranteed to be | | bright and orange, and suitable for use as rabbit bait. | | Customer complaint: rabbit has been able to detect trap, food | | apparently less effective now. | +-------------------------------------------------------------------+ • Analysis This SELECT retrieves the same two rows, but the order is different (although that may not always be the case). Neither of the two SELECT statements contained an ORDER BY clause. The latter (using LIKE) returns data in no particularly useful order. But the former (using full-text searching) returns data ordered by how well the text matched. Both rows contained the word rabbit, but the row that contained the word rabbit as the third word ranked higher than the row that contained it as the twentieth word. This is important. An important part of full-text searching is the ranking of results. Rows with a higher rank are returned first (as there is a higher degree of likelihood that those are the ones you really wanted). To demonstrate how ranking works, look at this example: • Input SELECT note_text, Match(note_text) Against('rabbit') AS rank FROM productnotes; • Output +-------------------------------------------------+-----------------+ | note_text | rank | +-------------------------------------------------+-----------------+ | Customer complaint: Sticks not individually | 0 | | wrapped, too easy to mistakenly detonate all | | | at once. Recommend individual wrapping. | | | Can shipped full, refills not available. Need | 0 | | to order new can if refill needed. | | | Safe is combination locked, combination not | 0 | | provided with safe. This is rarely a problem | | | as safes are typically blown up or dropped by | | | customers. | | | Quantity varies, sold by the sack load. All | 1.5905543170914 | | guaranteed to be bright and orange, and | | | suitable for as rabbit bait. | | | Included fuses are short and have been known to | 0 | | detonate too quickly for some customers. Longer | | | fuses are available (item FU1) and should be | | | recommended. | | | Matches not included, recommend purchase of | 0 | | matches or detonator (item DTNTR). | | | Please note that no returns will be accepted if | 0 | | safe opened using explosives. | | | Multiple customer returns, anvils failing to | 0 | | drop fast enough or falling backwards on | | | purchaser. Recommend that customer considers | | | using heavier anvils. | | | Item is extremely heavy. Designed for dropping, | 0 | | not recommended for use with slings, ropes, | | | pulleys, or tightropes. | | | Customer complaint: rabbit has been able to | 1.6408053837485 | | detect trap, food apparently less effective | | | now. | | | Shipped unassembled, requires common tools | 0 | | (including oversized hammer). | | | Customer complaint: Circular hole in safe floor | 0 | | can apparently be easily cut with handsaw. | | | Customer complaint: Not heavy enough to | 0 | | generate flying stars around head of victim. | | | If being purchased for dropping, recommend | | | ANV02 or ANV03 instead. | | | Call from individual trapped in safe plummeting | 0 | | to the ground, suggests an escape hatch be | | | added. Comment forwarded to vendor. | | +-------------------------------------------------+-----------------+ • Analysis Here Match() and Against() are used in the SELECT instead of the WHERE clause. This causes all rows to be returned (as there is no WHERE clause). Match() and Against() are used to create a calculated column (with the alias rank) which contains the ranking value calculated by the full-text search. The ranking is calculated by MySQL based on the number of words in the row, the number of unique words, the total number of words in the entire index, and the number of rows that contain the word. As you can see, the rows that do not contain the word rabbit have a rank of 0 (and were therefore not selected by the WHERE clause in the previous example). The two rows that do contain the word rabbit each have a rank value, and the one with the word earlier in the text has a higher rank value than the one in which the word appeared later. This helps demonstrate how full-text searching eliminates rows (those with a rank of 0), and how it sorts results (by rank in descending order). Note Ranking Multiple Search Terms If multiple search terms are specified, those that contain the most matching words will be ranked higher than those with less (or just a single match). As you can see, full-text searching offers functionality not available with simple LIKE searches. And as data is indexed, full-text searches are considerably faster, too. Using Query ExpansionQuery expansion is used to try to widen the range of returned full-text search results. Consider the following scenario. You want to find all notes with references to anvils in them. Only one note contains the word anvils, but you also want any other rows that may be related to your search, even if the specific word anvils is not contained within them. This is a job for query expansion. When query expansion is used, MySQL makes two passes through the data and indexes to perform your search:
Using query expansion you can therefore find results that might be relevant, even if they don't contain the exact words for which you were looking. Note MySQL Version 4.1.1 or Later Only Query expansion functionality was introduced in MySQL 4.1.1, and can therefore not be used in prior versions. Here is an example. First, a simple full-text search, without query expansion: • Input SELECT note_text FROM productnotes WHERE Match(note_text) Against('anvils'); • Output +-------------------------------------------------------------------+ | note_text | +-------------------------------------------------------------------+ | Multiple customer returns, anvils failing to drop fast enough or | | falling backwards on purchaser. Recommend that customer considers | | using heavier anvils. | +-------------------------------------------------------------------+ • Analysis Only one row contains the word anvils, so only one row is returned. Here is the same search, this time using query expansion: • Input SELECT note_text FROM productnotes WHERE Match(note_text) Against('anvils' WITH QUERY EXPANSION); • Output +-------------------------------------------------------------------+ | note_text | +-------------------------------------------------------------------+ | Multiple customer returns, anvils failing to drop fast enough or | | falling backwards on purchaser. Recommend that customer considers | | using heavier anvils. | | Customer complaint: Sticks not individually wrapped, too easy to | | mistakenly detonate all at once. Recommend individual wrapping. | | Customer complaint: Not heavy enough to generate flying stars | | around head of victim. If being purchased for dropping, recommend | | ANV02 or ANV03 instead. | | Please note that no returns will be accepted if safe opened using | | explosives. | | Customer complaint: rabbit has been able to detect trap, food | | apparently less effective now. | | Customer complaint: Circular hole in safe floor can apparently be | | easily cut with handsaw. | | Matches not included, recommend purchase of matches or detonator | | (item DTNTR). | +-------------------------------------------------------------------+ • Analysis This time seven rows were returned. The first contains the word anvils and is thus ranked highest. The second row has nothing to do with anvils, but as it contains two words that are also in the first row (customer and recommend) it was retrieved, too. The third row also contains those same two words, but they are further into the text and further apart, and so it was included, but ranked third. And this third row does indeed refer to anvils (by their product name). As you can see, query expansion greatly increases the number of rows returned, but in doing so also increases the number of returns that you might not actually want. Tip The More Rows the Better The more rows in your table (and the more text within those rows), the better the results returned when using query expansion. Boolean Text SearchesMySQL supports an additional form of full-text searching called boolean mode. In Boolean mode you may provide specifics as to
Tip Useable Even Without a FULLTEXT Index Boolean mode differs from the full-text search syntax used thus far in that it may be used even if no FULLTEXT index is defined. However, this would be a very slow operation (and the performance would degrade further as data volume increased). To demonstrate what IN BOOLEAN MODE does, here is a simple example: • Input SELECT note_text FROM productnotes WHERE Match(note_text) Against('heavy' IN BOOLEAN MODE); • Output +-------------------------------------------------------------------+ | note_text | +-------------------------------------------------------------------+ | Item is extremely heavy. Designed for dropping, not recommended | | for use with slings, ropes, pulleys, or tightropes. | | Customer complaint: Not heavy enough to generate flying stars | | around head of victim. If being purchased for dropping, recommend | | ANV02 or ANV03 instead. | +-------------------------------------------------------------------+ • Analysis This full-text search retrieves all rows containing the word heavy (there are two of them). The keywords IN BOOLEAN MODE are specified, but no boolean operators are actually specified and so the results are just as if boolean mode had not been specified. Note IN BOOLEAN MODE Behaves Differently Although the results in this example are the same as they would be without IN BOOLEAN MODE, there is an important difference in behavior (even if it did not manifest itself in this particular example). I'll point these out in the use notes later in this chapter. To match the rows that contain heavy but not any word beginning with rope, the following can be used: • Input SELECT note_text FROM productnotes WHERE Match(note_text) Against('heavy rope*' IN BOOLEAN MODE); • Output +-------------------------------------------------------------------+ | note_text | +-------------------------------------------------------------------+ | Customer complaint: Not heavy enough to generate flying stars | | around head of victim. If being purchased for dropping, recommend | | ANV02 or ANV03 instead. | +-------------------------------------------------------------------+ • Analysis This time only one row is returned. Again, the word heavy is matched, but this time rope* instructs MySQL to explicitly exclude any row that contains rope* (any word beginning with rope, including ropes, which is why one of the rows was excluded). Note Code Change Needed in MySQL 4.x If you are using MySQL 4.x, the previous example might not have returned any rows at all. This is the result of a bug in the processing of the * operator. To use this example in MySQL 4.x, use -ropes instead of -rope* (exclude ropes instead of any word beginning with rope). You have now seen two full-text search boolean operators: - excludes a word and * is the truncation operator (think of it as a wildcard used at the end of a word). Table 18.1 lists all of the supported boolean operators.
Here are some more examples to demonstrate the use of some of these operators: • Input SELECT note_text FROM productnotes WHERE Match(note_text) Against('+rabbit +bait"' IN BOOLEAN MODE); • Analysis This search matches rows that contain both the words rabbit and bait. • Input SELECT note_text FROM productnotes WHERE Match(note_text) Against('rabbit bait' IN BOOLEAN MODE); • Analysis Without operators specified, this search matches rows that contain at least one of rabbit or bait. • Input SELECT note_text FROM productnotes WHERE Match(note_text) Against('"rabbit bait"' IN BOOLEAN MODE); • Analysis This search matches the phrase rabbit bait instead of the two words rabbit and bait. • Input SELECT note_text FROM productnotes WHERE Match(note_text) Against('>rabbit <carrot' IN BOOLEAN MODE); • Analysis Match both rabbit and carrot, increasing the rank of the former and decreasing the rank of the latter. • Input SELECT note_text FROM productnotes WHERE Match(note_text) Against('+safe +(<combination)' IN BOOLEAN MODE); • Analysis This search matches the words safe and combination, lowering the ranking of the latter. Note Ranked, but Not Sorted In boolean mode, rows will not be returned sorted descending by ranking score. Full-Text Search Usage NotesBefore finishing this chapter, here are some important notes pertaining to the use of full-text searching:
Note No Proximity Operators One feature supported by many full-text search engines is proximity searching, the ability to search for words that are near each other (in the same sentence, in the same paragraph, or no more than a specific number of words apart, and so on). Proximity operators are not yet supported by MySQL full-text searching, although this is planned for a future release. |