|< Day Day Up >
Hack 38. Perform Proximity Searches
GAPS performs a proximity check between two words.
Sometimes it would be advantageous to search both forward and backward. For example, if you're doing genealogy research, you might find your uncle John Smith as both "John Smith" or "Smith John." Similarly, some pages might include John's middle initial—"John Q Smith" or "Smith John Q."
You might also need to find concepts that exist near each other but don't make up a phrase. For example, you might want to learn about keeping squirrels out of your bird feeder. Various attempts to create a phrase based on this idea might not work, but just searching for several words might not find specific enough results.
GAPS, created by Kevin Shay, allows you to run searches both forward and backward and within a certain number of spaces of each other. GAPS stands for Google API Proximity Search, and that's exactly what this application is: a way to search for topics within a few words of each other without having to run several queries in a row. The program runs the queries and automatically organizes the results.
You enter two terms (there is an option to add more terms that will not be searched for in proximity) and specify how far apart you want them (1, 2, or 3 words). You can specify that the words be found only in the order you request (wordA, wordB) or in either order (wordA, wordB, and wordB, wordA). You can specify how many results you want and in what order they appear (sorted by title, URL, ranking, and proximity).
Search results are formatted much like regular Google results, only a distance ranking is included beside each title. The distance ranking, between one and three, specifies how far apart the two query words were on the page. Figure 2-12 shows a GAPS search for google and hacks within two words of one another, order intact.
Figure 2-12. GAPS search for "google" and "hacks" within two words of one another
Click the distance rating link to pass the generated query on to Google directly.
2.20.1. Making the Most of GAPS
GAPS works best when you have words on the same page that are ambiguously or not at all related to one another. For example, if you're looking for information on Google and search engine optimization (SEO), you might find that searching for the words Google and SEO doesn't find the results that you want, while using GAPS to search for the words Google and SEO within three words of each other finds material focused much more on search engine optimization for Google.
GAPS also works well when you're searching for information about two famous people who might often appear on the same page, though not necessarily in proximity to each other. For example, you might want information on Bill Clinton and Alan Greenspan, but might find that you're getting too many pages that happen to list the two of them. By searching for their names in proximity to each other, you'll get better results.
Finally, you might find GAPS useful in medical research. Many times your search results will include index pages that list several symptoms. However, including symptoms or other medical terms within a few words of each other can help you find more relevant results. Note that this technique will take some experimentation. Many pages about medical conditions contain long lists of symptoms and effects, and there's no reason that one symptom might be within a few words of another.
2.20.2. The Code
The GAPS source code is rather lengthy, so we're not making it available here. You can, however, get it online at http://www.staggernation.com/gaps/readme.html.
2.20.3. See Also
If you like GAPS, you might want to try a couple of other scripts from Staggernation:
|< Day Day Up >