1.5. Special Syntax

In addition to the basic AND, OR, and phrase searches, Google offers some rather extensive special syntax for narrowing your searches.

As a full-text search engine, Google indexes entire web pages instead of just titles and descriptions. Additional commands, called special syntax or advanced operators, let Google users search specific parts of web pages for specific types of information. This comes in handy when you're dealing with more than eight billion web pages and need every opportunity to narrow your search results. Specifying that your query words must appear only in the title or URL of a returned web page is a great way to have your results get very specific without making your keywords themselves too specific. Following are descriptions of the special syntax elements, ordered by common usage and function.

Some of these syntax elements work well in combination. Others fare not quite as well. Still others do not work at all. For detailed discussion on what does and does not mix, see "Mixing Syntax," below.

intitle:

intitle: restricts your search to the titles of web pages. The variation allintitle: finds pages wherein all the words specified appear in the title of the web page. Using allintitle: is basically the same as using the intitle: before each keyword.

intitle:"george bush"

allintitle:"money supply" economics

You may wish to avoid the allintitle: variation, because it doesn't mix well with some of the other syntax elements.

intext:

intext: searches only body text (i.e., ignores link text, URLs, and titles). While its uses are limited, it's perfect for finding query words that might be too common in URLs or link titles.

intext:"yahoo.com"

intext:html

There's an allintext: variation, but again, this doesn't play well with others.

inanchor:

inanchor: searches for text in a page's link anchors. A link anchor is the descriptive text of a link. For example, the link anchor in the HTML code <a href="http://www.oreilly.com">O'Reilly Media</a> is "O'Reilly Media."

inanchor:"tom peters"

As with other in*: syntax elements, there's an allinanchor: variation, which works in a similar way (i.e., all the keywords specified must appear in a page's link anchors).

site:

site: allows you to narrow your search by either a site or a top-level domain. The AltaVista search engine, by contrast, has two syntax elements for this function (host: and domain:), but Google has only the one.

site:loc.gov

site:thomas.loc.gov

site:edu

site:nc.us

Be aware that site: is no good for trying to search for a page that exists beneath the main or default site (i.e., in a subdirectory such as /~sam/album/). For example, if you're looking for something below the main GeoCities site, you can't use site: to find all the pages in http://www.geocities.com/Heartland/Meadows/6485/; Google returns no results. Use inurl: instead.

inurl:

inurl: restricts your search to the URLs of web pages. This syntax tends to work well for finding search and help pages, because they tend to be rather regular in composition. An allinurl: variation finds all the words listed in a URL but doesn't mix well with some other special syntax.

inurl:help

allinurl:search help

You'll see that using the inurl: query instead of the site: query has one immediate advantage: you can use it to search subdirectories.

While the http:// prefix in a URL is ignored by Google when used with site:, search results come up short when including it in an inurl: query. Be sure to remove prefixes in any inurl: query for the best (read: any) results.

You can also use inurl: in combination with the site: syntax to draw out information on subdomains. For example, how many subdomains does oreilly.com really have? A quick query will help you figure that out:

site:oreilly.com -inurl:www.oreilly.com

This query asks Google to list all pages from the oreilly.com domain, but leave out those pages which are from the common subdomain www, since you already know about that one.

link:

link: returns a list of pages linking to the specified URL. Enter link:www.google.com and you'll get a list of pages that link to the Google home page, www.google.com (not anywhere in the google.com domain). Don't worry about including the http:// bit; you don't need it and, indeed, Google appears to ignore it even if you do put it in. link: works just as well with "deep" URLs—http://www.raelity.org/apps/blosxom/, for instance—as with top-level URLs such as raelity.org.

cache:

cache: finds a copy of the page that Google indexed even if that page is no longer available at its original URL or has since changed its content completely.

cache:www.yahoo.com

If Google returns a result that appears to have little to do with your query, you're almost sure to find what you're looking for in the latest cached version of the page at Google.

The Google cache is particularly useful for retrieving a previous version of a page that changes often.

daterange:

daterange: limits your search to a particular date or range of dates on which a page was indexed. It's important to note that a daterange: search has nothing to do with when a page was created, but when it was indexed by Google. So a page created on February 2 but not indexed by Google until April 11 would turn up in a daterange: search for April 11.

"Geri Halliwell" "Spice Girls" daterange:2450958-2450968

For an in-depth treatment of finding content either by the date it was created or when it was first noticed by Google, see [Hack #16] .

filetype:

filetype: searches the suffixes or filename extensions. These are usually, but not necessarily, different file types; filetype:htm and filetype:html will give you different result counts, even though they're the same file type. You can even search for different page generators—such as ASP, PHP, CGI, and so forth—presuming the site isn't hiding them behind redirection and proxying. Google indexes several different Microsoft formats, including PowerPoint (.ppt), Excel (.xls), and Word (.doc).

homeschooling filetype:pdf

"leading economic indicators" filetype:ppt

related:

related:, as you might expect, finds pages that are related to the specified page. This is a good way to find categories of pages; a search for related:google.com returns a variety of search engines, including Lycos, Yahoo!, and Northern Light.

related:www.yahoo.com

related:www.cnn.com

While an increasingly rare occurrence, you'll find that not all pages are related to other pages.

info:

info: provides a page of links to more information about a specified URL. This information includes a link to the URL's cache, a list of pages that link to the URL, pages that are related to the URL, and pages that contain the URL.

info:www.oreilly.com

info:www.nytimes.com/technology

Note that this information is dependent on whether Google has indexed the specified URL; if not, information will obviously be far more limited.

phonebook:

phonebook:, as you might expect, looks up phone numbers.

phonebook:John Doe CA

phonebook:(510) 555-1212

The phonebook is covered in detail in [Hack #6].

< Day Day Up >