As a full-text search engine, Google indexes entire web pages instead
of just titles and descriptions. Additional commands, called
special syntax or advanced
operators, let Google users search specific parts of web
pages for specific types of information. This comes in handy when
you're dealing with more than eight billion web
pages and need every opportunity to narrow your search results.
Specifying that your query words must appear only in the title or URL
of a returned web page is a great way to have your results get very
specific without making your keywords themselves too specific.
Following are descriptions of the special syntax elements, ordered by
common usage and function.
- intitle:
-
intitle: restricts your
search to the titles of web pages. The variation
allintitle: finds pages
wherein all the words specified appear in the title of the web page.
Using allintitle: is basically the same as using
the intitle: before each keyword.
intitle:"george bush"
allintitle:"money supply" economics
You may wish to avoid the allintitle: variation,
because it doesn't mix well with some of the other
syntax elements.
- intext:
-
intext: searches only
body text (i.e., ignores link text, URLs, and titles). While its uses
are limited, it's perfect for finding query words
that might be too common in URLs or link titles.
intext:"yahoo.com"
intext:html
There's an
allintext:
variation, but again, this doesn't play well with
others.
- inanchor:
-
inanchor: searches for
text in a page's link anchors. A link anchor is the
descriptive text of a link. For example, the link anchor in the HTML
code <a
href="http://www.oreilly.com">O'Reilly
Media</a> is
"O'Reilly Media."
inanchor:"tom peters"
As with other in*: syntax elements,
there's an allinanchor:
variation, which works in a similar way (i.e., all the keywords
specified must appear in a page's link anchors).
- site:
-
site: allows
you to narrow your search by either a site or a top-level domain. The
AltaVista search engine, by contrast, has two syntax elements for
this function (host: and
domain:), but Google has only the one.
site:loc.gov
site:thomas.loc.gov
site:edu
site:nc.us
Be aware that site: is no good for trying to
search for a page that exists beneath the main or default site (i.e.,
in a subdirectory such as /~sam/album/). For
example, if you're looking for something below the
main GeoCities site, you can't use
site: to find all the pages in http://www.geocities.com/Heartland/Meadows/6485/;
Google returns no results. Use inurl: instead.
- inurl:
-
inurl:
restricts your search to the URLs of web pages. This syntax tends to
work well for finding search and help pages, because they tend to be
rather regular in composition. An
allinurl:
variation finds all the words listed in a URL but
doesn't mix well with some other special syntax.
inurl:help
allinurl:search help
You'll see that using the inurl:
query instead of the site: query has one immediate
advantage: you can use it to search subdirectories.
|
While the http:// prefix in a URL is ignored by
Google when used with site:, search results come
up short when including it in an inurl: query. Be
sure to remove prefixes in any inurl: query for
the best (read: any) results.
|
|
You can also use inurl: in combination with the
site: syntax to draw out information on
subdomains. For example, how many subdomains does
oreilly.com really have? A quick query will help
you figure that out:
site:oreilly.com -inurl:www.oreilly.com
This query asks Google to list all pages from the
oreilly.com domain, but leave out those pages
which are from the common subdomain www, since
you already know about that one.
- link:
-
link: returns a
list of pages linking to the specified URL. Enter
link:www.google.com and you'll
get a list of pages that link to the Google home page, www.google.com (not anywhere in the
google.com domain). Don't worry about including the
http:// bit; you don't need it
and, indeed, Google appears to ignore it even if you do put it in.
link: works just as well with
"deep" URLs—http://www.raelity.org/apps/blosxom/, for
instance—as with top-level URLs such as
raelity.org.
- cache:
-
cache: finds a
copy of the page that Google indexed even if that page is no longer
available at its original URL or has since changed its content
completely.
cache:www.yahoo.com
If Google returns a result that appears to have little to do with
your query, you're almost sure to find what
you're looking for in the latest cached version of
the page at Google.
The Google cache is particularly useful for retrieving a previous
version of a page that changes often.
- daterange:
-
daterange:
limits your search to a particular date or range of dates on which a
page was indexed. It's important to note that a
daterange: search has nothing to do with when a
page was created, but when it was indexed by Google. So a page
created on February 2 but not indexed by Google until April 11 would
turn up in a daterange: search for April 11.
"Geri Halliwell" "Spice Girls" daterange:2450958-2450968
For an in-depth treatment of finding content either by the date it
was created or when it was first noticed by Google, see [Hack #16] .
- filetype:
-
filetype:
searches the suffixes or filename extensions. These are usually, but
not necessarily, different file types;
filetype:htm and filetype:html
will give you different result counts, even though
they're the same file type. You can even search for
different page generators—such as ASP, PHP, CGI, and so
forth—presuming the site isn't hiding them
behind redirection and proxying. Google indexes several different
Microsoft formats, including PowerPoint (.ppt),
Excel (.xls), and Word
(.doc).
homeschooling filetype:pdf
"leading economic indicators" filetype:ppt
- related:
-
related:, as
you might expect, finds pages that are related to the specified page.
This is a good way to find categories of pages; a search for
related:google.com returns a variety of search
engines, including Lycos, Yahoo!, and Northern Light.
related:www.yahoo.com
related:www.cnn.com
While an increasingly rare occurrence, you'll find
that not all pages are related to other pages.
- info:
-
info: provides
a page of links to more information about a specified URL. This
information includes a link to the URL's cache, a
list of pages that link to the URL, pages that are related to the
URL, and pages that contain the URL.
info:www.oreilly.com
info:www.nytimes.com/technology
Note that this information is dependent on whether Google has indexed
the specified URL; if not, information will obviously be far more
limited.
- phonebook:
-
phonebook:, as
you might expect, looks up phone numbers.
phonebook:John Doe CA
phonebook:(510) 555-1212
The phonebook is covered in detail in [Hack #6].