Hack 89. Be a Good Search Engine Citizen
Five don'ts and one do for
getting your site indexed by Google.
A high ranking in Google can mean a great deal of traffic. Because of
that, there are lots of people spending lots of time trying to figure
out the infallible way to get a high ranking from Google. Add this.
Remove that. Get a link from this. Don't post a link
to that.
Submitting your site to Google to be indexed is simple enough.
Google's got a site submission form (http://www.google.com/addurl.html), though
they say that, if your site has at least a few inbound links (other
sites that link to you), they should find you that way. In fact,
Google encourages URL submitters to get listed on The Open Directory
Project (ODP, http://www.dmoz.org) or Yahoo! (http://www.yahoo.com).
Nobody knows the secret of achieving high PageRank without effort.
Google uses a variety of elements, including page popularity, to
determine PageRank. PageRank is one of the factors determining how
high up a page appears in search results. But there are several
things that you should not be doing and one big thing that you
absolutely should.
Does breaking one of these rules mean that you're
automatically going to be thrown out of Google's
index? No, there are over four billion pages in
Google's index at this writing, and
it's unlikely that they'll find out
about your violation immediately. But there's a good
chance that they'll find out eventually. Is it worth
it having your site removed from the most popular search engine on
the Internet?
8.10.1. Thou Shalt Not:
Cloak Cloaking is when your web site is set up such that search engine
spiders get different pages from those that human surfers get. How
does the web site know which are the spiders and which are the
humans? By identifying the spider's User Agent or
IP—the latter being the more reliable method. An Internet Protocol (IP) address is the computer address from which
a spider comes. Everything that connects to the Internet has an IP
address. Sometimes the IP address is always the same, as with web
sites. Sometimes the IP address changes; that's
called a dynamic address. (If you use a dial-up
modem, chances are good that every time you log onto the Internet
your IP address is different. That's a dynamic IP
address.) A User Agent is a way a program that surfs the
Web identifies itself. Internet browsers like Mozilla use User
Agents, as do search engine spiders. There are literally dozens of
different kinds of User Agents; see the Web Robots Database
(http://www.robotstxt.org/wc/active.html) for
an extensive list. Advocates of cloaking claim that cloaking is useful to absolutely
optimize content for spiders. Anti-cloaking critics claim that
cloaking is an easy way to misrepresent site content—feeding a
spider a page that's designed to get the site hits
for pudding cups when actually it's all about
baseball bats. You can get more details about cloaking and different perspectives on it
at http://pandecta.com/,
http://www.apromotionguide.com/cloaking.html,
and http://www.webopedia.com/TERM/C/cloaking.html. Hide text Text is hidden by
putting words or links in a web page that are the same color as the
page's background—putting white words on a
white background, for example. This is also called
fontmatching. Why would you do this? Because a
search engine spider could read the words you've
hidden on the page while a human visitor couldn't.
Again, doing this and getting caught could get you banned from
Google's index, so don't. That goes for other page content tricks too, such as title
stacking (putting multiple copies of a title tag on one
page), putting keywords in comment tags, keyword
stuffing (putting multiple copies of keywords in very
small font on page), putting keywords not relevant to your site in
your META tags, and so on. Google
doesn't provide an exhaustive list of these types of
tricks on their site, but any attempt to circumvent or fool their
ranking system is likely to be frowned upon. Their attitude is more
like: "You can do anything you want to with your
pages, and we can do anything we want to with our index—such as
excluding your pages." Use doorway pages Sometimes, doorway pages are called
gateway pages. These are pages that are aimed
specifically at one topic, which don't have a lot of
their own original content and which lead to the main page of a site
(thus the name doorway pages). For example, say you have a page devoted to cooking. You create
doorway pages for several genres of cooking—French cooking,
Chinese cooking, vegetarian cooking, etc. The pages contain terms and
META tags relevant to each genre, but most of the
text is a copy of all the other doorway pages, and all it does is
point to your main site. Doorway pages are illegal in Google and annoying to the Google user,
so don't do it. You can learn more about
doorway pages at http://searchenginewatch.com/webmasters/bridge.html or http://www.searchengineguide.com/whalen/2002/0530_jw1.html. Check your link rank with automated queries Using automated queries (except for the sanctioned Google API) is
against Google's Terms of Service anyway. Using an
automated query to check your PageRank every 12 seconds is
triple-bad; it's not what the search engine was
built for and Google probably considers it a waste of their time and
resources. Link to "bad neighborhoods" Bad neighborhoods are those sites that exist
only to propagate links. Because link popularity is one aspect of how
Google determines PageRank, some sites have set up link
farms—sites that exist only for the purpose of
building site popularity with bunches of links. The links are not
topical, like a specialty subject index, and they're
not well-reviewed, like Yahoo!; they're just a pile
of links. Another example of a bad
neighborhood is a general FFA page. FFA stands for
free for all; it's a page where
anyone can add their link. Linking to pages like that is grounds for
a penalty from Google. Now, what happens if a page like that links to
you? Will Google penalize your page? No. Google
accepts that you have no control over who links to your site.
8.10.2. Thou Shalt:
Create great content All the HTML contortions in the world will do you little good if you
have lousy, old, or limited content. If you create great content and
promote it without playing search engine games, you will get noticed
and you will get links. Remember Sturgeon's Law:
"Ninety percent of everything is
crud." Why not make your web site an exception?
8.10.3. What Happens If You Reform?
Maybe you have a site that is not exactly the work of a good search
engine citizen. Maybe you have 500 doorway pages, 10
title tags per page, and enough hidden text to
make an O'Reilly Pocket Guide. But maybe now you
want to reform. You want to have a clean lovely site and leave the
doorway pages to Better Homes and Gardens. Are
you doomed? Will Google ban your site for the rest of its life?
No. The first thing you need to do is clean up your site—remove
all traces of rule breaking. Next, send a note about your site
changes and the URL to help@google.com. Note that Google really
doesn't have the resources to answer every email
about why they did or didn't index a
site—otherwise, they'd be answering emails all
day—and there's no guarantee that they will
reindex your kinder, gentler site. But they will look at your
message.
8.10.4. What Happens If You Spot Google Abusers in the Index?
What if some other site that you come across in your Google searching
is abusing Google's spider and
PageRank mechanism? You have two
options. You can send an email to spamreport@google.com or fill out the form at
http://www.google.com/contact/spamreport.html.
(I'd fill out the form; it reports the abuse in a
standard format that Google is used to seeing.)
|