< Day Day Up > |
Hack 91. Remove Your Materials from GoogleRemove your content from Google's various web properties. Some people are more than thrilled to have Google index their sites. Other folks don't want the GoogleBot anywhere near them. If you fall into the latter category and the bot's already done its worst, there are several things you can do to remove your materials from Google's index. Each part of Google—Web Search, Google Images, and Google Groups—has its own set of methodologies. 8.12.1. Google Web SearchHere are several tips to avoid being listed. 8.12.1.1 Making sure your pages never get there to begin withWhile you can take steps to remove your content from the Google index after the fact, it's always much easier to make sure the content is never found and indexed in the first place. Google's crawler obeys the robot exclusion protocol, a set of instructions you put on your web site that tells the crawler how to behave when it comes to your content. You can implement these instructions in two ways: via a META tag that you put on each page (handy when you want to restrict access to only certain pages or certain types of content) or via a robots.txt file that you insert in your root directory (handy when you want to block some spiders completely or want to restrict access to kinds or directories of content). You can get more information about the robots exclusion protocol and how to implement it at http://www.robotstxt.org/. 8.12.1.2 Removing your pages after they're indexedThere are several things you can have removed from Google's results.
<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="GOOGLEBOT" CONTENT="NOSNIPPET">
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE"> 8.12.1.3 Removing that content nowOnce you implement these changes, the next time GoogleBot crawls your web site (usually within a few weeks), it will remove or limit your content according to your META tags and robots.txt file. If you want your materials removed right away, you can use the automatic remover at http://services.google.com:8882/urlconsole/controller. You'll have to sign in with an account (requires an email address and a password). Using the remover, you can request that Google crawl your newly created robots.txt file, or you can enter the URL of a page that contains exclusionary META tags.
8.12.1.4 Reporting pages with inappropriate contentWhile you may like your own content fine, you might find that, even if you have filtering activated, you're getting search results with explicit content. Or you might find a site with a misleading title tag and content completely unrelated to your search. You have two options for reporting these sites to Google. Bear in mind that there's no guarantee that Google will remove the sites from the index, but they will investigate them. At the bottom of each page of search results, you'll see a "Dissatisfied? Help Us Improve" link; follow it to a form for reporting inappropriate sites. You can also send the URL of explicit sites that show up on a SafeSearch but probably shouldn't to safesearch@google.com. If you have more general complaints about a search result, you can send an email to search-quality@google.com. 8.12.2. Google ImagesGoogle's Image database of materials is separate from that of the main search index. To remove items from Google Images, use robots.txt to specify that the GoogleBot Image crawler should stay away from your site. Add these lines to your robots.txt file: User-agent: Googlebot-Image Disallow: / You can use the automatic remover mentioned in the web search section to have Google remove the images from its index database quickly. There may be cases where someone has put images on their server for which you own the copyright. In other words, you don't have access to their server to add a robots.txt file, but you need to stop Google from indexing your content there. In this case, you need to contact Google directly. Google has instructions for situations just like this at http://www.google.com/remove.html; look at Option 2, "If you do not have any access to the server that hosts your image." 8.12.3. Google GroupsLike the Google Web Index, you have the option to both prevent material from being archived on Google and to remove it after the fact. 8.12.3.1 Preventing your material from being archivedTo prevent your material from being archived on Google, add the following line to the headers of your Usenet posts: X-No-Archive: yes If you do not have the options to edit the headers of your post, make that line the first line in your post itself. 8.12.3.2 Removing materials after the factIf you want materials removed after the fact, you have a couple of options:
8.12.4. Google PhonebookYou migt not want to have your contact information made available via the phonebook searches on Google. You'll have to follow one of two procedures, depending on whether the listing you want removed is for a business or for a residential number. If you want to remove a business phone number, you'll need to send a request on your business letterhead to:
Be sure to include a phone number so that Google can reach you to verify your request. Removing a residential phone number is much simpler. Fill out the form at http://www.google.com/help/pbremoval.html. The form asks for your name, city and state, phone number, email address, and reason for removal, a multiple choice: incorrect number, privacy issue, or "other." |
< Day Day Up > |