Previous Page
Next Page

How Google Blog Search Works

Before Google Blog Search, it was a bit of a crap shoot trying to find information in the blogosphere. There is no single organized directory of blog sites, nor of the frequently updated content of all the blogs that exist today. The blogosphere is quite chaotic, and constantly changing; Google's traditional method of crawling the web for updated information, which normally takes a few weeks to update, was simply too slow to index blog content.

The solution to this problem came in the form of site feeds. A site feed is an automatically updated stream of a blog's contents, enabled by a special XML file format called RSS (Real-time Simple Syndication). When a blog has an RSS feed enabled, any updated content is automatically published as a special XML file that contains the RSS feed. The syndicated feed is then normally picked up by RSS feed reader programs and RSS aggregators for websites.

Google hit upon the idea of using these RSS feeds to seed its blog search index. By aggregating RSS feeds into its index, Google Blog Search is constantly (and almost immediately) updated with new blog content. The structured format of the RSS files also makes it relatively easy to accurately search for specific information and date ranges within the blog index.

While some users think that Google Blog Search only searches blogs hosted by Google's Blogger service, that isn't true. Google Blog Search searches every blog on the Internet that publishes a site feed, using either RSS or Atom formats. Google's blog index only holds posts created since the launch of Google Blog Search, however; for most blogs, that means posts made before June 2005 aren't available for searching.

Note

Atom is a feed format similar to RSS, with a few extra features.



Previous Page
Next Page