< Day Day Up > |
Hack 98. Program Google in PythonProgramming the Google Web API with Python is simple and clean, as these scripts and interactive examples demonstrate. Programming to the Google Web API from Python is a piece of cake, thanks to Mark Pilgrim's PyGoogle wrapper module (http://pygoogle.sourceforge.net/)—now maintained by Brian Landers. PyGoogle abstracts away much of the underlying SOAP, XML, and request/response layers, leaving you free to spend your time with the data itself. 9.15.1. PyGoogle InstallationDownload a copy of PyGoogle (http://sourceforge.net/project/showfiles.php?group_id=99616) and follow the installation instructions (http://pygoogle.sourceforge.net/dist/readme.txt). Assuming all goes to plan, this should be nothing more complex than: % python setup.py install Alternatively, if you want to give this a whirl without installing PyGoogle or don't have permissions to install it globally on your system, simply put the included SOAP.py and google.py files into the same directory as the googly.py script itself. 9.15.2. The CodeSave this code to a text file called googly.py. Be sure to replace insert key here with your own Google API key. #!/usr/bin/python # googly.py # A typical Google Web API Python script using Mark Pilgrim's # PyGoogle Google Web API wrapper # [http://diveintomark.org/projects/pygoogle/]. # Usage: python googly.py <query> import sys, string, codecs # Use the PyGoogle module. import google # Grab the query from the command line if sys.argv[1:]: query = sys.argv[1] else: sys.exit('Usage: python googly.py <query>') # Your Google API developer's key. google.LICENSE_KEY = 'insert key here' # Query Google. data = google.doGoogleSearch(query) # Teach standard output to deal with utf-8 encoding in the results. sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout) # Output. for result in data.results: print string.join( (result.title, result.URL, result.snippet), "\n"), "\n" 9.15.3. Running the HackInvoke the script on the command line ["How to Run the Hacks" in the Preface] as follows: % python googly.py
"query words" 9.15.4. The ResultsHere's a sample run, searching for "learning python": % python googly.py "learning python" oreilly.com -- Online Catalog: <b>Learning</b> <b>Python</b> http://www.oreilly.com/catalog/lpython/ <b>Learning</b> <b>Python</b> is an introduction to the increasingly popular interpreted programming language that's portable, powerful, and remarkably easy to use in both <b>...</b> ... Book Review: <b>Learning</b> <b>Python</b> http://www2.linuxjournal.com/lj-issues/issue66/3541.html <b>...</b> Issue 66: Book Review: <b>Learning</b> <b>Python</b> <b>...</b> Enter <b>Learning</b> <b>Python</b>. My executive summary is that this is the right book for me and probably for many others as well. <b>...</b> 9.15.5. Hacking the HackPython has a marvelous interface for working interactively with the interpreter. It's a good place to experiment with modules such as PyGoogle, querying the Google API on the fly and digging through the data structures it returns. Here's a sample interactive PyGoogle session demonstrating the use of the doGoogleSearch, doGetCachedPage, and doSpellingSuggestion functions: % python Python 2.2 (#1, 07/14/02, 23:25:09) [GCC Apple cpp-precomp 6.14] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import google >>> google.LICENSE_KEY = ' insert key here ' >>> data = google.doGoogleSearch("Learning Python") >>> dir(data.meta) ['_ _doc_ _', '_ _init_ _', '_ _module_ _', 'directoryCategories', 'documentFiltering', 'endIndex', 'estimateIsExact', 'estimatedTotalResultsCount', 'searchComments', 'searchQuery', 'searchTime', 'searchTips', 'startIndex'] >>> data.meta.estimatedTotalResultsCount 115000 >>> data.meta.directoryCategories [{u'specialEncoding': '', u'fullViewableName': "Top/Business/Industries/ Publishing/Publishers/Nonfiction/Business/O'Reilly_and_Associates/ Technical_Books/Python"}] >>> dir(data.results[5]) ['URL', '_ _doc_ _', '_ _init_ _', '_ _module_ _', 'cachedSize', 'directoryCategory', 'directoryTitle', 'hostName', 'relatedInformationPresent', 'snippet', 'summary', 'title'] >>> data.results[0].title 'oreilly.com -- Online Catalog: <b>Learning</b> <b>Python' >>> data.results[0].URL 'http://www.oreilly.com/catalog/lpython/' >>> google.doGetCachedPage(data.results[0].URL) '<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">\n <BASE HREF="http://www.oreilly.com/catalog/lpython/"><table border=1 ... >>> google.doSpellingSuggestion('lurn piethon' ) 'learn python' |
< Day Day Up > |