11.3. Berkeley DB InterfacingPython comes with the bsddb package, which wraps the Berkeley Database (also known as BSD DB) library if that library is installed on your system and your Python installation is built to support it. With the BSD DB library, you can create hash, binary-tree, or record-based files that generally behave like persistent dictionaries. On Windows, Python includes a port of the BSD DB library, thus ensuring that module bsddb is always usable. To download BSD DB sources, binaries for other platforms, and detailed documentation on BSD DB itself, see http://www.sleepycat.com. 11.3.1. Simplified and Complete BSD DB Python InterfacesModule bsddb itself provides a simplified, backward-compatible interface to a subset of BSD DB's functionality, as covered by the Python online documentation at http://www.python.org/doc/2.4/lib/module-bsddb.html. However, the standard Python library also comes with many modules in package bsddb, starting with bsddb.db. This set of modules closely mimics BSD DB's current rich, complex functionality and interfaces, and is documented at http://pybsddb.sourceforge.net/bsddb3.html. At this URL, you'll see the package documented under the slightly different name bsddb3, which is the name of a package you can separately download and install even on very old versions of Python. However, to use the version of this package that comes as part of the Python standard library, what you need to import are modules named bsddb.db and the like, not bsddb3.db and the like. Apart from this naming detail, the Sourceforge documentation fully applies to the modules in package bsddb in the Python standard library (db, dbshelve, dbtables, dbutil, dbobj, dbrecio). Entire books can be (and have been) written about the full interface to BSD DB and its functionality, so I do not cover this rich, complete, and complex interface in this book. (If you need to exploit BSD DB's complete functionality, I suggest, in addition to studying the URLs mentioned above, the book Berkeley DB, by Sleepycat Software [New Riders].) However, in Python you can also access a small but important subset of BSD DB's functionality in a much simpler way, through the simplified interface provided by module bsddb and covered in the following. 11.3.2. Module bsddbModule bsddb supplies three factory functions: btopen, hashopen, and rnopen.
11.3.3. Examples of Berkeley DB UseThe Berkeley DB is suited to tasks similar to those for which DBM-like files are appropriate. Indeed, anydbm uses dbhash, the DBM-like interface to BSD DB, to create new DBM-like files. In addition, BSD DB allows other file formats when you use module bsddb directly. The binary tree format is not as fast as the hashed format for keyed access, but excellent when you also need to access keys in alphabetical order. The following example handles the same task as the DBM example shown earlier, but uses bsddb rather than anydbm: import fileinput, os, bsddb wordPos = { } sep = os.pathsep for line in fileinput.input( ): pos = '%s%s%s'%(fileinput.filename( ), sep, fileinput.filelineno( )) for word in line.split( ): wordPos.setdefault(word,[ ]).append(pos) btOut = bsddb.btopen('btindex','n') sep2 = sep * 2 for word in wordPos: btOut[word] = sep2.join(wordPos[word]) btOut.close( ) The differences between this example and the DBM one are minimal: writing a new binary tree format file with bsddb is basically the same task as writing a new DBM-like file with anydbm. Reading back the data using bsddb.btopen('btindex') rather than anydbm.open('indexfile') is also similar. To illustrate the extra features of binary trees regarding access to keys in alphabetical order, let's tackle a slightly more general task. The following example treats its command-line arguments as specifying the beginning of words, and prints the lines in which any word with such a beginning appears: import sys, os, bsddb, linecache btIn = bsddb.btopen('btindex') sep = os.pathsep sep2 = sep * 2 for word in sys.argv[1:]: key, pos = btIn.set_location(word) if not key.startswith(word): sys.stderr.write('Word-start %r not found in index file\n' % word) while key.startswith(word): places = pos.split(sep2) for place in places: fname, lineno = place.split(sep) print "%r occurs in line %s of file %s:" % (word,lineno,fname) print linecache.getline(fname, int(lineno)), try: key, pos = btIn.next( ) except IndexError: break This example exploits the fact that btIn.set_location sets btIn's current position to the smallest key larger than word, when word itself is not a key in btIn. When word is the start of a word, and the keys are words, this means that set_location sets the current position to the first word, in alphabetical order, that begins with word. The tests with key.startswith(word) checks that we're still scanning words with that beginning, and terminate the while loop when that is no longer the case. We perform the first such test in an if statement, right before the while, because we want to single out the case where no word at all starts with the desired beginning, and output an error message in that specific case. |