I l@ve RuBoard |
16.5 Shelve FilesPickling allows you to store arbitrary objects on files and file-like objects, but it's still a fairly unstructured medium; it doesn't directly support easy access to members of collections of pickled objects. Higher-level structures can be added, but they are not inherent:
Shelves provide some structure to collections of pickled objects. They are a type of file that stores arbitrary Python objects by key for later retrieval, and they are a standard part of the Python system. Really, they are not much of a new topic -- shelves are simply a combination of DBM files and object pickling:
Because shelve uses pickle internally, it can store any object that pickle can: strings, numbers, lists, dictionaries, cyclic objects, class instances, and more. 16.5.1 Using ShelvesIn other words, shelve is just a go-between; it serializes and deserializes objects so that they can be placed in DBM files. The net effect is that shelves let you store nearly arbitrary Python objects on a file by key, and fetch them back later with the same key. Your scripts never see all this interfacing, though. Like DBM files, shelves provide an interface that looks like a dictionary that must be opened. To gain access to a shelve, import the module and open your file: import shelve dbase = shelve.open("mydbase") Internally, Python opens a DBM file with name mydbase, or creates it if it does not yet exist. Assigning to a shelve key stores an object: dbase['key'] = object Internally, this assignment converts the object to a serialized byte-stream and stores it by key on a DBM file. Indexing a shelve fetches a stored object: value = dbase['key'] Internally, this index operation loads a string by key from a DBM file and unpickles it into an in-memory object that is the same as the object originally stored. Most dictionary operations are supported here, too: len(dbase) # number of items stored dbase.keys( ) # stored item key index And except for a few fine points, that's really all there is to using a shelve. Shelves are processed with normal Python dictionary syntax, so there is no new database API to learn. Moreover, objects stored and fetched from shelves are normal Python objects; they do not need to be instances of special classes or types to be stored away. That is, Python's persistence system is external to the persistent objects themselves. Table 16-2 summarizes these and other commonly used shelve operations.
Because shelves export a dictionary-like interface, too, this table is almost identical to the DBM operation table. Here, though, the module name anydbm is replaced by shelve, open calls do not require a second c argument, and stored values can be nearly arbitrary kinds of objects, not just strings. You still should close shelves explicitly after making changes to be safe, though; shelves use anydbm internally and some underlying DBMs require closes to avoid data loss or damage. 16.5.2 Storing Built-in Object TypesLet's run an interactive session to experiment with shelve interfaces: % python >>> import shelve >>> dbase = shelve.open("mydbase") >>> object1 = ['The', 'bright', ('side', 'of'), ['life']] >>> object2 = {'name': 'Brian', 'age': 33, 'motto': object1} >>> dbase['brian'] = object2 >>> dbase['knight'] = {'name': 'Knight', 'motto': 'Ni!'} >>> dbase.close( ) Here, we open a shelve and store two fairly complex dictionary and list data structures away permanently by simply assigning them to shelve keys. Because shelve uses pickle internally, almost anything goes here -- the trees of nested objects are automatically serialized into strings for storage. To fetch them back, just reopen the shelve and index: % python >>> import shelve >>> dbase = shelve.open("mydbase") >>> len(dbase) # entries 2 >>> dbase.keys( ) # index ['knight', 'brian'] >>> dbase['knight'] # fetch {'motto': 'Ni!', 'name': 'Knight'} >>> for row in dbase.keys( ): ... print row, '=>' ... for field in dbase[row].keys( ): ... print ' ', field, '=', dbase[row][field] ... knight => motto = Ni! name = Knight brian => motto = ['The', 'bright', ('side', 'of'), ['life']] age = 33 name = Brian The nested loops at the end of this session step through nested dictionaries -- the outer scans the shelve, and the inner scans the objects stored in the shelve. The crucial point to notice is that we're using normal Python syntax both to store and to fetch these persistent objects as well as to process them after loading. 16.5.3 Storing Class InstancesOne of the more useful kinds of objects to store in a shelve is a class instance. Because its attributes record state and its inherited methods define behavior, persistent class objects effectively serve the roles of both database records and database-processing programs. For instance, consider the simple class shown in Example 16-2, which is used to model people. Example 16-2. PP2E\Dbase\person.py (version 1)# a person object: fields + behavior class Person: def __init__(self, name, job, pay=0): self.name = name self.job = job self.pay = pay # real instance data def tax(self): return self.pay * 0.25 # computed on call def info(self): return self.name, self.job, self.pay, self.tax( ) We can make some persistent objects from this class by simply creating instances as usual, and storing them by key on an opened shelve: C:\...\PP2E\Dbase>python >>> from person import Person >>> bob = Person('bob', 'psychologist', 70000) >>> emily = Person('emily', 'teacher', 40000) >>> >>> import shelve >>> dbase = shelve.open('cast') # make new shelve >>> for obj in (bob, emily): # store objects >>> dbase[obj.name] = obj # use name for key >>> dbase.close( ) # need for bsddb When we come back and fetch these objects in a later Python session or script, they are recreated in memory as they were when they were stored: C:\...\PP2E\Dbase>python >>> import shelve >>> dbase = shelve.open('cast') # reopen shelve >>> >>> dbase.keys( ) # both objects are here ['emily', 'bob'] >>> print dbase['emily'] <person.Person instance at 799940> >>> >>> print dbase['bob'].tax( ) # call: bob's tax 17500.0 Notice that calling Bob's tax method works even though we didn't import the Person class here. Python is smart enough to link this object back to its original class when unpickled, such that all the original methods are available through fetched objects. 16.5.4 Changing Classes of Stored ObjectsTechnically, Python reimports a class to recreate its stored instances as they are fetched and unpickled. Here's how this works:
The key point in this is that the class itself is not stored with its instances, but is instead reimported later when instances are fetched. The upshot is that by modifying external classes in module files, we can change the way stored objects' data is interpreted and used without actually having to change those stored objects. It's as if the class is a program that processes stored records. To illustrate, suppose the Person class from the previous section was changed to the source code in Example 16-3. Example 16-3. PP2E\Dbase\person.py (version 2)# a person object: fields + behavior # change: the tax method is now a computed attribute class Person: def __init__(self, name, job, pay=0): self.name = name self.job = job self.pay = pay # real instance data def __getattr__(self, attr): # on person.attr if attr == 'tax': return self.pay * 0.30 # computed on access else: raise AttributeError # other unknown names def info(self): return self.name, self.job, self.pay, self.tax This revision has a new tax rate (30%), introduces a __getattr__ qualification overload method, and deletes the original tax method. Tax attribute references are intercepted and computed when accessed: C:\...\PP2E\Dbase>python >>> import shelve >>> dbase = shelve.open('cast') # reopen shelve >>> >>> print dbase.keys( ) # both objects are here ['emily', 'bob'] >>> print dbase['emily'] <person.Person instance at 79aea0> >>> >>> print dbase['bob'].tax # no need to call tax( ) 21000.0 Because the class has changed, tax is now simply qualified, not called. In addition, because the tax rate was changed in the class, Bob pays more this time around. Of course, this example is artificial, but when used well, this separation of classes and persistent instances can eliminate many traditional database update programs -- in most cases, you can simply change the class, not each stored instance, for new behavior. 16.5.5 Shelve ConstraintsAlthough shelves are generally straightforward to use, there are a few rough edges worth knowing about. 16.5.5.1 Keys must be stringsFirst of all, although they can store arbitrary objects, keys must still be strings. The following fails, unless you convert the integer 42 to string "42" manually first: dbase[42] = value # fails, but str(42) will work This is different from in-memory dictionaries, which allow any immutable object to be used as a key, and derives from the shelve's use of DBM files internally. 16.5.5.2 Objects are only unique within a keyAlthough the shelve module is smart enough to detect multiple occurrences of a nested object and recreate only one copy when fetched, this only holds true within a given slot: dbase[key] = [object, object] # okay: only one copy stored and fetched dbase[key1] = object dbase[key2] = object # bad?: two copies of object in the shelve When key1 and key2 are fetched, they reference independent copies of the original shared object; if that object is mutable, changes from one won't be reflected in the other. This really stems from the fact the each key assignment runs an independent pickle operation -- the pickler detects repeated objects but only within each pickle call. This may or may not be a concern in your practice and can be avoided with extra support logic, but an object can be duplicated if it spans keys. 16.5.5.3 Updates must treat shelves as fetch-modify-store mappingsBecause objects fetched from a shelve don't know that they came from a shelve, operations that change components of a fetched object only change the in-memory copy, not the data on a shelve: dbase[key].attr = value # shelve unchanged To really change an object stored on a shelve, fetch it into memory, change its parts, and then write it back to the shelve as a whole by key assignment: object = dbase[key] # fetch it object.attr = value # modify it dbase[key] = object # store back-shelve changed 16.5.5.4 Concurrent updates not allowedAs we learned near the end of Chapter 14, the shelve module does not currently support simultaneous updates. Simultaneous readers are okay, but writers must be given exclusive access to the shelve. You can trash a shelve if multiple processes write to it at the same time, and this is a common potential in things like CGI server-side scripts. If your shelves may be hit by multiple processes, be sure to wrap updates in calls to the fcntl.flock built-in we explored in Chapter 14. 16.5.5.5 Pickler class constraintsIn addition to these shelve constraints, storing class instances in a shelve adds a set of additional rules you need to be aware of. Really, these are imposed by the pickle module, not shelve, so be sure to follow these if you store class objects with pickle directly, too.
In a prior Python release, persistent object classes also had to either use constructors with no arguments, or they had to provide defaults for all constructor arguments (much like the notion of a C++ copy constructor). This constraint was dropped as of Python 1.5.2 -- classes with non-defaulted constructor arguments now work fine in the pickling system.[2]
16.5.5.6 Other persistence limitationsIn addition to the above constraints, keep in mind that files created by an underlying DBM system are not necessarily compatible with all possible DBM implementations. For instance, a file generated by gdbm may not be readable by a Python with another DBM module installed, unless you explicitly import gdbm instead of anydbm (assuming it's installed at all). If DBM file portability is a concern, make sure that all the Pythons that will read your data use compatible DBM modules. Finally, although shelves store objects persistently, they are not really object-oriented database systems (OODBs). Such systems also implement features like object decomposition and delayed ("lazy") component fetches, based on generated object IDs: parts of larger objects are loaded into memory only as they are accessed. It's possible to extend shelves to support such features, but you don't need to -- the Zope system described in Chapter 15, includes an implementation of a more complete OODB system. It is constructed on top of Python's built-in persistence support, but offers additional features for advanced data stores. See the previous chapter for information and links. |
I l@ve RuBoard |