15.7 XML Processing Tools
Python ships with XML parsing support in
its standard library and plays host to a vigorous XML
special-interest group. XML (eXtended Markup
Language) is a tag-based markup language for describing many kinds of
structured data. Among other things, it has been adopted in roles
such as a standard database and Internet content representation by
many companies. As an object-oriented scripting language, Python
mixes remarkably well with XML's core notion of structured
document interchange, and promises to be a major player in the XML
arena.
XML is based
upon a tag syntax familiar to web page writers. Python's
xmllib library module includes tools for parsing
XML. In short, this XML parser is used by defining a subclass of an
XMLParser Python class, with methods that serve as
callbacks to be invoked as various XML structures are detected. Text
analysis is largely automated by the library module. This
module's source code, file xmllib.py in
the Python library, includes self-test code near the bottom that
gives additional usage details. Python also ships with a standard
HTML parser, htmllib, that works on similar
principles and is based upon the sgmllib SGML
parser module.
Unfortunately, Python's XML support is still evolving, and
describing it is well beyond the scope of this book. Rather than
going into further details here, I will instead point you to sources
for more information:
- Standard library
-
First off, be sure to consult the Python library manual for more on
the standard library's XML support tools. At the moment, this
includes only the xmllib parser module, but may
expand over time.
- PyXML SIG package
-
At this writing, the best place to find Python XML tools and
documentation is at the XML SIG (Special Interest Group) web page at
http://www.python.org (click on
the "SIGs" link near the top). This SIG is dedicated to
wedding XML technologies with Python, and publishes a free XML tools
package distribution called PyXML. That package
contains tools not yet part of the standard Python distribution,
including XML parsers implemented in both C and Python, a Python
implementation of SAX and DOM (the XML Document Object Model), a
Python interface to the Expat parser, sample code,
documentation, and a test suite.
- Third-party tools
-
You can also find free, third-party Python support tools for XML on
the Web by following links at the XML SIGs web page. These include a
DOM implementation for CORBA environments (4DOM) that currently
supports two ORBs (ILU and Fnorb) and much more.
- Documentation
-
As I wrote these words, a book dedicated to XML processing with
Python was on the eve of its publication; check the books list at
http://www.python.org or your
favorite book outlet for details.
Given the rapid evolution of XML technology, I wouldn't wager
on any of these resources being up to date a few years after this
edition's release, so be sure to check Python's web site
for more recent developments on this front.
|
In fact, the XML story changed substantially between the time I wrote
this section and when I finally submitted it to O'Reilly. In
Python 2.0, some of the tools described here as the PyXML SIG package
have made their way into a standard xml module
package in the Python library. In other words, they ship and install
with Python itself; see the Python 2.0 library manual for more
details. O'Reilly has a book in the works on this topic called
Python and XML.
|
|
|