18.5 Parser Generators
If you have any background in
parsing theory, you may know that neither regular expressions nor
string splitting is powerful enough to handle more complex language
grammars (roughly, they don't have the "memory"
required by true grammars). For more sophisticated language analysis
tasks, we sometimes need a full-blown parser. Since Python is built
for integrating C tools, we can write integrations to traditional
parser generator systems such as yacc and
bison. Better yet, we could use an integration
that already exists.
There are
also Python-specific parsing systems accessible from Python's
web site. Among them, the kwParsing system,
developed by Aaron Watters, is a parser generator written in Python,
and the SPARK toolkit, developed by John Aycock,
is a lightweight system that employs the Earley algorithm to work
around technical problems with LALR parser generation (if you
don't know what that means, you probably don't need to
care). Since these are all complex tools, though, we'll skip
their details in this text. Consult http://www.python.org for information on
parser generator tools available for use in Python programs.
Speaking of parser generators: to use some of these tools in Python
programs, you'll need an extension module that integrates them.
The first step in such scenarios should always be to see if the
extension already exists in the public domain. Especially for common
tools like these, chances are that someone else has already written
an integration that you can use off-the-shelf instead of writing one
from scratch.
Of course, not everyone can donate all their extension modules to the
public domain, but there's a growing library of available
components that you can pick up for free and a community of experts
to query. Visit http://www.python.org for links to Python
software resources. With some half a million Python users out there
as I write this book, there is much that can be found in the
prior-art department.
|
Of special interest to this chapter,
also see YAPPS -- Yet Another Python Parser System. YAPPS is a
parser generator written in Python. It uses supplied rules to
generate human-readable Python code that implements a recursive
descent parser. The parsers generated by YAPPS look much like (and
are inspired by) the hand-coded expression parsers shown in the next
section. YAPPS creates LL(1) parsers, which are not as powerful as
LALR parsers, but sufficient for many language tasks. For more on
YAPPS, see http://theory.stanford.edu/~amitp/Yapps.
|