Amund Tveit's Blog: Test-Driven Parsing with Python, Dparser and Doctest

Dparser for Python is the easiest-to-use parsing framework I've ever seen and used. It requires very little boilerplate code and supports having grammars in docstrings. Under the hood Dparser is a Scannerless Generalized Left-to-right Rightmost (GLR) derivation parser based on the Tomita Algorithm (no less :-) (See also A look at DParser for Python - note: a bit old article)

DParser and Doctest
Since DParser uses docstrings for grammars it didn't work together with Python doctest, but a small change in dparser.py replacing occurrences of f.__doc__ with f.__doc__.split(""">>>""")[0] seemed to fix that. This means that you can relatively easily do test-driven development of parsers, i.e. just write tests and the least amount of grammar + code to make tests pass. (note: I used Dparser 1.18 combined with (Stackless) Python 2.5.1 on Linux). Check out the small example below to see how it works (first part of docstrings has the grammar and the rest of the docstring are doctests testing the corresponding grammar):

from dparser import Parser
import doctest

def d_start(t):
   """start : noun verb
   >>> Parser().parse('cat flies')
   >>> Parser().parse('dog flies')
   """

def d_noun(t):
   """noun : 'cat' | 'dog'
   >>> Parser().parse('cat', start_symbol='noun')
   'cat'
   >>> Parser().parse('dog', start_symbol='noun')
   'dog'
   """
   return t[0]

def d_verb(t):
   """verb : 'flies'
   >>> Parser().parse('flies', start_symbol='verb')
   'flies'
   """
   return t[0]

def _test():
   doctest.testmod()

if __name__ == '__main__':
   _test()


# note: output when running time python example.py -v
Trying:
    Parser().parse('cat', start_symbol='noun')
Expecting:
    'cat'
ok
Trying:
    Parser().parse('dog', start_symbol='noun')
Expecting:
    'dog'
ok
Trying:
    Parser().parse('cat flies')
Expecting nothing
ok
Trying:
    Parser().parse('dog flies')
Expecting nothing
ok
Trying:
    Parser().parse('flies', start_symbol='verb')
Expecting:
    'flies'
ok
2 items had no tests:
    __main__
    __main__._test
3 items passed all tests:
   2 tests in __main__.d_noun
   2 tests in __main__.d_start
   1 tests in __main__.d_verb
5 tests in 5 items.
5 passed and 0 failed.
Test passed.

real    0m0.080s
user    0m0.068s
sys     0m0.008s

Amund Tveit's Blog

Thursday, March 20, 2008

Test-Driven Parsing with Python, Dparser and Doctest

No comments:

Translate this blog

Blog Archive

BlogCatalog

FEEDJIT Live Traffic Map