Thursday, March 20, 2008

Test-Driven Parsing with Python, Dparser and Doctest

Dparser for Python is the easiest-to-use parsing framework I've ever seen and used. It requires very little boilerplate code and supports having grammars in docstrings. Under the hood Dparser is a Scannerless Generalized Left-to-right Rightmost (GLR) derivation parser based on the Tomita Algorithm (no less :-) (See also A look at DParser for Python - note: a bit old article)

DParser and Doctest
Since DParser uses docstrings for grammars it didn't work together with Python doctest, but a small change in dparser.py replacing occurrences of f.__doc__ with f.__doc__.split(""">>>""")[0] seemed to fix that. This means that you can relatively easily do test-driven development of parsers, i.e. just write tests and the least amount of grammar + code to make tests pass. (note: I used Dparser 1.18 combined with (Stackless) Python 2.5.1 on Linux). Check out the small example below to see how it works (first part of docstrings has the grammar and the rest of the docstring are doctests testing the corresponding grammar):

from dparser import Parser
import doctest

def d_start(t):
"""start : noun verb
>>> Parser().parse('cat flies')
>>> Parser().parse('dog flies')
"
""

def
d_noun(t):
"""noun : 'cat' | 'dog'
>>> Parser().parse('cat', start_symbol='noun')
'cat'
>>> Parser().parse('dog', start_symbol='noun')
'dog'
"
""
return t[0]

def
d_verb(t):
"""verb : 'flies'
>>> Parser().parse('flies', start_symbol='verb')
'flies'
"
""
return t[0]

def
_test():
doctest.testmod()

if __name__ == '__main__':
_test()


# note: output when running time python example.py -v
Trying:
Parser().parse('cat', start_symbol='noun')
Expecting:
'cat'
ok
Trying:
Parser().parse('dog', start_symbol='noun')
Expecting:
'dog'
ok
Trying:
Parser().parse('cat flies')
Expecting nothing
ok
Trying:
Parser().parse('dog flies')
Expecting nothing
ok
Trying:
Parser().parse('flies', start_symbol='verb')
Expecting:
'flies'
ok
2 items had no tests:
__main__
__main__._test
3 items passed all tests:
2 tests in __main__.d_noun
2 tests in __main__.d_start
1 tests in __main__.d_verb
5 tests in 5 items.
5 passed and 0 failed.
Test passed.

real 0m0.080s
user 0m0.068s
sys 0m0.008s

No comments: