Thursday, March 20, 2008

Test-Driven Parsing with Python, Dparser and Doctest

Dparser for Python is the easiest-to-use parsing framework I've ever seen and used. It requires very little boilerplate code and supports having grammars in docstrings. Under the hood Dparser is a Scannerless Generalized Left-to-right Rightmost (GLR) derivation parser based on the Tomita Algorithm (no less :-) (See also A look at DParser for Python - note: a bit old article)

DParser and Doctest
Since DParser uses docstrings for grammars it didn't work together with Python doctest, but a small change in dparser.py replacing occurrences of f.__doc__ with f.__doc__.split(""">>>""")[0] seemed to fix that. This means that you can relatively easily do test-driven development of parsers, i.e. just write tests and the least amount of grammar + code to make tests pass. (note: I used Dparser 1.18 combined with (Stackless) Python 2.5.1 on Linux). Check out the small example below to see how it works (first part of docstrings has the grammar and the rest of the docstring are doctests testing the corresponding grammar):

from dparser import Parser
import doctest

def d_start(t):
"""start : noun verb
>>> Parser().parse('cat flies')
>>> Parser().parse('dog flies')
"
""

def
d_noun(t):
"""noun : 'cat' | 'dog'
>>> Parser().parse('cat', start_symbol='noun')
'cat'
>>> Parser().parse('dog', start_symbol='noun')
'dog'
"
""
return t[0]

def
d_verb(t):
"""verb : 'flies'
>>> Parser().parse('flies', start_symbol='verb')
'flies'
"
""
return t[0]

def
_test():
doctest.testmod()

if __name__ == '__main__':
_test()


# note: output when running time python example.py -v
Trying:
Parser().parse('cat', start_symbol='noun')
Expecting:
'cat'
ok
Trying:
Parser().parse('dog', start_symbol='noun')
Expecting:
'dog'
ok
Trying:
Parser().parse('cat flies')
Expecting nothing
ok
Trying:
Parser().parse('dog flies')
Expecting nothing
ok
Trying:
Parser().parse('flies', start_symbol='verb')
Expecting:
'flies'
ok
2 items had no tests:
__main__
__main__._test
3 items passed all tests:
2 tests in __main__.d_noun
2 tests in __main__.d_start
1 tests in __main__.d_verb
5 tests in 5 items.
5 passed and 0 failed.
Test passed.

real 0m0.080s
user 0m0.068s
sys 0m0.008s

Tuesday, March 18, 2008

Fridge Malware

Tuesday, March 11, 2008

Software Engineer tv-series for kids?

Bob the builder, Fetch the Vet, Fireman Sam and Postman Pat are all great. But where is the software engineer tv-series for kids? Not to be overly politically correct, but there also seems to be a slight unbalance in the gender of the main person for the existing shows - Ada the software engineer tv-series anyone?

It would be great to hear kids get inspired and ask for a new ide or compiler instead of a new shovel..

Thursday, March 6, 2008

My first years on the Net

Figured out that I have been on the Internet for about 19 years (a prime time I must say). Didn't experience the stone age of the Internet, but I probably started somewhere around the bronze age or at least the iron age of the Internet. Here follows a summary of my first years on the net.

1989-90
I was fortunate and got (dial-up) Internet access at high school through a project called SIRNett (~Schools in regional network, purpose was distance-education collaboration between high schools). Cool things at the time included Turbo Pascal, Amiga and e-mail. Was Guinea pig for the the Winix email software (developed in Trondheim), the software also had terminal emulation (we had a local unix box of some kind, don't remember which). Winix later became infamously known (for a while) as the largest IT scandal in Norway, but I guess investing hundreds of millions of kroner (/5 to get rough $ amounts) on creating commercial software relying on Internet services around 1990 was somewhat optimistic.
(note: The first Internet ISP appeared in 1990
(according to this source there were about 560k worldwide Internet users in 1989)

1991
1st (playful) year at University, lots of computers and all with Internet connection. MUD (Multi-User Dungeon), archie and ftp were the cool things (nic.funet.fi). Computers were either IBM PS/2's or DEC VAX/VMS terminals (even one terminal with Motif/Xish UI). Met some smart people in NVG who were extremely early adopters of Linux (Linux was born in 1991). The local college had a 64kbit/s line (which was easily saturated by ftp traffic).
File format of the year: DMS (for Amiga).

1992
Became involved in PVV. Learned to use Latex to write documents. Starting using Perl. Tried a bunch of flavours of unixes and met some incredibly smart people (quite a few of them ended up in the search industry, e.g. Yahoo, Fast, Google, Overture and Microsoft :) Monster box at the time was flipper.pvv.unit.no, it was a Dolphin OS based RISC box (spinn-off company of then still-alive Norsk Data). Tried Gopher (which I was quite impressed of, little did I know what was yet to come).

1993-1994
Mosaic, NCSA Web server, Netscape. Oh joy!

1995
Semester Project about "Marketing on the Internet". Field trip to Silicon Valley with visits to Berkeley University, Apple, Oracle, IBM Almaden Research Center (Ted Selker showed us a prototype keyboard with 2 trackpoints), Sun (demo of Java, it was only ~1+- applet at the time, one at San Francisco Chronicle if I don't remember incorrectly), Silicon Graphics where I got my head scanned on a nice Onyx IRIX box. But I didn't know that 10 years later I should visit the same campus and seriously use my head again in a job interview with multiple rock stars :)

1996-1997
Internship with IBM Canada. Cool software at the time was PointCast and Smit (for AIX). Working for IBM while Deep Blue beat Kasparov in Chess was quite fun :)

For my years on the net from 1998 to 2000 click here