Sunday, May 25, 2008

Pragmatic Classification with Python

In my previous posting I wrote about classification basics, this posting will follow up and talk about Python tools for classification and give an example with one of the tools.

Open Source Python Tools for Classification
  • Monte - less comprehensive than Orange, written purely in Python (i.e. no SWIGed C++). Looks interesting (has several classifiers algorithms), but the APIs seems to be in an early phase (relatively new tool in version 0.1.0)
  • libsvm - Python API for most popular open source implementation of SVM. Note: libsvm is also included with Orange and PyML. (I used this tools during my PhD a few years ago)
  • RPy - not exactly a classification tool, but it is quite useful with a statistics tool when you are doing classification (it has a nice plotting capability, not unlike matlabs), check out the demo.
  • PyML - also less comprehensive than Orange (specialized towards classification and regression, it supports SVM/SMO, ANN and Ridge Regression), but it has a nice API. Example of use:

    from PyML import multi, svm, datafunc
    # read training data, last column has the class
    mydataset = datafunc.SparseDataSet('iris.data', labelsColumn = -1)
    myclassifier = multi.OneAgainstRest(svm.SVM())
    print "cross-validation results", myclassifier.cv(mydataset)
My recommendation is to either go with Orange or with PyML.