Tuesday, May 27, 2008
Monday, May 26, 2008
Sunday, May 25, 2008
In my previous posting I wrote about classification basics, this posting will follow up and talk about Python tools for classification and give an example with one of the tools.
Open Source Python Tools for Classification
- Orange - machine learning tool which supports classification (including combining classifiers in ensembles), feature extraction, basic statistical analysis, regression and association rules. It also has an extension module which supports clustering and additional classifier algorithms. Note: Orange is probably the Python-based machine learning tool that is most similar to the more famous tool Weka (which is for Java, or Jython for that matter).
- Monte - less comprehensive than Orange, written purely in Python (i.e. no SWIGed C++). Looks interesting (has several classifiers algorithms), but the APIs seems to be in an early phase (relatively new tool in version 0.1.0)
- libsvm - Python API for most popular open source implementation of SVM. Note: libsvm is also included with Orange and PyML. (I used this tools during my PhD a few years ago)
- RPy - not exactly a classification tool, but it is quite useful with a statistics tool when you are doing classification (it has a nice plotting capability, not unlike matlabs), check out the demo.
- PyML - also less comprehensive than Orange (specialized towards classification and regression, it supports SVM/SMO, ANN and Ridge Regression), but it has a nice API. Example of use:
from PyML import multi, svm, datafunc
# read training data, last column has the class
mydataset = datafunc.SparseDataSet('iris.data', labelsColumn = -1)
myclassifier = multi.OneAgainstRest(svm.SVM())
print "cross-validation results", myclassifier.cv(mydataset)