Tuesday, April 22, 2008

Pragmatic Classification: The very basics

Classification is an everyday task, it is about selecting one out of several outcomes based on their features. An example could be recycling of garbage where you select the bin based on the characteristics of the garbage, e.g. paper, metal, plastic or organic.

Classification with computers
For classification with computers the focus is frequently on the classifier - the function/algorithm that selects the class based on features (note: classifiers usually has to be trained to get fit for fight). Classifiers can be found in many flavors, and quite a few of them have impressive names (phrases with rough, kernel, vector, machine and reasoning aren't uncommon when naming them).

note: Garbage in leads to Garbage out - as (almost always) - same goes for classification.

The numerical baseline
Let us assume you have a data set with 1000 documents that shows to have 4 equally different categories (e.g. math, physics, chemistry and medicine). A simple classifier for a believe-to-be-similar-dataset could be the rule: "the class is math", which is likely to give a classification accuracy of about 25%. (Another classifier could be to pick a random category for every document). This can be used as a numerical baseline for comparison with when bringing in heavier classification machinery, e.g if you get 19% accuracy with the heavier machinery it probably isn't very good (or your feature representation isn't very good) for that particular problem. (Note: heavy classification machinery frequently has plenty of degrees of freedom, so fine tuning them can be a challenge, same goes for feature extraction and representation).

Combining classifiers
On the other hand, if the heavy machinery classifier gave 0% accuracy you could combine it with a random classifier to only randomly select from the 3 classes the heavy machinery classifier didn't suggest.

Question 1: What is the accuracy with these combined classifiers?

Baseline for unbalanced data sets
Quite frequently classification problems have to deal with unbalanced data sets, e.g. let us say you were to classify documents about soccer and casting (fishing), and your training data set contained about 99.99% soccer and 0.01% about casting, a baseline classifier for a similar dataset could be to say - "the article is about soccer". This would most likely be a very strong baseline, and probably hard to beat for most heavy machinery classifiers.

Silver bullet classifier and feature extraction method?

Q: My friend says that classifier algorithm X and feature extraction method Y are the best for all problems, is that the case?
A: No, tell him/her to read about the ugly duckling and no free lunch theorems which clearly says that there is no universally best classifier or feature extraction approach.

note: Just some of the basics this time, something more concrete next time (I think).

Sunday, April 20, 2008

My years on the net from 1998-2000

1998 - Insurance and Finance, and my first domain name
Once upon a time (September 1998) I bought my first domain name - agentus.com. The domain name was inspired by a course in "distributed artificial intelligence and intelligent agents" I (partially) followed while being an IT trainee in an insurance and finance company. Learned quite a bit from insurance (enjoyed working with actuaries, i.e. basic risk analysis for insurance pricing), but felt I had some unfinished business in academia (and entrepreneur wise), so in the summer of 1999 I left the finance company to university to start on a PhD scholarship in the fall.

1999 - Back to University
I spent the most of the summer of 1999 doing what could be called entrepreneurship-in-a-sandbox, i.e. not actually doing much concrete (coding wise), but reading up on e-commerce literature[2] to figure out something entrepreneurial to do. I ended up with a few vague ideas (one was creating a service to automatically create brand and domain names), but nothing that I really believed could fly (maybe the insight into risk analysis wasn't such a great thing?). The PhD scholarship was part of project called "Electronic Commercial Agents" - ElComAg. One of the first things when I created web pages for the project was an updated list of academic events and call for papers to e-commerce and CS related conferences, workshops and journals, this was gradually improved over the years and grew into eventseer.net.

2000 - Entrepreneurial year
At this time CoShopper.com and LetsBuyIt.com provided coshopping services, the idea behind these services was roughly to be a "crowd-shopping middleman" i.e. if a lots of consumers got together and purchased things (e.g. hundreds of dvd players) they should get it cheaper than on their own. Inspired by this and my recent insurance experience I thought something like: "insurance is by nature a crowd-risk-sharing product, so what is more natural than co-shopping of insurance?". Another nice property of insurance (at least selling it..) is that it is highly virtual so close to no distribution cost[3]. (Coshopping of insurance actually happened already then but more implicit, e.g. if you're a member of an organization you might get cheaper insurance).
I strongly believed this would change the insurance industry, and in particular for insurance brokers (and perhaps even for reinsurance industry, but I don't remember why :), so together with 2 CS project students I tried to make this idea fly, but the highlight became a full day workshop with executives from a very large insurance broker. (From Feb 2000 investment mood starting changing around the world and when summer of 2000 arrived it all died out).

As mentioned I had the summer before thought of a service for automatic brand and domain name creating, so I got a CS project student working on it (in parallel with the insurance project) and we got approved to participate in a summit for entrepreneurs sponsored by the University[4], the first part of the workshop revealed significant business level holes in the idea (revenue potential etc.), so we had to be agile and think of something new so we changed the idea to an ontology-based browser plugin to support search. At the summit's presentation real investors were in the panel, and when we spoke with one of them he said informally he might be willing to invest a very large amount, not being used to be close to large numbers it somewhat freaked me out[5] . Coincidentally(1) enough during the first part of the summit (which was in February ) I saw the front page of a financial newspaper (in the reception area) that the stockmarket was seriously down, i.e. the dot.com bubble burst.

Coincidentally(2) again during the second part of the entrepreneurial summit I got a phone call on behalf of a large investor, I had recently sent him an email with misc. ideas. I stayed in touch and later in the year (fall) I got funding from that investor to found a company together with my brother and the student who worked on the automatic brand name project. I worked on this company part time in addition to my PhD studies for about 2-3 years until going back to PhD fulltime.

I was a co-organizer for something called "technology café", mainly together with other PhD students from many parts of the university (e.g. industrial ecology, political science, linguistics, social anthropology and computer science). Together with a few of them we tried to develop a company (during the fall) doing consulting and services related to indicators for industrial eco-efficiency. We had meetings with potential local investors and customers, but our business model was quite vague so it was a hard sell, so the initiative unfortunately fizzled out.

One of the guys from the industrial eco-efficiency initiative had another project related to services related to carbon quota market (this was 3 years after the Kyoto Protocol), they were lacking an IT guy so I was for a brief period one of the 5 people working on it, but due to my newly funded company (2 paragraphs up) I had to pass this project on. In retrospect I honestly wish I didn't, this company - PointCarbon - became very successful now with customers in more than 200 countries :-)

Hm, 2000 was truly a busy year, other than that I remember getting my submitted first paper rejected ;-) But things started getting somewhat better on the academic front in 2001.

[1] - intellige.. twice in a course name had to be good :)
[2] - online/PDF e-commerce literature such as Make Your Site Sell and Wilsonweb, and magazines: Fast Company, Business 2.0, Red Herring and Wired (some of them approximating phone book size per monthly issue. Not as being in silicon valley, but an ok substitute while being 9 timezones east of it, i.e. Scandinavia).
[3] - you pay someone to take on your risk, and the combined pay of the entire crowd is going to at least cover the costs of reimbursements to those who have incidents the insurance covers.
[4] - it was mainly traditional engineering companies participating at the innovation summit, e.g. with more tangible projects like a mobile asfalt producing vehicle. Our idea was accepted because it was exotic with a dot.com'ish like company (at least I believe so).
[5] - I saw bigger numbers as in IT trainee in insurance and finance, but they felt more distant.

Saturday, April 12, 2008

Biting the Hand that Feeds the Double Standard?


Maybe it is just me, but I am personally somewhat puzzled by two recent world events:
  1. protests against a big event in a country who at the same time is the major supplier of goods to those who are protesting against the event (double standards?)
  2. suggested actions that severely irritates another country who is the major supplier of energy to several of the countries who are irritating it (biting the hand that feeds you?)

Friday, April 4, 2008

A Machine Learning Theory Dream Team



Russia is the birthplace of many great mathematicians, and the work from quite a few of them have significant impact on state-of-the-art computer science and machine learning theory, e.g.:
  • Ludwig O. Hesse (1811-1874) - Hessian Matrix
    • Used in calculations of Logistic Regression (which can be used for binary classification), and feed-forward Neural Networks
  • David Hilbert (1862-1943) - Hilbert Space
    • E.g. a feature space for Radial Basis Function kernels in Support Vector Machine classifiers can be described with a Hilbert Space
  • Andrey N. Kolmogorov (1903-1987) - Kolmogorov Complexity
    • Used in algorithmic information theory, and also in theory behind evolutionary and genetic programming
  • Andrei A. Markov (1856-1922) - Markov Models and Markov Chains
    • Can be used e.g. for simulation (in games).
    • Noteworthy later "spinn-offs": Hidden Markov Models (HMM) and Markov Chain Monte Carlo (MCMC).
  • Andrei N. Tikhonov (1906-1993) - Tikhonov Regularization
    • Tikhonov Regularization is roughly a templating language for classification and regression. The template variable is a loss function, e.g. if using a square loss function you get Ridge Regression (also known as Regularized Least Squares Regression or Shrinkage Regression), an epsilon-insensitive loss function gives Support Vector Machine Regression, and a hinge loss function gives Support Vector Machine Classification.
Kudos.