Sunday, May 30, 2010

Evaluation of Search Predictions made in May 2000

In May 2000 I wrote A few thoughts about the future of Internet Information Retrieval (i.e. search), but how did it actually go? I've tried to evaluate them in this posting, with the original prediction in italic font followed by the evaluation.

1a) Prediction - Specialized Services within Search

It seems likely that the specialization in the Internet Information Retrieval (IIR) business will continue. Internet information crawling, pre-processing, indexing, searching and presentation requires different types of technologies and know-how, this might create opportunities for new companies specializing in only one step of the IIR "food chain". One possibility could be that companies doing crawling will do offer extracts of relevant data on request, e.g. a search engine specializing in winter sports could get only relevant data extracted from several regional crawler companies. In other words, the IIR "food chain" might increase in length.

1b) Evaluation
Specalization of search services happened to some degree, but had relatively small impact. Examples of such services include fetching/crawl-related services (e.g. 80legs). But the services with biggest impact are the free (e.g. Google Ajax Search API and Bing APIs) and commercial search APIs (e.g. Yahoo Boss and Wolfram Alpha API), all in common that they offer the last step, i.e. search - so implicitly covering all steps. Noteworthy happenings in the related direction is cloud computing and increasing number of large data sets (e.g. infochimps collection, DBPedia and the Public Terabyte (crawl) dataset)

2a) Prediction about Potential New Search Players

As the importance of Internet Information Retrieval grows, players that have been concentrating on the lower end of the Internet "food chain", i.e. major bandwidth providers (e.g. MCI or British Telecom) and network software/hardware vendors (e.g. 3COM or Cisco) might want to enter the market as providers of partially indexed data to search engines and topic hierarchies.

2b) Evaluation
This didn't happen at all to my knowledge.

3a) Prediction about Potential New Search Technologies

With the increased growth of the amount of data on the Internet, new technologies for doing distributed indexing/search of data will probably occur. This is particularly interesting if processing and indexing of multimedia data (e.g. sound, pictures and video) becomes popular. Processing of multimedia data is considerably more CPU intensive than processing of textual data. Example of such processing could be automatic detection of objects (e.g. a car) in video frames.

3b) Evaluation
(Massively) distributed indexing in the "SETI@home-style" didn't happen at large scale, though there are a few examples pursuing distributed indexing/search, e.g. the Majestic project. The in retrospective obvious processing of multimedia data is happening (but not trivial problems to solve).

If I am kind - 0.5 on prediction 1, 0 on prediction 2 and 0.5 on prediction 2 ~ 33.33% correct?

Thursday, May 13, 2010

Overview of my postings on the Atbrox blog (Nov 2009-May 2010)

As mentioned in a previous posting I mainly write on Atbrox's blog (and not here), in case you haven't seen them, here is an overview of my postings since November 2009  and so far in May 2010:


Hadoop and Mapreduce

(for even earlier postings on Atbrox check out this overview)