Thursday, March 29, 2012

Syllable-based forecast of best performing yc-startups from March 2012 batch

As previously written in predicting startup performance with syllables those with few syllables - typically 1 or 2 - in their name is likely to perform the best (there are of course notable exceptions, a relatively recent one being pinterest.com and instagram.com). I also wrote a prediction of the fall 2011 ycombinator batch (including rough validation of hypothesis in the first posting) 

The lists below include the current alexa rank (as a rough estimate of traffic and traction), consider that a starting point, and then growth from here can be validated later. The hypothesis is that the average growth for the top list will be significantly higher than for the bottom list. 

note to self: 
 it would be interesting to later test hypotheses around well-known english words (e.g. pair and ark) vs "functional names" (e.g. 99dresses) vs "syntetic/wordsmith names" (e.g. dealupa). Other tests would be ratio of vovels to non-vovels and impact on traffic/traction.

A-list - Predicted best performers from the ycombinator 2012 batch

Ark  - 1 syllable - 255,810 (alexa traffic rank)
Chute - 1 syllable - 3,799,189
Crowdtilt - 2 syllables - 161,411
Exec - 2 syllables - 3,623,916
Flutter - 2 syllables - 611,535 (domain: flutter.io)
Flypad - 2 syllables - 23,111,805
Kyte  - 2 syllables - 669,753
Givespark - 2 syllables - 4,900,872
Hackpad - 2 syllables - 201,595
Minefold - 2 syllables - 783,034
Midnox - 2 syllables - 3,152,778
Pair - 2 syllables - 10,045
PlanGrid - 2 syllables - 887,681
Popset - 2 syllables - 429,324 (big in japan)
Screenleap - 2 syllables -  185,485
SendHub - 2 syllables - 190,276
TiKL  - 2 syllables - 23,917,339

B-list - The rest of the startups from the batch (> 2 syllables)
42Floors  - 197,479 (alexa rank)
99dresses - 606,730
AnyVivo - no  data
Carsabi - 212,484
Coderwall - 90,456
Daily Muse - 4,320,906
Dealupa - 411,214
EveryArt - 289,105
FamilyLeaf - 606,730
HireArt - 217,183
Lvl6 - 2,612,633
Matterport - 1,749,590
Medigram - 2,427,117
Per Vices - 3,274,070
Priceonomics - 100,322
Shoptiques - 256,195
Socialcam - 46,235
Sonalight - 1,832,186
Your Mechanic - 3,650,942
Zillabyte - 1,089,060


disclaimer:
This method only counts syllables in the name and is not scientifically validated, so no reason to be offended if your startup doesn't make the A-list. There might also be erroneous counts in syllables , please let me know if you find one. 

Wednesday, August 24, 2011

Syllable-based forecast of best performing yc-startups from latest batch

As previously written in predicting startup performance with syllables those with few syllables - typically 1 or 2 - in their name is likely to perform the best. Here is a quick prediction based on the yc startups from the latest batch just published: 1 and 2-syllables (most likely to be high performers according to the few-syllable prediction)
  • MixRank (2), alexa-rank: 43,724
  • Picplum (2), alexa-rank: 700,278
  • Depteye (2), alexa-rank: no data
  • Envolve (2), alexa-rank: 52,418
  • Quartzy (2), alexa-rank: 785,102
  • Snapjoy (2), alexa-rank: 450,142
  • Opez (2), alexa-rank: 719,612
  • Stypi (2), alexa-rank: 531,965
  • ZigFu (2), alexa-rank: 21,914,936
  • Parse (1), alexa-rank: 564,829
  • Verbling (2, alexa-rank: 237,636
  • Vidyard (2), alexa-rank: 508,951
  • Tagstand (2), alexa-rank: 812,669
  • Kicksend (2, alexa-rank: 326,583
  • Can'tWait(2), alexa-rank: 786,213
I am sure the rest of the yc batch startups are just fine, but not according to syllable-based prediction, the ones I have in mind are:
  • Aisle50 (3), alexa-rank: 1,106,951
  • Launchpad Toys (3), alexa-rank: 1,425,628
  • Interviewstreet (3), alexa-rank: 135,296
  • DoubleRecall (4), alexa-rank: 799,538
  • Munch on Me (3), alexa-rank: 130,234
  • PageLever (3), alexa-rank: 55,462
  • MarketBrief (3), alexa-rank: 467,943
  • MobileWorks (3), alexa-rank: 315,150
  • Vimessa (3), alexa-rank: 314,150
  • Codeacademy (3), alexa-rank: 61,582
How did it go with the last prediction round?
Unfortunately I only added the ones I though would be best performing according to syllable-count (and not the rest for comparison) for the yc summer 2010 batch, but here is how the low syllable count  did:
  • AdGrok (acquired by Twitter)
  • Brushes (winner Apple design award 2010)
  • FanVibe (acquired by beRecruited)
  • Gantto (customers: Fujitsu, Lucasfilm++,  investor:500startups)
  • GazeHawk (investor: 500startups)
  • HipMunk (most successful startup from that yc batch?)
  • OhLife (not sure how they have done)
  • TeeVox (not sure how they have done)
Whether this beats throwing darts (random selection) is yet to be tested.
Disclaimer: If I've counted number of syllables wrong for some of the startups (have never heard pronounciation of the startup names) please ping me.

Friday, August 12, 2011

slowly back in the academic publishing game

a very long time ago I created a list of conference Call for Papers (CFP) I wanted to follow (as a fresh PhD student), this grew into a service of its own (and development shifted from me to another developer), and has been used by researchers both directly as a service and later as a part of linked data (semantic web) input, and I just became a sidekick on a poster about linked data and call for papers from that service. Hope to get the time to publish more in academic forums (in addition to blog posts) later this year, perhaps about Atbrox search technology and (forthcoming) services.

Wednesday, June 15, 2011

Mapreduce Algorithms and Search

My latest postings on the Atbrox blog:


Sunday, May 30, 2010

Evaluation of Search Predictions made in May 2000

In May 2000 I wrote A few thoughts about the future of Internet Information Retrieval (i.e. search), but how did it actually go? I've tried to evaluate them in this posting, with the original prediction in italic font followed by the evaluation.

1a) Prediction - Specialized Services within Search

It seems likely that the specialization in the Internet Information Retrieval (IIR) business will continue. Internet information crawling, pre-processing, indexing, searching and presentation requires different types of technologies and know-how, this might create opportunities for new companies specializing in only one step of the IIR "food chain". One possibility could be that companies doing crawling will do offer extracts of relevant data on request, e.g. a search engine specializing in winter sports could get only relevant data extracted from several regional crawler companies. In other words, the IIR "food chain" might increase in length.


1b) Evaluation
Specalization of search services happened to some degree, but had relatively small impact. Examples of such services include fetching/crawl-related services (e.g. 80legs). But the services with biggest impact are the free (e.g. Google Ajax Search API and Bing APIs) and commercial search APIs (e.g. Yahoo Boss and Wolfram Alpha API), all in common that they offer the last step, i.e. search - so implicitly covering all steps. Noteworthy happenings in the related direction is cloud computing and increasing number of large data sets (e.g. infochimps collection, DBPedia and the Public Terabyte (crawl) dataset)

2a) Prediction about Potential New Search Players

As the importance of Internet Information Retrieval grows, players that have been concentrating on the lower end of the Internet "food chain", i.e. major bandwidth providers (e.g. MCI or British Telecom) and network software/hardware vendors (e.g. 3COM or Cisco) might want to enter the market as providers of partially indexed data to search engines and topic hierarchies.


2b) Evaluation
This didn't happen at all to my knowledge.

3a) Prediction about Potential New Search Technologies

With the increased growth of the amount of data on the Internet, new technologies for doing distributed indexing/search of data will probably occur. This is particularly interesting if processing and indexing of multimedia data (e.g. sound, pictures and video) becomes popular. Processing of multimedia data is considerably more CPU intensive than processing of textual data. Example of such processing could be automatic detection of objects (e.g. a car) in video frames.


3b) Evaluation
(Massively) distributed indexing in the "SETI@home-style" didn't happen at large scale, though there are a few examples pursuing distributed indexing/search, e.g. the Majestic project. The in retrospective obvious processing of multimedia data is happening (but not trivial problems to solve).

Conclusion
If I am kind - 0.5 on prediction 1, 0 on prediction 2 and 0.5 on prediction 2 ~ 33.33% correct?