Comment in response to: http://mattturck.com/2012/07/20/data-driven-venture-capital
Considering the possibility of using quantitative techniques to invest in startups.
Quantitative investing is really quite hard, even when trying to analyse equities with decades of financial reports available. The world is complex and always changing, with no guarantee that historical conditions will prevail. Furthermore, the data that you have is generally extremely sparse relative to the dimensionality of the problem, requires lots of massaging to normalize, and is full of wrinkles that must be ironed out to normalize it for comparison (retrospective corrections, stock splits and other corporate actions etc… etc…).
The natural (and only) solution is to approach the problem with really strong priors in the form of a set of principled, theory-driven model(s) of the fundamentals, (backed by good engineering and thorough data management).
Because the problem domain itself is so complex, and the data so sparse, the models themselves have to be simple, which means that most of the opportunities for innovation are to be found in the search for new and previously underexploited data sources. This is particularly true when looking at early-stage startups, as financial history is either absent or not particularly predictive.
Perhaps fortunately, the current state of the art in data exploitation is really quite poor, meaning that many opportunities exist to improve the state of the art.
So, what opportunities can we identify?
Well, organizations are composed of people. Different organizations have different personalities, and different cultures; sometimes the people in those organizations gel together and turn into a great and highly productive team, and sometimes they do not.
If you were able to develop a really good understanding of how people work together in teams, and how different personality types, personal circumstances, technical skills and work environments come together, you could build what could be a pretty strong factor based on staff surveys, psychometric profiles and whatever other behavioral data you can lay your hands on.
(You could also use the same models to build a secondary business offering personal and organizational coaching… )
The cost of obtaining this data would be quite high, unfortunately, but there are other factors that could be attractive based simply on the ease with which large quantities of data may be collected.
For example, a statistical analysis of source code repositories and checkin histories might well yield insights into the ability of the organisation to respond to changing conditions, and to rapidly innovate.
http://alistair.cockburn.us/Characterizing+people+as+non-linear%2c+first-order+components+in+software+development