Idle Conjectures in Search of Refutation: Productivity Models.

The economic realities of software development are rather different from other forms of labor.

Firstly, productivity rates vary enormously, both for teams and for individuals. For example, one individual can be many orders of magnitude more productive than another. Equally, that same individual may experience variability in productivity of many orders of magnitude from one project to the next. People who talk about the 10Xer effect seem to (incorrectly) assume that people are constant - once a 10Xer, always a 10Xer. That is patently not the case. Personal productivity has a lot to do with emotional state, motivation, interpersonal relationships, office politics, the built environment, diet, length of commute and serendipitous fit-to-role, not to mention the ready availability of appropriate software for re-purposing and re-use. Great developers can do a lot to maximise the probability of exceptional performance, but they cannot guarantee it.

These apparent nonlinearities; these step-changes in productivity, make software development extraordinarily difficult to manage. The many factors involved and their extreme effects challenge our conventional approaches to modeling, prediction and control.

So, we have a challenge. I do love challenges.

Let us take our initial cue from the Machine Learning literature. When faced with a high-dimensional feature (or parameter) space; sparsely populated with data, such as we have here, we cannot fall back on pure data mining, nor can we deploy fancy or highly parameterized models. We must approach the problem with a strong, simple and well-motivated model with only a few free parameters to tune to the data. To make a Bayesian analogy: We require very strong priors.

So, we need to go back to first principles and use our knowledge of what the role entails to reason about the factors that affect productivity. We can then use that to guide our reasoning, and ultimately our model selection & control system (productivity optimization) architecture.

So, let us sketch out what we know about software engineering, as well as what we think will make it better, in the hopes that a model, or set of models will crystallize out of our thinking.

For a start, software development is a knowledge based role. If you already know how to solve a problem, then you will be infinitely faster than somebody who has to figure it out. The time taken to actually type out the logic in a source document is, by comparison with the time it typically takes to determine what to type and where to type it, fairly small. In some extreme cases, it can take millions of dollars worth of studies and simulations to implement a two or three line bug-fix to an algorithm. If the engineer implementing the system in the first place had already possessed the requisite knowledge, or even if he had instinctively architected around the potential problem area, the expenditure would not have been required.

20-20 hindsight is golden.

Similarly, knowledge of a problem doman and software reuse often go hand-in-hand. If you know that an existing library, API or piece of software solves your problem, then it is invariably cheaper to reuse the existing software than to develop your own software afresh. It is obvious that this reuse not only reduces the cost incurred by the new use-case, it also spreads/amortizes the original development cost. What is perhaps less obvious, is the degree to which an in-depth knowledge of the capabilities of a library or piece of software is required for the effective (re)use of that software. The capability that is represented by a piece of software is not only embedded in the source documents and the user manual, it is also embedded in the knowledge, skills and expertise of the practitioners who are familiar with that software. This is an uncomfortable and under-appreciated truth for those managers who would treat software developers as replaceable "jelly-bean" components in an organizational machine.

Both of these factors seem to indicate that good old specialization and division-of-labor are potentially highly significant factors in our model. Indeed, it appears on the face of it that these factors have the potential to be far more significant, in terms of effect on (software engineering) productivity, than Adam Smith could ever have imagined.

I do have to admit that the potential efficiency gain from specialization is likely to be confounded to a greater or lesser degree by the severe communication challenges that need to be overcome, but the potential is real, and, I believe, can be achieved with the appropriate technological assistance.

So, how do you make sure that the developer with the right knowledge is in the right place at the right time? How do you know which piece of reusable software is appropriate? Given that it is in the nature of software development to be dealing with unknown unknowns on a regular basis, I am pretty confident that there are no definitive and accurate answers to this question.

However, perhaps if we loosen our criteria we can still provide value. Maybe we only need to increase the probability that a developer with approximately the right knowledge is in approximately the right place at approximately the right time? (Taking a cue from Probably Approximately Correct Learning). On the other hand, perhaps we need the ability to quickly find the right developer with the approximately right knowledge at approximately the right time? Or maybe we need to give developers the ability to find the right problem (or opportunity) at the right time?

What does this mean in the real world?

Well, there are several approaches that we can take in different circumstances. If the pool of developers that you can call upon is constrained, then all we can do is ensure that the developers within that pool are as knowledgeable, flexible and multi-skilled as possible, are as aware of the business environment as possible, and have the freedom and flexibility that they need to innovate and provide value to the business as opportunities arise. Here particularly, psychological factors, such as motivation, leadership, vision, and apparent freedom-of-action are important. (Which is not to say that that they are unimportant elsewhere)

If, on the other hand, you have a large pool of developers available, (or are willing to outsource) then leveraging specialization and division-of-labour has the potential to bring disproportionate rewards. The problem then becomes an organizational/technical one:

How to find the right developer or library when the project needs them?

Or, flipping the situation on it's head, we can treat each developer to be an entrepreneur:

Given the current state of the business or organization, how can developers to use their skills, expertise and knowledge of existing libraries to exploit opportunities and create value for the business?

Looking initially at the first approach, it is a real pity that all of the software & skills inventory and reporting mechanisms that I have ever experienced have fallen pitifully short of the ease-of-use and power that is required. These systems need to interrogate the version control system to build the software components inventory, and parse it's log-file to build the skills inventory. These actions need to take place automatically and without human intervention. The reports that they produce need to be elegant and visually compelling. Innovation is desperately required in this area. I imagine that there are some startups somewhere that may be doing some work along these lines with GitHub:- if anybody knows of any specific examples, please mention them in the comments.

Another area where innovation is desperately needed is in the way that common accounting practices treat software inventory. Large businesses spend a lot of time and effort keeping accurate inventory checks on two hundred dollar Dell desktop machines, and twenty dollar keyboards and mice, but virtually no effort in keeping an accounting record on the software that they may have spent many tens of millions of dollars developing. For as long as the value of software gets accounted for under "intangibles" we will have this problem. We need to itemize every function and class in the organization's accounts, associate a cost with each, as well as the benefits that are realized by the organization. Again, this needs to happen automatically, and without any manual intervention from the individual developers. As before, if anybody knows of any organization who is doing this, please please let me know.

Moving our focus to the second approach, how do we match developers with the opportunities available at a particular organization; how do we let them know where they can provide value? How can we open organizations up so that outsiders can contribute to their success? This is a much harder problem which may invite more radical solutions. Certainly nothing reasonable springs to my mind at the moment.

Anyway, I have spent too long ruminating, and I need to get back to work. If anybody has any ideas on how this line of enquiry might be extended, or even if you think that I am wasting my (and everybody else's) time:- please, please say so in the comments.

This line of thought builds upon and extends some of my previous thinking on the subject.

Idle Conjectures in Search of Refutation

Thursday, 22 November 2012

Productivity Models.

1 comment: