Monday, 30 July 2012

Cost-control and Amortization in Software Engineering: The importance of the source repository.

This discussion is not finished, but it is getting late, so I am putting it out regardless. Please excuse the jumps, the argument is put together like a tree, working down from the leaves, through the branches to the trunk. I will try to re-work it later to make it more linear and easier to read.


This is a discussion about controlling costs in Software Engineering. It is also a discussion about communication, bureaucracy, filing systems and human nature, but it starts with the debate about how best to organize ones source code repository; a debate dominated by the ascendance of the Distributed Version Control System (DVCS):

In the Distributed-vs-Centralized VCS debate I am generally agnostic, perhaps inclined to view DVCS systems slightly more favorably than their centralized counterparts.

I have used Svn and Hg and am happy with both. For a distributed team, working on a large code-base over low-bandwidth connections, Hg or Git have obvious advantages.

Many DVCS activists strongly advocate a fine-grained division of the code-base into multiple per-project repositories. In the context of a global, distributed development team, this approach is highly efficient as it allows developers to work independently with minimal need for synchronization or communication.

This is particularly striking when we remember just how difficult and time-consuming (i.e. hugely expensive) synchronization and communication can be if developers are working in different time-zones or only working part-time on a project.

However, this property is less relevant to a traditional, co-located and tightly integrated development team. In fact, I intend to  demonstrate that splitting the repository in this situation has serious implications that need to be considered.

Before I do that, however, I need to describe a number of considerations that motivate and support my argument.

Amortization as the only effective mechanism for controlling development costs.

Firstly, developing software and maintaining software is very expensive. The more complexity, the more cost. For a given problem, there is a limit to the amount of complexity that we can eliminate. There is also a limit to how much we can reduce developer salaries or outsource work before the complexity and cost imposed by the consequent deterioration in understanding and communication outweigh the savings.

The only other meaningful mechanism that we have to control costs is careful, meticulous amortization, through reuse across product lines, products & bespoke projects. I believe that reuse is fiendishly difficult to achieve, critically important, and requires a fundamental shift in thinking to tackle effectively. The difficulty in achieving reuse is supported by historical evidence: re-use is rare as hen's teeth. It's importance, I hope, is self-evident, and the fundamental shift in thinking is required because we are working against human nature, against historical precedent, and against some unfortunate physics.

Much of this argument is a discussion about how to overcome these problems and facilitate the amortization of costs at a variety of different levels, and how a single, monolithic repository (or filing system) can support our efforts.

Reuse is difficult partly because of what software development is.

More than anything else, software development is the process of learning about a problem, exploring different solutions, and applying that learning to the development of a machine that embodies your understanding of the problem and it's solution; software both defines a machine and describes a model of our understanding of the problem domain. The development of a piece of software is primarily a personal intellectual exercise undertaken by the individual developer, and only incidentally a group exercise undertaken by the team.

Reuse by one person, of a software component written by another, must at some level involve some transfer of some degree of understanding.

Communications bandwidth is insufficient.

The bandwidth offered by our limited senses (sight, hearing, smell etc..) is insignificant when held up against the tremendous expressive power of our imagination; the ability of our brain to bring together distant memories with present facts to build a sophisticated and detailed mental model. The bandwidth offered by our channels of communication is more paltry still; even the most powerful means of communication at our disposal, carried out face-to-face, is pathetic in comparison; writing, documentation, even more contemptible. (Although still necessary).

Communicating the understanding built up in the process of doing development work is very difficult because the means that we have at our disposal are totally inadequate.

Reuse by one person, of a software component written previously by the same person is orders of magnitude easier to achieve than reuse of a software component written by another. Facilitating reuse between individuals will require us to bolster both the available bandwidth and the available transfer time by all means and mechanisms possible. We will need to consider the developer's every waking moment as a potential communications opportunity, and every possible thing that is seen or touched as a potential communications channel.

Re-Think the approach.

One solution to the problem is to side-step communication. A developer is already an expert in, and thus tightly bound to the software components that he has written; A simple mechanism for reuse is simply to redeploy the original developer, along with his library of reusable components, on different projects.

As with the individual, so with the team. A team of individuals that has spent a long time together has a body of shared organizational knowledge; they have had opportunities to share experiences, have evolved a common vocabulary, culture and approach. Collectively, they are also tightly bound to the software components that they have written. A team, expert in a particular area, is a strategic asset to an organization, and the deployment of that asset also needs to be considered strategically, in some detail, by the executive.

(Make people's speciality part of their identity.)

Strategic planning needs to think in terms of capabilities and reuse of assets. The communication of the nature of those capabilities and assets needs to be a deliberate, planned activity.

Exploit pervasive communication mechanisms.

We need to think beyond the morning stand-up meeting; our current "managed" channels of communication are transient; but our eyes and ears are less fleetingly open. We need to take advantage of pervasive, persistent out-of-band communication mechanisms. Not intranets and wiki pages that are seldom read, nor emails and meetings that arrive when the recipient is not maximally receptive. We need communications channels that are in the background, not intrusive, yet always on. Do not place the message into the environment, make the environment the message.

Make the office layout communicate reuse. Make the way the company is structured communicate reuse, make where people sit communicate reuse, and above all else, make the way that your files are organized communicate reuse. 

The repository is a great pervasive communication mechanism.

As developers, we need to use the source repository frequently. The structure of the repository dictates the structure of our work, and defines how components are to be reused.

The structure of the repository and the structure of the organization should be, by design, tightly bound together, as they are so often by accident. If the two are in harmony, the repository then becomes much more useful. It provides a map that we can use to navigate by. It will help us to find other software components, and, if components and people are tightly bound together, other people also. The repository is more than a place to store code; it is a place to store documentation and organizational knowledge, and to synchronize it amongst the members of the organization. Indeed, it is the place where we define the structure and the nature of the organization. It is the source of more than just binaries, but of all organizational behavior.


Financial reporting still drives everything - it needs to be in harmony also.

Trust is required. (maybe not)

Not the whole story.

Another problem: reuse is difficult because it puts the cart before the horse. How can we decide what component to reuse before we have understood the problem? How can we truly understand the problem before we have done the development work? If we are solving externally defined and uncorrelated problems that we have no influence over, then we could never reuse anything.

Sounds difficult? Lessons from the Spartans on the nature of city walls.

Back to repositories.

Limiting code reuse. Difficult to do "hot" library development.

One of the frequent arguments that gets trotted out in the Git/Hg-vs-SVN debate is that of merging.
The merge conflict argument is bogus. Proper organization and DRY prevents merge issues.

No comments:

Post a Comment