Idle Conjectures in Search of Refutation: November 2012

Friday 23 November 2012

Collaboration & Reuse

A spectrum of collaborative techniques exists between the two extremes presented below.

Loose Collaboration leads to Coarse Grained Reuse

* A collection of independent professionals rather than a tightly integrated team.

* Each developer brings different skills to the table.

* Each developer has a high level of expertise in his or her own domain.

* Each developer works independently and largely alone.

* Tasks are defined in broad terms: Developers have high levels of freedom and autonomy.

* Development Environments are varied, idiosyncratic and highly customized.

* Git / Hg - separate repositories for separate components.

* Few inter-dependencies.

* Quantum of re-use is large: A critical mass has to be achieved before something is placed into it's own repository and shared.

Tight Collaboration leads to Fine Grained Reuse

* A tightly integrated team rather than a collection of independent professionals.

* Work carried out by both junior and senior developers.

* Developers work in close collaboration with one another.

* Tasks are small and fine-grained.

* Working Practices & Development environments harmonised & centrally controlled.

* Subversion / CVS - single monolithic repository

* Many inter-dependencies.

* Techniques like continuous integration and test-driven development keep everything on track.

* Quantum of re-use is small: individual scripts and configuration settings are easily shared.

Which approach creates a better work environment? Well, it totally depends upon your personality. Personally, I enjoy working with other people, so I tend to find the tightly integrated team approach more fun, but I am happy to acknowledge that this is not everybody's cup of tea. I must also admit that my preferences continue to change as I grow and develop.

Which approach creates better software? I have no idea. System Architecture will definitely be impacted by the approach to collaboration. In some manner. So, I imagine that the development of some particular classes of software are better served by the first approach, others by the second approach. I cannot, for the moment, imagine what the mapping between class of software and most-appropriate approach would look like.

Guesses in comments, please.

PS:

There exist other spectra on which Software Development activities may be placed. One notable one is the Predictive/Reactive spectrum, which distinguishes between predictive, variance-minimizing techniques as espoused by SEI/CMMI (Chicago School), and reactive, "Agile" techniques. In contrast with the loose/tight collaboration spectrum, it is easier to judge the appropriateness of predictive vs reactive, as level of anticipated technical risk is pretty a good indicator of the degree to which a reactive approach is appropriate.

Thursday 22 November 2012

The Utility of Version Control Systems

In response to a post about the utility of version control systems.

The "backups" argument for version control is easily understood, but it is ultimately not that compelling. The real benefit that you get from version control is a dramatic cultural shift and a new way of working with collaborators.

First of all, a version control system provides a structured framework and vocabulary for collaboration and feedback. This means that it is possible to talk about "changes" and "branches'. This is not dissimilar from "track changes" functionality in MS Word. Secondly, it facilitates the implementation of automated services such as "continuous integration" and "continuous testing".

These two pieces of functionality enable a very significant cultural shift. In most domains, professionals are used to working independently for an extended period of time, only presenting results when they are finished, polished and (supposedly) error-free. This way of working leads to an intolerant attitude towards (rare) errors: The ensuing lack of humility creates and enforces a risk-averse culture. It quickly and inevitably becomes more important to be seen to be perfect than it is to innovate and solve problems.

The working culture and practices that version control systems and continuous integration support and encourage could not be more different. By publishing incomplete drafts and unfinished work, your mistakes and typos become part of the "public" record, along with a record of the way that you approach problems and do your work. This requires either a thick skin, or (more realistically) the realization that nobody really cares about your mistakes.

The benefit of this is strongest where the work being carried out by multiple professionals is highly interdependent. By sharing work earlier in the cycle, as well as more frequently, conflicts and misunderstandings are resolved up much sooner. In fact, it is highly likely that professionals that need to collaborate closely will already be using a form of version control (such as track changes) in their work already. Using tools like Subversion or Mercurial simply provides additional tools with finer grained control and a different approach to sharing and communication (changes pulled on demand from the repository rather than pushed out via email).

So, the prime benefit of version control is in the fact that it provides a structured framework for sharing work, as well as discussing and coordinating the collaborative development of documents that refer to, or otherwise depend upon, one another. Another, equally important benefit is that it encourages transparency and humility through the practice of regularly sharing unfinished work. Finally, as stated in the article, it acts as a record for understanding how work was carried out, and for recovering from past mistakes.

Bridging the Gap

In response to an article talking about using narrative in software specifications:

There certainly exists a reality gap between "the business" and the people who end up implementing the system, but I suspect that it is as much a matter of practicalities as it is a matter of cultural differences. The developer, by necessity, thinks in terms of the details, because sooner or later, he must confront them and overcome them. The customer typically thinks at a higher level of abstraction. He never needs to go into the details, so why would he waste the time? The act of bridging the gap is akin to weaving. You must go from a high level of abstraction, diving down to the details and then back out again, several times in succession before you can build the shared understanding, terminology & conceptual framework that is required for the business side and the technology side to fuse into a single, effective whole. This process is generally characterized by feelings of confusion and disorientation (on both sides) and sometimes accompanied by arguments, generally stemming from each side misunderstanding or failing to grasp the other's terminology and conceptual framework. All of this is natural, and part of the learning process. It is also exceedingly expensive and time consuming; a fact often under-appreciated at the start of the process.

You are probably familiar with the famous aphorism: "Computer science is no more about computers than astronomy is about telescopes". Well, if I may beg leave to corrupt and abuse the saying: "Software development is no more about computers than accountancy is about spreadsheets". Software developers may spend a lot of time using computers, to be sure, but then again, so do accountants. Software development is about understanding a problem, communicating it, and specifying it with such formality and precision that even a creature as simple and literal-minded as the typical computer can understand it. There may be a lot of technical jargon, to be sure, but ever since people ditched assembly language for C, Java, Python and the like, the jargon has had more to do with the precise communication and specification of philosophical concepts than anything to do with shuffling electrons across the surface of semiconductors. Software development is a misnomer. It is the search for the devil that lies in the details, and the communication of the elevated understanding that comes from that search. The software itself, the implementation, it is both the microscope that is constructed to aid in the search; a vehicle for sharing that understanding, and a tool that has utility in the problem domain.

It is worth reading the full article, which more fully lays out a very interesting vision of collaborative development not totally unlike that supported and encouraged by Agile tools like Scrum, albeit with the addition of some novel, interesting and potentially useful narrative concepts to structure the interaction.

Productivity Models.

The economic realities of software development are rather different from other forms of labor.

Firstly, productivity rates vary enormously, both for teams and for individuals. For example, one individual can be many orders of magnitude more productive than another. Equally, that same individual may experience variability in productivity of many orders of magnitude from one project to the next. People who talk about the 10Xer effect seem to (incorrectly) assume that people are constant - once a 10Xer, always a 10Xer. That is patently not the case. Personal productivity has a lot to do with emotional state, motivation, interpersonal relationships, office politics, the built environment, diet, length of commute and serendipitous fit-to-role, not to mention the ready availability of appropriate software for re-purposing and re-use. Great developers can do a lot to maximise the probability of exceptional performance, but they cannot guarantee it.

These apparent nonlinearities; these step-changes in productivity, make software development extraordinarily difficult to manage. The many factors involved and their extreme effects challenge our conventional approaches to modeling, prediction and control.

So, we have a challenge. I do love challenges.

Let us take our initial cue from the Machine Learning literature. When faced with a high-dimensional feature (or parameter) space; sparsely populated with data, such as we have here, we cannot fall back on pure data mining, nor can we deploy fancy or highly parameterized models. We must approach the problem with a strong, simple and well-motivated model with only a few free parameters to tune to the data. To make a Bayesian analogy: We require very strong priors.

So, we need to go back to first principles and use our knowledge of what the role entails to reason about the factors that affect productivity. We can then use that to guide our reasoning, and ultimately our model selection & control system (productivity optimization) architecture.

So, let us sketch out what we know about software engineering, as well as what we think will make it better, in the hopes that a model, or set of models will crystallize out of our thinking.

For a start, software development is a knowledge based role. If you already know how to solve a problem, then you will be infinitely faster than somebody who has to figure it out. The time taken to actually type out the logic in a source document is, by comparison with the time it typically takes to determine what to type and where to type it, fairly small. In some extreme cases, it can take millions of dollars worth of studies and simulations to implement a two or three line bug-fix to an algorithm. If the engineer implementing the system in the first place had already possessed the requisite knowledge, or even if he had instinctively architected around the potential problem area, the expenditure would not have been required.

20-20 hindsight is golden.

Similarly, knowledge of a problem doman and software reuse often go hand-in-hand. If you know that an existing library, API or piece of software solves your problem, then it is invariably cheaper to reuse the existing software than to develop your own software afresh. It is obvious that this reuse not only reduces the cost incurred by the new use-case, it also spreads/amortizes the original development cost. What is perhaps less obvious, is the degree to which an in-depth knowledge of the capabilities of a library or piece of software is required for the effective (re)use of that software. The capability that is represented by a piece of software is not only embedded in the source documents and the user manual, it is also embedded in the knowledge, skills and expertise of the practitioners who are familiar with that software. This is an uncomfortable and under-appreciated truth for those managers who would treat software developers as replaceable "jelly-bean" components in an organizational machine.

Both of these factors seem to indicate that good old specialization and division-of-labor are potentially highly significant factors in our model. Indeed, it appears on the face of it that these factors have the potential to be far more significant, in terms of effect on (software engineering) productivity, than Adam Smith could ever have imagined.

I do have to admit that the potential efficiency gain from specialization is likely to be confounded to a greater or lesser degree by the severe communication challenges that need to be overcome, but the potential is real, and, I believe, can be achieved with the appropriate technological assistance.

So, how do you make sure that the developer with the right knowledge is in the right place at the right time? How do you know which piece of reusable software is appropriate? Given that it is in the nature of software development to be dealing with unknown unknowns on a regular basis, I am pretty confident that there are no definitive and accurate answers to this question.

However, perhaps if we loosen our criteria we can still provide value. Maybe we only need to increase the probability that a developer with approximately the right knowledge is in approximately the right place at approximately the right time? (Taking a cue from Probably Approximately Correct Learning). On the other hand, perhaps we need the ability to quickly find the right developer with the approximately right knowledge at approximately the right time? Or maybe we need to give developers the ability to find the right problem (or opportunity) at the right time?

What does this mean in the real world?

Well, there are several approaches that we can take in different circumstances. If the pool of developers that you can call upon is constrained, then all we can do is ensure that the developers within that pool are as knowledgeable, flexible and multi-skilled as possible, are as aware of the business environment as possible, and have the freedom and flexibility that they need to innovate and provide value to the business as opportunities arise. Here particularly, psychological factors, such as motivation, leadership, vision, and apparent freedom-of-action are important. (Which is not to say that that they are unimportant elsewhere)

If, on the other hand, you have a large pool of developers available, (or are willing to outsource) then leveraging specialization and division-of-labour has the potential to bring disproportionate rewards. The problem then becomes an organizational/technical one:

How to find the right developer or library when the project needs them?

Or, flipping the situation on it's head, we can treat each developer to be an entrepreneur:

Given the current state of the business or organization, how can developers to use their skills, expertise and knowledge of existing libraries to exploit opportunities and create value for the business?

Looking initially at the first approach, it is a real pity that all of the software & skills inventory and reporting mechanisms that I have ever experienced have fallen pitifully short of the ease-of-use and power that is required. These systems need to interrogate the version control system to build the software components inventory, and parse it's log-file to build the skills inventory. These actions need to take place automatically and without human intervention. The reports that they produce need to be elegant and visually compelling. Innovation is desperately required in this area. I imagine that there are some startups somewhere that may be doing some work along these lines with GitHub:- if anybody knows of any specific examples, please mention them in the comments.

Another area where innovation is desperately needed is in the way that common accounting practices treat software inventory. Large businesses spend a lot of time and effort keeping accurate inventory checks on two hundred dollar Dell desktop machines, and twenty dollar keyboards and mice, but virtually no effort in keeping an accounting record on the software that they may have spent many tens of millions of dollars developing. For as long as the value of software gets accounted for under "intangibles" we will have this problem. We need to itemize every function and class in the organization's accounts, associate a cost with each, as well as the benefits that are realized by the organization. Again, this needs to happen automatically, and without any manual intervention from the individual developers. As before, if anybody knows of any organization who is doing this, please please let me know.

Moving our focus to the second approach, how do we match developers with the opportunities available at a particular organization; how do we let them know where they can provide value? How can we open organizations up so that outsiders can contribute to their success? This is a much harder problem which may invite more radical solutions. Certainly nothing reasonable springs to my mind at the moment.

Anyway, I have spent too long ruminating, and I need to get back to work. If anybody has any ideas on how this line of enquiry might be extended, or even if you think that I am wasting my (and everybody else's) time:- please, please say so in the comments.

This line of thought builds upon and extends some of my previous thinking on the subject.

Monday 19 November 2012

Do Not Fear Conflict

The greatest creative partnerships that I have experienced have always involved an element of friction.

Not only are heated technical debates not to be feared, they are to be welcomed as they are one of the best ways (if not THE best way) of improving the design of a product.

...

(although there are limits, of course)

:-)

Wednesday 7 November 2012

Why I like "working in trunk"

It forces you to:
* Commit Frequently.
* Communicate with your colleagues.
- Both of which are virtues in their own right.