Friday 28 December 2012

The Political Economics of the Singularity


"The singularity", in some sense at least, is already happening, and has been for the past couple of years.

Look, if you will, at the disconnect between the technology sector (booming; "talent" (labor) in exceedingly short supply, salaries rocketing) and the rest of the economy (tanking, many people out of work, surplus of labor, salaries plummeting).

Technology brings many new and unfamiliar nonlinearities into the economy. Access to the mass market does not (in all circumstances) require mass employment - For example, I am one individual, working from my home office, and I can easily make improvements to and deploy a product used (indirectly) by millions of people with a few keystrokes -- No expensive bureaucracy, no factory, no paperwork, and no infrastructure beyond a handful of laptops, an internet connection, and a few dozen rented Amazon EC2 machines.

The funny thing is this: We all expected the singularity to swing the balance of power firmly towards the side of capital, away from labor. After all, surely capital would simply buy robots instead of employing labor? However, it is not quite panning out as we expected. Developing a new technology product is now ridiculously cheap -- capital costs have all but disappeared. Technology startups now look to investors not so much for capital, but for advice, access to customers, and reputation. Just as the need for labor has (unevenly) diminished, so the need for capital has also (unevenly) diminished.

It is becoming clear that (in some circumstances at least) the old balance of power between labor and capital has been swept aside, with both, in some sense, having been made irrelevant.

What replaces it? The answer to that question is not easy to discern. One thing is clear though: this new world is far more complex, richly textured, baroque and interesting, and the old political battle-lines will need to be redrawn with greater subtlety and nuance than they ever were before.

Wednesday 26 December 2012

Stupid is better than Smart (A call for humility)

Software development is a funny thing. It is full of nonlinearities and counterintuitive results.

Here is one of them: It is better to think of oneself as stupid (and be right) than it is to think of oneself as smart (and be wrong).

This sounds nonsensical, doesn't it? Surely it is better to be smart than it is to be stupid. Particularly since we spend so much of our time trying to demonstrate to other people just how smart we really are?

Well, if we were to start thinking of ourselves as smart people, relative to the rest of the population, then it is all too easy to start thinking of ourselves as being smart relative to the problems that we are trying to solve.

This is a problem, because *everybody* is pretty stupid in the grand scheme of things, and hubris is dangerous, particularly in the presence of complexity.

Complexity makes systems difficult to understand and manage, and difficult to fix when they go wrong. It is also difficult to gauge complexity, and humans have a consistent tendency to underestimate the complexity of unexplored functionality.

Let us consider a (mis) quote that illustrates the point:

"Debugging a system is harder than designing it in the first place, so if you are as clever as you possibly can be when you are designing the system, you are (by definition) too stupid to debug it."

The well known Dunning-Kruger effect applies here too: Because we tend to think that we are smarter than we really are, we tend to design systems that are too complex for us to debug and maintain.

In cases such as this, it is helpful to take a broader view. We may call somebody "Smart", but this term is defined relative to other humans, not relative to the problems that we need to solve.

We are frequently faced with problems that would be considered difficult by the very best of us. It is not a sign of weakness to acknowledge that; to treat the problems that we are trying to solve with the deference and respect that they deserve.

BAM! The Combinatorial Explosion and Planning

Solving problems requires understanding.

Understanding is built, in part, by measuring features.

One or two features might be sufficient to describe something simple, but describing something complex often takes many more.

With one or two features, a handful of measurements may suffice to capture the behavior of whatever-it-is that we are trying to understand. As the number of features that we need to measure grows, the number of measurements that we need to properly and comprehensively capture that behavior grows faster than fast: It grows stupendously, ridiculously, unreasonably, explosively fast. It grows so fast that it is akin to ... BAM! ... hitting a brick wall.

This is the Combinatorial Explosion:- Know it, and Fear it, because it is Important.

In other words, it stops being possible in any reasonable, practicable sense of the word, to understand any system or problem, the description of which involves more than a small handful of features.

This has some terribly important consequences. Consequences that, as our world becomes more complex, and contains more and more complex, interconnected systems, we particularly need to understand and internalize, for we fail to do so at our peril.

The past becomes less useful as a guide for predicting the future; Our intuition becomes less effective; Unintended consequences and unusual, exceptional events become more prevalent; our ability to predict what will happen next is weakened to the point where it disappears, and the utility of planning (in the way that we commonly understand it) diminishes dramatically.

At no point in history did plans ever survive first contact with the enemy, but as (consciously or unconsciously) we become more liberal with complexity, these traits and characteristics will become more and more prevalent.

We need to adapt, and learn to deal with them.

PS:

This line of thinking is particularly interesting when applied to organizations:
http://daltoncaldwell.com/thoughts-on-organizational-complexity

Which, of course, then go on to produce more complex products:
http://en.wikipedia.org/wiki/Conway's_law

Monday 17 December 2012

Development Automation - The Revolution

In response to:

http://www.businessweek.com/articles/2012-12-17/google-s-gmail-outage-is-a-sign-of-things-to-come

It is really great to see an article on continuous deployment in the mainstream media, as it is an issue about which I am extremely enthusiastic.

When we delay the release of a piece of our work, the psychological importance that we place on the quality of that work increases, so we spend more time manually finessing and polishing the work (often resulting in more delay, and possibly also raising the psychological barriers to release still higher).

This is all well and good when all testing and quality control must, by necessity, be manual. However, this is less and less common today, as automated testing and deployment practices become more common (In the form of Test Driven Development & Continuous Integration).

This (extremely) high level of automation and the work practices that go with it, together offer a revolutionary step-change in the way that we engineer complex systems:- a revolution that companies like Google and Netflix have embraced; a revolution that the rest of us ignore at our peril.

Instead of simply engineering products, we must engineer organizations and systems that produce world-beating products time and time again.

That is the revolution of DevOps, of Test Engineering, and of Development Automation.

And that is the only sane way to go about making complex systems today.

Friday 23 November 2012

Collaboration & Reuse

A spectrum of collaborative techniques exists between the two extremes presented below.

Loose Collaboration leads to Coarse Grained Reuse

* A collection of independent professionals rather than a tightly integrated team.
* Each developer brings different skills to the table.
* Each developer has a high level of expertise in his or her own domain.
* Each developer works independently and largely alone.
* Tasks are defined in broad terms: Developers have high levels of freedom and autonomy.
* Development Environments are varied, idiosyncratic and highly customized.
* Git / Hg - separate repositories for separate components.
* Few inter-dependencies.
* Quantum of re-use is large: A critical mass has to be achieved before something is placed into it's own repository and shared.

Tight Collaboration leads to Fine Grained Reuse

* A tightly integrated team rather than a collection of independent professionals.
* Work carried out by both junior and senior developers.
* Developers work in close collaboration with one another.
* Tasks are small and fine-grained.
* Working Practices & Development environments harmonised & centrally controlled.
* Subversion / CVS - single monolithic repository
* Many inter-dependencies.
* Techniques like continuous integration and test-driven development keep everything on track.
* Quantum of re-use is small: individual scripts and configuration settings are easily shared.

Which approach creates a better work environment? Well, it totally depends upon your personality. Personally, I enjoy working with other people, so I tend to find the tightly integrated team approach more fun, but I am happy to acknowledge that this is not everybody's cup of tea. I must also admit that my preferences continue to change as I grow and develop.

Which approach creates better software? I have no idea. System Architecture will definitely be impacted by the approach to collaboration. In some manner. So, I imagine that the development of some particular classes of software are better served by the first approach, others by the second approach. I cannot, for the moment, imagine what the mapping between class of software and most-appropriate approach would look like.

Guesses in comments, please.


PS:

There exist other spectra on which Software Development activities may be placed. One notable one is the Predictive/Reactive spectrum, which distinguishes between predictive, variance-minimizing techniques as espoused by SEI/CMMI (Chicago School), and reactive, "Agile" techniques. In contrast with the loose/tight collaboration spectrum, it is easier to judge the appropriateness of predictive vs reactive, as level of anticipated technical risk is pretty a good indicator of the degree to which a reactive approach is appropriate.

Thursday 22 November 2012

The Utility of Version Control Systems


In response to a post about the utility of version control systems.

The "backups" argument for version control is easily understood, but it is ultimately not that compelling. The real benefit that you get from version control is a dramatic cultural shift and a new way of working with collaborators.

First of all, a version control system provides a structured framework and vocabulary for collaboration and feedback. This means that it is possible to talk about "changes" and "branches'. This is not dissimilar from "track changes" functionality in MS Word. Secondly, it facilitates the implementation of automated services such as "continuous integration" and "continuous testing".

These two pieces of functionality enable a very significant cultural shift. In most domains, professionals are used to working independently for an extended period of time, only presenting results when they are finished, polished and (supposedly) error-free. This way of working leads to an intolerant attitude towards (rare) errors: The ensuing lack of humility creates and enforces a risk-averse culture. It quickly and inevitably becomes more important to be seen to be perfect than it is to innovate and solve problems.

The working culture and practices that version control systems and continuous integration support and encourage could not be more different. By publishing incomplete drafts and unfinished work, your mistakes and typos become part of the "public" record, along with a record of the way that you approach problems and do your work. This requires either a thick skin, or (more realistically) the realization that nobody really cares about your mistakes.

The benefit of this is strongest where the work being carried out by multiple professionals is highly interdependent. By sharing work earlier in the cycle, as well as more frequently, conflicts and misunderstandings are resolved up much sooner. In fact, it is highly likely that professionals that need to collaborate closely will already be using a form of version control (such as track changes) in their work already. Using tools like Subversion or Mercurial simply provides additional tools with finer grained control and a different approach to sharing and communication (changes pulled on demand from the repository rather than pushed out via email).

So, the prime benefit of version control is in the fact that it provides a structured framework for sharing work, as well as discussing and coordinating the collaborative development of documents that refer to, or otherwise depend upon, one another. Another, equally important benefit is that it encourages transparency and humility through the practice of regularly sharing unfinished work. Finally, as stated in the article, it acts as a record for understanding how work was carried out, and for recovering from past mistakes.

Bridging the Gap

In response to an article talking about using narrative in software specifications:

There certainly exists a reality gap between "the business" and the people who end up implementing the system, but I suspect that it is as much a matter of practicalities as it is a matter of cultural differences. The developer, by necessity, thinks in terms of the details, because sooner or later, he must confront them and overcome them. The customer typically thinks at a higher level of abstraction. He never needs to go into the details, so why would he waste the time? The act of bridging the gap is akin to weaving. You must go from a high level of abstraction, diving down to the details and then back out again, several times in succession before you can build the shared understanding, terminology & conceptual framework that is required for the business side and the technology side to fuse into a single, effective whole. This process is generally characterized by feelings of confusion and disorientation (on both sides) and sometimes accompanied by arguments, generally stemming from each side misunderstanding or failing to grasp the other's terminology and conceptual framework. All of this is natural, and part of the learning process. It is also exceedingly expensive and time consuming; a fact often under-appreciated at the start of the process.

You are probably familiar with the famous aphorism: "Computer science is no more about computers than astronomy is about telescopes". Well, if I may beg leave to corrupt and abuse the saying: "Software development is no more about computers than accountancy is about spreadsheets". Software developers may spend a lot of time using computers, to be sure, but then again, so do accountants. Software development is about understanding a problem, communicating it, and specifying it with such formality and precision that even a creature as simple and literal-minded as the typical computer can understand it. There may be a lot of technical jargon, to be sure, but ever since people ditched assembly language for C, Java, Python and the like, the jargon has had more to do with the precise communication and specification of philosophical concepts than anything to do with shuffling electrons across the surface of semiconductors. Software development is a misnomer. It is the search for the devil that lies in the details, and the communication of the elevated understanding that comes from that search. The software itself, the implementation, it is both the microscope that is constructed to aid in the search; a vehicle for sharing that understanding, and a tool that has utility in the problem domain.

It is worth reading the full article, which more fully lays out a very interesting vision of collaborative development not totally unlike that supported and encouraged by Agile tools like Scrum, albeit with the addition of some novel, interesting and potentially useful narrative concepts to structure the interaction.

Productivity Models.


The economic realities of software development are rather different from other forms of labor.

Firstly, productivity rates vary enormously, both for teams and for individuals. For example, one individual can be many orders of magnitude more productive than another. Equally, that same individual may experience variability in productivity of many orders of magnitude from one project to the next. People who talk about the 10Xer effect seem to (incorrectly) assume that people are constant - once a 10Xer, always a 10Xer. That is patently not the case. Personal productivity has a lot to do with emotional state, motivation, interpersonal relationships, office politics, the built environment, diet, length of commute and serendipitous fit-to-role, not to mention the ready availability of appropriate software for re-purposing and re-use. Great developers can do a lot to maximise the probability of exceptional performance, but they cannot guarantee it.

These apparent nonlinearities; these step-changes in productivity, make software development extraordinarily difficult to manage. The many factors involved and their extreme effects challenge our conventional approaches to modeling, prediction and control.

So, we have a challenge. I do love challenges.

Let us take our initial cue from the Machine Learning literature. When faced with a high-dimensional feature (or parameter) space; sparsely populated with data, such as we have here, we cannot fall back on pure data mining, nor can we deploy fancy or highly parameterized models. We must approach the problem with a strong, simple and well-motivated model with only a few free parameters to tune to the data. To make a Bayesian analogy: We require very strong priors.

So, we need to go back to first principles and use our knowledge of what the role entails to reason about the factors that affect productivity. We can then use that to guide our reasoning, and ultimately our model selection & control system (productivity optimization) architecture.

So, let us sketch out what we know about software engineering, as well as what we think will make it better, in the hopes that a model, or set of models will crystallize out of our thinking.

For a start, software development is a knowledge based role. If you already know how to solve a problem, then you will be infinitely faster than somebody who has to figure it out. The time taken to actually type out the logic in a source document is, by comparison with the time it typically takes to determine what to type and where to type it, fairly small. In some extreme cases, it can take millions of dollars worth of studies and simulations to implement a two or three line bug-fix to an algorithm. If the engineer implementing the system in the first place had already possessed the requisite knowledge, or even if he had instinctively architected around the potential problem area, the expenditure would not have been required.

20-20 hindsight is golden.

Similarly, knowledge of a problem doman and software reuse often go hand-in-hand. If you know that an existing library, API or piece of software solves your problem, then it is invariably cheaper to reuse the existing software than to develop your own software afresh. It is obvious that this reuse not only reduces the cost incurred by the new use-case, it also spreads/amortizes the original development cost. What is perhaps less obvious, is the degree to which an in-depth knowledge of the capabilities of a library or piece of software is required for the effective (re)use of that software. The capability that is represented by a piece of software is not only embedded in the source documents and the user manual, it is also embedded in the knowledge, skills and expertise of the practitioners who are familiar with that software. This is an uncomfortable and under-appreciated truth for those managers who would treat software developers as replaceable "jelly-bean" components in an organizational machine.

Both of these factors seem to indicate that good old specialization and division-of-labor are potentially highly significant factors in our model. Indeed, it appears on the face of it that these factors have the potential to be far more significant, in terms of effect on (software engineering) productivity, than Adam Smith could ever have imagined.

I do have to admit that the potential efficiency gain from specialization is likely to be confounded to a greater or lesser degree by the severe communication challenges that need to be overcome, but the potential is real, and, I believe, can be achieved with the appropriate technological assistance.



So, how do you make sure that the developer with the right knowledge is in the right place at the right time? How do you know which piece of reusable software is appropriate? Given that it is in the nature of software development to be dealing with unknown unknowns on a regular basis, I am pretty confident that there are no definitive and accurate answers to this question.

However, perhaps if we loosen our criteria we can still provide value. Maybe we only need to increase the probability that a developer with approximately the right knowledge is in approximately the right place at approximately the right time? (Taking a cue from Probably Approximately Correct Learning). On the other hand, perhaps we need the ability to quickly find the right developer with the approximately right knowledge at approximately the right time? Or maybe we need to give developers the ability to find the right problem (or opportunity) at the right time?

What does this mean in the real world?

Well, there are several approaches that we can take in different circumstances. If the pool of developers that you can call upon is constrained, then all we can do is ensure that the developers within that pool are as knowledgeable, flexible and multi-skilled as possible, are as aware of the business environment as possible, and have the freedom and flexibility that they need to innovate and provide value to the business as opportunities arise. Here particularly, psychological factors, such as motivation, leadership, vision, and apparent freedom-of-action are important. (Which is not to say that that they are unimportant elsewhere)

If, on the other hand, you have a large pool of developers available, (or are willing to outsource) then leveraging specialization and division-of-labour has the potential to bring disproportionate rewards. The problem then becomes an organizational/technical one:

How to find the right developer or library when the project needs them?

Or, flipping the situation on it's head, we can treat each developer to be an entrepreneur:

Given the current state of the business or organization, how can developers to use their skills, expertise and knowledge of existing libraries to exploit opportunities and create value for the business?

Looking initially at the first approach, it is a real pity that all of the software & skills inventory and reporting mechanisms that I have ever experienced have fallen pitifully short of the ease-of-use and power that is required. These systems need to interrogate the version control system to build the software components inventory, and parse it's log-file to build the skills inventory. These actions need to take place automatically and without human intervention. The reports that they produce need to be elegant and visually compelling. Innovation is desperately required in this area. I imagine that there are some startups somewhere that may be doing some work along these lines with GitHub:- if anybody knows of any specific examples, please mention them in the comments.

Another area where innovation is desperately needed is in the way that common accounting practices treat software inventory. Large businesses spend a lot of time and effort keeping accurate inventory checks on two hundred dollar Dell desktop machines, and twenty dollar keyboards and mice, but virtually no effort in keeping an accounting record on the software that they may have spent many tens of millions of dollars developing. For as long as the value of software gets accounted for under "intangibles" we will have this problem. We need to itemize every function and class in the organization's accounts, associate a cost with each, as well as the benefits that are realized by the organization. Again, this needs to happen automatically, and without any manual intervention from the individual developers. As before, if anybody knows of any organization who is doing this, please please let me know.

Moving our focus to the second approach, how do we match developers with the opportunities available at a particular organization; how do we let them know where they can provide value? How can we open organizations up so that outsiders can contribute to their success? This is a much harder problem which may invite more radical solutions. Certainly nothing reasonable springs to my mind at the moment.

Anyway, I have spent too long ruminating, and I need to get back to work. If anybody has any ideas on how this line of enquiry might be extended, or even if you think that I am wasting my (and everybody else's) time:- please, please say so in the comments.

This line of thought builds upon and extends some of my previous thinking on the subject.

Monday 19 November 2012

Do Not Fear Conflict

The greatest creative partnerships that I have experienced have always involved an element of friction.

Not only are heated technical debates not to be feared, they are to be welcomed as they are one of the best ways (if not THE best way) of improving the design of a product.

...

(although there are limits, of course)

:-)

Wednesday 7 November 2012

Why I like "working in trunk"

It forces you to:
* Commit Frequently.
* Communicate with your colleagues.
- Both of which are virtues in their own right.

Wednesday 24 October 2012

Natural Semantics

Whilst doing some background reading on AMQP, I came across the following little tidbit. (from http://www.imatix.com/articles:necessary-changes-in-amqp)


"There is a concept I call "natural semantics". These are simple patterns that Just Work. Often they take a little while to appreciate. Natural semantics are like mathematical truths, they exist objectively, and independently of particular technologies or viewpoints. They are precious things. The AMQ exchange-binding-queue model is a natural semantic. A good designer should always search for these natural semantics, and then enshrine them in such ways as to make them inviolable and inevitable and trivial to use."

The notion of natural semantics resonates with my experiences as a developer:- which makes me happy, because something that I for so long felt to be true but could not articulate, now has a name.

Thursday 11 October 2012

Character Flaws.

My Modus Operandi is to identify my own serious character flaws, then exploit them for fun and profit!

I have mild obsessive-compulsive tendencies. I exploit this to help organize and categorize source documents & data, and to improve the discipline and quality of my work.

I also have mild bipolar tendencies. I use my mania to get lots of work done, then my depression to fix all of the bugs that I introduced during my manic coding sessions: My own personal (psychological) Carnot cycle.


We are all flawed in some way. But sometimes we can find real strength in apparent weakness.

Friday 28 September 2012

Array languages rock


As a long-time MATLAB developer, I may be biased, but: Array languages rock!

Why?

They allow the developer to express simple numerical relationships in an incredibly declarative manner. In this, they go a further than typical functional languages, by allowing you to discard the use of "zip", "map", list comprehension constructs etc.. for a variety of simple cases.

procedural < functional < array

Indeed, when one gets to the point where much of the standard library of an array language is able to accept and return collections, the developer becomes able (in many cases) to purge flow control from great swathes of the program.

As a result, one ends up with a program where control flow proceeds linearly, from top to bottom, greatly reducing the mental gymnastics that are needed to understand what the algorithm is doing, greatly improving readability, and dramatically improving productivity.

Sure, the run-time performance of MATLAB often sucks, but there is no real reason why array languages cannot be highly performant. This is why I am feeling excited about the emergence of languages like Julia, that attempt to be the best of both worlds.

These new, emerging languages excite me:

  • Go: For it's fast compile, opinionated toolset, close-to-the-metal feel, approach to object-orientation and its support for concurrency.
  • Julia: For attempting to make array programming performant.
  • Cobra: For bringing design-by-contract back into vogue.

My ideal software tool:

Designed primarily to allow academics, data scientists, mathematicians, financial engineers and other domain experts to prototype ideas quickly, it will feel a lot like MATLAB with a highly interactive development environment, excellent plotting and visualisation tools, and a large standard library filled with functions that accept and return arrays.

It will allow these prototypes to be quickly and easily brought up to production quality standards by supporting (out of the box) quality control mechanisms like strong typing, design by contract, assertions, built in static analysis, test coverage tools & continuous testing tools.

Like Go, it will compile very quickly to highly performant native binaries, support concurrency and multiprocessing out-of-the box, and be able to deploy to future production environments, which are likely to include highly heterogenous "cloud" environments as well as multicore CPU + GPU embedded systems.

Why declarative?


One of the challenges of software development is to try to identify those aspects of the system that are likely to remain constant, and those areas that are likely to change in future. These aspects are rarely specified, rarely understood, and rarely do we (as developers) get the architecture of the software right (in this respect).

Having done this, we try to make the transient, changeable aspects of the system as declarative as possible; perhaps going as far as to express them as statements declared in a configuration file.

Having done a bit of Prolog programming in the dim and distant past, my intuition is that trying to make everything declarative is a mistake; one ends up tying oneself into knots. The mental gymnastics simply are not worth it. However, splitting the program into declarative-and-non-declarative parts seems reasonable.

In fact, the idea that a a multi-paradigm approach can be used, with some parts of the system expressed in OO terms, some in functional terms, and some in declarative terms, seems to be gaining in acceptance and popularity.


Note: (See also Monads and some examples as another way of thinking about this)


Tuesday 25 September 2012

On the importance of over-communication.

In a crisis, do not seek to control information flow. Instead, over-communicate. Have confidence in people's intelligence and tolerance. Confusion is a natural part of learning, and disagreement and debate are a natural part of team decision making.

You will either get through the crisis as a team, or you will end up postponing problems for later, and sewing seeds of distrust to trip you up in future.

I would rather work in an argumentative and critical but open and honest organization than in an organization that papers over disagreement and misunderstanding with dishonesty and spin.

Friday 21 September 2012

Motivation trumps Method

We can either tackle the problem at hand by the most direct route possible, or we can try to build tools & abstractions to help future development efforts become faster and more efficient.

The first end of the spectrum corresponds largely to the notion of a "product-oriented" development attitude, and the second end to the notion of a "capability-oriented" development attitude.


The position on the spectrum that we select is driven both by immediate constraints and long term goals, and how we balance one against the other. Living exclusively at either extreme is generally harmful and counterproductive.

This dichotomy is very loosely related to that which contrasts the predictive/CMMI/waterfall development approach with the reactive/agile approach to development in that the reactive/agile approach is very well served by a toolset that has come into existence through the efforts of a capability-oriented team.

However, the predictive-vs-reactive dichotomy is really about managing risk, whereas the product-vs-capability dichotomy is really about amortizing costs.

There is no “right” answer, it all depends upon the situation that we are in and the culture of the organization(s) with which we are operating. One thing that is really important however, is motivation. Software is all about detail, and detail is all about attention, and attention is all about motivation, and motivation is all about the team. Maintaining motivation and enthusiasm in the team just about trumps everything else.

We should do what we enjoy, because we will be so much more productive that any other consideration will be secondary.

Development is Exploration.


People talk about software being grown, not built. Here is another perspective:

Software development is an act of exploration, uncovering a set of truths or relationships, whether perpetual or contingent, (and therefore transient), together with an abstract machine to turn that set of truths into a system that can interact with the problem and thereby do useful work.

Thursday 20 September 2012

Multi-project integration.


This is a response to the question: "How to maintain same code fragments on multiple projects":

The DRY principle states that there should be a single authoritative source of truth. This dictates the use of a single *source* repository for all program logic, with no duplication; files organized to promote sharing and reuse of documents.

The pragmatic requirements of communication in a distributed, multi-team environment suggests that there should be multiple independent repositories, one for each project or collaboration.

This presents a problem, as the requirements (at first glance) would seem to be contradictory. However, it is possible to use scripting & automation to smooth over this awkward fact, and to create two separate but consistent interfaces to the body of work.

The one unified repository acts as the single authoritative source of truth regarding all of the projects. The build process for each project copies all the files used by that project (and only those files) into an intermediate location, then build from that intermediate location. (Unison or some similar tool can be used to move deltas instead of whole files).

These intermediate locations can be used as local working copies for the set of secondary, derived, or downstream repositories. Post-commit hook scripts on the authoritative repository will update all of the intermediate locations, and, for each in turn, check if it has changed, and make the same commit to the corresponding secondary repository if a change is detected.

Likewise, post-commit hook scripts on the intermediate repositories can be implemented to update files in the central repository, enabling inbound changes from collaborators to propagate to the authoritative repository.

This way, the multiple secondary repositories are kept in sync with the single authoritative source repository, and the build process ensures that the secondary repositories contain all of the (possibly shared) documents and other files that are needed for the build to succeed. Finally, and most importantly, the development and build process ensures that files may be edited in one place and one place only, and that all copies are kept consistent and up to date.

This solution is tool agnostic, it should be possible to use Svn, Hg or Git as repositories, and bash or batch to implement the synchronizing logic. Something like Unison may be useful to synch the downstream repositories with the authoritative repository,

Monday 17 September 2012

Capability vs Product Orientation


In response to: http://iamwilchung.wordpress.com/2012/09/17/make-tools-for-yourself-and-others-around-you/

I have always thought of the creation of internal tools and libraries in terms of the following dichotomy:

"capability-oriented" vs "product-oriented"

The "capability-oriented" approach seeks to maximise the capabilities of the organization over the long term, opening up opportunities for cost amortization over disparate products. By contrast, the "product-oriented" approach is "lean" in the sense that it seeks to minimize work not absolutely necessary for the current product.

The "product-oriented" approach seeks a local optimum, and is prevalent in industries with heavyweight cost controls and minimal opportunities for investment (Defense, for example), whereas the "capability-oriented" approach seeks a global optimum, and is only possible in situations where organizational leadership is capable and willing to get involved with detailed technical planning.

As with all dichotomies, organizations must seek a compromise position that varies from product to product and project to project.

A sweet-spot that I have frequently found: Third-party libraries are necessarily written in a very general manner, capable of being used in a wide variety of ways. A particular organization generally only uses a small subset of the features in a particular third-party library. As a result, an internal library that acts as a thin wrapper around an external/third-party library can help encourage and enforce a consistent approach within an organization, helping to create an institutional culture and improving interoperability between teams whilst reducing learning barriers for new team members.

Thursday 13 September 2012

Automated micro-outsourcing to reduce design costs by supporting hyper-specialization.

I have been re-reading Rodney Brooks "No Silver Bullet" paper. It is fantastic stuff, and I am glad that I came back to it (It has been several years since I last looked at it).

The paper is showing it age just a teensy tiny little bit (His enthusiasm for Ada and OOP seems slightly too strong in retrospect). With the benefit of experience, we now have a bit better understanding of how people react to complex tools, and as a result now tend to favor tools that encourage (or enforce) simplicity and consistency.

Anyway, that is not what I want to write about. The bit that particularly popped out at me this morning is the section on "Buy vs build". This chimes well with my belief that the only effective way to control software costs is through amortization. However, I had always limited my thinking to amortization within an organization, and so focussed my efforts entirely on how one could organize the source code repository to encourage reuse of components & libraries from within the organization, or engineer tools to help encourage reuse. (For example, an always-on background structural-similarity search to identify and bubble-up structurally similar source documents).

However, this does not address the concerns that one of my other closely-held beliefs implies. I believe that much of the value that is derived from development activities comes from the learning and resulting improvements in knowledge and understanding of the individual developers. This knowledge and expertise remains with the individual, and is a good part of the 10x performance differences that many people observe in software developers.

So, how about using the same sort of background structural-similarity search to identify potential collaborators / consultants whose expertise fits in well with the system that you are trying to create. In the same way that an advanced IDE would make the suggestion: "perhaps you want to buy this library to help you solve this problem here...", you could imagine the same IDE making the suggestion "perhaps you want to hire this developer to help you here..."

Or maybe it is just my recent involvement in the advertising industry that makes my mind turn so quickly to commercial transactions. :-)

Wednesday 12 September 2012

Obsessive Perfectionism: Success and Failure.

Perfectionism is both a blessing and a curse - the obsessive pursuit of perfection is necessary for the creation of truly great works, but the obsessive pursuit of perfection is responsible for the most abject and catastropic failures

Attention and Agility

As individuals and as organizations, we are only capable of paying attention to a teensy tiny little bit of the world at any one time.

The bit of the world that receives our attention is improved, and our performance with respect to it is (generally) solid, but we always forget that it is only a tiny little piece of the puzzle. Errors and omissions tend to happen where our attention is not directed, at the boundaries between individual and team responsibilities, for example, or where the play is fast moving and priorities quickly shifting.

The primary purpose of automation is to help us correct and compensate for that deficiency; to allow us to be more organized and systematic, without compromising our ability to be mentally agile and flexible.

--

It is important to note that there are other aspects of automation and tooling that exacerbate, rather than compensate for, this deficiency. In particular, as Fred Brooks notes in his spookily prescient "No Silver Bullet" essay:


... the screens of today are too small, in pixels, to show both the scope and the resolution of any seriously detailed software diagram. The so-called "desktop metaphor" of today's workstation is instead an "airplane-seat" metaphor. Anyone who has shuffled a lap full of papers while seated between two portly passengers will recognize the difference--one can see only a very few things at once. The true desktop provides overview of, and random access to, a score of pages. Moreover, when fits of creativity run strong, more than one programmer or writer has been known to abandon the desktop for the more spacious floor. The hardware technology will have to advance quite substantially before the scope of our scopes is sufficient for the software-design task...

(This is why a couple of big 30" monitors is still not enough...)

Socio-Technical Architecture

Caught in an iconoclastic mood, I would like to challenge some conventional thinking: that distributed system architecture should favor lots of little independent services, each doing one thing, and doing one thing well.

The conservation of complexity principle (Brooks, No Silver Bullet) suggests that whenever we think that we are simplifying something, we are really just pushing the complexity around to somewhere where it is less visible, and less easily managed.

I think that this is sometimes the case here - If you have really good operations & sysadmin resources, and limited developer resources, then the lots-of-little processes architecture is probably appropriate.

If, on the other hand, you have good developer resources, and limited operations & sysadmin resources, then all that you are doing is shifting the complexity to someplace where you are lack the tools to manage and control it.

In summary, the quality of good architecture depends more on how it complements and supports the capabilities of the team and toolset than it does on any fundamental natural law.

Software engineering is a social science, not a hard engineering discipline, nor a mathematical or physical science.

Tuesday 11 September 2012

On the Primacy of Empirical Truth

Truth comes only (*) from testing oneself against Nature. Too often, our lives are spent; utterly wasted in a social discourse that has no grounding, no anchoring in any form of truth, but rather in meaningless social judgementalism: "x is good", "y is bad".

http://www.aaronsw.com/weblog/anders

(*) Covers both empiricism and rationalism - IMHO empirical truths are less "truthy" than rational truths, but infinitely more useful.

"In God we trust; all others must bring data" - W. Edwards Deming.

Thursday 6 September 2012

Quadrature ICS

I have been following the (heated) debate around Git's usability that has been going on recently.

So I woke up this morning and thought, y'know, what the world really needs is another Version Control System.

:-)

Still just an idea, I present to you: Quadrature Integration Control System (QICS) - the anti-Git.

The basic ideas:


(1) Ruthless minimization of required user interaction.

This is both to reduce workload, and to allow non-technical users to work alongside developers. (This is a big deal for me:- I need to get work from my collaborators across the business into version control, so my automated scripts can pick up their work & data for integration into production systems)

You should only need to interact with the system to resolve a conflict), otherwise the system can operate quite happily in the background, listening for changes to a directory tree, making commits to the ICS as and when files change. (Unison, Wave)

The minimization of interaction should extend to installation and configuration. A cross platform stack with no runtime is desirable (Golang seems ideal for this), as well as a peer-to-peer network architecture, with optional central server.

Sophisticated users should have as much control as they want, up to and including provision of APIs for all aspects of behavior. This implies a post-hoc approach to logging, where the user (can) explain what the changes were after they happened, and decorate the log with those explanations.


(2) The target users are a very large, globally distributed, multi-disciplinary development team in a commercial organization. Think Google-scale development, but with a mixture of programmers and (intelligent non-programmer) domain experts. We will assume intermittent or poor network access.


(3) A convention-over-configuration, "opinionated software" approach, with a set of prescriptive and detailed conventions for repository organization, workflow, and branch management. By creating conventions, we can aid and support development and workflow automation efforts to further reduce burden and increase efficiency.


The focus will be on supporting continuous integration and resolving conflicts, rather than supporting branches and the management of concurrent, independent versions. The assumed model is one in which, by default, all work is integrated together in trunk as quickly as network connectivity permits.

The QICS way of thinking about the problem:


A stream of lots of little deltas flowing around a distributed multicast/pub-sub network - similar problem to distributed databases (CAP theorem applies) - take similar approach to some NoSQL DBs to solve the problem - offer only eventual (or asymptotic) consistency. In fancy language: "Present the user a locally coherent view of an asymptotically coherent whole."


Concepts:

Upstream = deltas coming from other collaborators to the local/client working copy.
Downstream = deltas going from the local/client working copy to other collaborators.
Local = the client machine.
Remote = the rest of the network of collaborators.
Local Working Copy = The set of files and directories that the user interacts with.
Delta buffer = storage area containing a set of deltas (or differences), equivalent to the notion of a repository.

To support operation in situations when network connection fails, we will need local delta-buffers, perhaps one local-downstream store, and one local-upstream store, as well as a mechanism to shuttle information from the local downstream store out to the remote network as well as in from from the remote network to the local upstream store.

The local client/user probably does not want the whole repository on his local machine, so he will probably need some mechanism to specify a region-of-interest in the repository (a set of directories) so he can pull in only pertinent changes. (Hence a Pub/Sub-like communications mechanism)

Locally, we will need a mechanism to monitor file changes, calculate the deltas and store them in the downstream store, as well as a mechanism to take deltas from the upstream store, resolve conflicts and apply them to the local working copy.

It would be good if we could detect moved/renamed files without explicit user interaction, so per-file checksums are probably a good idea.

Heuristic mechanisms are good enough.

Wednesday 5 September 2012

Test vs Design

I am not totally sold on the argument that tests improve design.

Since you need to mock-out dependencies when in test, a function that is going to be placed under test cannot be allowed to instantiate it's own dependencies.

They must be externally instantiated and passed in so that, in the tests, they can be replaced with mocks.

This means that the function-under-test is now required to expose some of its implementation detail in the interface, making the interface more complex than it really needs to be.

(IMHO the interface is the thing that, above anything else, we really need to keep simple).

Sometimes, when using Python or another language that supports default parameter values, the dependencies can be passed in as optional input parameters that are used only by the tests, but this really degrades the value of the interface declaration as a piece of useful documentation.


One approach that I have found useful upon occasion is to split the interface into a set of internal functions specifically intended for testing purposes, which are wrapped (thinly) by a set of external functions for the client application to use. That way, the amount of logic not being tested is minimized, but the externally-facing interfaces are kept minimal and clean.


This is still more complex than I would like, but hey, technology is still in it's infancy.

In time, I would like to be able to give tests some access privileges above and beyond those which ordinary client logic enjoys; For example, the ability to reach in an pull a labelled block out of a function, and manipulate it using some form of reflection. I freely admit that such dynamism is not normally a good idea, but it is perhaps excusable if its use is restricted to a strict, single purpose mechanism specifically engineered to support testing.

Tuesday 4 September 2012

Functional makes for an easier Agile


An "agile" approach is concerned with how we approach the management of resources in environments that contain risk factors that cannot be a-priori controlled. Specifically, it is about increasing the tolerance of systems and processes to unanticipated change.

"Functional" development is concerned with how we reason about problems, suggesting a perspective that values a decomposition of the problem into stateless declarative truths over a decomposition of the problem into stateful (albeit small and easy to understand) components (each a small simulacrum of some corner of the problem domain).

Here I am assuming that there is a close (identity?) relationship between a complete analysis of the problem and the solution to that problem - although this may simply be some little remnant of Prolog haunting what is left of my faculties.


It should be possible to behave in an agile manner with both functional and (well designed) OO models, but in my mind, the statelessness of pure functions makes it very much easier to reason about them, and the implications of using them, in unanticipated and rapidly changing situations.

In other words, functional programming is a human --> human communications mechanism, where the extra rigor that the statelessness requirement imposes reduces the number of possibilities that the reader must consider when attempting to modify a model to meet a changed (or novel) requirement.

In summary, "Agile" and "functional" programming are different beasts, but the "functional" approach is very well suited to agile development by virtue of it's enforced "discipline" of statelessness and corresponding reduction in degrees-of-freedom, resulting in reduced mental burden on the maintainer.

We can deduce something from this.

It appears that the more "disciplined" the modeling and programming paradigm, the easier it is to modify and manipulate those models, and to operate in an agile manner, so whilst disciplined programming techniques are not necessarily required for an agile approach to development, the ease with which the agile approach may be taken up benefits greatly from the adoption of highly disciplined programming techniques such as TDD and Functional programming.

If we want to push forwards the state-of-the-art, we should be looking for more ways that we can automatically enforce restrictions upon software models, as well as improve our ability to reason about them.

Hence my interest in TDD, formal methods, assertions and all forms of development automation as a mechanism to make agility easier to achieve.

Optimism and Economic Expansion

There is no upper bound on human ingenuity. There is no fixed limit on talent. Not everybody can rise to the occasion, but only a tiny fraction of those who can are ever put to the test.

Wednesday 29 August 2012

Demographics and the Devil

I wonder how much specialization is driven by demographic shifts.

The number of educated, intellectually involved people is growing at a rapid rate. At the same time, the ability for people to communicate and share their knowledge is also increasing dramatically. All of this leads to a large quantitative shift in the amount of information and knowledge being generated and consumed.

Quantity being a quality all of its own, such large quantitative changes are bound to be accompanied by qualitative shift in the way that we use that information; the way that we contribute to the debate; and the way that we do our jobs.Increased specialization is an obvious consequence of this (ongoing) demographic change, but not the only one. What it means to be a specialist (or even generalist) has changed also.

I am convinced is that there is an increasing need for generalists and management professionals to be able to reason about issues at a level of fine detail that was not previously required or possible. Taking a broad-brush, 10,000 ft high approach, whilst still necessary, is no longer sufficient - we need to collectively sweat the details, because that, they say, is where the devil lies.

Pride and Culture

Just like the Toyota TPS, I have been influenced a lot by what W E Deming wrote, although he was concerned with the issues surrounding the management of industrial production rather than the management of design shops, so there is perhaps some scope for mismatch between our requirements and his philosophy. 

The thing that really resonates with me is the idea that pride in your work, and pride in your organization are both critically important motivational factors, and the idea that the pursuit of quality can be used to both improve productivity and the pleasure that work provides. 

This also chimes with this (world class) investment management advice: Invest in companies and organizations with a "soul" - with a strong, motivating culture that really makes people believe in, and belong to, an organization, over those that are hollow and impersonal. 

On a personal level, I want to take pride both in the organization for which I work, and the designs that I produce; I want to buy in to the culture and to believe in the mission. Moreover, I want to work for a company that has the homeostatic mechanisms in place to support that culture over the long term, and the profitability to sustain it. 

If this seems like a lot to ask, it is only because of the sad state of the world as it is right now. With a bit of organization, effort, and leadership, this sort of result is perfectly achievable, and with a bit of automation (yaaay technology!) it is perfectly reproducible.

Tuesday 28 August 2012

Breaking the liberal/conservative axis.

All models are wrong, but some of them are useful. 

The liberal/conservative dichotomy is a useful model, but the one thing that is more true of software than anything else is it's nonlinearity (and malleability).

If constructing an exception to a rule is possible, then we can automate it. If we can automate it, we can make it easy, and if we can make it easy, we can make it mainstream. In addition to describing the way things are, we also need to describe the way things should be.


Then make it happen.

Friday 24 August 2012

Is YAGNI always a good thing?



I am not sure that YAGNI really is good practice in all situations.

A minimalistic "only do what we need to right now" approach is appropriate under a wide range of situations, but really dramatic agility and performance comes about when you have developed a set of capabilities that are well-matched to emerging requirements.

I admit that this partly comes down to luck, but it is a powerful and uplifting experience when it happens.

So, typically, the business hands us a requirements spec, and we do some analysis to plan out how to tackle the project; a work breakdown is done, tasks are created, and we work away at those tasks to make a product that meets the spec. So far so conventional. All stakeholders are focussed on the product, and really do not think beyond the immediate project, because to do so would be wasteful, right? As software engineers, we have been told that modularity and reuse are good things, so we over-engineer components according to imagined modes of variation, but because everything is being done in the context of this one project, nothing ever stands any real chance of being reused anyway, because the organizational communication channels that are required to support that reuse simply do not exist. As a result, the software ends up being overcomplicated and expensive.

There are two (not necessarily exclusive) ways of tackling this problem. The first is pure YAGNI: "do not be precious". Your work is probably going to get thrown away anyway, so do the simplest thing that works for the immediate problem. This is, I think, a healthy way of thinking about problems, but it only gets you so far. The second is capability-building, which says: OK, while I am working away on this problem, let me make a bunch of libraries so that the next time I come to the same problem, I can move faster. Each of the libraries can start out simple, but as I move from project to project, product to product, my capabilities will improve and increase over time.

This is, in a sense, the opposite of YAGNI, but has helped me out on more than one occasion, when requirements come in that fit in very well with the capabilities at my disposal. On one particularly memorable occasion, I was able to turn around a small piece of functionality in about 45 minutes that one of our partner organizations had been working on for over a month. This simply because I had tools at my disposal that were a very good fit to the problem at hand.

My point is this; unless we build real capabilities (software libraries) that provide strategic differentiation and competitive advantage to the organizations for which we work, we will be forever stuck in a cottage industry, reinventing the wheel over and over and over and over again.

So:- whilst for many bike-shedding things it might be true that You Ain't Gonna Need It, if you choose carefully, maybe for some things You Are Gonna Need It, and when you do, you will be grateful that you have a capability to hand that enables you to spin on a sixpence and get product out the door FAST.

Agile: The 45 minute sprint

How long should a sprint be? 2 months? 2 weeks? Let us go right to one extreme. How about 45 minutes? Here is some homework for you: What tools would you need, what work would you have to do to make it possible to do an entire sprint in only 45 minutes?

Foundations of Agility


Being lightweight and low structure does not necessarily have anything to do with being Agile, although instances of the two characteristics do tend to be correlated in practice.

It is my strong belief that a humble approach to risk management is the distinguishing characteristic of agile processes.

Think about it like this: The term "Agile" deliberately evokes responsive, direction-changing imagery.  Indeed, all four Values in the agile manifesto either talk about combating impediments to change, or supporting mechanisms for change. The only thing that is not explicitly stated is the reason why responsiveness and changeability is important, although it is (I think) pretty obvious.

We do not know the future, so we have to be prepared for the unexpected. We cannot plan for everything, but that should not stop us from doing the best that we can. It is not "be prepared" in the boy-scout sense of being prepared for the worst, but rather a more sophisticated combination of anticipating change and removing current impediments to possible future maneuvers.

Agile is the technological equivalent of maneuver warfare, excepting that the objective is to merely keep up with our interlocutors and partners changing needs, rather than to overtake them and seize the initiative.

Processes and tools that support and facilitate change can (and sometimes do) have tremendous utility. It is just that (in the past) processes have tended to create harmful institutional inertia. Likewise, documentation traditionally has proven difficult to keep up-to-date in the face of changing systems and requirements, but this is not necessarily the case with documentation that is generated automatically from source comments. Similarly, contracts can be handled in an agile manner, providing that there is a well established, fast and easy way to handle change notices. Finally, as Ike Eisenhower said, "Plans are worthless, but planning is everything". If the presence of a plan imparts  inertia to a project, or if the process of planning takes anything other than an insignificant amount of time and resource away from execution, then it is clear that the activity is harmful; but to do away with planning altogether is folly.

Wednesday 22 August 2012

What is Agile?

Agile is humility. You do not have a crystal ball:- the unexpected can, (and frequently does) happen in this most uncertain of worlds.

Agile is old-fashioned, conservative decision making. Do not take risks that you can avoid.

Agile is the art of not committing to a course of action when you do not know what the future will bring. Postpone risky decisions, wait until you have more information before committing time and resources.

Agile is risk-aware prioritization. Schedule simple, low-risk activities that give you feedback and information before complex, high-risk activities that could go wrong.

Agile acknowledges the limits of prediction, and seeks to improve responsiveness and the capability to react quickly to external changes.

Agile is neither lightweight nor easy. You can be bad at being Agile just as you can be bad at any other activity.

Technology can help. The ability to respond quickly to external changes is an organizational capacity that must be built and nourished, just like any other capability that you want your organization to have.

You cannot plan change, but you can plan for change.

Monday 20 August 2012

Auto-disintermediation

We can increase transparency and prosperity through the disintermediation of informational and economic transactions:- I would rather trust a dumb algorithm than a conniving human.

Friday 17 August 2012

Sensor system (machine vision) development team organization and management.


A note on the organization of teams and processes for the development of automotive industry machine vision systems.

Sensor systems (particularly machine vision systems) operating in unconstrained environments pose significant design and engineering challenges. It is, however, possible to design these, and not necessary at exorbitant cost, either, but to do so means adopting some slightly unusual and highly, highly disciplined engineering practices.

First of all, the fact that the sensor system is to operate in an unconstrained environment with a very large number of free variables necessarily means that there will be lots of "black swan" / "long-tail" type problems that are not known in advance: the bird that flies in formation with the car for a short while, the van carrying a mirror on it's side; the torrential downpour limiting visibility; the train moving alongside the road; that sort of thing.

As a consequence, you need moderate-to-large volumes of pathological test data, together with the associated machinery and automation that will allow candidate system configurations to pit their wits against the worst that the test system can throw at them. 

The problem also requires that we get the basics right early on, so that the team can focus on the real problem without tripping themselves up with regressions and other problems (all to easy to do). Whilst solutions can be relatively simple (indeed they should be as simple as possible), the amount of work required to cover the basics is large, and no one person can make sufficient progress on their own. To succeed requires both teamwork and a level of focus, organization and automation that is not commonly found in an academic environment. It also requires that we master the tricky balance between discipline at the team level, and freedom to innovate at the individual level.

It is important that we pay attention to the organization of the system. We need to think about it for a little bit, and pick a good one, but move quickly and avoid bike-shedding. A wide range of organizations can be made to work. (Here is one I like). The organization of the system must be reflected in the organization of the team, and the organization of the filing system. The filing system would (ideally) be put in the version control system / repository, and the log used as a management tool to monitor progress. The important thing is that the organization is agreed-upon, and reflects the organization of the capabilities or systems being developed.

Having picked a good organizational structure, the next thing to focus on is total quality through test automation. 

Weekly meetings, daily standups, documentation & wikis etc... are all good ideas, but they are much less important than AUTOMATION, which IS ABSOLUTELY CRITICAL to success. This is why: humans are liars. We cannot help it. When we talk to other people, especially our superiors, we try to present ourselves in the best light. We spin the situation and use weasel words to make ourselves look better than we are. Honesty is crucially important for the team to make decisions. The only way to achieve honesty is to take reporting out of the hands of human beings.

You will need management buy-in for this. Use financial audit as an analogy. We audit the company books so investors can have some degree of faith in them, so why not also audit design progress?

First of all, you will need some hardware. Start with one server, running the repository and a simple post-commit hook script. (Ideally also stored in the repository) :-). Continuous Integration software like Hudson is not necessary (and perhaps not desirable). Better results can often be achieved with a Python, MATLAB or Go script (or some such) to run tests and measure performance.

The post commit hook script is the main way that we are going to feed-back performance to managers, so get it to save results to a file, and have a daily process summarize the contents and email the summary once a day to the management team. Do this even (especially) if the results file says "No progress measured today".

Initially, the management team might not want to read it, but get them to at least pretend that they do, so that there is at least the semblance of oversight. If they want more information, get them to ask for it to be included in the daily development status email. Try to encourage the view that "if it is not in the automated report, it is not done"

:-)

Bringing progress reporting and test automation together I think could be a powerful tool to help support and encourage transparent and professional development.

Manual reporting and communication attempts, such as documentation, manual progress reports, wikis and meetings have a smaller impact on success than tooling and automation.

It is about building a design production line for the organization.

Thursday 2 August 2012

Beware of false comforts

Ancient Sparta had no city walls:- I do not know why, but I like to think that it was a deliberate policy: A rejection of false comforts.

Static defensive positions can be useful, but only when coupled with a mobile army to disrupt the enemy's plans. On their own, all they can possibly do is postpone the inevitable. Of course, human nature being what it is, the appearance of safety that the city walls create soon leads to the withering of the real guarantor of safety: the army.

In software development too, we are surrounded by false comforts. We expend tremendous effort delaying integration and deployment, not touching legacy code "because it works" and so on, We should think hard about the procedures and tools that we use. Which ones give the appearance of safety, but in actuality provide only a false sense of comfort, and lead to a withering and neglect of the skills and tools that we really need to rely upon in a crisis?

Software development best practice for MATLAB users

Here is a response to a question on Stack Overflow: "Who organizes your MATLAB code?", that I posted back in 2011. It was nice to go back to this, and to realize that some of my recent rabid rants were, at one point in history, actually grounded in reality.

:-)

I have found myself responsible for software development best practice amongst groups of MATLAB users on more than one occasion.

MATLAB users are not normally software engineers, but rather technical specialists from some other discipline, be it finance, mathematics, science or engineering. These technical specialists are often extremely valuable to the organisation, and bring significant skill and experience within their own domain of expertise.

Since their focus is on solving problems in their own particular domain, they quite rightly neither have the time nor the natural inclination to concern themselves with software development best practices. Many may well consider "software engineer" to be a derogatory term. :-)

(In fact, even thinking of MATLAB as a programming language can be somewhat unhelpful; I consider it to be primarily a data analysis & prototyping environment, competing more against Excel+VBA rather than C and C++).

I believe that tact, diplomacy and stamina are required when introducing software engineering best practices to MATLAB users; I feel that you have to entice people into a more organised way of working rather than forcing them into it. Deploying plenty of enthusiasm and evangelism also helps, but I do not think that one can expect the level of buy-in that you would get from a professional programming team. Conflict within the team is definitely counterproductive, and can lead to people digging their heels in. I do not believe it advisable to create a "code quality police" enforcer unless the vast majority of the team buys-in to the idea.

In a team of typical MATLAB users, this is unlikely.

Perhaps the most important factor in promoting cultural change is to keep the level of engagement high over an extended time period: If you give up, people will quickly revert to follow the path of least resistance.

Here are some practical ideas:

Repository:
If it does not already exist, set up the source file repository and organise it so that the intent to re-use software is manifest in it's structure. Try to keep folders for cross-cutting concerns at a shallower level in the source tree than folders for specific "products". Have a top-level libraries folder, and try to discourage per-user folders. The structure of the repository needs to have a rationale, and to be documented.

I have also found it helpful to keep the use of the repository as simple as possible and to discourage the use of branching and merging. I have generally used SVN+TortoiseSVN in the past, which most people get used to fairly quickly after a little bit of hand-holding.

I have found that sufficiently useful & easy-to-understand libraries can be very effective at enticing your colleagues into using the repository on a regular basis. In particular, data-file-reading libraries can be particularly effective at this, especially if there is no other easy way to import a dataset of interest into MATLAB. Visualisation libraries can also be effective, as the presence of pretty graphics can add a "buzz" that most APIs lack.

Coding Standards:
On more than one occasion I have worked with (otherwise highly intelligent and capable) engineers and mathematicians who appear to have inherited their programming style from studying "Numerical Recipes in C", and therefore believe that single-letter variables are de rigueur, and that comments and vertical whitespace are strictly optional. It can be hard to change old habits, but it can be done.

If people are modifying existing functions or classes, they will tend to copy the style that they find there. It is therefore important to make sure that source files that you commit to the repository are shining examples of neatness, full of helpful documentation, comments and meaningful variable names. This is particularly important if your colleagues will be extending or modifying your source files. Your colleagues will have a higher chance of picking up good habits from your source files if your make demo applications to illustrate how to use your libraries.

Development Methodologies:
It is harder to encourage people to follow a particular development methodology than it is to get them to use a repository and to improve their coding style; Methodologies like Scrum presuppose a highly social, highly interactive way of working. Teams of MATLAB users are often teams of experts, who are used to (and expect to continue) working alone for extended periods of time on difficult problems.

Apart from daily stand-up meetings, I have had little success in encouraging the use of "Agile" methodologies in teams of MATLAB users; most people just do not "get" the ideas behind test-driven development, development automation & continuous integration. In particular, the highly structured interaction with the "business" that Scrum espouses is a difficult concept to generate interest in, even though some of the more serious problems that I have experienced in various organisations could have been mitigated with a little bit of organisation in the lines of communcation.

Administration:
Most of what constitutes "good programming practice" is simply a matter of good administration & organisation. It might be helpful to consider framing solutions as "administrative" and "managerial" in nature, rather than as "software engineering best practice".

Monday 30 July 2012

Cost-control and Amortization in Software Engineering: The importance of the source repository.

This discussion is not finished, but it is getting late, so I am putting it out regardless. Please excuse the jumps, the argument is put together like a tree, working down from the leaves, through the branches to the trunk. I will try to re-work it later to make it more linear and easier to read.


--


This is a discussion about controlling costs in Software Engineering. It is also a discussion about communication, bureaucracy, filing systems and human nature, but it starts with the debate about how best to organize ones source code repository; a debate dominated by the ascendance of the Distributed Version Control System (DVCS):

In the Distributed-vs-Centralized VCS debate I am generally agnostic, perhaps inclined to view DVCS systems slightly more favorably than their centralized counterparts.

I have used Svn and Hg and am happy with both. For a distributed team, working on a large code-base over low-bandwidth connections, Hg or Git have obvious advantages.

Many DVCS activists strongly advocate a fine-grained division of the code-base into multiple per-project repositories. In the context of a global, distributed development team, this approach is highly efficient as it allows developers to work independently with minimal need for synchronization or communication.

This is particularly striking when we remember just how difficult and time-consuming (i.e. hugely expensive) synchronization and communication can be if developers are working in different time-zones or only working part-time on a project.

However, this property is less relevant to a traditional, co-located and tightly integrated development team. In fact, I intend to  demonstrate that splitting the repository in this situation has serious implications that need to be considered.

Before I do that, however, I need to describe a number of considerations that motivate and support my argument.

Amortization as the only effective mechanism for controlling development costs.

Firstly, developing software and maintaining software is very expensive. The more complexity, the more cost. For a given problem, there is a limit to the amount of complexity that we can eliminate. There is also a limit to how much we can reduce developer salaries or outsource work before the complexity and cost imposed by the consequent deterioration in understanding and communication outweigh the savings.

The only other meaningful mechanism that we have to control costs is careful, meticulous amortization, through reuse across product lines, products & bespoke projects. I believe that reuse is fiendishly difficult to achieve, critically important, and requires a fundamental shift in thinking to tackle effectively. The difficulty in achieving reuse is supported by historical evidence: re-use is rare as hen's teeth. It's importance, I hope, is self-evident, and the fundamental shift in thinking is required because we are working against human nature, against historical precedent, and against some unfortunate physics.

Much of this argument is a discussion about how to overcome these problems and facilitate the amortization of costs at a variety of different levels, and how a single, monolithic repository (or filing system) can support our efforts.


Reuse is difficult partly because of what software development is.

More than anything else, software development is the process of learning about a problem, exploring different solutions, and applying that learning to the development of a machine that embodies your understanding of the problem and it's solution; software both defines a machine and describes a model of our understanding of the problem domain. The development of a piece of software is primarily a personal intellectual exercise undertaken by the individual developer, and only incidentally a group exercise undertaken by the team.

Reuse by one person, of a software component written by another, must at some level involve some transfer of some degree of understanding.


Communications bandwidth is insufficient.

The bandwidth offered by our limited senses (sight, hearing, smell etc..) is insignificant when held up against the tremendous expressive power of our imagination; the ability of our brain to bring together distant memories with present facts to build a sophisticated and detailed mental model. The bandwidth offered by our channels of communication is more paltry still; even the most powerful means of communication at our disposal, carried out face-to-face, is pathetic in comparison; writing, documentation, even more contemptible. (Although still necessary).

Communicating the understanding built up in the process of doing development work is very difficult because the means that we have at our disposal are totally inadequate.

Reuse by one person, of a software component written previously by the same person is orders of magnitude easier to achieve than reuse of a software component written by another. Facilitating reuse between individuals will require us to bolster both the available bandwidth and the available transfer time by all means and mechanisms possible. We will need to consider the developer's every waking moment as a potential communications opportunity, and every possible thing that is seen or touched as a potential communications channel.



Re-Think the approach.

One solution to the problem is to side-step communication. A developer is already an expert in, and thus tightly bound to the software components that he has written; A simple mechanism for reuse is simply to redeploy the original developer, along with his library of reusable components, on different projects.

As with the individual, so with the team. A team of individuals that has spent a long time together has a body of shared organizational knowledge; they have had opportunities to share experiences, have evolved a common vocabulary, culture and approach. Collectively, they are also tightly bound to the software components that they have written. A team, expert in a particular area, is a strategic asset to an organization, and the deployment of that asset also needs to be considered strategically, in some detail, by the executive.

(Make people's speciality part of their identity.)

Strategic planning needs to think in terms of capabilities and reuse of assets. The communication of the nature of those capabilities and assets needs to be a deliberate, planned activity.




Exploit pervasive communication mechanisms.

We need to think beyond the morning stand-up meeting; our current "managed" channels of communication are transient; but our eyes and ears are less fleetingly open. We need to take advantage of pervasive, persistent out-of-band communication mechanisms. Not intranets and wiki pages that are seldom read, nor emails and meetings that arrive when the recipient is not maximally receptive. We need communications channels that are in the background, not intrusive, yet always on. Do not place the message into the environment, make the environment the message.

Make the office layout communicate reuse. Make the way the company is structured communicate reuse, make where people sit communicate reuse, and above all else, make the way that your files are organized communicate reuse. 




The repository is a great pervasive communication mechanism.

As developers, we need to use the source repository frequently. The structure of the repository dictates the structure of our work, and defines how components are to be reused.

The structure of the repository and the structure of the organization should be, by design, tightly bound together, as they are so often by accident. If the two are in harmony, the repository then becomes much more useful. It provides a map that we can use to navigate by. It will help us to find other software components, and, if components and people are tightly bound together, other people also. The repository is more than a place to store code; it is a place to store documentation and organizational knowledge, and to synchronize it amongst the members of the organization. Indeed, it is the place where we define the structure and the nature of the organization. It is the source of more than just binaries, but of all organizational behavior.





TODO:

Financial reporting still drives everything - it needs to be in harmony also.

Trust is required. (maybe not)

Not the whole story.

Another problem: reuse is difficult because it puts the cart before the horse. How can we decide what component to reuse before we have understood the problem? How can we truly understand the problem before we have done the development work? If we are solving externally defined and uncorrelated problems that we have no influence over, then we could never reuse anything.

Sounds difficult? Lessons from the Spartans on the nature of city walls.

Back to repositories.

Limiting code reuse. Difficult to do "hot" library development.

One of the frequent arguments that gets trotted out in the Git/Hg-vs-SVN debate is that of merging.
The merge conflict argument is bogus. Proper organization and DRY prevents merge issues.