Idle Conjectures in Search of Refutation: May 2012

Tuesday, 29 May 2012

Artificial Intelligence: Prerequisites.

How do we go about developing "Artificial Intelligence"?

If I recall correctly, Von Neumann once defined Artificial Intelligence very simply as "Advanced Computer Science".

In a different context, Arthur C Clarke said that "Any sufficiently advanced technology is indistinguishable from magic."

Inverting the intended sense of the latter phrase, it seems that, when envisioning what Artificial Intelligence might look like, we naturally seek
conceptually advanced techniques; a brilliant and unified theory of intelligence seemingly being required to produce an intelligent machine.

In other words, we look for the silver bullet, which, as Fred Brooks (again, from a different context) reminds us, does not exist.

Let us be a little more humble then. The development of intelligent machines might well not be so special as to require the development brilliant theoretical underpinnings.

The little I know of biologicial brains leads me to suspect that their function is more readily understood as the aggregate of a few hundred relatively simple processes rather than a small number of stupendously sophisticated ones.

Indeed, to make Artifical Intelligence a reality, we should focus on the prosaic and mundane rather than the exotic.

In my mind, the basic thing that stands in our way is simple software engineering, and the generally rather crude way that we currently go about it.

We need better tools and processes for developing complex software. Not just better languages, but better IDEs, testing frameworks, build servers, static analysis tools, refactoring tools, management tools etc.. etc...

The seemingly exotic, once we are familiar with it, collapses to the banal and the mundane. What is left is just a lot of hard work.

Let us get to it, then.

Friday, 25 May 2012

Open Community development vs Closed Commercial Development

Many popular contemporary tools (DVCS etc...) and workflows have emerged from the open development community.

The open development community has a very different set of requirements from closed, commercial development shops.

Sometimes they just need different tools.

Wednesday, 23 May 2012

Great advice for developers, great advice for life.

What is past is prologue - William Shakespeare.

Learn from the mistakes of others. You can't live long enough to make them all yourself - Eleanor Roosevelt.

Let us be a little humble; let us think that the truth may not perhaps be entirely with us - Jawaharlal Nehru.

(courtesy of the good folk of Sophos)

Thursday, 10 May 2012

What is the role of documentation in an agile team?

The distinguishing feature of an "agile" techniques is it's approach to risk. We have the humility to admit that development is primarily a learning process, and that risk originates from factors that are initially unknown.

Agile techniques seek to reduce development risk by producing a minimum viable product as early as possible, by postponing critical decisions until sufficient information is available, and by learning-by-doing. Storage and dissemination of lessons-learned is a critical part of any non-trivial agile effort.

Documentation is therefore critically important, but because documentation may change significantly and rapidly (as the team's understanding of the problem evolves) the form that the documentation takes must be very different in an agile development organization from a classical development organization.

I cannot stress this enough: Development Automation is THE key enabling set of technologies that enables us to take an agile approach. Documentation must be written in a formal manner, accessible to automation and able to be changed rapidly with minimal effort.

It also must be parsimonious and accessible enough to be disseminated rapidly. If automation is limited, then documentation cannot be lengthy. If automation is sophisticated, then more documentation can exist. Ultimately, the quantity of documentation is limited by the ability of team members to absorb the information rather than the speed with which it can be re-written.

Well written, readable source documents meet these criteria.

Monday, 7 May 2012

The OOP/FP argument, yet again.

I like OOP when I am the one writing the software, because I can think very naturally in terms of modelling the problem domain, but my opinion rapidly turns on it's head when I need to read/debug some OOP software that somebody else has written (When I need to think about control flow and behavior).

Writing good (readable) OOP problems is hard. Debugging them is even harder. Navigating then when armed with just the source code is harder still. (Diagrams please!)

Whilst most individual objects and methods in OOP systems are small and easy to understand in isolation (this is a good thing), I find that the flow of execution and the behavior of the system in the large becomes very difficult and time-consuming to understand. It is as if the fundamental complexity of the problem has been shoveled around from the small scale to the large scale.

To be fair, the same complaint probably applies to FP as much as to OOP; the tension driving this dialectic exists between the Functional Decomposition and the DRY principle on the one hand, and readability and narrative flow on the other. (Or the battle between reader and writer, to put it more colorfully)

Saturday, 5 May 2012

Tools for individual traits

One of my former employers (FIL) was noteworthy for it's culture of introspection. It encouraged staff to discover their own strengths & weaknesses, biases & predilections, and to use that knowledge to work better and to make better decisions. (Other financial institutions encourage similar cultures, to a greater or lesser degree). This was a fantastic and valuable lesson to learn.

So, swallowing a dose of humility, here goes:

I make mistakes all the time. Embarrassingly often.

Most of these mistakes are errors of omission: oversights. The spotlight of consciousness operating in my brain is unusually narrow. This means that I am reasonably good at something if I am focussing on it, but if I am not paying attention (which is most of the time for most things), I have a tendency to miss things in a way that is, well, rather absent minded.

This is not an uncommon tendency. Most people get over it by deploying organizational systems and practices to help them concentrate. Lists, and notes and obsessive habits and the like. I am a software developer. I use automated tests & other forms of development automation.

Without these, I tend to make a lot of mistakes and move very slowly. With them, I can move quickly, be creative & productive, and focus on making new things without worrying (too much) about what I have missed.

Standing back for a moment, we can observe that the usefulness of the tools that we use are really driven by the capabilities and characteristics of the people who use them. I am a bit obsessed by development automation because I rely on it to such a great extent. Other people will find other tools useful to a greater or lesser extent because of their own unique capabilities and weaknesses.

Exponential Growth + Network Effects.

Dr Albert A Bartlett's lecture on Arithmetic, Population and Energy is a really great introduction to the exponential function, and how we generally fail to understand it's implications. In the unlikely event that you have not already seen it, go watch it now, it is worth your time.

So, what things are growing? (Perhaps not exponentially due to limiting factors, but at a dramatic and most likely super-linear pace even so):

The total population of the world.
The percentage of the population who are educated to a certain level.
The percentage of the population connected to the internet.

Another good read is The Mythical Man Month, by Rodney Brooks. In this book, he observes that the number of channels of communication in a group of n individuals is given by n(n-1)/2. I.e. in big Oh notation O² (O squared).

Now, taking these things together, we have a growth in the number of potential channels of communication that increases at a rate somewhere between polynomial (O squared) and exponential.

Assuming (naively) that any given (non-geographically limited) interest group will scale linearly with the total number of potential channels of communication, all forums should experience this rate of increase.

I have always believed that quantitative change inevitably drives qualitative change.

So what impact will this have on the quality of communication (in terms of properties and characteristics, not value)?

Since there are physical (bandwidth, mental capacity) limits on our individual capacities to communicate, a drive towards increasing specialization must be a consequence (this is generally acknowledged, although the rate at which our specialization must increase is probably under-appreciated).

What other effects will we see? Any comments?

Will this affect things other than just communication? What about the economy? How we divide labour? The network effects are in evidence there, also.

Thursday, 3 May 2012

Complexity: It had better be worth it!

A response to the StackExchange question: What is the Optimal Organizational Structure for IT?

Conway's Law states:

"..organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations"

The corollary of this is that the best organizational structure is the same as the best software architecture.
Not knowing the specifics of your business, I am basing the following on some sweeping generalizations:

Experience indicates that software development is both expensive and risky; and that this risk and expense can only (even with strenuous effort) be reduced to a very limited extent.

Since cost & the level of risk is (approximately) fixed, you need to increase returns to achieve an acceptable reward:risk ratio.

The first obvious consequence of this is that organisations should focus on projects with a big pay-off. This is not always achievable, as the opportunities might not exist in the marketplace. The second obvious consequence of this is that development costs should be amortized as much as possible. For example, by spreading development costs over multiple product lines & projects. (I.e. code reuse).

So, back to Conway's law: What organizational structure maximises code reuse? The obvious answer would be to align organizational units around libraries & APIs, with each developer responsible for one or more libraries AND one or more products. How libraries get re-used is then no longer a purely technical decision, but an important business decision also. It should be a management function to ensure that development costs are amortized effectively to maximise return per unit development effort.

Each developer then has responsibility for the development, testing & in-service performance of the features supplied/supported by his library.

Development Concerns with MATLAB

My response to a StackExchange question from a while ago: Who organizes your MATLAB code?

I have found myself responsible for software development best practice amongst groups of MATLAB users on more than one occasion.

MATLAB users are not normally software engineers, but rather technical specialists from some other discipline, be it finance, mathematics, science or engineering. These technical specialists are often extremely valuable to the organisation, and bring significant skill and experience within their own domain of expertise.

Since their focus is on solving problems in their own particular domain, they quite rightly neither have the time nor the natural inclination to concern themselves with software development best practices. Many may well consider "software engineer" to be a derogatory term. :-)

(In fact, even thinking of MATLAB as a programming language can be somewhat unhelpful; Taking a cue from one of my former colleagues, I consider it to be primarily a data analysis & prototyping environment, competing more against Excel+VBA rather than C and C++).

I believe that tact, diplomacy and persistence are required when introducing software engineering best practices to MATLAB users; I feel that you have to entice people into a more organised way of working rather than forcing them into it. Deploying plenty of enthusiasm and evangelism also helps, but I do not think that one can expect the level of buy-in that you would get from a professional programming team. Conflict within the team is definitely counterproductive, and can lead to people digging their heels in. I do not believe it advisable to create a "code quality police" enforcer unless the vast majority of the team buys-in to the idea. In a team of typical MATLAB users, this is unlikely.
Perhaps the most important factor in promoting cultural change is to keep the level of engagement high over an extended time period: If you give up, people will quickly revert to follow the path of least resistance.

Here are some practical ideas:

Repository: If it does not already exist, set up the source file repository and organise it so that the intent to re-use software is manifest in it's structure. Try to keep folders for cross-cutting concerns at a shallower level in the source tree than folders for specific "products". Have a top-level libraries folder, and try to discourage per-user folders. The structure of the repository needs to have a rationale, and to be documented.

I have also found it helpful to keep the use of the repository as simple as possible and to discourage the use of branching and merging. I have generally used SVN+TortoiseSVN in the past, which most people get used to fairly quickly after a little bit of hand-holding.
I have found that sufficiently useful & easy-to-understand libraries can be very effective at enticing your colleagues into using the repository on a regular basis. In particular, data-file-reading libraries can be particularly effective at this, especially if there is no other easy way to import a dataset of interest into MATLAB. Visualisation libraries can also be effective, as the presence of pretty graphics can add a "buzz" that most APIs lack.

Coding Standards: On more than one occasion I have worked with (otherwise highly intelligent and capable) engineers and mathematicians who appear to have inherited their programming style from studying "Numerical Recipes in C", and therefore believe that single-letter variables are de rigueur, and that comments and vertical whitespace are strictly optional. It can be hard to change old habits, but it can be done.

If people are modifying existing functions or classes, they will tend to copy the style that they find there. It is therefore important to make sure that source files that you commit to the repository are shining examples of neatness, full of helpful documentation, comments and meaningful variable names. This is particularly important if your colleagues will be extending or modifying your source files. Your colleagues will have a higher chance of picking up good habits from your source files if your make demo applications to illustrate how to use your libraries.

Development Methodologies: It is harder to encourage people to follow a particular development methodology than it is to get them to use a repository and to improve their coding style; Methodologies like Scrum presuppose a highly social, highly interactive way of working. Teams of MATLAB users are often teams of experts, who are used to (and expect to continue) working alone for extended periods of time on difficult problems.

Apart from daily stand-up meetings, I have had little success in encouraging the use of "Agile" methodologies in teams of MATLAB users; most people just do not "get" the ideas behind test-driven development, development automation & continuous integration. In particular, the highly structured interaction with the "business" that Scrum espouses is a difficult concept to generate interest in, even though some of the more serious problems that I have experienced in various organisations could have been mitigated with a little bit of organisation in the lines of communication.

Administration: Most of what constitutes "good programming practice" is simply a matter of good administration & organisation. It might be helpful to consider framing solutions as "administrative" and "managerial" in nature, rather than as "software engineering best practice".

Wednesday, 2 May 2012

Lowering the barriers to Entry

or ... From svn to hg and back again (and more importantly ... where next?)

I have been using Mercurial for the past 6 months or so, and I am still only partially sold on the whole DVCS movement.

I used Subversion exclusively for around 4 years, between 2007 and 2011, and a mixture of Perforce, StarTeam & SourceSafe (shudder) in the years prior to that. (I even did it manually for a while, before I knew better). These formative experiences occurred (mostly) in corporate environments, where I was frequently faced with the task of evangelizing software development best-practices in teams dominated by non-programmers (academics or other domain specialists).

Here the challenge is in working effectively with colleagues who are accustomed to working alone for long periods, and for whom sharing of work is done through network folders and email (or peer-reviewed, published articles!).

It is easy to forget that most professionals will require a significant amount of convincing before they will tolerate even minor inconveniences. Subversion, one of the easiest version control systems to use, still presents major barriers to adoption. Mercurial, for all of it's DVCS goodness, requires yet more knowledge & presents yet more friction to the non-developer user than Subversion. I am not even going to think about discussing Git.

So, how can we lower the barriers to entry and reduce everyday friction for modern development automation systems? Can we make using distributed version control easier that using Subversion? Easier than using email to share work? Easier than using network folders?

Can we find a much simpler way to solve the same essential problem that version control systems, configuration management systems & source document repositories solve

Well, what is the essential problem that they are trying to solve, anyway? It has something to do with collaboration, something to do with man-management, and something to do with asset management, organization and control.

Like most issues that appear, on the surface, to be purely technological, when you peel back the surface, it becomes possible to discern psychological, sociological and political factors at play; but by the same token, once analyzed, these (potentially confounding) influences simply become additional technical problems that can be managed by technical means.

So, we use version control & configuration management systems is to help organize our source documents, organize our development processes, and organize how work is divided up and merged back together again. They give us visibility, they keep a record of what happened in the past, and enable us to predict and plan what is going to happen in the future. They are the ally of the obsessive-compulsive side of our personality, and they give us the comforting feeling that everything is under control. As much as they are anything else, they are also an emotional crutch, and in that, they are a political ally against the local-optimum seeking risk minimizers in life.

I have a lot of hope that real-time collaborative editing (Etherpad - Realie) and online development environments (Koding - CodeNow) will find success and provide us with a rich set of options for our future development environments; they certainly offer an aggressive simplification and improvement over the current state of affairs! (Although I believe that they will need to address the above political concerns to gain widespread traction in a conservative (small c) world.)

I also hope that these environments pick up the user-interface ideas that are being promoted by LightTable et al, and provide support for the broader engineering community (Embedded and safety-critical systems in particular) as well as web & enterprise development.