Thursday 2 August 2012

Software development best practice for MATLAB users

Here is a response to a question on Stack Overflow: "Who organizes your MATLAB code?", that I posted back in 2011. It was nice to go back to this, and to realize that some of my recent rabid rants were, at one point in history, actually grounded in reality.

:-)

I have found myself responsible for software development best practice amongst groups of MATLAB users on more than one occasion.

MATLAB users are not normally software engineers, but rather technical specialists from some other discipline, be it finance, mathematics, science or engineering. These technical specialists are often extremely valuable to the organisation, and bring significant skill and experience within their own domain of expertise.

Since their focus is on solving problems in their own particular domain, they quite rightly neither have the time nor the natural inclination to concern themselves with software development best practices. Many may well consider "software engineer" to be a derogatory term. :-)

(In fact, even thinking of MATLAB as a programming language can be somewhat unhelpful; I consider it to be primarily a data analysis & prototyping environment, competing more against Excel+VBA rather than C and C++).

I believe that tact, diplomacy and stamina are required when introducing software engineering best practices to MATLAB users; I feel that you have to entice people into a more organised way of working rather than forcing them into it. Deploying plenty of enthusiasm and evangelism also helps, but I do not think that one can expect the level of buy-in that you would get from a professional programming team. Conflict within the team is definitely counterproductive, and can lead to people digging their heels in. I do not believe it advisable to create a "code quality police" enforcer unless the vast majority of the team buys-in to the idea.

In a team of typical MATLAB users, this is unlikely.

Perhaps the most important factor in promoting cultural change is to keep the level of engagement high over an extended time period: If you give up, people will quickly revert to follow the path of least resistance.

Here are some practical ideas:

Repository:
If it does not already exist, set up the source file repository and organise it so that the intent to re-use software is manifest in it's structure. Try to keep folders for cross-cutting concerns at a shallower level in the source tree than folders for specific "products". Have a top-level libraries folder, and try to discourage per-user folders. The structure of the repository needs to have a rationale, and to be documented.

I have also found it helpful to keep the use of the repository as simple as possible and to discourage the use of branching and merging. I have generally used SVN+TortoiseSVN in the past, which most people get used to fairly quickly after a little bit of hand-holding.

I have found that sufficiently useful & easy-to-understand libraries can be very effective at enticing your colleagues into using the repository on a regular basis. In particular, data-file-reading libraries can be particularly effective at this, especially if there is no other easy way to import a dataset of interest into MATLAB. Visualisation libraries can also be effective, as the presence of pretty graphics can add a "buzz" that most APIs lack.

Coding Standards:
On more than one occasion I have worked with (otherwise highly intelligent and capable) engineers and mathematicians who appear to have inherited their programming style from studying "Numerical Recipes in C", and therefore believe that single-letter variables are de rigueur, and that comments and vertical whitespace are strictly optional. It can be hard to change old habits, but it can be done.

If people are modifying existing functions or classes, they will tend to copy the style that they find there. It is therefore important to make sure that source files that you commit to the repository are shining examples of neatness, full of helpful documentation, comments and meaningful variable names. This is particularly important if your colleagues will be extending or modifying your source files. Your colleagues will have a higher chance of picking up good habits from your source files if your make demo applications to illustrate how to use your libraries.

Development Methodologies:
It is harder to encourage people to follow a particular development methodology than it is to get them to use a repository and to improve their coding style; Methodologies like Scrum presuppose a highly social, highly interactive way of working. Teams of MATLAB users are often teams of experts, who are used to (and expect to continue) working alone for extended periods of time on difficult problems.

Apart from daily stand-up meetings, I have had little success in encouraging the use of "Agile" methodologies in teams of MATLAB users; most people just do not "get" the ideas behind test-driven development, development automation & continuous integration. In particular, the highly structured interaction with the "business" that Scrum espouses is a difficult concept to generate interest in, even though some of the more serious problems that I have experienced in various organisations could have been mitigated with a little bit of organisation in the lines of communcation.

Administration:
Most of what constitutes "good programming practice" is simply a matter of good administration & organisation. It might be helpful to consider framing solutions as "administrative" and "managerial" in nature, rather than as "software engineering best practice".

No comments:

Post a Comment