Idle Conjectures in Search of Refutation: September 2012

Friday 28 September 2012

Array languages rock

As a long-time MATLAB developer, I may be biased, but: Array languages rock!

Why?

They allow the developer to express simple numerical relationships in an incredibly declarative manner. In this, they go a further than typical functional languages, by allowing you to discard the use of "zip", "map", list comprehension constructs etc.. for a variety of simple cases.

procedural < functional < array

Indeed, when one gets to the point where much of the standard library of an array language is able to accept and return collections, the developer becomes able (in many cases) to purge flow control from great swathes of the program.

As a result, one ends up with a program where control flow proceeds linearly, from top to bottom, greatly reducing the mental gymnastics that are needed to understand what the algorithm is doing, greatly improving readability, and dramatically improving productivity.

Sure, the run-time performance of MATLAB often sucks, but there is no real reason why array languages cannot be highly performant. This is why I am feeling excited about the emergence of languages like Julia, that attempt to be the best of both worlds.

These new, emerging languages excite me:

Go: For it's fast compile, opinionated toolset, close-to-the-metal feel, approach to object-orientation and its support for concurrency.
Julia: For attempting to make array programming performant.
Cobra: For bringing design-by-contract back into vogue.

My ideal software tool:

Designed primarily to allow academics, data scientists, mathematicians, financial engineers and other domain experts to prototype ideas quickly, it will feel a lot like MATLAB with a highly interactive development environment, excellent plotting and visualisation tools, and a large standard library filled with functions that accept and return arrays.

It will allow these prototypes to be quickly and easily brought up to production quality standards by supporting (out of the box) quality control mechanisms like strong typing, design by contract, assertions, built in static analysis, test coverage tools & continuous testing tools.

Like Go, it will compile very quickly to highly performant native binaries, support concurrency and multiprocessing out-of-the box, and be able to deploy to future production environments, which are likely to include highly heterogenous "cloud" environments as well as multicore CPU + GPU embedded systems.

Why declarative?

One of the challenges of software development is to try to identify those aspects of the system that are likely to remain constant, and those areas that are likely to change in future. These aspects are rarely specified, rarely understood, and rarely do we (as developers) get the architecture of the software right (in this respect).

Having done this, we try to make the transient, changeable aspects of the system as declarative as possible; perhaps going as far as to express them as statements declared in a configuration file.

Having done a bit of Prolog programming in the dim and distant past, my intuition is that trying to make everything declarative is a mistake; one ends up tying oneself into knots. The mental gymnastics simply are not worth it. However, splitting the program into declarative-and-non-declarative parts seems reasonable.

In fact, the idea that a a multi-paradigm approach can be used, with some parts of the system expressed in OO terms, some in functional terms, and some in declarative terms, seems to be gaining in acceptance and popularity.

Note: (See also Monads and some examples as another way of thinking about this)

Tuesday 25 September 2012

On the importance of over-communication.

In a crisis, do not seek to control information flow. Instead, over-communicate. Have confidence in people's intelligence and tolerance. Confusion is a natural part of learning, and disagreement and debate are a natural part of team decision making.

You will either get through the crisis as a team, or you will end up postponing problems for later, and sewing seeds of distrust to trip you up in future.

I would rather work in an argumentative and critical but open and honest organization than in an organization that papers over disagreement and misunderstanding with dishonesty and spin.

Friday 21 September 2012

Motivation trumps Method

We can either tackle the problem at hand by the most direct route possible, or we can try to build tools & abstractions to help future development efforts become faster and more efficient.

The first end of the spectrum corresponds largely to the notion of a "product-oriented" development attitude, and the second end to the notion of a "capability-oriented" development attitude.

The position on the spectrum that we select is driven both by immediate constraints and long term goals, and how we balance one against the other. Living exclusively at either extreme is generally harmful and counterproductive.

This dichotomy is very loosely related to that which contrasts the predictive/CMMI/waterfall development approach with the reactive/agile approach to development in that the reactive/agile approach is very well served by a toolset that has come into existence through the efforts of a capability-oriented team.

However, the predictive-vs-reactive dichotomy is really about managing risk, whereas the product-vs-capability dichotomy is really about amortizing costs.

There is no “right” answer, it all depends upon the situation that we are in and the culture of the organization(s) with which we are operating. One thing that is really important however, is motivation. Software is all about detail, and detail is all about attention, and attention is all about motivation, and motivation is all about the team. Maintaining motivation and enthusiasm in the team just about trumps everything else.

We should do what we enjoy, because we will be so much more productive that any other consideration will be secondary.

Development is Exploration.

People talk about software being grown, not built. Here is another perspective:

Software development is an act of exploration, uncovering a set of truths or relationships, whether perpetual or contingent, (and therefore transient), together with an abstract machine to turn that set of truths into a system that can interact with the problem and thereby do useful work.

Thursday 20 September 2012

Multi-project integration.

This is a response to the question: "How to maintain same code fragments on multiple projects":

The DRY principle states that there should be a single authoritative source of truth. This dictates the use of a single *source* repository for all program logic, with no duplication; files organized to promote sharing and reuse of documents.

The pragmatic requirements of communication in a distributed, multi-team environment suggests that there should be multiple independent repositories, one for each project or collaboration.

This presents a problem, as the requirements (at first glance) would seem to be contradictory. However, it is possible to use scripting & automation to smooth over this awkward fact, and to create two separate but consistent interfaces to the body of work.

The one unified repository acts as the single authoritative source of truth regarding all of the projects. The build process for each project copies all the files used by that project (and only those files) into an intermediate location, then build from that intermediate location. (Unison or some similar tool can be used to move deltas instead of whole files).

These intermediate locations can be used as local working copies for the set of secondary, derived, or downstream repositories. Post-commit hook scripts on the authoritative repository will update all of the intermediate locations, and, for each in turn, check if it has changed, and make the same commit to the corresponding secondary repository if a change is detected.

Likewise, post-commit hook scripts on the intermediate repositories can be implemented to update files in the central repository, enabling inbound changes from collaborators to propagate to the authoritative repository.

This way, the multiple secondary repositories are kept in sync with the single authoritative source repository, and the build process ensures that the secondary repositories contain all of the (possibly shared) documents and other files that are needed for the build to succeed. Finally, and most importantly, the development and build process ensures that files may be edited in one place and one place only, and that all copies are kept consistent and up to date.

This solution is tool agnostic, it should be possible to use Svn, Hg or Git as repositories, and bash or batch to implement the synchronizing logic. Something like Unison may be useful to synch the downstream repositories with the authoritative repository,

Monday 17 September 2012

Capability vs Product Orientation

In response to: http://iamwilchung.wordpress.com/2012/09/17/make-tools-for-yourself-and-others-around-you/

I have always thought of the creation of internal tools and libraries in terms of the following dichotomy:

"capability-oriented" vs "product-oriented"

The "capability-oriented" approach seeks to maximise the capabilities of the organization over the long term, opening up opportunities for cost amortization over disparate products. By contrast, the "product-oriented" approach is "lean" in the sense that it seeks to minimize work not absolutely necessary for the current product.

The "product-oriented" approach seeks a local optimum, and is prevalent in industries with heavyweight cost controls and minimal opportunities for investment (Defense, for example), whereas the "capability-oriented" approach seeks a global optimum, and is only possible in situations where organizational leadership is capable and willing to get involved with detailed technical planning.

As with all dichotomies, organizations must seek a compromise position that varies from product to product and project to project.

A sweet-spot that I have frequently found: Third-party libraries are necessarily written in a very general manner, capable of being used in a wide variety of ways. A particular organization generally only uses a small subset of the features in a particular third-party library. As a result, an internal library that acts as a thin wrapper around an external/third-party library can help encourage and enforce a consistent approach within an organization, helping to create an institutional culture and improving interoperability between teams whilst reducing learning barriers for new team members.

Thursday 13 September 2012

Automated micro-outsourcing to reduce design costs by supporting hyper-specialization.

I have been re-reading Rodney Brooks "No Silver Bullet" paper. It is fantastic stuff, and I am glad that I came back to it (It has been several years since I last looked at it).

The paper is showing it age just a teensy tiny little bit (His enthusiasm for Ada and OOP seems slightly too strong in retrospect). With the benefit of experience, we now have a bit better understanding of how people react to complex tools, and as a result now tend to favor tools that encourage (or enforce) simplicity and consistency.

Anyway, that is not what I want to write about. The bit that particularly popped out at me this morning is the section on "Buy vs build". This chimes well with my belief that the only effective way to control software costs is through amortization. However, I had always limited my thinking to amortization within an organization, and so focussed my efforts entirely on how one could organize the source code repository to encourage reuse of components & libraries from within the organization, or engineer tools to help encourage reuse. (For example, an always-on background structural-similarity search to identify and bubble-up structurally similar source documents).

However, this does not address the concerns that one of my other closely-held beliefs implies. I believe that much of the value that is derived from development activities comes from the learning and resulting improvements in knowledge and understanding of the individual developers. This knowledge and expertise remains with the individual, and is a good part of the 10x performance differences that many people observe in software developers.

So, how about using the same sort of background structural-similarity search to identify potential collaborators / consultants whose expertise fits in well with the system that you are trying to create. In the same way that an advanced IDE would make the suggestion: "perhaps you want to buy this library to help you solve this problem here...", you could imagine the same IDE making the suggestion "perhaps you want to hire this developer to help you here..."

Or maybe it is just my recent involvement in the advertising industry that makes my mind turn so quickly to commercial transactions. :-)

Wednesday 12 September 2012

Obsessive Perfectionism: Success and Failure.

Perfectionism is both a blessing and a curse - the obsessive pursuit of perfection is necessary for the creation of truly great works, but the obsessive pursuit of perfection is responsible for the most abject and catastropic failures

Attention and Agility

As individuals and as organizations, we are only capable of paying attention to a teensy tiny little bit of the world at any one time.

The bit of the world that receives our attention is improved, and our performance with respect to it is (generally) solid, but we always forget that it is only a tiny little piece of the puzzle. Errors and omissions tend to happen where our attention is not directed, at the boundaries between individual and team responsibilities, for example, or where the play is fast moving and priorities quickly shifting.

The primary purpose of automation is to help us correct and compensate for that deficiency; to allow us to be more organized and systematic, without compromising our ability to be mentally agile and flexible.

--

It is important to note that there are other aspects of automation and tooling that exacerbate, rather than compensate for, this deficiency. In particular, as Fred Brooks notes in his spookily prescient "No Silver Bullet" essay:

... the screens of today are too small, in pixels, to show both the scope and the resolution of any seriously detailed software diagram. The so-called "desktop metaphor" of today's workstation is instead an "airplane-seat" metaphor. Anyone who has shuffled a lap full of papers while seated between two portly passengers will recognize the difference--one can see only a very few things at once. The true desktop provides overview of, and random access to, a score of pages. Moreover, when fits of creativity run strong, more than one programmer or writer has been known to abandon the desktop for the more spacious floor. The hardware technology will have to advance quite substantially before the scope of our scopes is sufficient for the software-design task...

(This is why a couple of big 30" monitors is still not enough...)

Socio-Technical Architecture

Caught in an iconoclastic mood, I would like to challenge some conventional thinking: that distributed system architecture should favor lots of little independent services, each doing one thing, and doing one thing well.

The conservation of complexity principle (Brooks, No Silver Bullet) suggests that whenever we think that we are simplifying something, we are really just pushing the complexity around to somewhere where it is less visible, and less easily managed.

I think that this is sometimes the case here - If you have really good operations & sysadmin resources, and limited developer resources, then the lots-of-little processes architecture is probably appropriate.

If, on the other hand, you have good developer resources, and limited operations & sysadmin resources, then all that you are doing is shifting the complexity to someplace where you are lack the tools to manage and control it.

In summary, the quality of good architecture depends more on how it complements and supports the capabilities of the team and toolset than it does on any fundamental natural law.

Software engineering is a social science, not a hard engineering discipline, nor a mathematical or physical science.

Tuesday 11 September 2012

On the Primacy of Empirical Truth

Truth comes only (*) from testing oneself against Nature. Too often, our lives are spent; utterly wasted in a social discourse that has no grounding, no anchoring in any form of truth, but rather in meaningless social judgementalism: "x is good", "y is bad".

http://www.aaronsw.com/weblog/anders

(*) Covers both empiricism and rationalism - IMHO empirical truths are less "truthy" than rational truths, but infinitely more useful.

"In God we trust; all others must bring data" - W. Edwards Deming.

Thursday 6 September 2012

Quadrature ICS

I have been following the (heated) debate around Git's usability that has been going on recently.

So I woke up this morning and thought, y'know, what the world really needs is another Version Control System.

:-)

Still just an idea, I present to you: Quadrature Integration Control System (QICS) - the anti-Git.

The basic ideas:

(1) Ruthless minimization of required user interaction.

This is both to reduce workload, and to allow non-technical users to work alongside developers. (This is a big deal for me:- I need to get work from my collaborators across the business into version control, so my automated scripts can pick up their work & data for integration into production systems)

You should only need to interact with the system to resolve a conflict), otherwise the system can operate quite happily in the background, listening for changes to a directory tree, making commits to the ICS as and when files change. (Unison, Wave)

The minimization of interaction should extend to installation and configuration. A cross platform stack with no runtime is desirable (Golang seems ideal for this), as well as a peer-to-peer network architecture, with optional central server.

Sophisticated users should have as much control as they want, up to and including provision of APIs for all aspects of behavior. This implies a post-hoc approach to logging, where the user (can) explain what the changes were after they happened, and decorate the log with those explanations.

(2) The target users are a very large, globally distributed, multi-disciplinary development team in a commercial organization. Think Google-scale development, but with a mixture of programmers and (intelligent non-programmer) domain experts. We will assume intermittent or poor network access.

(3) A convention-over-configuration, "opinionated software" approach, with a set of prescriptive and detailed conventions for repository organization, workflow, and branch management. By creating conventions, we can aid and support development and workflow automation efforts to further reduce burden and increase efficiency.

The focus will be on supporting continuous integration and resolving conflicts, rather than supporting branches and the management of concurrent, independent versions. The assumed model is one in which, by default, all work is integrated together in trunk as quickly as network connectivity permits.

The QICS way of thinking about the problem:

A stream of lots of little deltas flowing around a distributed multicast/pub-sub network - similar problem to distributed databases (CAP theorem applies) - take similar approach to some NoSQL DBs to solve the problem - offer only eventual (or asymptotic) consistency. In fancy language: "Present the user a locally coherent view of an asymptotically coherent whole."

Concepts:

Upstream = deltas coming from other collaborators to the local/client working copy.
Downstream = deltas going from the local/client working copy to other collaborators.
Local = the client machine.
Remote = the rest of the network of collaborators.
Local Working Copy = The set of files and directories that the user interacts with.
Delta buffer = storage area containing a set of deltas (or differences), equivalent to the notion of a repository.

To support operation in situations when network connection fails, we will need local delta-buffers, perhaps one local-downstream store, and one local-upstream store, as well as a mechanism to shuttle information from the local downstream store out to the remote network as well as in from from the remote network to the local upstream store.

The local client/user probably does not want the whole repository on his local machine, so he will probably need some mechanism to specify a region-of-interest in the repository (a set of directories) so he can pull in only pertinent changes. (Hence a Pub/Sub-like communications mechanism)

Locally, we will need a mechanism to monitor file changes, calculate the deltas and store them in the downstream store, as well as a mechanism to take deltas from the upstream store, resolve conflicts and apply them to the local working copy.

It would be good if we could detect moved/renamed files without explicit user interaction, so per-file checksums are probably a good idea.

Heuristic mechanisms are good enough.

Wednesday 5 September 2012

Test vs Design

I am not totally sold on the argument that tests improve design.

Since you need to mock-out dependencies when in test, a function that is going to be placed under test cannot be allowed to instantiate it's own dependencies.

They must be externally instantiated and passed in so that, in the tests, they can be replaced with mocks.

This means that the function-under-test is now required to expose some of its implementation detail in the interface, making the interface more complex than it really needs to be.

(IMHO the interface is the thing that, above anything else, we really need to keep simple).

Sometimes, when using Python or another language that supports default parameter values, the dependencies can be passed in as optional input parameters that are used only by the tests, but this really degrades the value of the interface declaration as a piece of useful documentation.

One approach that I have found useful upon occasion is to split the interface into a set of internal functions specifically intended for testing purposes, which are wrapped (thinly) by a set of external functions for the client application to use. That way, the amount of logic not being tested is minimized, but the externally-facing interfaces are kept minimal and clean.

This is still more complex than I would like, but hey, technology is still in it's infancy.

In time, I would like to be able to give tests some access privileges above and beyond those which ordinary client logic enjoys; For example, the ability to reach in an pull a labelled block out of a function, and manipulate it using some form of reflection. I freely admit that such dynamism is not normally a good idea, but it is perhaps excusable if its use is restricted to a strict, single purpose mechanism specifically engineered to support testing.

Tuesday 4 September 2012

Functional makes for an easier Agile

An "agile" approach is concerned with how we approach the management of resources in environments that contain risk factors that cannot be a-priori controlled. Specifically, it is about increasing the tolerance of systems and processes to unanticipated change.

"Functional" development is concerned with how we reason about problems, suggesting a perspective that values a decomposition of the problem into stateless declarative truths over a decomposition of the problem into stateful (albeit small and easy to understand) components (each a small simulacrum of some corner of the problem domain).

Here I am assuming that there is a close (identity?) relationship between a complete analysis of the problem and the solution to that problem - although this may simply be some little remnant of Prolog haunting what is left of my faculties.

It should be possible to behave in an agile manner with both functional and (well designed) OO models, but in my mind, the statelessness of pure functions makes it very much easier to reason about them, and the implications of using them, in unanticipated and rapidly changing situations.

In other words, functional programming is a human --> human communications mechanism, where the extra rigor that the statelessness requirement imposes reduces the number of possibilities that the reader must consider when attempting to modify a model to meet a changed (or novel) requirement.

In summary, "Agile" and "functional" programming are different beasts, but the "functional" approach is very well suited to agile development by virtue of it's enforced "discipline" of statelessness and corresponding reduction in degrees-of-freedom, resulting in reduced mental burden on the maintainer.

We can deduce something from this.

It appears that the more "disciplined" the modeling and programming paradigm, the easier it is to modify and manipulate those models, and to operate in an agile manner, so whilst disciplined programming techniques are not necessarily required for an agile approach to development, the ease with which the agile approach may be taken up benefits greatly from the adoption of highly disciplined programming techniques such as TDD and Functional programming.

If we want to push forwards the state-of-the-art, we should be looking for more ways that we can automatically enforce restrictions upon software models, as well as improve our ability to reason about them.

Hence my interest in TDD, formal methods, assertions and all forms of development automation as a mechanism to make agility easier to achieve.

Optimism and Economic Expansion

There is no upper bound on human ingenuity. There is no fixed limit on talent. Not everybody can rise to the occasion, but only a tiny fraction of those who can are ever put to the test.