Monday, 21 July 2014

Brooks & Conway in a a discussion around CI & development automation.

Discussion arising from this HBR blog post discussing CI:
William Payne:

"One of the reasons that the "Agile" movement has lost credibility in recent years is because many of the consultants selling "Scrum" and similar processes failed to emphasize the fact that a significant investment in automated "testing" and continuous integration is prerequisite for the success of these approaches.

A big barrier to adoption seems to be related to the use of the word "testing". In a modern and effective development process such as those mentioned in the article, the focus of the "test" function isn't really (only) about quality any more ... it is about maximizing the pace of development within an environment that does not tolerate preventable faults.

As a result of this, within my sphere of influence, I have tried to promote the notion of "Development Automation" as an umbrella term that captures the automation of the software build process, module & integration testing, deployment and configuration control, documentation generation and management information reporting ... a term that may help to speed adoption of the techniques mentioned in the article above.

In many ways "Development Automation" is to product development what "DevOps" is to SAAS systems development: Promoting the use of integrated cross-functional systems of automation for the testing and deployment of software and other complex systems.

Indeed, as products become more complex, the importance of automation becomes greater and more critical, and the requirement for a carefully considered, well planned, and aggressively automated integration, test and configuration management strategy becomes a prerequisite for success.

Nowhere is this more apparent than in my field of expertise: the production of machine vision and other sensor systems for deployment into uncontrolled and outdoor environments, systems where specification and test pose a set of unique challenges with considerable knock-on impacts on the system design and choice of integration strategy."

Bradford Power:

"Do you break down the product into small modules and have small teams that are responsible for design, deployment, AND testing? Do you use simulations to shrink the cycles on new product tests?"

William Payne:

"It depends.

Taken together, Rodney Brooks & Melvin Conway have the answer.

Firstly, Brooks' "No Silver bullet" tells us that we cannot drive down development costs forever. Complexity costs money.

Since we can't meaningfully reduce the cost of complexity, we either have to maximize the top line, or amortize that cost across multiple products. This is product-line engineering taken to the extreme.

Secondly, Conway's law tells us that our team structure will become our de-facto system architecture. Complex systems development is primarily a learning activity and the fundamental unit of reuse is the individual engineer.

Team structure therefore has to be organized around the notion that team expertise will be reused across products within one or more product lines, and the more reuse we have, the more we amortize the cost of development and the more profitable we become.

Whether this means small teams or large teams really depends on the industry and the nature of the product. Similarly, the notion of what constitutes a "module" varies widely, sometimes even within the same organization.

However, in order to facilitate this, you need a reasonably disciplined approach, together with a shared commitment to stick to the discipline.

Finally, and most importantly, none of this works unless you can 100% rely upon your automated tests to tell you if you have broken your product or not. This is absolutely critical and is the keystone without which the whole edifice crumbles.

You can't modify a single component that goes into a dozen different products unless you are totally confident in your testing infrastructure, and in the ability of your tests to catch failures.

I have spoken to Google test engineers, and they have that confidence. I have got close in the past, and it is a transformative experience, giving you (as an individual developer) the confidence to proceed with a velocity and a pace that is otherwise impossible to achieve.

Separate test teams have a role to play, particularly when safety standards such as ASIL and/or SIL mandate their use. Equally, simulations have a role to play, although this depends a lot on the nature of the product and the time and engineering cost required to implement the simulation.

The key point is that there is no silver bullet that will make product development cheaper on a per-unit-of-complexity basis ... only a pragmatic, rigorous, courageous and detail-oriented approach to business organization that acknowledges that cost and is willing to pay for it."

Andy Singleton:

"Yes, I think that Conway's law is very relevant here. We are trying to build a system as multiple independent services, and we use separate service teams to build, release and maintain them.

Yes, complexity will always cost money and time. However, I think that Brooks' "Mythical Man Month" observations are obsolete. He had a 40 year run of amazing insights about managing big projects. During this time, it was generally true that large projects were inefficient or even prone to "failure", and no silver bullet was found. Things have changed in the last few years. Companies like Amazon and Google have blasted through the size barrier.

They did it with a couple of tactics:
1) Using independent service teams. These teams communicate peer-to-peer to get what htey need and resolve dependencies.
2) Using a continuous integration machine that finds problems in the dependencies of one team on another through automated testing, and notifies both teams. This is BRILLIANT, because it replaces the most difficult part of human project management with a machine.

The underlying theory behind this goes directly against Brook's theory. He theorized that the problem is communications - with an increase in comunication channels of N partricipants to N^2 channels, which causes work and confusion. If you believe this, you organize hierarchically to contain the communicaitons. ACTUALLY, the most scalable projects (such as LInux) have the most open communications.

I think that the real problem with big projects is dependencies. If you have one person, he is never waiting for himself. If you have 100 people, it's pretty common for 50 people to be waiting for something. The solution to this is actually more open communication that allows those 50 people to fix problems themselves.

I have written several blog articles challenging the analysis in the Mythical Man Month, if you are interested."

William Payne:

"What you say is very very interesting indeed.

I agree particularly strongly with what you say about using your CI & build tools to police dependencies. This is key. However, I am a little less convinced that "peer-to-peer" communication quite represents the breakthrough that you suggest. Peer-to-peer communication is unquestionably more efficient than hierarchical communication, with its' inbuilt game of chinese-whispers and proliferation of choke-points. However, simply communicating in a peer-to-peer manner by itself does not sidestep the fundamental (physical) problem. You still need to communicate, and that still costs time and attention.

IMHO organizing and automating away the need for communication is absolutely the best (only?) way to improve productivity when working on complex systems. This is achieved either by shifting communication from in-band to out-of-band through appropriate organizational structures, or setting (automatically enforced) policythat removes the need for communication (aka standardisation).

These are things that I have tried very hard to build into the automated build/test systems that I am responsible for, but it is still a very difficult "sell" to make to professionals without the requisite software engineering background."