Idle Conjectures in Search of Refutation: January 2013

Tuesday 15 January 2013

Personal Statement

My passion for machine vision, machine learning and statistical pattern recognition is longstanding, having started over 10 years ago and continuing today.

Most of my undergraduate AI degree was oriented towards logic, theorem proving, and computational linguistics, which was fascinating in it's own right, but did not strike me as a particularly realistic or pragmatic way of dealing with the messiness and complexity of the real world. As a result, I latched on to the (at the time) less mainstream "soft" computing approaches with enthusiasm, devouring the content of the machine learning, machine vision and neural networks modules avidly. I saw these approaches as a pragmatic alternative to the hard and inflexible grammar-based approaches to Natural Language Processing espoused by the main body of the department.

This view of machine learning as a pragmatic tool, at odds with ivory-tower academicism has stuck with me ever since, even as the subject has become more mainstream (and more academic and mathematically sophisticated). As a result, I tend to focus on simple techniques that work, rather than techniques which demonstrate mathematical chops and academic sophistication. I am fortunate in this regard, because, paradoxically, the solutions to difficult problems are often conceptually simpler and mathematically less sophisticated than the "optimum" solutions to simple problems. Perhaps a little bit of the Yorkshire/Lancashire culture of engineering pragmatism rubbed off on me during my time in Manchester.

Another thing that was dawning on me as I finished my undergraduate degree was the importance of scale. As I attempted to find datasets for my hobby projects, (Far harder back then than today), I began to develop suspicions that scale, rather than any qualitative leap in understanding, was going to be a key factor in the development of genuinely interesting artificial intelligence techniques. From this came my interest in machine vision, which I saw as a key "gateway" technique for the collection of data -- to help the machine build an understanding of the world around it and to "bootstrap" itself to a more interesting level.

I was lucky with my first employer, Cambridge Research Systems, where I had the opportunity to work with some very talented people, both within the company and across our customer community. From that experience, and the abortive neuroscience PhD that I started, I learned a lot about the neuroscience of biological visual systems, particularly the older, lower-level pathways that go, not to the primary visual cortex, but to the evolutionarily older "reptilian" parts of the brainstem. In contrast with the "general purpose" and "reconfigurable" nature of the cortex, these older pathways consist of a large number of (less flexible) special-purpose circuits handling things like eye movements and attention-directing mechanisms. Crucially, these lower-level circuits enable our visual system to stabilise, normalise and "clean" the data that we present to our higher-level cortical mechanisms. This insight crosses across well to more commercial work, where the importance of solid groundwork (data quality, normalization and sampling) can make or break a machine learning implementation. I was also fortunate enough to pick up some signal processing and FIR filter design fundamentals - as I was writing software to process biological time-series signals (EOG) to identify and isolate events like saccades and blinks.

At around about this time, I was starting to become aware of the second important thread in my intellectual development: The incredible slowness of software development, and the difficulty and cost that we would incur trying to implement the large number of these lower level stabilization mechanisms that would be required.

I left Cambridge Research Systems specifically to broaden my real-world, commercial software development experience, working at a larger scale than was possible at CRS. Again, I was lucky to find a role with Sophos, where I learned a great deal from a large group of very talented C++ developers doing Test Driven Development in the highest-functioning Agile team I have yet encountered. Here, I started to think seriously about the role of communication and human factors in software development, as well as the role that tools play in guiding development culture, always with an eye to how we might go about developing those special purpose data processing functions.

Following a relocation closer to London (for family reasons), I left Sophos and started working for Thales Optronics. Again fortunate, I found myself working on (very) large scale machine vision applications. Here, during a joyous three year period, I was able to put much of my previous intellectual development and thinking into practice, developing not only the in-flight signal processing, tracking and classification algorithms, but more significantly, the petabyte-scale data handling systems needed to train, test and gain confidence in them. In addition to the technical work, I worked to encourage a development culture conducive to the development of complex systems. This was the most significant, successful and rewarding role I have had to date.

Unfortunately, budgetary constraints led Thales to close their office in Staines, and rather than transferring to the new office, I chose to "jump ship" and join Fidelity Asset Managers in the City of London, partly in an attempt to defeat some budgetary constraints of my own, and partly out of an awareness of the potential non-transferability of defense industry expertise, made more pressing by an impending overseas relocation.

At Fidelity, I used my knowledge of the MATLAB distributed computing toolbox to act as the High-Performance Computing expert in the "quant" team. I gained exposure to a very different development culture, and learned a lot about asset management and quantitative investing, gaining some insight into the accounting and management factors that drive development culture in other organizations. I particularly valued my exposure to the insights that Fidelity had, as an institutional investor, into what makes a successful organization, as well as it's attempts to apply those insights to itself.

Finally, in 2011, my family's long expected overseas posting came. Yet again we were incredibly lucky, and got to spend a wonderful year-and-a-half living in the middle of New York city. I was fortunate, and managed to get a job at an incredible Silicon Alley startup, EveryScreen Media, which was riding the wave of interest in mobile advertising that was just beginning to ramp up in 2011 and 2012. Again, finding myself working with incredibly talented and passionate colleagues, I was given the opportunity to broaden my skills once again, picking up Python and Unix development skills, becoming immersed in (back-end) web development, building out the data science infrastructure in an early-stage startup. From this year and a half or so, I particularly value what I learned about how to develop large scale, (scaleable) distributed real-time data processing systems systems and the effective use use of modern internet "web" technology.

Now, back in the UK, I am in search of the next step on my journey of learning and discovery. My focus is, and remains, on the pragmatics of developing complex statistical data processing systems, on how to create and curate large data-sets, how to integrate them into the development process, so that continuous integration and continuous testing, visualisation and monitoring help the development team to understand and communicate the system that they are building, as well as the data that feeds it; to respond to unexpected behaviors, and to steer the product and the project to success and, moreover, to help ensure that the organization remains rightly confident in the team and in the system.

Thursday 10 January 2013

A Network Model for Interpersonal Communication

Modeling interpersonal communication within an organization as a network of reconfigurable topology composed of high capacity data stores connected by limited bandwidth communication channels.

The Model:

The amount that we know on any given topic of interest vastly outweighs our practical ability to communicate that information in a reasonable time-frame. We simply do not have the time or the available bandwidth to communicate everything that we need to in the detail that the subject deserves. Our model reflects this - The data storage capacity at each node is immense, and contrasts sharply with the exceedingly limited bandwidth available for communication between nodes. The difference between the two is many orders of magnitude in size. For a visual analogy, we should not look to buckets connected by hosepipes, but rather half-million ton supertankers connected by thin cocktail straws. Transmitting even a gallon of knowledge is a challenge.

Chinese Whispers:

When communicating with distant nodes with messages routed through intermediary nodes, the information being transmitted is compressed to an incredibly high degree with a very lossy and low-quality compression algorithm. The poor quality of the communications channel is particularly evident when the network encompasses a diverse range of backgrounds, cultures and terminological-linguistic subtypes. In many such cases the intent of the message can easily be inverted as relevant details are either dropped or misinterpreted in transmission.

Systematic Factors impacting efficacy of communication:

The options available for compression are greater when two neighboring nodes already have a great deal in common, where shared datasets, terminology, mental models, and approaches to communication can be used to elide parts of the message, reducing bandwidth requirements, and allowing for communication that is both more reliable and more rapid. As a result of this, communication within an organization that has a strong, unified "culture" (common knowledge, terminology and practices) will be far more effective than communication within an organization that has a less cohesive "culture", purely because the options for message compression are greater, irrespective of any other measures that the organization might put in place to improve the available bandwidth. It is worth noting that, whilst this does improve the situation considerably, the problem itself is fundamental and always presents a significant challenge.

Organizational Optimization for Effective Communication:

Given that there is nothing inherently fixed about the topology of the communications network within which we are embedded, one simple response to this problem is to remove intermediary steps from source to destination nodes, and to allow the source node to connect directly with the destination node, permitting a direct and relatively high bandwidth exchange. This is even more effective if the exchange is bidirectional, permitting in-situ error correction. This argues for a collaborative approach to organization - with technical experts communicating with one another directly, with no intermediary communications or management specialists.

Another approach is to optimize for effective compression of the messages being transmitted through the network. As noted above, this relies on a common terminology, a common knowledge base, and a common set of practices and approaches to problems. In other words, most of the communication is moved out-of-band -- communicated though various channels prior to the point where it is actually needed. Again, this is well aligned with the collaborative approach to organization - where the technical experts within an organization continually educate one another on their own area of expertise so that when they need to communicate quickly, they can do so both effectively and reliably.

A Common Antipattern: Spin Doctors and Office Politicians:

Of course, the approaches outlined above are not the only solution to this fundamental problem. However, I argue that at least some of these approaches are anti-patterns, detrimental to the long term health and development of the organization.

Many individuals craft the messages they send extremely carefully to minimize the probability of corruption or misinterpretation en route, partly by reducing the information content of the message, and partly by crafting the emotional tone to remove ambiguity. This process is colloquially known as "spin", or "crafting the message".

In some situations this approach may even be appropriate, for example: where the network is large and cannot be reconfigured so that messages must be transmitted over more than one "hop"; where the environment for communication is particularly disadvantageous, with little common culture, terminology, or background technical knowledge; and finally, where long-term systematic improvements must be subordinated to short-term goals.

However, there are a couple of significant drawbacks to this approach. Firstly, by restricting knowledge transfer, opportunities to grow a base of common knowledge and understanding are squandered. Secondly, this approach is in direct conflict with cultural norms that emphasize honesty and transparency in interpersonal communication, the adherence to which builds a basis of trust and mutual understanding that further enhances communication.

Friday 4 January 2013

Dialectic vs. Tribalism .... Fight!

One of the problems that we have is that there is a very human instinct to stereotype and denigrate any (real or imagined) opposition community. It is part of our tribalistic nature, and it infects every sphere of human thought, even when we are not aware of it. (Especially when we are not aware of it.)

The truth of the matter is that other people are not simple and they are not stupid. They may be preoccupied with other battles, but they are generally smarter and more complex than we are willing to give credit for.

Fortunately, we have centuries of philosophical learning to help us tackle this behavioral bias. Unfortunately we do not have the time to absorb all of this scholarship so we will just jump on the whole "Hegelian" [sic] thesis-antithesis-synthesis meme as a quick-and-dirty fix for our ignorance.

We have to discard our tribalistic myopia and become aware of the other battles that people are fighting - the preoccupations that they are focused on, and within which they frame their arguments and their notions of right and wrong. Any synthesis solution will incorporate elements of these other concerns:- "In such-and-such a situation, you need to worry about risk A, and take action B, in this other situation, you need to worry about risk C and take action D."

In other words, a sign of maturity in the debate over a discipline is the presence of increasingly fine-grained recipe books, within which increasingly tightly defined specialist sub-disciplines emerge with their own concerns and heuristics. The division of labour becomes ever more fine-grained as the economy around a discipline matures and grows.

A (few) words to the wise

Here are the conclusions (categorized, mildly edited & slightly extended) from Boehm's retrospective of the past 50 years of software development.

Some of these are contradictory. Such is the way of wisdom.

Skepticism and Critical Thinking:

Be self-aware. Know your own strengths and weaknesses, and how to manage them.
Don’t believe everything you read. Be wary of the downslope of the Gartner "hype-cycle" roller-coaster.
Be skeptical about silver bullets, and one-size-fits-all solutions.
Avoid falling in love with your slogans. YAGNI (you aren’t going to need it) is not always true.

Total Commitment to Quality:

Avoid cowboy programming. The last-minute all-nighter frequently doesn’t work, and the patches get ugly fast.
Eliminate errors early. Even better, prevent them in the future via root cause analysis.
Be quick, but don’t hurry. Overambitious early milestones usually result in incomplete and incompatible specifications and lots of rework.

Flexibility and Craftsmanship:

Avoid using a rigorous sequential process. The world is getting too tangeable and unpredictable for this, and it’s usually slower.
Avoid Top-down development and reductionism. COTS, reuse, IKIWISI, rapid changes and emergent requirements make this increasingly unrealistic for most applications.
Look before you leap. Premature commitments can be disastrous (Marry in haste; repent at leisure – when any leisure is available).
Keep your reach within your grasp. Some systems of systems may just be too big and complex.

Clarity and Communication:

Determine the system’s purpose. Without a clear shared vision, you’re likely to get chaos and disappointment. Goal-question-metric is another version of this.
Make software useful to people. This is the other part of the definition of “engineering.”
Consider and satisfice all of the stakeholders’ value propositions. If success-critical stakeholders are neglected or exploited, they will generally counterattack or refuse to participate, making everyone a loser.
Have an exit strategy. Manage expectations, so that if things go wrong, there’s an acceptable fallback.

The Skill and The Discipline:

Respect software’s differences. You can’t speed up its development indefinitely. Since it’s invisible, you need to find good ways to make it visible and meaningful to different stakeholders.
What’s good for products is good for process, including architecture, reusability, composability, and adaptability.
If change is rapid, adaptability trumps repeatability.
Don’t neglect the sciences. This is the first part of the definition of “engineering”. It should not include just mathematics and computer science, but also behavioral sciences, economics, and management science. It should also include using the scientific method to learn through experience.

Performance:

Time is money. People generally invest in software to get a positive return. The sooner the software is fielded, the sooner the returns come – if it has satisfactory quality.
These are many roads to increased productivity, including staffing, training, tools, reuse, process improvement, prototyping, and others.
Think outside the box. Repetitive engineering would never have created the Arpanet or Engelbart’s mouse-and-windows GUI. Have some fun prototyping; it’s generally low-risk and frequently high reward.

Wednesday 2 January 2013

Maslow's Hierarchy of Software Requirements

Agile development practices mandate an ongoing dialog with the customer, in which requirements are raised and met in a sequential manner.

This naturally causes a prioritization, as some requirements only become important once others have been fulfilled. This is closely analogous to Maslow's hierarchy of needs (following the point raised by Boehm in his paper: A View of 20th and 21st Century Software Engineering.

For example, once basic functionality has been implemented, it's basic reliability must be assured, and trust must be built in it's operation. After that requirement has been met, the security of the software must be ensured. Only after these tasks have been done does one (generally) prioritize the offering of metrics to "add value" with insight into the business.

Insight

Security

Trust & Reliability

Essential (Base) Functionality

Tuesday 1 January 2013

Personal Development Process - End of Day

For the past couple of months I have been working from my home office. It has been a great experience - particularly in comparison with a workday that involves 2-3 hours of commuting, and is something that I would strongly recommend to anybody who is lucky enough to get the opportunity.

On the other hand, working from home definitely presents it's own challenges. Communication with coworkers (now clients) is much, much harder, and requires deliberate effort to get right. Discipline and record-keeping is also much more important. It is something that I was never particularly good at before, (I am a get-lost-in-the-work kind of guy) but have been really making a conscious effort to improve since the transition.

One thing that I have been trying to establish (without too much success so far, admittedly), is to incorporate a review & planning session at the end of each evening. To encourage me to stick at it, I thought that I would try to increase the value of the session by incorporating some post-mortem techniques along the lines of those suggested a couple of months ago by a (very astute and experienced) colleague:

End of day review activities:

Identify one thing that went wrong, and one thing that went right that day.
For both, do a "Five Whys" root-cause analysis.
This should give 10 "causes" at various levels, from proximal to distal.
Pick two "causes" to address with tasks to be performed on the next day, one task to improve and consolidate something that went right, and one task to address something that went wrong.
Over the week, try to spread the tasks over the entire proximal-to-distal spectrum, so that work is balanced between immediate (proximal) concerns and long-term (distal) improvements.
In this way, a better balance is maintained between the urgent and the important.

Antifragility

I have recently started reading the book: "Antifragile", by Nassim Taleb, author of "The Black Swan", and although I am only 70 or so pages into the book, I do not hesitate to thoroughly recommend it, although you may find (as I have) the personal, brash and argumentative style somewhat jarring.

As somebody who was weaned on Gleick, and who's bookshelf is packed with pop-sci books with "Complexity" or "Chaos" somewhere in the title, Taleb's central thesis is falling on fertile, and very well prepared ground. It is nice to have a book that brings some new ammunition, and new nuance to old arguments.

For example, in the predictive vs reactive control spectrum, it is clear that Taleb would argue vociferously for the merits of reactive control. His position seems to be based on two observations: The first is a behavioral bias: our propensity to smooth over and absorb past surprises, to rationalise and to confabulate, to fool ourselves into believing that past events were predictable when they were anything but, and to continue reinforcing the (erroneous) belief that we can predict and control the future. The second is the notion that unexpected events are more common that we expect, and that using past behavior to model future events will tend to underestimate the frequency and impact of those events. (A topic I have covered before).

"It is far easier to figure out if something is fragile than to predict the occurrence of an event that may harm it" ... "Fragility is quite measurable, risk, not so at all, particularly risk associated with rare events".

Predictive vs Reactive management

Nothing in this article is really new, but I needed a document to which I could point people whenever I make use of Predictive-vs-Reactive terminology.

The Agile-vs-Waterfall debate is old, and, arguably, has been won. (Depending on who you ask);
However, I like to frame this dichotomy in other terms, which, I believe, offer both a superior perspective and (Hegel-like) the opportunity for synthesis.

Predictive (Waterfall) vs Reactive (Agile)

Traditional management techniques put the emphasis on predictive management, so that power may be consolidated in the hands of the decision maker. Planning and specification activities are important, leading naturally to a waterfall-style, gated development process. This is not so much a development methodology as a manifestation of the exercise of dominant political power within an organization.

The variance-minimizing aspects of predictive control trace their roots back to Deming's teachings in factory management, and are expressed through the TPS, six-sigma, lean sigma and so on -- Although in some situations, it can be argued that this philosophy is abused (More on this later).

Agile management techniques on the other hand put the emphasis on reactive management and feedback loops; the devolvement of power and responsibility to the individual developer, and the consequent restructuring of information flows to enable the organization to react to new information as it is discovered, and new events as they happen.

With predictive management everything is focussed around a small number of decision points and authority figures. A manager with authority for the project will either stop the project, or give the "green light" for work to continue at one of a handful of project gates. Predictive management requires extensive plans, forecasts and specifications to inform high-impact, long-lasting decisions. Predictive control is sensitive to changing environments, unexpected events, poor planning capacity, behavioral biases, and incorrect assumptions. Although appropriate in some situations, it is fragile and error-prone, and where it does occur, the guiding rationale is normally the consolidation of political or financial power.

With reactive management, the number and frequency of decision points is increased, so that work is planned out over short time scales only (anywhere from a few hours to a few weeks). Each decision is low-impact and carries only for a short time. Sensitivity to changing environments and unexpected events is reduced, as is sensitivity to poor planning, behavioral biases and incorrect assumptions (thanks to empirical feedback). Additionally, since decisions are greater in number and lower in impact, it becomes advantageous to devolve decision making authority to avoid the creation of decision-making information-flow bottlenecks.

One of the primary advantages of the reactive control approach is the opportunity that it offers for the incorporation of timely and relevant empirical information in the decision making process; the ability to seek feedback, to make mistakes and to recover (and learn) from them. Indeed, a properly functioning reactive control system does not seek to avoid mistakes, but rather to make them quickly, and learn from them, ("Move quickly and break things") although we often call the mistake-making process "experimentation" to disguise it's nature from those who, for political reasons, demand preternatural levels of perfection and clairvoyance from those around them.

The key property to look for, of course, is the flow of lesson-bearing information through the decision-making cycle. Error-feedback requires errors. An Interesting comparison is to be had between this and the CFAR (Constant False Alarm Rate) approach to adaptive signal processing - if you do not get any errors or make any mistakes, you are not trying hard enough!

Returning, for a moment, to Deming, variance-minimization, lean and six-sigma. Deming's argument is essentially the same as mine: To manage effectively, you need to incorporate empirical feedback, and empower individuals to act together in common cause. However, manufacturing is a highly controlled environment, where variance can be modelled as Gaussian, (six sigma) and unexpected "Black Swan" "Unknown Unknowns" can be omitted from the process control model. Software development (and other business activities) on the other hand, operate in a very different environment, where the Gaussian is a misleading and dangerous noise model. We do still need empirical and quantitative feedback, but we are no longer measuring variance in a simple, low-dimensional space, so what we can do with it is quite different. However, the concept of the feedback loop remains valid, and the organizational psychology is the same: Empirical feedback frees people from organizational politics, "gamifies" the work experience, and empowers individuals to work together for a common cause.