Data Science & Machine Learning

From Fighter Planes to Omics: Why Iterate?

Raghav Sehgal
November 28, 2017

You may have probably read Boyd’s law of iteration[1].  If you haven’t, already, have a look at it.

In Roger’s words[2]:

“In analyzing complexity, fast iteration almost always produces better results than in-depth analysis.”

This is especially true of exploratory work, where the goals are not well defined, and can possibly change with time.

Boyd’s law was born out of a fascinating story of two fighter planes. Aircraft designers would all agree that MIG-15 was a better plane than F-86, whereas pilots would prefer the latter.

“ In one-on-one dogfights with MiG-15s, the F-86 won nine times out of ten.”

The question of why has been analysed by many people. I’d like to present a view that I haven’t come across before, atleast not explicitly.  The reason for this peculiar anamoly turns out to be a mismatch between the definitions of “better”. For a pilot, the only criteria that matters is the likelihood of winning a one-on-one dogfight. It’s difficult to argue for a better criterion. The designers, on the other hand, assumed that having an engine with better maneuverability will provide an advantage. The theory is sound. It failed the reality check.

Faster iterations are good because they provide a shorter feedback loop.

I particularly like this part:

“Without hydraulics, it took slightly more physical energy to move the MiG-15  flight stick than it did the F-86 flight stick. Even though the MiG-15 would turn faster (or climb higher) once the stick was moved, the amount of energy it took to move the stick was slightly greater for the MiG-15 pilot. With each iteration, the MiG-15 pilot grew a little more fatigued than the F-86 pilot. And as he got more fatigued, it took just a bit little longer to complete his OOPA (Observe, Orient, Plan and Act) loop. The MiG-15 pilot didn’t lose because he got outfought. He lost because he got out-OOPAed.”

When people think of data science, they think of complex algorithms and fancy words. (cough. machine learning. cough.) Most data science is rather boring. The fancy words are not silver bullets. Not many think of taking things that already exist, combining them in new ways. It is unfortunate.

Let us consider the work of a typical researcher or analyst working on metabolomic and genomic data. Most of the time is spent juggling with various tools to generate visualisations from the collected data, and to find patterns that might hold clues.  Surprisingly, the actual amount of time spent processing the data is negligible compared to the time it takes to analyse the data, to find something interesting enough to pursue.

What is interesting is often hard to define concretely enough that we can get machines to do the searching for us. We do not have enough data tagged with detailed information that we can have machines learn and extract patterns from them. The value proposition of the analyst or researcher is their intuition they developed over the years. It fails them when looking at a vast amount of data. Our brains are not good at making sense of numbers. It’s a different story with visuals, though. Looking at certain visual representations of data makes identifying patterns easier.

A personal anecdote: I have seen someone point out a single missing point from a plot of hundreds, after looking at it for just a few seconds. They have been working on the dataset for a while. I’m willing to bet that would have been impossible if we were looking at a bunch of numbers, raw data. I’m also willing to bet that anyone who isn’t in the business of looking at such data can not pull that off. Researchers and analysts are valuable precisely for this reason. They develop a habit of identifying patterns, have an understanding of the potential reasons for the anomalies, and can quickly filter out those that are worth investigating from the deluge of information.

We analyse data, mostly metabolomic. We often look at different subsets of data to visually identify anomalies, and confirm or refute hypothesis. Every context-switch between looking at the data and generating new visuals to look at is an unwanted distraction. Context switches carry a penalty of much more than the time required to do the actual switching. Just like the fighter pilots getting a little fatigued each time they moved the stick, an analyst gets a little exhausted with each iteration when they have to switch between various tools. The cost adds up, especially when doing hundreds of iterations rapidly.

Much of this can be eliminated. Let the minds that can make sense of data not spend their precious time and effort fighting with the tools. We envision applications tailor made to take out the pain from a typical analysis, focused on enabling faster iterations. Because speed of iteration matters a lot.

Faster iterations do not simply reduce the time required to do something, they can enable possibilities that are impossible or inconceivable until then.

Imagine being able to take a new dataset, and in a matter of minutes, do sanity checking and preliminary analysis that usually takes a few weeks. Imagine being able to visualise various metabolic pathways, alongside raw and processed data, without having to switch, because everything that is needed is present at your fingertips.  Imagine being able to present a story with data, rather than a mere collection of visualisations.

We are not there yet, but are making steady progress. Throughout this process, I remind myself of the above story. Because often, it turns out that what I believe is better, isn’t. Seemingly trivial enhancements add a lot of value as opposed to carefully designed features. We iterate. Not because we need to but rather to prevent from being out-OOPAed.

[1]: https://blog.codinghorror.com/boyds-law-of-iteration/

[2]: https://msdn.microsoft.com/en-us/library/aa479371.aspx

Data analysis? Try Polly, book a professional session today!

Blog Categories

Blog Categories

Request Demo