And Make Black Belts Obsolete?

I read an article in Wired magazine the other day that got me to thinking about the relationship between statistics, engineering knowledge, and theory. The article claimed that with the era of massive data storage and analytical capabilities, the scientific concept of the “theory” was becoming obsolete. What implications does this have for people working with data to solve problems? Can we find solutions without a theory? Will Google obsolete the Black Belt?

As I have discussed before, define, measure, analyze, improve, and control (DMAIC), like other problem-solving methods is a modification of the scientific method. Here is a comparison of various problem-solving strategies as compared to a simplified version of the scientific method.

Table 1 - A comparison of the scientific method and various problem-solving strategies in industry


Scientific Method (from Wikipedia)

Ford 8D (from Wikipedia)

Shewhart Cycle (Deming Wheel)

Problem-Solving Strategy

Six Sigma

Define the question

  • Assemble a cross-functional team of experts
  • Define the problem fully

Plan

  • Reason for improvement
  • Define

Gather information and resources (observe)

  • Current situation
  • Measure

Form hypothesis

  • Implement and verify interim containment actions (ICAs) as needed. Also known as temporary fixes.
  • Analysis
  • Analyze

Perform experiment and collect data

  • Identify and verify root cause

Analyze data

Interpret data and draw conclusions that serve as a starting point for new hypothesis

  • Choose and verify permanent corrective actions (PCAs). preventive actions are also chosen.
  • Implement and validate PCAs Prevent recurrence of the problem/root cause

Do

  • Countermeasures
  • Improve

Publish results

  • Recognize the efforts of the team

Check

Act

  • Results

 

  • Standardization
  • Future Plans
  • Control

Retest (frequently done by other scientists)

  • [Ongoing monitoring]

The key aspects of all these methods is generating an idea about what is going on (based on data, experience, or knowledge), using data to test your idea, and then refining your idea (your hypothesis) through multiple iterations until it accords with your data. After a lot of validating data, your hypothesis might even reach the level of theory, which is a hypothesis supported by multiple streams of evidence.

The strength of the scientific method and its derivatives is that they are data based, iterative and therefore self-correcting, and provisionally explanatory. Once an explanation has been incorporated into our knowledge, it might lead to other breakthroughs. For example, while nothing physically prevented constructing a laser in Edwardian England, it couldn’t be built until people had explained and understood atomic structure.

I like to tell people that there are many ways to try to convince someone that they are wrong, but the only one that has a chance of working is the scientific method (and even that doesn’t work all the time). But we are all scientists, every day of our lives, as we observe the world around us and hypothesize about what is going on. Either we are right or we get smacked in the face by reality.

So the scientific method has worked really well for humanity over the past few centuries. We understand a lot more about how things work because of the system. But is it to be superseded by gross computational and analytical power? The author of the article in Wired uses the example of Google, which uses statistical tools to give us those remarkably useful listings when we type in our keywords, and contrasts Box’s “All models are wrong, but some are useful” with Google’s Peter Norvig’s “All models are wrong, and increasingly you can succeed without them.”

How would this modeless approach look in a business and how might it be relevant to Six Sigma?

In the modern era, most companies have a pretty good ability to generate a lot of data. The problem in my experience is that data isn’t knowledge, and while companies spend a lot of money warehousing their data, they don’t spend nearly as much extracting useful knowledge from that data. I can’t count the number of times that a company has had a big problem with its product, maybe even one with liability, that could have been prevented by looking at the data that the company already had in its database. Sure you get to look like a hero for finding and fixing the problem, but wouldn’t it have been much better (cheaper) to have prevented the problem in the first place by using the data that was already there?

So the potential is there, perhaps. We can imagine a Google-plus type system that might comb through the data warehouse and find correlations and build mathematical (not explanatory) models that could be helpful. Because this system is nonexperimental, the models it builds would be empirical and “dumb” in the sense that it wouldn’t understand what is going on, it would only seek to explain output numbers with input numbers.

Would such a system be a virtual Black Belt? Such a thing might be able to detect a number of true issues, but I don’t think it will replace Black Belts or the scientific method. For a variety of reasons, I think that the difference between empirical model predictions and the actual output (also known as a residual) will frequently be quite large. Mostly it boils down to this: to solve problems that have been around for a while, it’s frequently necessary to do something different from what we have been doing, and all that Google-plus can do is look at what we have done.

First, there’s the issue of the incoming data. Remember the principle of GIGO (garbage in garbage out). If a data stream from a measurement device from a gauge that isn’t in control or related to the output you are interested in, then no numerical model is going to work. This is why we perform measurement system analyses early in the measure phase. Also, the data that you currently collect could be missing one or more measurements that are necessary to understanding the process. These are Lloyd Nelson’s “unknown or unknowable” measures. Having at least a rough hypothesis frequently guides us to measure more than we did previously. In the absence of a guiding hypothesis that the problem is related to, say, something happening in the summer, we wouldn’t have any reason to try measuring humidity and thus finding that humidity in fact explains our problem. Finally, the solid process improvement work of achieving control and reducing variation solves problems you don’t even know you have, and no empirical model is going to do that.

When performing experiments in the analyze phase, there are times that we choose settings at the edge of or beyond the current operating envelope. Sometimes this is to maximize the size of the effect to more easily detect it over the random noise of the process, other times we have a hypothesis that drives us to consider that the limitations of the envelope itself may be causing the problem. Again, looking at historical or current production data would not find this knowledge. (This is also one of the known limitations of EVOP (evolutionary operation). However, a system like Google-plus might help us prioritize factors for inclusion in such experiments. The caution with this approach is if you have a crossover interaction, as below.

While this is detectable with a numerical technique (ANOVA), without telling the system to look at the data this way (which requires a hypothesis about the interaction) it will probably do something like an additive linear regression on these data and find that temperature and pressure have no effect, on average, and thus can be discounted. What is actually going on is that temperature and pressure have a huge, but interactive, effect. Now the graph above is actually a two-dimensional projection of the edges of the three-dimensional response surface, shown below (assuming a linear response between points).

At any setting of temperature, the average response is 5.5, so Google-plus might be fooled into thinking that these factors have no effect. (Run a linear regression on these data, which I used to make up the 3-D graph above if you don’t believe me.) It doesn’t know any better, whereas the Six Sigma team might have some engineering knowledge that these two factors may interact, and would easily be able to find this out by looking at the data.

So in the event of interactions our Google-plus will probably see little or no effect with a lot of residual error around its predictions. It might even falsely correlate the residual with some other random factor just due to chance for a while. And in fact, I have seen exactly this happen with a computer model for a furnace control system. The prediction was correct on average, but it had large variations for any particular furnace load.

There’s another factor making it harder to see what is happening. Reality is dynamic, where the empirical model that Google-plus is creating uses past and current data. Getting a new supplier is easy to account for if that’s part of your hypothesis, but an analytical program has to be told such things are happening and then figure out some way to either discount the change or decrease the weight of previous data, as they don’t represent the new process model. It probably could be done, but now we’re talking about a custom solution for each process with a lot of manual intervention, and we have lost the putative advantage of having a computer do our process analysis for us.

Such models would be very useful in the improve phase, where we’re trying out countermeasures for our problem. It would be nice to have such a thing watching all possible outputs for change associated with our pilot trials.

In the control phase, we try to maintain the new process, and Google-plus could be useful in that area. However, in the absence of a causal model, we may or may not be able to apply what we just learned to other, similar problems. More importantly, without an explanatory framework we may not be able to move to the next iteration in improving the process. By generating and validating a hypothesis, we can eventually move to understand the process better. For example, Newton’s Laws are a good approximation of what we experience, but because it’s an incomplete explanation that breaks down in certain situations, we 1) know that there’s more to learn and 2) clues as to where to look. Thus Einstein’s Special and then General Relativity—even better approximations of reality. In the absence of explanatory models, would Google-stein have found those elegant equations? Or would it have number-crunched adding further and further corrections like Apollonius’ epicycles to Ptolemy’s geocentric modelof the universe. If this had been done, would we have ever invented the GPS system that keeps me from getting lost when I visit a strange city? Imagine the implications!

Conclusions

These are just some of the problems I see with the position that the scientific method is doomed to be superseded by correlational computer programs. (Heck, I didn’t even talk about false positives and false negatives.) Along the way, I found some ways that Black Belts, those stealth practitioners of the scientific method in business, will be superior to blind analysis programs.

I’m a little dismayed at the intellectual disinterest that such a system would engender, though my dismay isn’t by itself sufficient to argue against it. Would the process expert of the future say, “I dunno why it does that—Google-plus says so, I guess?” On the other hand, I can see (and have used) many opportunities to use such data mining to track results or find interesting correlations to test experimentally. Google, as it now functions, is very good at finding what I’m looking for, but Google takes data generated from human brains (people link to websites and Google analyzes these links) and distills it to a ranking that a human can use. It isn’t generating new knowledge from the relationships it finds and this task, while computationally intensive, isn’t nearly as complex as describing a simple grinding process in manufacturing.

So the Black Belt, by examining existing data and then hypothesizing, experimenting, and constantly refining, can be the source of true process understanding in a way that a correlative statistical analysis can never approach.

It’s interesting to consider what a smart Google-plus-plus might be—actively forming hypothesis and performing experiments to test them. (“I’m running that experiment now, Dave.”) But even with such an artificial intelligence, it still has to use the scientific method. I just can’t see a way around testing numerical models against reality.

The Wired article states that in the Petabyte Age “correlation supersedes causation” in violation of the well-worn (but too often forgotten) “correlation is not causation.” But even an infinite amount of processing power can still form a totally useless correlative model that has no validity in predicting outcomes. A multiplicative model that does perfectly describe the above crossover interaction is easy to generate:

Output = A(Temperature)+B(Pressure)+C(Temperature*Pressure)

But these models grow geometrically with more factors, because each factor you add has to be multiplied by each other factor as well as each combination of the other factors. With enough factors and with happenstance data, you run out of data before you can find a model.

It seems to me that explanatory theory and Black Belts still have a use in making sense of processes.

On the other hand, the future is a long time, so who knows? I could be wrong. I’m sure you will let me know if I am.