Measurement System Analysis

I seem to be thinking about the measurement phase a lot recently. I suppose it’s because I’ve seen some teams working hard on problems that turned out to be nothing more than problems with measurement. Let me give you an example.

Once upon a time, I was responsible for a product that was frequently machined by the end-user. We had a measurement system that purported to predict one aspect if its machinability, which involved the use of a band saw cutting right down the middle of the product. We had complaints from customers who were machining to tight tolerances, so we were looking to design an experiment to make this characteristic better.

Then we noticed something odd. The readings on this device didn’t seem to relate to the complaints that we’d gotten from customers. To complicate matters, not every customer machined and not every lot was tested this way.

So we did a measurement-system analysis on the device, and found out it was good only for generating random numbers. Measuring the same product time and time again resulted in so much variability that there was no way to predict which lots would work and which ones would not.

I’ve seen the same story elsewhere, and it turns from comedy to tragedy when the team never realizes that their measurement system is unrelated to what they’re interested in measuring. Imagine going through an entire project, working hard brainstorming potential sources, setting up an experiment, running the experiment, and coming to conclusions without realizing that the whole time, the “problem” was solely one of measurement, and not a real problem in the product at all.

Measurement system analysis for continuous data

Like any other process, your measurement system should be tested for stability through time as well as its ability to meet your requirements. If a measurement system is highly variable, it still might be useable if you’re willing to take multiple measures and average them. If a measurement system is out of control, well, it’s pretty much unusable, because measuring the same thing today and tomorrow gives you two results that are unexpectedly different from each other.

The measurement-system analysis will result in an estimate of the variability within and between systems, and in the case of the short-term and long-term studies, some sort of indication of control.

Potential study

The first type of measurement system analysis is known as a potential study. This is the one people usually mean when they say they are doing a “gauge R&R,” because all it estimates is the gauge variability within and between systems, or “repeatability” and “reproducibility.” This is the one where you take ten production parts and measure them two to three times each for each appraiser. I’ve run into people who try to get ten parts that are as close to identical as possible, and I have no idea where this came from. You want to “exercise” the gauge by selecting parts to measure that span the entire width you’re expecting to use in production.

This study will give you a quick-and-dirty estimate of the within-system variation, and if you test multiple appraisers, the between-system variation.

I tell people that this is the test the gauge vendor has to pass before we even call him back for a second appointment.

However, you will notice that there’s absolutely no useful information on stability through time. Sadly, most Black Belts I have talked with use only this study.

What are the implications of using a measurement system whose stability through time is unknown? Well, imagine you’re part way through a large fractional factorial experiment and the measurement system shifts average or dispersion, since it is not in control. This could, technically speaking, seriously mess up your experimental results and conclusions. When you run your confirmation (You always do a confirmation after a fractional factorial, don’t you? Of course you do.) your results don’t confirm back to the experiment. You would have no way of knowing that it was the measurement system and not, say an aliased effect or some other effect external to the experiment. Thus begins the “DOE doesn’t work here” mantra.

Short-term study

For the short-term study, we gather 25 parts and measure them five to eight times each for multiple appraisers. This not only gives more information to refine our estimates of the measurement error between and within systems, it also gives us the first hint of stability through time. By plotting the range for each part on a standard range chart, we can determine if the variability of remeasuring a part is stable over this short span of time.

This is the trial that your gauge vendor should pass before you even consider buying that flashy new measuring system. If they can’t pass a short-term study, the probability of passing a long-term study approaches zero.

However, while we do somewhat assess stability, we are really only looking at a short time span where each part is measured five to eight times. What we would really like, just as on a typical control chart, is an assessment of the stability on 25 points through time, not 25 items at five points in time. Which leads us to the…

Long-term study

In this study the same eight parts are measured through time. In this study eight standard production parts spanning the typical use of the gauge are measured through time. (Note that you cannot use a typical X-bar and R chart on these since the between-part variation is large by design—track each part on an individuals chart, the overall average on an averages-as-individuals chart, and the standard deviation on a standard deviation chart with limits from the moving range.)

This study only needs to be done as long as you plan on using the measurement system to measure anything. What’s that? You plan on using it for a while yet? Well, if it is measuring a critical characteristic or process variable, you had better have an ongoing long-term study on it. If you want to make money and stay out of court, that is.

Wait a minute—you say you have a calibration sticker on that device and want to know why I haven’t talked about that? Because calibration stickers tell you nothing about gauge stability or acceptability. At best, calibration stickers tell you something about accuracy, which is the distance that the average reading is from a standard. They don’t address through-time stability (control) or variability of repeated measures (acceptability). At worst, they give you a false sense of security in the measurements. We need a process that tracks the gauge performance through time and compares it to past performance.

Because with the long-term study we’re measuring the same thing through time, any variation we see is supposed to be due only to measurement error. The exception is if the part changes through time, perhaps due to the measurements themselves. If the variability within or between samples increases, we catch it quickly. If the overall gauge results shift because I dropped my micrometer, I detect it quickly.

The really sad thing is that I don’t run into many people who do long-term studies on their critical measurement devices. This means that they have no way of knowing what that device does through time, and are in no position to guarantee anything that’s measured with it, regardless of an up-to-date calibration sticker. This answers the question you probably had at one point—“How do we know if the product we made between calibrations is any good?”


If a business doesn’t use measurement system analysis to select new gauges and monitor the ongoing performance of critical measurement systems, it really has no idea how those measurements relate to reality. In my experience, the long-term measurement system analysis is probably the most important and most neglected tool in business, let alone in Six Sigma. I mean, if you can’t measure it, what can you do?

Once a gauge is in control and you estimate the measurement error , you would like to see 6σe take up less than 10 percent of the specification (some practitioners use 5.15σe, but the difference is negligible and 6σe is analogous to a natural tolerance). If so, great news! If not, join the rest of us and determine if the gauge is acceptable, given the application and the increased risk of concluding in-spec product is out of spec, or out-of-spec product is in spec. Remember, if it’s in control but the variation is too high, you can take multiple readings and average to reduce your apparent error, as .

Of course if it’s a gauge you’ve been using for years and you find out that it is out of control, or that 6σe takes up 150 percent of your specification (Don’t laugh. I’ve seen that happen.), then you have my sympathy. You are now the Bearer of Bad News and that isn’t a happy place to be.

But I could be wrong.