Can they co-exist?

We have recently covered a lot of ground on the topic of measurement system analysis (MSA). We talked about the basics of MSA, the potential study, the short-term study, and the long-term study. At this point you should have a pretty firm foundation in the importance and methods of good MSA studies for your research and production, as well as a practical tool to help you in doing measurement system analysis—the file "MSA Forms 3.22.xls" (gauge repeatability and reproducibility worksheets)— which is a free download from Six Sigma Online. In this article, I am going to tie up some loose ends and then talk about a frequent question, “Is MSA even possible with a destructive gauge?”

All right, so you went out and did a potential study on the gauge you have been using in production for 35 years and found that there was no way it could reliably determine if a product was truly conforming or not conforming to current specs. You call in your gauge salesperson and ask her to show you some possibilities for replacement. They all look cool, especially the ones with all the pretty lights, but you need a gauge to make decisions with, not (just) to look like a Christmas tree. So you challenge the gauge vendor to a short-term study and find a gauge that potentially has the capabilities that would enable you to determine conformance to current specifications.

Still, having been burned by your not-so-trusty gauge, you tell the vendor that you will buy the new gauge if it passes a long-term study that is administered by you and performed by the people who will actually use the gauge. Good news… the new gauge passes the test. It is stable over time and the measurement variability is small compared to the width of the spec, so you can be assured that the conformance decisions you make with the gauge are correct. Even better, with a percentage of reliability and repeatability (%R&R) of 10 percent, you can do a much better job detecting effects when you run your experiments to improve quality; so you get to use a much smaller sample size to boot.

Then Sparky comes in (everyone works with a Sparky, though sometimes his name is Bob) and says, “Well, all that work you have done so far doesn’t guarantee your fancy new gauge is going to give the correct answer tomorrow, now does it?” He says it with a little sneer, Sparky does, as he thinks the best way to a promotion is to push everyone else down.

“No problem,” you say. “Now that we have the limits from the long-term study, we select a test frequency based on the risk of missing a change in the gauge and the cost of the monitoring, and we just keep measuring our eight samples over time. If there is a shift in any of the diagnostic charts, we will investigate to see what changed and fix it before it affects a significant amount of our production. Oh, and by the way, those charts will also tell us when it is time to recalibrate the gauge, so we either save money and reduce variation by recalibrating less frequently, or we save money and reduce variation by not missing a gauge drift and risking bad conformance decisions getting out to our customers.”

Sparky, now cowed by your rhetorical, metrological, and statistical brilliance, turns and mutters something about how in the old days customers were the enemy, and how you can’t use your new-fangled MSA on the testosterone-soaked destructive gauge used in his process.

Thankfully, having read this article, you can continue to put Sparky in his place. A nasty dark place with lots of clown dolls, hopefully. Brrr….

In the book, A Practical Approach to Gauge Capability Analysis, by Jeffrey Luftig and Michael V. Petrovich (Luftig & Warren International, 1992) Luftig codified the three types of destructive tests and how to handle each.

First, let’s define a destructive test, because a lot of measurement systems that seem to be nondestructive actually are, and vice-versa. A destructive test just means that you can’t remeasure the exact same thing and expect to keep getting the true value. What you are destroying is not necessarily the sample, but access to the true value itself. This could be because the sample is in fact destroyed (as in atomic absorption spectroscopy), or that the test itself changes the true value in the area measured (as in hardness testing in certain metals), or the true value changes with time or environmental conditions (as in, the thing that will really tick off your spouse... today).

The first destructive gauge I encountered was the first gauge study I ever did: using a 1 to 2 inch micrometer to measure the thickness of an aluminum plate. You might think that a micrometer is nondestructive—after all, the plate is a good 40 inches wide and a couple  hundred inches long—how is using a tiny little hand mic going to be destructive? We circled 25 locations on the plate that we were going to measure, did our five measurements of each location across three operators (well, two operators and the goofy young metallurgical engineer), and put the data into the same format as I have shown you.

We ended up having too much variability as compared to the spec. How could this be? Hand micrometers like this are marked with a vernier so that you can read down to the thousandth, then why couldn’t we read consistently beyond around 20 thousandths? Well, after a lot of thinking and looking at the data, we finally noticed that the variation increased during the test. So we went back out and took a look at our test plate. Inside each circled area where we measured, we could see a shiny circle where the carbide tip of the mic had contacted the plate. I pulled out my handy loupe and saw little bits of aluminum that had been squished down onto the surface.

What we had finally figured out was that if the right-angle edge of the carbide face contacted the plate at a slight angle, it would tend to scrape a curl of aluminum up, which then might get smashed down on that reading or the next one, thus increasing the apparent thickness of the plate by a few thousandths. The true thickness was being obscured by the little bits of metal sitting on the surface of the plate.

Our solution? We went to a round tip on the mic—which introduced another source of variability since aluminum is so springy—and then settled on a round tip coupled with the vigilant use of the friction sleeve.

Once our new procedure was in place, we were well within the variability needed to make conformance decisions. One guy could consistently read to ±1/1,000 each repetition. That guy was not me.

So what are the destructive test types and how do we handle them? I thought you would never ask!

Class I destructive test

In this test, the specimen’s true value is destroyed through testing so we can’t remeasure exactly the same specimen we did the first time, but we can sample from a homogenous batch. Imagine a big, well-mixed vat of some chemical—it is all the same thing, right? When we sample from it multiple times, we can presume that any differences we see in the readings is measurement error. If we have five big vats, we can use them as five samples in our long-term MSA and come back every day and take another five samples to monitor changes in the measurement system. No modification to the MSA procedures I’ve shown you is needed for a Class I destructive test, though we do have a risk here that is not present in a nondestructive test. If our assumption about homogeneity is not correct, we will be trying to track down a problem in the measurement system that was really variations in the samples.

Class II destructive test

In a Class II destructive test the specimen’s true value is destroyed and we don’t have a homogenous subgroup from which to draw specimens. The problem here is that if you see a change in the measurement, you don’t know if it is due to the measurement system, the piece you just measured, or some combination of the two. With this one we usually have to be content in bounding the total variability of the measurement error and sample variation.

When doing an MSA on a Class II-type device, we try to minimize sample variation while still maintaining external validity by measuring something that we actually test. A sample might be a short production run that we randomize and save to use over time. Another run to a different production target would serve as a second sample. We know that there may be within-sample (aka within-run) product variation present, but the overall variability of the product and the measurement device together should be stable and predictable in the absence of special causes. Looking at the overall control charts that we generate as part of the long-term study (the means and standard deviations charted as individuals), we should be able to detect unexpected variation, and our reaction would first be to check the measurement device. (We do have to get rid of those charts on the long-term MSA that track “Sample 1,” “Sample 2,” etc. since we never test the same sample.)

Of course, the risk here is that the unexpected variation we saw was actually product variation, and our overall estimate of the gauge variability will include some amount of variability from the sample product we are measuring, but that is the nature of Class II destructive tests.

Class III destructive test

A Class III destructive test is when we can remeasure the sample, but the true value of the sample is itself changing with time. I have seen this when hardness testing metals—a ball is pushed into the surface of the metal with a standard weight and the size of the indention is used to determine how hard the metal is. But when you go to remeasure that sample, a sometimes surprisingly large area around the original dimple becomes hardened from the stress of making the first indention and can affect your next hardness reading. Another example is that some chemical mixtures change with time as solids are dissolved or precipitate out, or chemicals from the environment are absorbed. A third example: testing new motors on the number of cycles until run in.

You can handle this type of destructive test by modeling the change through time, based on either first principles or by measuring the sample repeatedly through time and generating a best-fit model. You can then just subtract the model prediction at time t from the reading at time t to get the estimate of the true value. If you want to get really fancy, if you have a homogenous population to draw on, you could measure samples on a number of testers at multiple ages, though of course some amount of tester-to-tester variability will still be confounded with measurement error.

In any case, we still have an estimate of the measurement error that is probably somewhat different from the “real” error, but we can still detect when the gauge goes out of control, and we can still use the %R&R to determine if a measurement system can be trusted to make a conformance decision.

One note, though—because the true value could be changing pretty significantly through time, this type of test is particularly prone to that correlation of error with magnitude we tested for in the short- and long-term studies. If that happens, no single value can describe the measurement error through time.

If there's a combination of Class II and Class III going on, you probably see by now that you just use a combination of the two approaches to handle that.

Destructive tests—another alternative

There is another way to handle destructive tests, where the above guidelines might be impractical. Almost all destructive test systems involve elements of nondestructive testing, that themselves can be tested for control through time and acceptability. For example, I might be doing a Class II destructive test on the expansion of a heat-activated foam polymer: I put a sample on a coupon, measure its water displacement, put the coupon into an oven for a set amount of time, remove the coupon, and measure its displacement again, and use the difference in displacements as the expansion of the polymer. Each step of the way I am using devices that can be tested for stability and acceptability: coupon size and weight variation, oven temperature control, timer variation, and a mass balance or volumetric gauge capability. If we think of all the things that go into making the correct measurement, and all those things are stable and have acceptable variation, then we can estimate a %R&R of the whole system. Obviously there are a number of caveats going this route, but if it is your only alternative, it is better than just hoping such a complicated measurement is generating stable and relatively small variation as compared to the spec.

Conclusions

So that’s it for MSA for now. Hopefully, you have a renewed passion for making sure that the measurement systems you use to determine whether processes are running correctly are stable and acceptable, and that the outputs of those processes are also stable. I have given you (absolutely free) a neat little spreadsheet to help you along the way, and I hope you find it useful.

In the absence of a good measurement system analysis, you can waste time, money, and sanity trying to track down product problems that are measurement problems. Even worse, with a bad %R&R you can end up calling good product bad, or bad product good, and both can cost you more than just lots of money.

So if you measure anything important in your processes, make sure you have a solid understanding of how the gauge you use actually performs.