## Statistical Answers - Technoids Only!

Here is where you can find the answers to the statistical scenarios posed in the Six Sigma Heretic article called Stupid Six Sigma Tricks #5

### Question 1

The correct answer is d) Do something else.

Sometimes in real life you run up against a lower limit of some sort. In this case, it is a response time of zero minutes. The resulting distribution could be a folded normal or some other skewed distribution that is very non-normal. Obviously one clue was that the software was giving you a lower limit of -20 minutes, which would mean that you were answering some of the questions 20 minutes before you received them - a great result if only it were possible! You don't even need to test for normality here, since the projection for the normal curve gives you an impossible lower limit you know you have to model a different distribution.

The individuals chart is highly sensitive to deviations from normality, and using the typical limit calculations at three standard deviations, the limits will be close to the natural tolerance. Not exact, since the point-to-point variation of a skewed distribution probably will result in a larger estimate of the variation, pushing the limits out a little further. Chebychev's Theorem says that at least 75% of any distribution will be within ±2 sigmas and that at least 89% will be within ±3 sigmas. Some practitioners say they rely on this Theorem to protect an individuals chart from non-normality, but this is one of many reasons why that is just not practical for control chart purposes. Even if you put the lower limit at zero and used the calculated upper limit, you would still have overly frequent runs below the mean and points outside the upper limit. This would drive you to fruitlessly investigate these occurrences as special events in order to improve your response time, when in fact they are expected from this distribution. If you want to make this process better, you are going to need to change the distribution, which means you need to change the process.

So, you would need to fit some sort of distribution to your data, then calculate the appropriate control limits for that distribution. As always with fitting distributions, you have the additional error of fitting to your sample data, so you should keep a close eye on the chart as you continue to gather data to make sure that the model you chose is still giving you helpful information.

### Question 2

The correct answer is c) Do something else

Statistical procedures are limited by the scale of data with which you are working. The data scale tells you about how your measurement relates to the property in which you are interested. In this case, the data are ordinal, which means that while choosing Disagree Strongly means you disagree more than just choosing Disagree, you don't know how much more, nor is it necessarily the same amount as the difference between the same interval, say Disagree and Neutral. If you are measuring temperatures at an interval level, 20º is exactly 10º cooler than 30º, and 30º is exactly 10º cooler than 40º.

It is wholly inappropriate to use the t-test on anything but interval and ratio level data, and you may make the wrong conclusion if you do. And you can't test for differences in variance either; since it is ordinal data the variance has no meaning. For ordinal data, you would need to use a non-parametric test, and I would choose the Wilcoxon-Mann-Whitney test to determine if there has been any change. This test detects differences in distribution, especially differences in medians, which sounds like it will answer the research question.

### Question 3

The correct answer is c) Do something else (noticing a pattern here?)

This is almost a trick question, but it serves to illustrate an important point. When you are looking for the relationship between truly dichotomous data (yes/no) which is of course nominal, and ratio or interval data, the appropriate statistic is the Point Biserial Correlation Coefficient rpbi. The rpbi ranges from 0 to 1 - there is no negative to indicate direction since the category of "yes" and "no" are purely arbitrary. It just so happens that rpbi is calculated exactly the same way as the Pearson Product Moment Correlation Coefficient r but the absolute value is used, and it is tested for significance in the same way. This is a case where someone with just a surface knowledge would unknowingly get the right answer for the wrong reason. However it is important to realize the following:

• The best importance measure for rpbi is not rpbi2 like it is for Pearson's r, but omega-squared (ω2 if your browser supports Greek letters). So if the Black Belt concluded that, say, r2 of the variability of one variable was explained by the other, it is probably off a bit.
• Had the survey response been an artificially dichotomous variable (e.g. "How much will you spend with Acme next year? 1 = <\$5,000 2 = >\$5,000") you might have been able to use the Biserial Correlation Coefficient, and if so get a better understanding of what the correlation would have been had you been able to collect ratio data.
• Had the question been looking for the correlation of ratio, interval, or ordinal variables with another ordinal variable with a monotonic relationship, you should use Spearman's rs which is the Pearson r on the rank orders.

So the point that this illustrates is that in all ignorance a Black Belt might get the right answer one time, but a very wrong answer the next time.

### Question 4

The correct answer is c) Something else (bet you guessed that!)

For this one, you have to know the difference between random and fixed effects. In a fixed effect ANOVA, you are interested in knowing if there is a difference between the different levels of the treatments, and the levels that you are testing are all the levels in which you are interested. For example, I have four vendors and I want to know if there is a difference between those four in some critical characteristic of what they supply. A random effect is when you are interested in knowing if a treatment has an effect where the levels you have chosen are randomly selected from a larger group. For example, I have 20 vendors, I select four at random to see if vendor has an effect on that critical characteristic. In this case, I am not interested in knowing whether Vendor 1's average is different from Vendor 2's, I am interested in knowing whether the differences in the means are just random differences from the same average, or if there is an additional component to that variation which is that at least some of the vendors have a different average. It is a different hypothesis.

Now for a oneway ANOVA, it turns out that random and fixed effects are analyzed for significance in exactly the same way. But if you have more than one factor, things change. Remember that the fixed effect ANOVA calculates an F-statistic by taking the mean-squares between levels of the factor or effect and dividing it by the mean-squares within the factors: the mean-squared error a.k.a. within-cell error. With the random effects, what you divide by changes depending on whether that factor and the other factors in the model are random and/or fixed. So you get very different F-statistics for the same data depending on how the factors are classified as fixed or random.

What is interesting (and a trap lying in wait for innocent Black Belts) is that the exact same data in a two-way ANOVA analyzed as both fixed effects gives you different answers than if analyzed as both random effects, which in turn is different if one is fixed and the other is random. And here is the freaky part: only in the experimenter's mind is this known. The statistical software can't look at the data and say that it is random or fixed, the practitioner needs to tell it that.

So in this case, you have one random factor - Vendor (ten randomly selected from many others) and a fixed factor - Location (the three locations).

And for the truly beyond-all-hope technoids (like myself) here are the appropriate expected means squares:

 Expected Mean Squares Output Factors Source Levels F/R C/N A 10 Random Crossed B 3 Fixed Crossed Expected Mean Squares Source df E(MS) Main Effects A 9 3V[A] + V[e] B 2 10P[B] + 1V[AB] + V[e] 2-Way AB 18 1V[AB] + V[e] Error (n-1) × 30 V[e] Total 29 + df(Error)

This output is from MVPstats, a great stats program! Thanks to the guys over at MVPprograms!

The output is telling you that while the random factor A (Vendor) and the random effect interaction AB (Vendor and Location) use the within-error for thier F-statistic, the fixed effect B (Location) uses the mean squared error for the interaction not the within-error! You will get a totally different answer if you just put them in as fixed!