### Chapter 10: Analysis of Variance: One-way Analysis of Variance

### Introduction to Analysis of Variance

Earlier we covered the *independent-samples #t#-test *as a method for testing hypotheses about the mean difference between *two* independent populations. With this procedure, we could, for example, compare the mean difference in the number of traffic violations between the younger and older halves of some population.

The downside of this type of one-to-one (pairwise) test, however, is that it is not particularly well suited for testing the mean differences between *more than two* populations. For instance, what if we would like to compare the number of traffic violations between three age groups (young, middle-aged, elderly) rather than two?

We could, in theory, run a series of consecutive #t#-tests to compare all the different pairs of age groups one at a time, but this runs into the *multiple testing problem*.

Multiple Testing Problem

Every time we conduct a statistical test, there exists the possibility of making a *Type I error*, i.e., getting a false-positive result. The probability of making a *Type I error* is by definition equal to the significance level #\alpha# we set as a researcher:

\[\mathbb{P}(\text{Type I error}) = \mathbb{P}(H_0\text{ is rejected}\,|\,H_0\text{ is true}) = \alpha\]

Thus if we set the significance level at #\alpha=0.05# and conduct a single test, the probability of a false positive is #5\%#. However, if we were to conduct multiple tests in a row, the probability that *at least one* of these tests result in a false positive becomes larger than #\alpha#. This is called the **multiple testing problem**.

To determine the exact probability of a false positive across multiple tests, the *family-wise **error rate *can be calculated.

Family-Wise Error Rate

When conducting multiple hypothesis tests, the **family-wise error rate** (#\alpha_{fw}#) is the overall probability that one or more of the tests results in a *Type I error*.

If we conduct #k# separate hypothesis tests and the significance level of the individual tests is set at #\alpha#, then the family-wise error rate is calculated as:

\[\alpha_{fw} = 1 - (1 - \alpha)^k\]

Consider a research design in which a continuous variable is measured once on #3# simple random samples. If we would use a series of *independent-samples *#t#*-tests *to make pairwise comparisons between the means of these samples, we would need a total of #3# tests:

- A test between samples #1# and #2#
- A test between samples #1# and #3#
- A test between samples #2# and #3#

If we set a significance level of #\alpha = 0.05# for each of the #k=3# pairwise comparisons, then the *family-wise error rate *is computed as:

\[\alpha_{fw} = 1 - (1 - 0.05)^3=0.142625\]

This means that there is a #14.26\%# chance of making a *Type I error*.

As the number of samples being compared increases, the *family-wise error* rate quickly gets out of control. For example, if we were to compare #6# samples at a significance level of #\alpha=0.05#, the odds of obtaining one or more false-positive results are about #50##:##50#.

To avoid inflating the *family-wise error rate *when comparing more than two samples, a different type of test is needed.

**Analysis of Variance**(ANOVA) is a collection of statistical procedures and their associated models used to test hypotheses about the mean differences between three or more populations.

Using ANOVA, we are able to simultaneously compare all population means using a single hypothesis test. This ensures that the probability of making a

*Type I*

*error*remains equal to the significance level #\alpha#.

ANOVA Terminology

In the context of ANOVA, the continuous dependent variable is referred to as the **outcome variable**.

The categorical independent variable that determines which populations are being compared is called a **factor**.

The individual categories or populations that make up the factor are referred to as the **levels **of the factor.

Suppose a researcher wants to investigate whether people's height differs among three European nations: France, Germany, and the Netherlands.

In this example:

- Height is the
*outcome*variable. - Nation is the
*factor.* - France, Germany, and the Netherlands are the three
*levels*of the factor.

The remainder of this chapter will introduce the most basic type of ANOVA, the

*one-way Analysis of Variance,*which can be considered a generalization of the

*independent-samples #t#-test*.

**Pass Your Math**independent of your university. See pricing and more.

Or visit omptest.org if jou are taking an OMPT exam.