### Chapter 9: Categorical Association: Chi-Square Goodness of Fit Test

### Chi-Square Goodness of Fit Test: Test Statistic and p-value

The test statistic for the *Chi-Square **Goodness of Fit Test *is calculated using two types of frequency counts: the *observed frequency *and the *expected frequency.*#\phantom{0}#

Data for the Goodness of Fit Test

The **observed frequency **is the number of individuals in the sample that are classified as a particular category and is denoted by #f_o#.

The **expected frequency **is the number of individuals that one would *expect* to be classified as a particular category based on the null hypothesis and is denoted by #f_e#.

To calculate the expected frequency of category #i#, multiply the proportion specified by the null hypothesis by the total sample size:

\[f_e = \pi_{0,\,i} \cdot n\]

Calculating Expected Frequencies

Suppose a researcher is interested in finding out how well the following data fits a null hypothesis of *no preference*.

Category | I | II | II |

Observed frequency (#f_o#) | 23 | 17 | 20 |

Since there are a total of #3# measurement categories, a null hypothesis of *no preference* makes the following prediction about the proportions #\pi# in the population:

I | II | III | |

#H_0:# | #1/3# | #1/3# | #1/3# |

To calculate the

*expected frequencies*, multiply each of the proportions specified by the null hypothesis by the total size of the sample, which in this case is #n=60#.

#\begin{array}{llclcl}

\,\,\,\,\scriptsize{\bullet}&\normalsize{\text{I}}:&\pi_{0,\,I}\cdot n &=& 1/3\cdot 60 &=& 20\\

\,\,\,\,\scriptsize{\bullet}&\normalsize{\text{II}}:&\pi_{0,\,II}\cdot n &=&1/3\cdot 60 &=& 20\\

\,\,\,\,\scriptsize{\bullet}&\normalsize{\text{III}}:&\pi_{0,\,III}\cdot n &=&1/3\cdot 60 &=& 20\\

\end{array}#

Category | I | II | II |

Observed frequency (#f_o#) | 23 | 17 | 20 |

Expected frequency (#f_e#) | 20 | 20 | 20 |

#\phantom{0}#

After the expected frequencies have been calculated, the next step is to calculate the *Goodness of Fit *test statistic in order to determine how much the observed frequencies differ from the frequencies expected under the null hypothesis.

#\phantom{0}#

Chi-Square Test Statistic and Distribution

The test statistic of the *Chi-Square **Goodness of Fit Test *is denoted by #\chi^2# and is calculated with the following formula:

\[\chi^2=\sum_{\text{all categories}}{\dfrac{(\text{Observed}-\text{Expected})^2}{\text{Expected}}}=\sum_{\text{all categories}}{\dfrac{(f_o-f_e)^2}{f_e}}\]

Since the calculation of the test statistic involves adding squared values, a #\chi^2#-statistic will always have a value of zero or larger.

Assuming the null hypothesis of the *Goodness of Fit Test *is true, the #\chi^2#-statistic will (approximately) follow a #\chi^2#distribution with #df= k - 1# *degrees of freedom*, where #k# is the number of possible categories.

Chi-square distributions are positively skewed and the critical region will always be located in the right tail of the distribution.

Calculating the p-value of a Chi-Square Goodness of Fit Test

A *Chi-Square *test is by definition a *right-tailed *test.

To calculate the #p#-value of a *Chi-Square **Goodness of Fit Test *in **Excel**, use the following command:

\[=1\text{ - }\text{CHISQ.DIST}(\chi^2, df, 1)\]

To calculate the #p#-value of a *Chi-Square **Goodness of Fit Test *in **R**, use the following command:

\[\text{pchisq}(\chi^2, df, lower.tail=\text{FALSE})\]

Where #df=k\text{ - }1#.

If #p \leq \alpha#, reject #H_0# and conclude #H_a#. Otherwise, do not reject #H_0#.

A researcher wants to use a *Chi-Square Goodness of Fit Test *to determine if the distribution of eye color is the same in Sweden as it is in Norway.

Suppose that from previous research it is known that the distribution of eye color in Norway is the following:

Blue | Brown | Other |

#0.67# | #0.22# | #0.11# |

The researcher randomly selects #958# Swedes and records their eye color. The table below is the result of this measurement:

Eye Color | Blue | Brown | Other |

Observed frequency | 623 | 220 | 115 |

Calculate the #p#-value of the test and make a decision regarding #H_0#. Round your answer to #3# decimal places. Use the #\alpha = 0.03# significance level.

#p=0.399#

On the basis of this #p#-value, #H_0# should not be rejected, because #\,p# #\gt# #\alpha#.

There are a number of different ways we can calculate the #p#-value of the test. Click on one of the panels to toggle a specific solution.

Calculate the expected frequency of each category with the following formula:

\[f_{e,\,i} = \pi_{0,\,i} \cdot n\]

#\begin{array}{llclcl}

\,\,\scriptsize{\bullet}&\normalsize{\text{Blue}}:&\pi_{0,\,blue}\cdot n &=& 0.67 \cdot 958 &=& 641.86\\

\,\,\scriptsize{\bullet}&\normalsize{\text{Brown}}:&\pi_{0,\,brown}\cdot n &=&0.22 \cdot958 &=&210.76\\

\,\,\scriptsize{\bullet}&\normalsize{\text{Other}}:&\pi_{0,\,other}\cdot n &=&0.11 \cdot958 &=&105.38\\

\end{array}#

Eye color | Blue | Brown | Other |

Observed frequency (#f_o#) | 623 | 220 | 115 |

Expected frequency (#f_e#) | 641.86 | 210.76 | 105.38 |

Calculate the #\chi^2#-statistic:

\[\begin{array}{rcl}

\chi^2&=&\sum\limits_{\text{all categories}}{\dfrac{(f_o-f_e)^2}{f_e}}\\\\

&=& \cfrac{(623-641.86)^2}{641.86} + \cfrac{(220-210.76)^2}{210.76} + \cfrac{(115-105.38)^2}{105.38}\\\\

&=& 1.837

\end{array}\]

Determine the degrees of freedom:

\[df = k - 1 = 3 - 1 = 2 \]

To calculate the #p#-value of a* *#\chi^2#-test, make use of the following Excel function:

CHISQ.DIST(x, deg_freedom, cumulative)

x: The value at which you wish to evaluate the distribution function.deg_freedom: An integer indicating the number of degrees of freedom.cumulative: A logical value that determines the form of the function.

- TRUE - uses the cumulative distribution function, #\mathbb{P}(X \leq x)#
- FALSE - uses the probability density function

A *Chi-Square *test is by definition a *right-tailed *test. Thus, to calculate the #p#-value of the test, run the following command:

\[=1\text{ - }\text{CHISQ.DIST}(\chi^2, k\text{ - }1, 1)\\

\downarrow\\

=1\text{ - }\text{CHISQ.DIST}(1.837, 2, 1)\]

This gives:

\[p = 0.399\]

Since #\,p# #\gt# #\alpha#, the null hypothesis of *no preference *should not be rejected.

Calculate the expected frequency of each category with the following formula:

\[f_{e,\,i} = \pi_{0,\,i} \cdot n\]

#\begin{array}{llclcl}

\,\,\scriptsize{\bullet}&\normalsize{\text{Blue}}:&\pi_{0,\,blue}\cdot n &=& 0.67 \cdot 958 &=& 641.86\\

\,\,\scriptsize{\bullet}&\normalsize{\text{Brown}}:&\pi_{0,\,brown}\cdot n &=&0.22 \cdot958 &=&210.76\\

\,\,\scriptsize{\bullet}&\normalsize{\text{Other}}:&\pi_{0,\,other}\cdot n &=&0.11 \cdot958 &=&105.38\\

\end{array}#

Eye color | Blue | Brown | Other |

Observed frequency (#f_o#) | 623 | 220 | 115 |

Expected frequency (#f_e#) | 641.86 | 210.76 | 105.38 |

Calculate the #\chi^2#-statistic:

\[\begin{array}{rcl}

\chi^2&=&\sum\limits_{\text{all categories}}{\dfrac{(f_o-f_e)^2}{f_e}}\\\\

&=& \cfrac{(623-641.86)^2}{641.86} + \cfrac{(220-210.76)^2}{210.76} + \cfrac{(115-105.38)^2}{105.38}\\\\

&=& 1.837

\end{array}\]

Determine the degrees of freedom:

\[df = k - 1 = 3 - 1 = 2 \]

To calculate the #p#-value of a* *#\chi^2#-test, make use of the following R function:

pchisq(q, df, lower.tail)

q: The value at which you wish to evaluate the distribution function.df: An integer indicating the number of degrees of freedom.lower.tail: If TRUE (default), probabilities are #\mathbb{P}(X \leq x)#, otherwise, #\mathbb{P}(X \gt x)#.

A *Chi-Square *test is by definition a *right-tailed *test. Thus, to calculate the #p#-value of the test, run the following command:

\[\text{pchisq}(q = \chi^2, df = k\text{ - }1, lower.tail=\text{FALSE})\\

\downarrow\\

\text{pchisq}(q = 1.837, df = 2, lower.tail=\text{FALSE})\]

This gives:

\[p = 0.399\]

Since #\,p# #\gt# #\alpha#, the null hypothesis of *no preference *should not be rejected.

**Pass Your Math**independent of your university. See pricing and more.

Or visit omptest.org if jou are taking an OMPT exam.