### Chapter 9: Categorical Association: Chi-Square Test for Independence

### Chi-Square Test for Independence: Test Statistic and p-value

Data for the Chi-Square Test for Independence

The **observed frequency **is the number of individuals in the sample that are classified as a particular category and is denoted by #f_o#.

The **expected frequency **is the number of individuals that one would *expect* to be classified as a particular category based on the predictions made by the null hypothesis and is denoted by #f_e#.

The expected frequency of a category is calculated with the following formula:

\[f_e = \cfrac{f_r \cdot f_c}{n}\]

where #f_r# is frequency total for the row and #f_c# is the frequency total for the column.

Calculating Expected Frequencies

Consider the following frequency distribution table:

Observed Frequencies |
|||

Apple | Banana | #\blue{\text{Total}}# | |

Extrovert | #\purple{\text{13}}# | #\purple{\text{37}}# | #\blue{\text{50}}# |

Introvert | #\purple{\text{81}}# | #\purple{\text{97}}# | #\blue{\text{178}}# |

#\orange{\text{Total}}# | #\orange{\text{94}}# | #\orange{\text{134}}# | 228 |

To calculate the expected frequencies, apply the following formula to each #\purple{\text{cell}}# in the table:

\[f_e = \cfrac{\blue{f_r} \cdot \orange{f_c}}{n}\]

where #\blue{f_r}# is frequency total for the row and #\orange{f_c}# is the frequency total for the column.

#\begin{array}{llcl}

\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Extrovert - Apple}}&:&\cfrac{\blue{50}\cdot \orange{94}}{228}=20.61\\

\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Extrovert - Banana}}&:&\cfrac{\blue{50}\cdot \orange{134}}{228}=29.39\\

\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Introvert - Apple}}&:&\cfrac{\blue{178}\cdot \orange{94}}{228}=73.39\\

\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Introvert - Banana}}&:&\cfrac{\blue{178}\cdot \orange{134}}{228}=104.61\\

\end{array}#

Expected Frequencies |
|||

Apple | Banana | Total | |

Extrovert | 20.61 | 29.39 | 50 |

Introvert | 73.39 | 104.61 | 178 |

Total | 94 | 134 | 228 |

#\phantom{0}#

After the expected frequencies have been calculated, the next step is to calculate the *Chi-Square Test for Independence* test statistic in order to determine how much the observed frequencies differ from the frequencies expected under the null hypothesis.

#\phantom{0}#

Chi-Square Test Statistic and Distribution

The test statistic for the *Chi-Square **Test for Independence *is denoted by #\chi^2# and is calculated with the following formula:

\[\chi^2=\sum_{\text{all cells}}{\dfrac{(\text{Observed}-\text{Expected})^2}{\text{Expected}}}=\sum_{\text{all cells}}{\dfrac{(f_o-f_e)^2}{f_e}}\]

Since the calculation of the test statistic involves adding squared values, a #\chi^2#-statistic will always have a value of zero or larger.

Assuming the null hypothesis of the *Chi-Square **Test for Independence *is true, the #\chi^2#-statistic will (approximately) follow a #\chi^2#-distribution with #df = (r -1)(c-1)# *degrees of freedom*, where #r# is the number of rows and #c# the number of columns.

Chi-square distributions are positively skewed and the critical region will always entirely be located in the right tail of the distribution.

Calculating the p-value of a *Chi-Square **Test for Independence*

A *Chi-Square *test is by definition a *right-tailed *test.

To calculate the #p#-value of a *Chi-Square **Test for Independence *in **Excel**, use the following command:

\[=1\text{ - }\text{CHISQ.DIST}(\chi^2, df, 1)\]

To calculate the #p#-value of a *Chi-Square **Test for Independence *in **R**, use the following command:

\[\text{pchisq}(\chi^2, df, lower.tail=\text{FALSE})\]

Where #df = (r \text{ - }1)(c\text{ - }1)#.

If #p \leq \alpha#, reject #H_0# and conclude #H_a#. Otherwise, do not reject #H_0#.

In an effort to assess the impact of funding cuts on pre-school programs, school administrators in a US school district selected a simple random sample of #270# students in the seventh grade and determined whether or not each student had attended pre-school and whether each student was performing below, at, or above grade level in mathematics.

The distribution was organized in the following two-way frequency table:

Observed frequencies | ||||

Below grade level | At grade level | Above grade level | Total | |

Pre-school | #37# | #123# | #30# | #190# |

No pre-school | #17# | #58# | #5# | #80# |

Total | #54# | #181# | #35# | #270# |

The researcher plans on using a

*Chi-Square Test for Independence*to determine whether pre-school attendance and mathematical ability are related to one another.

Calculate the #p#-value of the test and make a decision regarding #H_0#. Round your answer to #3# decimal places. Use the #\alpha = 0.05# significance level.

#p=0.103#

On the basis of this #p#-value, #H_0# should not be rejected, because #\,p# #\gt# #\alpha#.

There are a number of different ways we can calculate the #p#-value of the test. Click on one of the panels to toggle a specific solution.

Calculate the expected frequency of all cells in the table with the following formula:

\[f_e = \cfrac{f_r \cdot f_c}{n}\]

where #f_r# is frequency total for the row, #f_c# is the frequency total for the column, and #n# is the total sample size.

Expected frequencies | ||||

Below grade level | At grade level | Above grade level | Total | |

Pre-school | #38.00# | #127.37# | #24.63# | #190# |

No pre-school | #16.00# | #53.63# | #10.37# | #80# |

Total | #54# | #181# | #35# | #270# |

Calculate the #\chi^2#-statistic:

\[\begin{array}{rcl}

\chi^2&=&\sum\limits_{\text{all cells}}{\dfrac{(f_o-f_e)^2}{f_e}}\\

&=& \cfrac{(37-38.00)^2}{38.00}+\cfrac{(123-127.37)^2}{127.37}+\cfrac{(30-24.63)^2}{24.63}+\cfrac{(17-16.00)^2}{16.00}\\ && \phantom{}+\cfrac{(58-53.63)^2}{53.63}+\cfrac{(5-10.37)^2}{10.37}\\

&=& 4.546

\end{array}\]

Determine the degrees of freedom:

\[df = (r -1)(c-1) = (2 -1 )(3 - 1)=2\]

To calculate the #p#-value of a

*#\chi^2#-test, make use of the following Excel function:*

CHISQ.DIST(x, deg_freedom, cumulative)

x: The value at which you wish to evaluate the distribution function.deg_freedom: An integer indicating the number of degrees of freedom.cumulative: A logical value that determines the form of the function.

- TRUE - uses the cumulative distribution function, #\mathbb{P}(X \leq x)#
- FALSE - uses the probability density function

A *Chi-Square *test is by definition a *right-tailed *test. Thus, to calculate the #p#-value of the test, run the following command:

\[=1\text{ - }\text{CHISQ.DIST}(\chi^2,(r \text{ - }1)(c\text{ - }1), 1)\\

\downarrow\\

=1\text{ - }\text{CHISQ.DIST}(4.546, 2, 1)\]

This gives:

\[p = 0.103\]

Since #\,p# #\gt# #\alpha#, the null hypothesis of *independence* should not be rejected.

Calculate the expected frequency of all cells in the table with the following formula:

\[f_e = \cfrac{f_r \cdot f_c}{n}\]

where #f_r# is frequency total for the row, #f_c# is the frequency total for the column, and #n# is the total sample size.

Expected frequencies | ||||

Below grade level | At grade level | Above grade level | Total | |

Pre-school | #38.00# | #127.37# | #24.63# | #190# |

No pre-school | #16.00# | #53.63# | #10.37# | #80# |

Total | #54# | #181# | #35# | #270# |

Calculate the #\chi^2#-statistic:

\[\begin{array}{rcl}

\chi^2&=&\sum\limits_{\text{all cells}}{\dfrac{(f_o-f_e)^2}{f_e}}\\

&=& \cfrac{(37-38.00)^2}{38.00}+\cfrac{(123-127.37)^2}{127.37}+\cfrac{(30-24.63)^2}{24.63}+\cfrac{(17-16.00)^2}{16.00}\\ && \phantom{}+\cfrac{(58-53.63)^2}{53.63}+\cfrac{(5-10.37)^2}{10.37}\\

&=& 4.546

\end{array}\]

Determine the degrees of freedom:

\[df = (r -1)(c-1) = (2 -1 )(3 - 1)=2\]

To calculate the #p#-value of a

*#\chi^2#-test, make use of the following R function:*

pchisq(q, df, lower.tail)

q: The value at which you wish to evaluate the distribution function.df: An integer indicating the number of degrees of freedom.lower.tail: If TRUE (default), probabilities are #\mathbb{P}(X \leq x)#, otherwise, #\mathbb{P}(X \gt x)#.

A *Chi-Square *test is by definition a *right-tailed *test. Thus, to calculate the #p#-value of the test, run the following command:

\[\text{pchisq}(q = \chi^2, df = (r \text{ - }1)(c\text{ - }1), lower.tail=\text{FALSE})\\

\downarrow\\

\text{pchisq}(q = 4.546, df = 2, lower.tail=\text{FALSE})\]

This gives:

\[p = 0.103\]

Since #\,p# #\gt# #\alpha#, the null hypothesis of *independence* should not be rejected.

**Pass Your Math**independent of your university. See pricing and more.

Or visit omptest.org if jou are taking an OMPT exam.