### Chapter 8. Testing for Differences in Mean and Proportion: Independent Proportions Z-test

### Independent Proportions Z-test: Test Statistic and p-value

Independent proportions Z-test: Test Statistic

Let #X_1# denote the number of successes in the first sample and #X_2# the number of successes in the second sample. Then #\hat{p}_1# and #\hat{p}_2# are the *sample proportions*:

\[\hat{p}_1 = \cfrac{X_1}{n_1} \phantom{000000} \hat{p}_2 = \cfrac{X_2}{n_2}\]

Besides the individual sample proportions, we will also need the *pooled sample proportion *#\hat{p}# in order to calculate the test statistic:

\[\hat{p} = \cfrac{X_1+X_2}{n_1+n_2}\]

The test statistic of a *independent proportions #Z#-test *is denoted #Z# and is computed with the following formula:

\[Z=\cfrac{(\hat{p}_1-\hat{p}_2) - (\pi_1 - \pi_2)}{s_{(\hat{p}_1 - \hat{p}_2)}} = \cfrac{\hat{p}_1-\hat{p}_2 }{\sqrt{\hat{p}\cdot(1-\hat{p})\cdot(\cfrac{1}{n_1}+\cfrac{1}{n_2})}}\]

where #s_{(\hat{p}_1 - \hat{p}_2)}# is the **standard error of the proportion difference**.

When both samples are large #(n_1 \geq 30 \text{ and } n_2 \geq 30)#, the #Z#-statistic follows the *Standard Normal Distribution *under the null hypothesis of the test:

\[Z \sim N(0,1)\]

Calculating the p-value of an independent proportions Z-test with Statistical Software

The calculation of the #p#-value of an *independent proportions #Z#-test* is dependent on the *direction *of the test and can be performed using either Excel or R.

To calculate the #p#-value of an *independent proportions #Z#-test* for #\pi_1 - \pi_2# in **Excel**, make use of one of the following commands:

\[\begin{array}{llll}

\phantom{0}\text{Direction}&\phantom{000000}H_0&\phantom{000000}H_a&\phantom{0000000000}\text{Excel Command}\\

\hline

\text{Two-tailed}&H_0:\pi_1 - \pi_2 = 0&H_a:\pi_1 - \pi_2 \neq 0&=2 \text{ * }(1 \text{ - }\text{NORM.DIST}(\text{ABS}(z),0,1,1))\\

\text{Left-tailed}&H_0:\pi_1 - \pi_2 \geq 0&H_a:\pi_1 - \pi_2 \lt 0&=\text{NORM.DIST}(z,0,1,1)\\

\text{Right-tailed}&H_0:\pi_1 - \pi_2 \leq 0&H_a:\pi_1 - \pi_2 \gt 0&=1 \text{ - }\text{NORM.DIST}(z,0,1,1)\\

\end{array}\]

To calculate the #p#-value of an *independent proportions #Z#-test* for #\pi_1 - \pi_2# in **R**, make use of one of the following commands:

\[\begin{array}{llll}

\phantom{0}\text{Direction}&\phantom{000000}H_0&\phantom{000000}H_a&\phantom{0000000000}\text{R Command}\\

\hline

\text{Two-tailed}&H_0:\pi_1 - \pi_2 = 0&H_a:\pi_1 - \pi_2 \neq 0&2 \text{ * }\text{pnorm}(\text{abs}(z),0,1, \text{FALSE})\\

\text{Left-tailed}&H_0:\pi_1 - \pi_2 \geq 0&H_a:\pi_1 - \pi_2 \lt 0&\text{pnorm}(z,0,1, \text{TRUE})\\

\text{Right-tailed}&H_0:\pi_1 - \pi_2 \leq 0&H_a:\pi_1 - \pi_2 \gt 0&\text{pnorm}(z,0,1, \text{FALSE})\\

\end{array}\]

If #p \leq \alpha#, reject #H_0# and conclude #H_a#. Otherwise, do not reject #H_0#.

The researcher plans on using an

*independent proportions #Z#-test*to determine whether or not there is a significant

*difference*between the morning and evening on-time arrival rate, at the #\alpha = 0.02# level of significance.

Out of the #116# morning trains, #X_1=97# arrived on time. Out of the #120# evening trains, #X_2=112# arrived on time.

Calculate the #p#-value of the test and make a decision regarding #H_0: \pi_1 - \pi_2 = 0#. Round your answer to #3# decimal places.

#p=0.019#

On the basis of this #p#-value, #H_0# should be rejected, because #\,p# #\lt# #\alpha#.

There are a number of different ways we can calculate the #p#-value of the test. Click on one of the panels to toggle a specific solution.

Compute the *sample proportions* #\hat{p}_1# and #\hat{p}_2#:

\[\hat{p}_1=\cfrac{X_1}{n_1}=\cfrac{97}{116}=0.83621\\

\hat{p}_2=\cfrac{X_2}{n_2}=\cfrac{112}{120}=0.93333\]

Compute the *pooled sample proportion *#\hat{p}#:

\[\hat{p}=\cfrac{X_1 + X_2 }{n_1 + n_2}=\cfrac{97 + 112}{116 + 120}=0.88559\]

Compute the #Z#-statistic:

\[z=\cfrac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p} \cdot (1-\hat{p}) \cdot \bigg(\cfrac{1}{n_1}+\cfrac{1}{n_2} \bigg)}}

=\cfrac{0.83621 - 0.93333}{\sqrt{0.88559 \cdot (1-0.88559) \cdot \bigg(\cfrac{1}{116}+\cfrac{1}{120} \bigg)}}=-2.3435\]

Since both #n_1# and #n_2# are considered *large *(#\gt 30#), the *Central Limit Theorem *applies and we know that the test statistic

\[Z=\cfrac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p} \cdot (1-\hat{p}) \cdot \bigg(\cfrac{1}{n_1}+\cfrac{1}{n_2} \bigg)}}\]

approximately has the *Standard Normal Distribution*, under the assumption that #H_0# is true.

For a *two-tailed *#Z#-test, the #p#-value is defined as #2\cdot \mathbb{P}(Z \geq |z|)#. To calculate this value in Excel, make use of the following function:

NORM.DIST(x, mean, standard_dev, cumulative)

x: The value at which you wish to evaluate the distribution function.mean: The mean of the distribution.standard_dev: The standard deviation of the distribution.cumulative: A logical value that determines the form of the function.

- TRUE - uses the cumulative distribution function, #\mathbb{P}(X \leq x)#
- FALSE - uses the probability density function

Thus, to calculate #p = 2\cdot \mathbb{P}(Z \geq |z|)#, run the following command:

\[

=2 \text{ * }(1 \text{ - } \text{NORM.DIST}(\text{ABS}(z),0,1,1))\\

\downarrow\\

=2 \text{ * }(1 \text{ - } \text{NORM.DIST}(\text{ABS}(\text{-}2.34346),0,1,1))

\]

This gives:

\[p = 0.019\]

Since #\,p# #\lt# #\alpha#, #H_0: \pi_1 - \pi_2 = 0# should be rejected.

Compute the *sample proportions* #\hat{p}_1# and #\hat{p}_2#:

\[\hat{p}_1=\cfrac{X_1}{n_1}=\cfrac{97}{116}=0.83621\\

\hat{p}_2=\cfrac{X_2}{n_2}=\cfrac{112}{120}=0.93333\]

Compute the *pooled sample proportion *#\hat{p}#:

\[\hat{p}=\cfrac{X_1 + X_2 }{n_1 + n_2}=\cfrac{97 + 112}{116 + 120}=0.88559\]

Compute the #Z#-statistic:

\[z=\cfrac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p} \cdot (1-\hat{p}) \cdot \bigg(\cfrac{1}{n_1}+\cfrac{1}{n_2} \bigg)}}

=\cfrac{0.83621 - 0.93333}{\sqrt{0.88559 \cdot (1-0.88559) \cdot \bigg(\cfrac{1}{116}+\cfrac{1}{120} \bigg)}}=-2.3435\]

Since both #n_1# and #n_2# are considered *large *(#\gt 30#), the *Central Limit Theorem *applies and we know that the test statistic

\[Z=\cfrac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p} \cdot (1-\hat{p}) \cdot \bigg(\cfrac{1}{n_1}+\cfrac{1}{n_2} \bigg)}}\]

approximately has the *Standard Normal Distribution*, under the assumption that #H_0# is true.

For a *two-tailed *#Z#-test, the #p#-value is defined as #2\cdot \mathbb{P}(Z \geq |z|)#. To calculate this value in R, make use of the following function:

pnorm(q, mean, sd, lower.tail)

q: The value at which you wish to evaluate the distribution function.mean: The mean of the distribution.sd: The standard deviation of the distribution.lower.tail: If TRUE (default), probabilities are #\mathbb{P}(X \leq x)#, otherwise, #\mathbb{P}(X \gt x)#.

Thus, to calculate #p = 2\cdot \mathbb{P}(Z \geq |z|)#, run the following command:

\[

2 \text{ * } \text{pnorm}(q = \text{abs}(z), mean = 0, sd = 1,lower.tail = \text{FALSE})\\

\downarrow\\

2\text{ * } \text{pnorm}(q = \text{abs}(\text{-}2.34346), mean = 0, sd = 1,lower.tail = \text{FALSE})

\]

This gives:

\[p = 0.019\]

Since #\,p# #\lt# #\alpha#, #H_0: \pi_1 - \pi_2 = 0# should be rejected.

**Pass Your Math**independent of your university. See pricing and more.

Or visit omptest.org if jou are taking an OMPT exam.