[A, SfS] Chapter 6: Hypothesis Testing: 6.6: Test for Difference Between Proportions
Hypothesis Test for a Difference Between Two Population Proportions
Hypothesis test for a difference between two population proportions
In this section, we will look at how to test whether the proportion of one population having some characteristic is different from the proportion of another population having the same characteristic.
Suppose #X# is a binary variable, measured on two distinct, independent populations in which unknown proportions #p_1# and #p_2# of the respective populations meet some condition of interest, with #X = 1# if a selected member of the population meets that condition, and #X = 0# otherwise.
Let #S_1# denote the number of subjects for which #X = 1# in a random sample of size #n_1# from population 1, and #S_2# denote the number of subjects for which #X=1# in a random sample of size #n_2# from population 2.
Let #\hat{p}_1 = \cfrac{S_1}{n_1}# and #\hat{p}_2 = \cfrac{S_2}{n_2}#.
Research Question and Hypotheses
The research question of a hypothesis test for a difference between two population proportions is whether or not the difference between #p_1# and #p_2# differs from some value #\Delta_0#. Usually #\Delta_0 = 0#, so we will use #0# in this course.
Depending on the expected direction of the difference, a hypothesis test for a difference between two population proportions has one of the following pairs of hypotheses:
Twotailed  Lefttailed  Righttailed 



Test Statistic and Null Distribution
Under the initial assumption that #H_0# is true, the proportions #p_1# and #p_2# equal some common proportion #p#, which can be estimated by the pooled sample proportion: \[\hat{p} = \cfrac{S_1 + S_2}{n_1 + n_2}\]
Then, when #n_1# and #n_2# are both large, we know that \[\hat{p}_1  \hat{p}_2 \sim N\bigg(0,p(1p)\Big(\cfrac{1}{n_1} + \cfrac{1}{n_2}\Big)\bigg)\]
based on the Central Limit Theorem.
Since #p# is unknown, we estimate #p# with #\hat{p}#. Then the test statistic is \[Z = \cfrac{\hat{p}_1  \hat{p}_2}{\sqrt{\hat{p}(1\hat{p})\Big(\cfrac{1}{n_1} + \cfrac{1}{n_2}\Big)}}\]
which has an approximate #N(0,1)# distribution when #H_0# is assumed true and the sample sizes are both large.
Values of #Z# are extreme if they are far from #0#, in either tail of the #N(0,1)# density curve.
Calculating the Pvalue
Depending on the form of #H_1#, we compute the Pvalue based on the observed value #z# of the random variable #Z#, just as we have shown for previous settings:
\[\begin{array}{llll}
\phantom{0}\text{Direction}&\phantom{000000}H_0&\phantom{000000}H_1&\phantom{0000000000}\text{Excel Command}\\
\hline
\text{Twotailed}&H_0:p_1  p_2 = 0&H_1:p_1  p_2 \neq 0&2 \text{ * }\text{pnorm}(\text{abs}(z),0,1, \text{low=FALSE})\\
\text{Lefttailed}&H_0:p_1  p_2 \geq 0&H_1:p_1  p_2 \lt 0&\text{pnorm}(z,0,1, \text{low=TRUE})\\
\text{Righttailed}&H_0:p_1  p_2 \leq 0&H_1:p_1  p_2 \gt 0&\text{pnorm}(z,0,1, \text{low=FALSE})\\
\end{array}\]
If a significance threshold #\alpha# has been chosen, then #H_0# is rejected in favor of #H_1# if the computed Pvalue is #\leq \alpha#, and #H_0# is otherwise not rejected.
Example:
A researcher wants to investigate whether the proportion of adult males in some country who agree with a specified political opinion about refugees is different from the proportion of adult females in that country who agree with that opinion.
In a random sample from each population, #75# of #120# adult males and #85# of #132# adult females agree with the opinion.
Can she conclude that men and women differ in their opinion about refugees?
Solution:
The researcher wants to test #H_0: p_m  p_f = 0# against #H_1 : p_m  p_f \neq 0#.
The proportion of males who agree with the opinion is: \[\hat{p}_m = \cfrac{75}{120} = 0.625\] And the proportion of females who agree with the opinion is: \[\hat{p}_f = \cfrac{85}{132} \approx 0.644\] The pooled sample proportion is: \[\hat{p} = \cfrac{75 + 85}{120 + 132} \approx 0.635\] And the test statistic is: \[z = \cfrac{0.625  0.644}{\sqrt{0.635(10.635) \Big(\cfrac{1}{120} + \cfrac{1}{132}\Big)}} \approx 0.313\]
Both sample sizes are large. The Pvalue is computed in #\mathrm{R}# using
> 2*pnorm(0.313,low=F)
to be #0.754#.
A Pvalue of this size would lead her to conclude that there is no difference between the proportions of adult men and women in that country who agree with the political opinion about refugees. Although we did not specify a significance level, this Pvalue is much larger than any reasonable significance level that might be selected.
Or visit omptest.org if jou are taking an OMPT exam.