[A, SfS] Chapter 5: Confidence Intervals: 5.6: CI for difference in proportions
Confidence Interval for the Difference Between Two Population Proportions
Confidence Interval for the Difference Between Two Population Proportions
In this lesson, you will learn how to calculate the confidence interval for the difference between two population proportions.
#\text{}#
Confidence Interval for the Difference in Proportions
Suppose #X# is a binary variable, but measured on two distinct, independent populations in which unknown proportions #p_1# and #p_2# of the respective populations meet some condition of interest, with #X = 1# if the subject meets that condition and #X = 0# otherwise.
Then #X \sim B(1,p_1)# on population #1#, and #X \sim B(1,p_2)# on population #2#. We are not interested in the values of #p_1# or #p_2#, but in their difference #p_1 - p_2#.
Let #S_1# denote the number of subjects for which #X = 1# in a random sample of size #n_1# from population #1#, and #S_2# denote the number of subjects for which #X = 1# in a random sample of size #n_2# from population #2#.
Furthermore, let #\tilde{p}_1 = \cfrac{S_1 + 1}{n_1 + 2}\,\,\,\,\,# and #\,\,\,\,\,\tilde{p}_2 = \cfrac{S_2 + 1}{n_2 + 2}#.
When #n_1# and #n_2# are both large, an approximate #(1 - \alpha)100\%# confidence interval for the difference in population proportions #p_1 - p_2# is:
\[\Bigg(\tilde{p}_1 - \tilde{p}_2 - z_{\alpha /2}\sqrt{\cfrac{\tilde{p}_1(1 - \tilde{p}_1)}{n_1 + 2} + \cfrac{\tilde{p}_2(1 - \tilde{p}_2)}{n_2 + 2}},\,\,\,\,\,\tilde{p}_1 - \tilde{p}_2 + z_{\alpha /2}\sqrt{\cfrac{\tilde{p}_1(1 - \tilde{p}_1)}{n_1 + 2} + \cfrac{\tilde{p}_2(1 - \tilde{p}_2)}{n_2 + 2}}\Bigg)\]
If the lower limit is less than #-1#, replace it with #-1#, and if the upper limit is larger than #1#, replace it with #1#.
A study in 2002 describes measurements of ammonium concentration (in #\mathrm{mg/L}#) at a large number of wells in the state of Iowa in the USA. These included #349# alluvial wells and #143# quaternary wells. Of the alluvial wells, #182# had concentrations above #0.1#, while #112# of the quaternary wells had concentrations above #0.1#.
Assume the wells represent random samples from the populations of all alluvial and quaternary wells in the region. Find a #95\%# confidence interval for the difference between the population proportions of the two types of wells with concentrations above #0.1#.
\[\tilde{p}_1 = \cfrac{182 + 1}{349 + 2} = \cfrac{183}{351} \approx 0.521\]
\[\tilde{p}_2 = \cfrac{112 + 1}{143 + 2} = \cfrac{113}{145} \approx 0.779\]
The margin of error is:
\[z_{\alpha /2}\sqrt{\cfrac{\tilde{p}_1(1 - \tilde{p}_1)}{n_1 + 2} + \cfrac{\tilde{p}_2(1 - \tilde{p}_2)}{n_2 + 2}} = 1.96\sqrt{\cfrac{0.521(1 - 0.521)}{351} + \cfrac{0.779(1 - 0.779)}{145}} \approx 0.085\]
In order to have positive confidence bounds instead of negative confidence bounds, we will compute a #95\%# CI for #p_2 - p_1#, which is then:
\[(l,u) = (0.779 - 0.521 - 0.085,\,\,\,\,\,0.779 - 0.521 + 0.085) = (0.173,\,\,\,\,\,0.343)\]
Or visit omptest.org if jou are taking an OMPT exam.