Hypothesis Test for a Population Proportion

[A, SfS] Chapter 6: Hypothesis Testing: 6.5: Test for Population Proportion

Hypothesis Test for a Population Proportion

Hypothesis test for a population proportion

In this section, we will look at how to test whether the proportion of the population having some characteristic is different from some benchmark value.

Suppose #X# is a binary variable measured on a population in which an unknown proportion #p# of the population meets some condition of interest, with #X = 1# if the subject meets that condition and #X = 0# otherwise. We can also think of #p# as the probability that #X = 1# for a randomly-selected element from the population.

Let #X_1,\ldots,X_n# denote measurements of #X# on a random sample from the population. We noted previously that \[S = X_1 + \cdots + X_n \sim B(n,p).\]

Let #\hat{p} = \cfrac{S}{n}#.

We saw previously that \[E(\hat{p}) = p \\\phantom{0}\\ \text{and}\\\phantom{0}\\ V(\hat{p}) = \cfrac{p(1 - p)}{n}.\]

Moreover, we learned that when #n# is large, the Central Limit Theorem implies that #\hat{p}# has an approximate \[N\bigg(p,\cfrac{p(1 - p)}{n}\bigg)\] distribution.

We might hypothesize that #p# is different from some specified benchmark value #p_0# (or, similarly, that the probability #p# that #X=1# is different from #p_0#).

Research Question and Hypotheses

The research question of a hypothesis test for a population proportion is whether or not #p# differs from some benchmark value #p_0#.

Depending on the direction of the test, a hypothesis test for a population proportion has one of the following pairs of hypotheses:

Two-tailed	Left-tailed	Right-tailed
#H_0: p = p_0# #H_1: p \neq p_0#	#H_0: p \geq p_0# #H_1: p \lt p_0#	#H_0: p \leq p_0# #H_1: p \gt p_0#

Test Statistic and Null Distribution

If the sample size #n# is large, then the test statistic \[Z = \cfrac{\hat{p} - p_0}{\sqrt{\cfrac{p_0(1 - p_0)}{n}}}\] has an approximate #N(0,1)# distribution when #H_0# is true.

Thus values of #Z# are extreme if they are far from #0#, in either tail of the #N(0,1)# density curve.

Calculating and Evaluating the P-value

Depending on the form of #H_1# we compute the P-value based on the observed value #z# of the test statistic #Z#, just as we did previously in the tests for a population mean.

As a reminder, if we are testing for #H_1 : p \neq p_0# then the P-value is computed in #\mathrm{R}# using in the absolute value of the test statistic, with command

> 2*pnorm(abs(z),low=F)

If we are testing for #H_1: p < p_0# then the P-value is computed in #\mathrm{R}# with the command

> pnorm(z)

If we are testing for #H_1: p > p_0# the p-value is computed in #\mathrm{R}# with the command

> pnorm(z,low=F)

Given a specific significance level #\alpha# we would reject #H_0# and conclude #H_1# if the P-value is #\leq \alpha#. Otherwise, we would not reject #H_0#, meaning that the evidence in the data is not inconsistent with the null hypothesis.

#\text{}#

We mentioned earlier that there is a correspondence between a hypothesis test about a population parameter #\theta# at significance level #\alpha# and a #(1 - \alpha)100\%# confidence interval for #\theta#.

This is not quite true for this setting, because when forming the CI we use #\tilde{p}# to estimate the value of #p# and we use \[\sqrt{\cfrac{\tilde{p}(1 - \tilde{p})}{n + 4}}\] to estimate the standard error of #\tilde{p}#.

But in the hypothesis test we use #\hat{p}# to estimate the value of #p# and we use \[\sqrt{\cfrac{p_0(1 - p_0)}{n}}\] to estimate the standard error of #\hat{p}#.

So this inconsistency could mean that the conclusion of the hypothesis test might not align with the corresponding CI. To prevent this inconsistency from occurring, calculate the confidence interval using #p_0# instead.

Re-establishing the Connection Between Hypothesis Testing and Confidence Intervals

If the #(1 - \alpha)100\%# confidence interval for #p# is formed by using \[\bigg(\hat{p} - z_{\alpha /2}\sqrt{\cfrac{p_0(1 - p_0)}{n}},\,\,\,\,\,\hat{p} + z_{\alpha /2}\sqrt{\cfrac{p_0(1 - p_0)}{n}}\bigg)\] then the correspondence between hypothesis testing and confidence intervals holds.

That is, if #p_0# falls inside this CI then we would not reject #H_0: p = p_0# at significance level #\alpha#. If #p_0# falls outside this CI then we would reject #H_0: p = p_0# at significance level #\alpha#.

Similar adjustments would be required for correspondence between one-sided tests and one-sided CIs.

#\text{}#
Even when #n# is not large, we can still proceed with the same hypothesis test.

Test Statistic and Null Distribution

If the sample size #n# is small, then the total number of successes #S# has a #B(n,p_0)# distribution when #H_0# is true.

Calculating and Evaluating the P-value

Depending on the form of #H_1#, we can compute the P-value based on the observed value #s# of the test statistic #S# in #\mathrm{R}# as follows:

Given #H_1: p < p_0#, use the command:

> pbinom(s,n,p_0)

Given #H_1: p > p_0#, use the command:

> pbinom(s-1,n,p_0,low=F)

Example:

Is the proportion #p# of voters in Amsterdam who support the complete legalization of marijuana larger than #0.5#?

A researcher selects a random sample of #82# Amsterdam voters and asked them for their opinion about this issue. Of these, #39# said they were in favor of the complete legalization of marijuana, #36# said they were against it, and #7# did not have an opinion.

Conduct a hypothesis test at significance level #\alpha = 0.05# to investigate this research question.

Solution:

We test #H_0 : p \leq 0.5# against #H_1: p > 0.5# at significance level #\alpha = 0.05#. We will omit the participants who had no opinion.

Hence, the sample size is #n = 75#, of which #X = 39# are in favor of the complete legalization of marijuana, so that \[\hat{p} = \cfrac{39}{75} \approx 0.52\]

The test statistic is \[z = \cfrac{0.52 - 0.5}{\sqrt{\cfrac{0.5(1 - 0.5)}{75}}} \approx 0.3464\] and the P-value is computed in #\mathrm{R}# using

> pnorm(0.3464,low=F)

to be #0.365#. This is larger than #0.05#, so we do not reject #H_0#.

The evidence does not support the hypothesis that the proportion of the voters in Amsterdam and support the complete legalization of marijuana is larger than #0.5#.

Example:

It is alleged that a certain coin has a greater chance of coming up Heads than Tails, i.e., #H_1: p > 0.5#, where #p# denotes the probability of Heads. The coin is flipped #n = 20# times and #S = 14# Heads are observed.

Conduct a hypothesis test to determine whether the coin is fair or not at a significance level #\alpha = 0.05#.

Solution

The null hypothesis is #H_0:p \leq 0.5#.

Since #n# is small, the P-value must be computed using the #B(20,0.5)# distribution:\[P(S \geq 14 \ | \ p = 0.5) = 1 - P(S \leq 13 \ | \ p = 0.5)\]

> 1-pbinom(13,20,0.5)

which equals #0.0577#.

Since the P-value is larger than #0.05# we would not conclude in favor of #H_1#, i.e., we do not conclude that the coin is biased towards Heads. Fourteen or more Heads out of #20# is not too unusual for a fair coin.

Note: We did not discuss the procedure to compute a CI for this situation. Because the distribution is discrete, the procedure is somewhat complex, so we will avoid it in this course.