Hypothesis Test for a Population Mean

[A, SfS] Chapter 6: Hypothesis Testing: 6.2: Test for Population Mean

Hypothesis Test for a Population Mean

In this section, we will look at how to test whether the mean of a quantitative variable measured on a population differs from some benchmark value.

Suppose we have a continuous variable #X# whose mean value when measured on a specified population is denoted #\mu#.

Research Question and Hypotheses

The research question of a hypothesis test for a population mean is whether or not #\mu# differs from some benchmark value #\mu_0#.

Depending on the direction of the test, a hypothesis test for a population mean has one of the following pairs of hypotheses:

Two-tailed	Left-tailed	Right-tailed
#H_0: \mu = \mu_0# #H_1: \mu \neq \mu_0#	#H_0: \mu \geq \mu_0# #H_1: \mu \lt\mu_0#	#H_0: \mu \leq \mu_0# #H_1: \mu \gt\mu_0#

Test Statistic and Null Distribution

Suppose a random sample of size #n# is selected from the population and #X# is measured on the sample, from which we obtain the sample mean #\bar{x}# and the sample standard deviation #s#.

As with confidence intervals, there are several conditions that guide the procedure we follow:

1) The distribution of #X# on the population is a normal distribution;

2) We know the value of the variance #\sigma^2# of #X# on the population;

3) The sample size #n# is large.

Test statistic	Null distribution	Use when
\[Z=\cfrac{\bar{X} - \mu_0}{\sigma/\sqrt{n}}\]	\[N(0,1)\]	Conditions 1 and 2 are #\green{\text{true}}#; or Conditions 2 and 3 are #\green{\text{true}}#; or All three conditions are #\green{\text{true}}#.
\[Z = \cfrac{\bar{X} - \mu_0}{s/\sqrt{n}}\]	\[N(0,1)\]	Conditions 1 and 3 are #\green{\text{true}}#; or Only condition 3 is #\green{\text{true}}#
\[T = \cfrac{\bar{X} - \mu_0}{s/\sqrt{n}}\]	\[t_{n-1}\]	Only condition 1 is #\green{\text{true}}#

Calculating the P-value

The calculation of the P-value of a hypothesis test for a population mean #\mu# depends on which form of #H_1# is being considered, and which test statistic is being used. We present in the below tables the P-value calculation, including the #\mathrm{R}# commands.

Test statistic: #Z \sim N(0,1)#; Computed value based on sample data: #z#.

#\begin{array}{lllll}
\phantom{0}\text{Direction}&\phantom{0000}H_0&\phantom{0000}H_1&\phantom{000}p\text{-value}&\phantom{0000000}\text{R Command}\\
\hline
\text{Two-tailed}&H_0:\mu = \mu_0&H_1:\mu \neq \mu_0&2\cdot \mathbb{P}(|Z|\geq |z|)&2 \text{ * }\text{pnorm}(\text{abs}(z),0,1, \text{low=FALSE})\\
\text{Left-tailed}&H_0:\mu \geq \mu_0&H_1:\mu \lt \mu_0&\mathbb{P}(Z\leq z)&\text{pnorm}(z,0,1, \text{low=TRUE})\\
\text{Right-tailed}&H_0:\mu \leq \mu_0&H_1:\mu \gt \mu_0&\mathbb{P}(Z\geq z)&\text{pnorm}(z,0,1, \text{low=FALSE})\\
\end{array}#

#H_1: \mu \neq \mu_0#

#H_1: \mu \lt \mu_0#

#H_1: \mu \gt \mu_0#

Test statistic: #T \sim t_{n-1}#; Computed value based on sample data: #t# with sample size #n#.

#\begin{array}{lllll}
\phantom{0}\text{Direction}&\phantom{0000}H_0&\phantom{0000}H_1&\phantom{000}p\text{-value}&\phantom{0000000}\text{R Command}\\
\hline
\text{Two-tailed}&H_0:\mu = \mu_0&H_1:\mu \neq \mu_0&2\cdot \mathbb{P}(|T|\geq |T|)&2 \text{ * }\text{pt}(\text{abs}(t),n\text{ - }1,\text{low=FALSE})\\
\text{Left-tailed}&H_0:\mu \geq \mu_0&H_1:\mu \lt \mu_0&\mathbb{P}(T\leq T)&\text{pt}(t,n\text{ - }1, \text{low=TRUE})\\
\text{Right-tailed}&H_0:\mu \leq \mu_0&H_1:\mu \gt \mu_0&\mathbb{P}(T\geq t)&\text{pt}(t,n\text{ - }1, \text{low=FALSE})\\
\end{array}#

#H_1: \mu \neq \mu_0#

#H_1: \mu \lt \mu_0#

#H_1: \mu \gt \mu_0#

If the P-value is larger than the significance level #\alpha#, then the evidence against #H_0# is not convincing. Otherwise, we would reject #H_0# and conclude that there is sufficient evidence to support #H_1#.

Note that in #\mathrm{R}#, #\mathtt{abs(z)}# or #\mathtt{abs(t)}# compute the absolute value of the test statistic. You don’t need to use the #\mathtt{abs()}# command in practice, as long as you enter the absolute value of #z# or #t# yourself.

Suppose it is hypothesized that the mean pregnancy length for expectant mothers in a certain region is less than #270# days. Thus, we have the following hypotheses:
\[H_0: \mu \geq 270 \\
H_1: \mu < 270\]

Assume pregnancy lengths for expectant mothers in the region are normally distributed, and set #\alpha = 0.05#.

From a random sample of #n = 25# expectant mothers from the region, the sample mean pregnancy length is #\bar{x} = 264# and the sample variance is #s^2 = 150.80#.

The test statistic is: \[t = \cfrac{264 - 270}{\sqrt{150.80/25}} \approx -2.443\]

The sample size is small, so the corresponding P-value is found using the #t_{24}# distribution: \[P(T \leq -2.443) \approx 0.011\]

using the #\mathrm{R}# command:

pt(-2.443, 24, low = TRUE)

Since this P-value is less than #\alpha = 0.05#, we conclude in favor of #H_1#, i.e., that the mean pregnancy length for expectant mothers in that region is less than #270# days.

If instead, we had #n = 50# expectant mothers in the sample, then whether or not pregnancy lengths are normally-distributed in the region, we can use the normal distribution to compute the P-value, on the basis of the Central Limit Theorem.

Suppose the sample mean and sample variance are the same as above.

Then in this case: \[z = \cfrac{264 - 270}{\sqrt{150.80/50}} \approx -3.455\]

and the approximate P-value is: \[P(Z \leq -3.455) \approx 0.000275\]

using the #\mathrm{R}# command:

pnorm(-3.455, 0, 1, low = TRUE)

Since the P-value is less than #\alpha = 0.05#, we conclude in favor of #H_1#, i.e., that the mean pregnancy length for expectant mothers in that region is less than #270# days.