Confidence Interval for a Mean Difference

Chapter 8. Testing for Differences in Mean and Proportion: Paired Samples t-test

Confidence Interval for a Mean Difference

Confidence Interval for a Population Mean Difference

Assuming the sampling distribution of the sample mean difference is (approximately) normal, the general formula for computing a #C\%\,CI# for a population mean difference #\mu_D#, based on a random sample of #n# difference scores, is:
\[CI_{\mu_D}=\bigg(\bar{D} - t^*\cdot \cfrac{s_D}{\sqrt{n}},\,\,\,\, \bar{D} + t^*\cdot \cfrac{s_D}{\sqrt{n}} \bigg)\]

Where #t^*# is the critical value of the #t_{n-1}# distribution such that #\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}#.

Calculating t* with Statistical Software

Let #C# be the confidence level in #\%#.

To calculate the critical value #t^*# in Excel, make use of the function T.INV():
\[=\text{T.INV}((100+C)/200, n \text{ - } 1)\]

To calculate the critical value #t^*# in R, make use of the function qt():
\[\text{qt}(p=(100+C)/200, df=n \text{ - } 1,lower.tail = \text{TRUE})\]

A researcher conducts an experiment in which #8# randomly selected students are invited to eat dinner at a restaurant on two different evenings. On one evening each student receives a regular-size plate and on the other, they receive a large-size plate.

On each occasion, the students are allowed to choose as much food as they want from a buffet. Once the students have made their selection, their plates are weighed.

The table below shows how much food (in grams) each student chose when they were given a regular-size plate #(X)# and when they were given a large-size plate #(Y)#:
\[\begin{array}{|l|c|c|c|c|c|c|c|c|}\hline \text{Student} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline X \text{: Regular} &334&357&344&395&318&359&422&392\\ \hline Y \text{: Large} &371&455&361&377&340&399&513&374\\ \hline \end{array}\]
You may assume that the amount of food eaten for either plate size is normally distributed.

Define #D=Y-X# and construct a #97\%# confidence interval for the population mean difference #\mu_D#. Round your answers to #3# decimal places.

#CI_{\mu,\,97\%}=(-8.099,\,\,\, 75.349)#

There are a number of different ways we can compute the confidence interval. Click on one of the panels to toggle a specific solution.

Excel Calculation

Compute the difference scores:
\[\begin{array}{|l|c|c|c|c|c|c|c|c|}\hline \text{Student} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \hline X \text{: Regular} &334&357&344&395&318&359&422&392\\ \hline Y \text{: Large} &371&455&361&377&340&399&513&374\\ \hline D \text{: Difference} &37&98&17&-18&22&40&91&-18\\ \hline \end{array}\]
Assuming the amount of food eaten for either plate size is normally distributed, we know that the sampling distribution of the sample mean difference is normally distributed as well.

If the sampling distribution of the sample mean difference is (approximately) normal, the general formula for computing a #C\%\,CI# for a population mean difference #\mu_D#, based on a random sample of size #n#, is:
\[CI_{\mu_D}=\bigg(\bar{D} - t^*\cdot \cfrac{s_D}{\sqrt{n}},\,\,\,\, \bar{D} + t^*\cdot \cfrac{s_D}{\sqrt{n}} \bigg)\]
Compute the mean of the difference scores #\bar{D}#:
\[\bar{D}=\cfrac{\sum{D}}{n} = \cfrac{37+98+17-18+22+40+91-18}{8}=33.625\]
Compute the standard deviation of the difference scores #s_{D}#:
\[\sum{D}=37+98+17-18+22+40+91-18=269\]
\[\begin{array}{rcl}\sum{D^2}&=&37^2+98^2+17^2+(-18)^2+22^2+40^2+91^2+(-18)^2\\&=&22275\end{array}\]
\[s_{D}=\sqrt{\cfrac{\sum{D^2} - \cfrac{(\sum{D})^2}{n} }{n-1}}=\sqrt{\cfrac{22275 - \cfrac{269^2}{8} }{8-1}}=43.4739\]

For a given confidence level #C# (in #\%#), the critical value #t^*# of the #t_{n-1}# distribution is the value such that #\mathbb{P}(-t^* \leq t \leq t^*)=\cfrac{C}{100}#.

To calculate this critical value #t^*# in Excel, make use of the following function:

T.INV(probability, deg_freedom)

probability: A probability corresponding to the normal distribution.

deg_freedom: The mean of the distribution.

Here, we have #C=97#. Thus, to calculate #t^*# such that #\mathbb{P}(-t^* \leq t \leq t^*)=0.97#, run the following command:
\[\begin{array}{c}
=\text{T.INV}((100+C)/200, n - 1)\\
\downarrow\\
=\text{T.INV}(197/200, 8 \text{ - } 1)
\end{array}\]
This gives:
\[t^* = 2.71457\]
Calculate the lower bound #L# of the confidence interval:
\[L = \bar{D} - t^* \cdot \cfrac{s_D}{\sqrt{n}} = 33.6250 - 2.71457 \cdot \cfrac{43.4739}{\sqrt{8}}=-8.099\]
Calculate the upper bound #U# of the confidence interval:
\[U = \bar{D} + t^* \cdot \cfrac{s_D}{\sqrt{n}} = 33.6250 + 2.71457 \cdot \cfrac{43.4739}{\sqrt{8}}=75.349\]
Thus, the #97\%# confidence interval for the population mean difference #\mu_D# is:
\[CI_{\mu_D,\,97\%}=(-8.099,\,\,\, 75.349)\]

R Calculation

qt(p, df, lower.tail)

p: A probability corresponding to the normal distribution.

df: An integer indicating the number of degrees of freedom.

lower.tail: If TRUE (default), probabilities are #\mathbb{P}(X \leq x)#, otherwise, #\mathbb{P}(X \gt x)#.

Here, we have #C=97#. Thus, to calculate #t^*# such that #\mathbb{P}(-t^* \leq t \leq t^*)=0.97#, run the following command:

\[\begin{array}{c}
\text{qt}(p = (100+C)/200, df = n \text{ - } 1, lower.tail = \text{TRUE})\\
\downarrow\\
\text{qt}(p =197/200, df = 8 \text { - } 1, lower.tail = \text{TRUE})
\end{array}\]
This gives:
\[t^* = 2.71457\]
Calculate the lower bound #L# of the confidence interval:
\[L = \bar{D} - t^* \cdot \cfrac{s_D}{\sqrt{n}} = 33.6250 - 2.71457 \cdot \cfrac{43.4739}{\sqrt{8}}=-8.099\]
Calculate the upper bound #U# of the confidence interval:
\[U = \bar{D} + t^* \cdot \cfrac{s_D}{\sqrt{n}} = 33.6250 + 2.71457 \cdot \cfrac{43.4739}{\sqrt{8}}=75.349\]
Thus, the #97\%# confidence interval for the population mean difference #\mu_D# is:
\[CI_{\mu_D,\,97\%}=(-8.099,\,\,\, 75.349)\]

New example

#\phantom{0}#

Connection to Hypothesis Testing

There exists a direct connection between a two-sided paired samples #t#-test for #\mu_D# and a #(1-\alpha)\cdot 100\%# confidence interval for #\mu_D#:

If #0# falls inside the #(1 - \alpha)\cdot 100\%\,CI#, then #H_0: \mu_D=0# should not be rejected at the #\alpha# level of significance.
If #0# falls outside of the #(1 - \alpha)\cdot 100\%\,CI#, then #H_0: \mu_D=0# should be rejected at the #\alpha# level of significance.

A #96\%# confidence interval for a population mean difference #\mu_D# is #(-0.615,\,\, 1.467)#.

Suppose you use the same sample of difference scores to test #H_0: \mu_D = 0# against #H_a: \mu_D \neq 0# at the #\alpha = 0.04# level of significance.

What would be the conclusion?

Do not reject #H_0#.

Since the #96\%# confidence interval #(-0.615,\,\,1.467)# contains the value #0#, we would not reject #H_0: \mu_D = 0# at the #\alpha = 0.04# level of significance.

New example