### Chapter 8. Testing for Differences in Mean and Proportion: Independent Samples t-test

### Confidence Interval for the Difference Between Two Independent Means

Confidence Interval for the Difference Between Two Population Means

Assuming the *sampling distribution of the difference between two sample means* is (approximately) normal, the general formula for computing a #C\%\,CI# for the difference between the two population means #\mu_1 - \mu_2# is:

\[CI_{(\mu_1 - \mu_2)}=\bigg((\bar{X_1} - \bar{X_2}) - t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}},\,\,\,\, (\bar{X_1} - \bar{X_2}) + t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} \bigg)\]

Where #t^*# is the *critical value *of the #t_{df}# distribution such that #\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}#.

Calculating t* with Statistical Software

Let #C# be the *confidence level *in #\%#.

To calculate the *critical value* #t^*# in Excel, make use of the function **T.INV()**:

\[=\text{T.INV}((100+C)/200, \text{MIN}(n_1 \text{ - } 1, n_2 \text{ - } 1))\]

To calculate the *critical value* #t^*# in R, make use of the function **qt()**:

\[\text{qt}(p=(100+C)/200, df=\text{min}(n_1 \text{ - } 1, n_2 \text{ - } 1),lower.tail = \text{TRUE})\]

Do boys and girls perform differently on driving tests? To investigate this matter, a researcher selects a simple random sample of #20# boys #(X_1)# and girls #(X_2)# and gives each of them a driving test.

Each student gets a score from #0# to #100#. These are their test results:

Boys #(X_1)# | Girls #(X_2)# |

\[\begin{array}{rcl} |
\[\begin{array}{rcl} |

You may assume that the test scores are approximately normally distributed.

Construct a #90\%# confidence interval for the difference between the two population means #\mu_1 - \mu_2#. Round your answers to #3# decimal places.

#CI_{(\mu_1 - \mu_2),\,90\%}=(1.213,\,\,\, 5.987)#

There are a number of different ways we can compute the *confidence interval*. Click on one of the panels to toggle a specific solution.

Assuming the test scores are approximately normally distributed, we know that *sampling distribution of the difference between two sample means* is (approximately) normal as well.

If the *sampling distribution of the difference between two sample means* is (approximately) normal, the general formula for computing a #C\%\,CI# for the difference between the two population means #\mu_1 - \mu_2# is:

\[CI_{(\mu_1 - \mu_2)}=\bigg((\bar{X_1} - \bar{X_2}) - t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}},\,\,\,\, (\bar{X_1} - \bar{X_2}) + t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} \bigg)\]

Determine the degrees of freedom:

\[df = min(n_1-1, n_2-1) = min(10, 8)=8\]

For a given *confidence level *#C# (in #\%#), the *critical value* #t^*# of the #t_{df}# is the value such that #\mathbb{P}(-t^* \leq t \leq t^*)=\cfrac{C}{100}#.

To calculate this critical value #t^*# in Excel, make use of the following function:

T.INV(probability, deg_freedom)

probability: A probability corresponding to the normal distribution.deg_freedom: The mean of the distribution.

Here, we have #C=90#. Thus, to calculate #t^*# such that #\mathbb{P}(-t^* \leq t \leq t^*)=0.90#, run the following command:

\[\begin{array}{c}

=\text{T.INV}((100+C)/200, df)\\

\downarrow\\

=\text{T.INV}(190/200, 8)

\end{array}\]

This gives:

\[t^* = 1.85955\]

Calculate the lower bound #L# of the confidence interval:

\[L = (\bar{X_1} - \bar{X_2}) - t^* \cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} = (74.5 -70.9) - 1.85955 \cdot \sqrt{\cfrac{2.0^2}{11}+\cfrac{3.4^2}{9}}=1.213\]

Calculate the lower bound #U# of the confidence interval:

\[U = (\bar{X_1} - \bar{X_2}) + t^* \cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} = (74.5 -70.9) + 1.85955 \cdot \sqrt{\cfrac{2.0^2}{11}+\cfrac{3.4^2}{9}}=5.987\]

Thus, the #90\%# confidence interval for the difference between the two population means #\mu_1 - \mu_2# is:

\[CI_{(\mu_1 - \mu_2),\,90\%}=(1.213,\,\,\, 5.987)\]

Assuming the test scores are approximately normally distributed, we know that *sampling distribution of the difference between two sample means* is (approximately) normal as well.

If the *sampling distribution of the difference between two sample means* is (approximately) normal, the general formula for computing a #C\%\,CI# for the difference between the two population means #\mu_1 - \mu_2# is:

\[CI_{(\mu_1 - \mu_2)}=\bigg((\bar{X_1} - \bar{X_2}) - t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}},\,\,\,\, (\bar{X_1} - \bar{X_2}) + t^*\cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} \bigg)\]

Determine the degrees of freedom:

\[df = min(n_1-1, n_2-1) = min(10, 8)=8\]

For a given *confidence level *#C# (in #\%#), the *critical value* #t^*# of the #t_{df}# is the value such that #\mathbb{P}(-t^* \leq t \leq t^*)=\cfrac{C}{100}#.

To calculate this critical value #t^*# in R, make use of the following function:

qt(p, df, lower.tail)

p: A probability corresponding to the normal distribution.df: An integer indicating the number of degrees of freedom.lower.tail: If TRUE (default), probabilities are #\mathbb{P}(X \leq x)#, otherwise, #\mathbb{P}(X \gt x)#.

Here, we have #C=90#. Thus, to calculate #t^*# such that #\mathbb{P}(-t^* \leq t \leq t^*)=0.90#, run the following command:

\[\begin{array}{c}

\text{qt}(p=(100+C)/200, df=\text{min}(n_1 \text{ - } 1, n_2 \text{ - } 1),lower.tail = \text{TRUE})\\

\downarrow\\

\text{qt}(p =190/200, df = 8, lower.tail = \text{TRUE})

\end{array}\]

This gives:

\[t^* = 1.85955\]

Calculate the lower bound #L# of the confidence interval:

\[L = (\bar{X_1} - \bar{X_2}) - t^* \cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} = (74.5 -70.9) - 1.85955 \cdot \sqrt{\cfrac{2.0^2}{11}+\cfrac{3.4^2}{9}}=1.213\]

Calculate the lower bound #U# of the confidence interval:

\[U = (\bar{X_1} - \bar{X_2}) + t^* \cdot \sqrt{\cfrac{s^2_1}{n_1}+\cfrac{s^2_2}{n_2}} = (74.5 -70.9) + 1.85955 \cdot \sqrt{\cfrac{2.0^2}{11}+\cfrac{3.4^2}{9}}=5.987\]

Thus, the #90\%# confidence interval for the difference between the two population means #\mu_1 - \mu_2# is:

\[CI_{(\mu_1 - \mu_2),\,90\%}=(1.213,\,\,\, 5.987)\]

#\phantom{0}#

Connection to Hypothesis Testing

There exists a direct connection between a *two-sided independent samples* #t#*-test* for #\mu_1 - \mu_2# and a #(1-\alpha)\cdot 100\%# confidence interval for #\mu_1 - \mu_2#:

- If #0# falls
*inside*the #(1 - \alpha)\cdot 100\%\,CI#, then #H_0: \mu_1 - \mu_2=0# should not be rejected at the #\alpha# level of significance. - If #0# falls
*outside*of the #(1 - \alpha)\cdot 100\%\,CI#, then #H_0: \mu_1 - \mu_2=0# should be rejected at the #\alpha# level of significance.

*#97\%#*confidence interval for the difference between two population means #\mu_1 - \mu_2# is #(-2.010,\,\, 0.331)#.

Suppose you use the same samples to test #H_0: \mu_1 - \mu_2 = 0# against #H_a: \mu_1 - \mu_2 \neq 0# at the #\alpha = 0.03# level of significance.

What would be the conclusion?

Since the #97\%# confidence interval #(-2.010,\,\,0.331)# contains the value #0#, we would not reject #H_0: \mu_1 - \mu_2 = 0# at the #\alpha = 0.03# level of significance.

**Pass Your Math**independent of your university. See pricing and more.

Or visit omptest.org if jou are taking an OMPT exam.