### Chapter 11: Regression Analysis: Simple Linear Regression

### Finding the Regression Equation

A regression line is the *best-fitting* straight line through a set of data points. What makes a line best-fitting is that it minimizes the differences between the predicted and observed values of #Y#.

This section introduces a method by which the slope and the intercept of the regression line can directly be calculated.

#\phantom{0}#

Calculation Regression Coefficient and Intercept

Performing a simple linear regression analysis results in a *regression equation* of the form:

\[\hat{Y}=b_0 + b_1 \cdot X\]

To calculate the *slope* #b_1# of the regression line, use of the following formula:

\[b_1 =\cfrac{\sum\limits_{i=1}^n{(X_i-\bar{X})(Y_i-\bar{Y})}}{\sum\limits_{i=1}^n{(X_i-\bar{X})^2}}\]

Once the slope is known, it is possible to calculate the *intercept *#b_0# of the regression with the following formula:

\[b_0 = \bar{Y} - b_1 \cdot \bar{X}\]

Consider the following #5# pairs of data points:

#X_i# | #Y_i# |

#1# | #2# |

#2# | #7# |

#3# | #7# |

#4# | #9# |

#5# | #3# |

Find the regression line corresponding to these points.

The first step in determining the regression line is to calculate the mean values of #X# and #Y#.

\[\begin{array}{rcl}

\bar{X}&=&\displaystyle\cfrac{\sum\limits_{i=1}^n{X_i}}{n} = \dfrac{1+2+3+4+5}{5}=\dfrac{15}{5}=3\\\\

\bar{Y}&=&\displaystyle\cfrac{\sum\limits_{i=1}^n{Y_i}}{n} = \dfrac{2+7+7+9+3}{5}=\dfrac{28}{5}=5.6

\end{array}\]

Next, find the values of #(X_i-\bar{X}), (Y_i-\bar{Y}), (X_i-\bar{X})^2# and #(X_i-\bar{X})(Y_i-\bar{Y})# for each pair of data points:

#X# | #Y# | #X_i - \bar{X}# | #Y_i - \bar{Y}# | #(X_i - \bar{X})^2# | #(X_i - \bar{X})(Y_i - \bar{Y})# |

#1# | #2# | #-2# | #-3.6# | #4# | #7.2# |

#2# | #7# | #-1# | #1.4# | #1# | #-1.4# |

#3# | #7# | #0# | #1.4# | #0# | #0# |

#4# | #9# | #1# | #3.4# | #1# | #3.4# |

#5# | #3# | #2# | #-2.6# | #4# | #-5.2# |

With this information the slope #b_1# and intercept #b_0# can be calculated:

\[\begin{array}{rcl}

b_1 &=& \displaystyle\cfrac{\sum\limits_{i=1}^n{(X_i-\bar{X})(Y_i-\bar{Y})}}{\sum\limits_{i=1}^n{(X_i-\bar{X})^2}}\\\\

&=& \cfrac{7.2-1.4+0+3.4-5.2}{4+1+0+1+4}\\\\

&=& 0.4\\\\

b_0 &=& \bar{Y} - b_1 \cdot \bar{X}\\\\

&=& 5.6 - (0.4)\cdot3\\\\

&=& 4.4

\end{array}\]

So the regression equation is:

\[\begin{array}{rcl}

\hat{Y} &=& b_0 + b_1X\\\\

&=& 4.4+ 0.4X

\end{array}\]

Note that the regression line always passes through the mean point #(\bar{X},\bar{Y})#.

In this case, #(\bar{X},\bar{Y})= (3,5.6)# and entering #X=3# into the equation gives:

\[\begin{array}{rcl}

\hat{Y} &=& 4.4 + 0.4\cdot 3\\\\

&=& 5.6

\end{array}\]

**Pass Your Math**independent of your university. See pricing and more.

Or visit omptest.org if jou are taking an OMPT exam.