Chapter 11: Regression Analysis: Simple Linear Regression
Finding the Regression Equation
A regression line is the best-fitting straight line through a set of data points. What makes a line best-fitting is that it minimizes the differences between the predicted and observed values of #Y#.
This section introduces a method by which the slope and the intercept of the regression line can directly be calculated.
#\phantom{0}#
Calculation Regression Coefficient and Intercept
Performing a simple linear regression analysis results in a regression equation of the form:
\[\hat{Y}=b_0 + b_1 \cdot X\]
To calculate the slope #b_1# of the regression line, use of the following formula:
\[b_1 =\cfrac{\sum\limits_{i=1}^n{(X_i-\bar{X})(Y_i-\bar{Y})}}{\sum\limits_{i=1}^n{(X_i-\bar{X})^2}}\]
Once the slope is known, it is possible to calculate the intercept #b_0# of the regression with the following formula:
\[b_0 = \bar{Y} - b_1 \cdot \bar{X}\]
Consider the following #5# pairs of data points:
#X_i# | #Y_i# |
#1# | #4# |
#2# | #7# |
#3# | #3# |
#4# | #1# |
#5# | #8# |
Find the regression line corresponding to these points.
The first step in determining the regression line is to calculate the mean values of #X# and #Y#.
\[\begin{array}{rcl}
\bar{X}&=&\displaystyle\cfrac{\sum\limits_{i=1}^n{X_i}}{n} = \dfrac{1+2+3+4+5}{5}=\dfrac{15}{5}=3\\\\
\bar{Y}&=&\displaystyle\cfrac{\sum\limits_{i=1}^n{Y_i}}{n} = \dfrac{4+7+3+1+8}{5}=\dfrac{23}{5}=4.6
\end{array}\]
Next, find the values of #(X_i-\bar{X}), (Y_i-\bar{Y}), (X_i-\bar{X})^2# and #(X_i-\bar{X})(Y_i-\bar{Y})# for each pair of data points:
#X# | #Y# | #X_i - \bar{X}# | #Y_i - \bar{Y}# | #(X_i - \bar{X})^2# | #(X_i - \bar{X})(Y_i - \bar{Y})# |
#1# | #4# | #-2# | #-0.6# | #4# | #1.2# |
#2# | #7# | #-1# | #2.4# | #1# | #-2.4# |
#3# | #3# | #0# | #-1.6# | #0# | #0# |
#4# | #1# | #1# | #-3.6# | #1# | #-3.6# |
#5# | #8# | #2# | #3.4# | #4# | #6.8# |
With this information the slope #b_1# and intercept #b_0# can be calculated:
\[\begin{array}{rcl}
b_1 &=& \displaystyle\cfrac{\sum\limits_{i=1}^n{(X_i-\bar{X})(Y_i-\bar{Y})}}{\sum\limits_{i=1}^n{(X_i-\bar{X})^2}}\\\\
&=& \cfrac{1.2-2.4+0-3.6+6.8}{4+1+0+1+4}\\\\
&=& 0.2\\\\
b_0 &=& \bar{Y} - b_1 \cdot \bar{X}\\\\
&=& 4.6 - (0.2)\cdot3\\\\
&=& 4.0
\end{array}\]
So the regression equation is:
\[\begin{array}{rcl}
\hat{Y} &=& b_0 + b_1X\\\\
&=& 4.0+ 0.2X
\end{array}\]
Note that the regression line always passes through the mean point #(\bar{X},\bar{Y})#.
In this case, #(\bar{X},\bar{Y})= (3,4.6)# and entering #X=3# into the equation gives:
\[\begin{array}{rcl}
\hat{Y} &=& 4.0 + 0.2\cdot 3\\\\
&=& 4.6
\end{array}\]
Or visit omptest.org if jou are taking an OMPT exam.