### Chapter 11: Regression Analysis: Simple Linear Regression

### Introduction to Regression Analysis

Regression Analysis

**Regression analysis** is a statistical procedure for estimating the relationship between variables. Specifically, regression is used to predict the value of a continuous *outcome (dependent)** variable *on the basis of one or more *predictor (independent) variables*.

Regression analysis evaluates the relationship between variables by finding the *best-fitting straight line *through a set of data points, and the resulting line is called the **regression line**.

#\phantom{0}#

The most straightforward type of regression is *Simple Linear Regression*.

#\phantom{0}#

Simple Linear Regression

In **Simple Linear Regression**, the value of the outcome variable is predicted using a *single *predictor variable.

The regression line of a *Simple Linear Regression *is mathematically described by the following **regression equation**:

\[\hat{Y}=aX+b\]

Where:

- #\hat{Y}# is the
*predicted*value of the outcome variable #Y#. - #X# is the predictor variable.
- #a# is the slope of the regression line and is called the
**regression coefficient**. - #b# is the value of #\hat{Y}# when #X=0# and is called the
**intercept**.

#\phantom{0}#

Describing the relationship between two variables as a straight line provides an easy way to predict values of the outcome variable #Y# for certain values of the predictor variable #X#. Simply enter a value for #X# into the regression equation to get the predicted value of #Y#.

#\phantom{0}#

Example: Regression Analysis

For #10# days, the owner of an ice cream truck has kept track of how much ice cream he sold and what the maximum temperature in #^\circ{}C# was that day. He has calculated the regression line to find the relationship between the maximum temperature and the amount of ice cream sold.

Take a look at the scatterplot below. The blue dots represent the #10# #\blue{\textbf{data points}}# that serve as the basis for the regression analysis. The #\orange{\textbf{regression line}}# #\hat{Y} = 2.93X -20.45# is drawn in orange.

#\phantom{0}#

#\phantom{0}#

Here, #a=2.93# is the regression coefficient. This value predicts how much more ice cream #Y# will be sold, given that the maximum temperature #X# increases by #1#. For example, if the maximum temperature increases by #2#, the amount of ice cream sold is *predicted* to increase by #2\cdot 2.93=5.86#.

The intercept #b# is #-20.45#. In this case, the negative value of the intercept holds no particular meaning, since it's not possible to sell a negative amount of ice cream.

To calculate the predicted amount of ice cream sold at a particular maximum temperature, simply enter a value for the #X# into the equation. For example, at a maximum temperature of #X=25#, the predicted amount of ice cream sold is:

\[\hat{Y}=2.93X-20.45=2.93\cdot25-20.45=52.8\]

#\phantom{0}#

An important thing to consider when performing a regression analysis is that even a single *outlier* can have a large impact on the results of the analysis, especially when working with relatively small datasets.

#\phantom{0}#

Example: Effect of Outlier

Let's revisit the ice cream truck example, but this time at a temperature of #22# degrees, the owner of the truck sells #500# ice creams.

#\phantom{0}#

#\phantom{0}#

This value is notably larger than the other data points. Such a data point is called an *influential outlier* and causes the entire regression line to shift upwards. When you find such an outstanding value, you can consider omitting it from the analysis.

**Pass Your Math**independent of your university. See pricing and more.

Or visit omptest.org if jou are taking an OMPT exam.