### Chapter 11: Regression Analysis: Multiple Linear Regression

### Overfitting and Multicollinearity

With the inclusion of more than one predictor variable in the regression model, there are some additional considerations that need to be taken into account. One of these considerations is the consequences of *overfitting *your regression model.

#\phantom{0}#

Overfitting

Adding more variables to a regression model does not necessarily mean that the model becomes better. In fact, it can make the model worse. This is called **overfitting**.

The danger of overfitting is that the regression model becomes tailored to fit the specific sample used to construct the model. While adding more variables could increase the predictive power of the model in regard to the sample, this may very well come at the cost of reduced predictive power with respect to the general population.

Consequently, an overfit model may lead to misleading regression coefficients, #p#-values, and #R^2#-values.

#\phantom{0}#

Another thing to watch out for when performing a *Multiple Regression Analysis *is *multicollinearity*.

#\phantom{0}#

Multicollinearity

**Multicollinearity** occurs when two or more of the predictor variables in the regression model are (substantially) correlated with each other.

Although multicollinearity does not reduce the predictive power of a regression model as a whole, it does reduce the accuracy of the individual partial regression coefficients (#b_1 \ldots b_n#).

If two predictor variables (e.g. #X_1# and #X_2#) are highly correlated, then the partial regression coefficients associated with them (#b_1# and #b_2#) may not accurately reflect the relationship between #Y# and #X_1# or the relationship between #Y# and #X_2# that exists in the population.

**Pass Your Math**independent of your university. See pricing and more.

Or visit omptest.org if jou are taking an OMPT exam.