### Chapter 9. Regression Analysis: Multiple Linear Regression

### Dummy Variables

Besides *quantitative *predictor variables, it is also possible to include *categorical *predictor variables into a regression model. This is done by creating one or more *dummy variables*.

#\phantom{0}#

Dummy Variable

A **dummy variable **is a *binary *variable used in regression analysis to represent a particular subgroup of the sample.

A dummy variable takes on a value of #1# if an individual belongs to a particular subgroup and a value of #0# if the individual does not belong to that subgroup.

If you want to add a categorical predictor variable with *two* levels to the regression model, a single dummy variable is sufficient.

If you want to add a categorical predictor variable with *more than two* levels to the regression model, multiple dummy variables need to be created. For a categorical variable with #k# levels, you will need #k-1# dummy variables.

Example: Adding a Binary Variable to the Model

Consider the following regression equation:

\[\hat{Y}=-12+9X_1\]

Where #X_1# is a person's age and #\hat{Y}# is their predicted income in 1000 euros.

Now suppose that, besides a person's age, you would also like to incorporate into the model whether or not the person has a Dutch nationality. This variable can take on two values: either you are Dutch or you aren't.

To incorporate this variable into the model, a dummy variable #X_2# can be introduced, which takes on a value of #1# if the person in question is Dutch and a value of #0# if the person has another nationality.

Suppose the new model is described by the following regression equation:

\[\hat{Y}=9X_1-12 + 5X_2\]

Here #b_2=5#. So if you have a Dutch nationality, the model predicts you will earn #5000# euros more than a person of the same age, but with a different nationality.

On the basis of this dummy variable, it is possible to construct two models: one for people with a Dutch nationality and one for people with another nationality.

- The predicted income of someone with a Dutch nationality is:
- #\hat{Y_1}=9X_1-12+5\cdot1=9X-7#

- The predicted income of someone with a different nationality is:
- #Y_2=9X_1-12+5\cdot0=9X-12#

Notice that both equations have the same regression coefficient but different intercepts. The distance between the two regression lines is equal to the coefficient of the dummy variable, #b_2=5#.

Example: Multiple Dummy Variables

Consider the following regression equation:

\[\hat{Y}=-12+9X\]

Where #X# is a person's age and #\hat{Y}# is their predicted income in 1000 euros.

Now suppose that, instead of treating age like a *quantitative* variable, you want to treat age like a *categorical* variable by grouping people into #4# age groups: Child, Teen, Adult, and Elder.

Since there are #4# age groups (levels), you need #k-1=4-1=3# dummy variables:

- The variable #X_1# is one if the person is a child and zero otherwise.
- The variable #X_2# is one if the person is a teen and zero otherwise.
- The variable #X_3# is one if the person is an adult and zero otherwise
- For an elderly person, #X_1, X_2#, and #X_3# are all zero.

On the basis of these three dummy variables, you can construct four different regression models, one for each age group.

#X_1# | #X_2# | #X_3# | Regression Model | |

#\phantom{0}#Child | 1 | 0 | 0 | #\phantom{00}##\hat{Y}=b_0+b_1X_1# |

#\phantom{0}#Teen | 0 | 1 | 0 | #\phantom{00}##\hat{Y}=b_0+b_2X_2# |

#\phantom{0}#Adult | 0 | 0 | 1 | #\phantom{00}##\hat{Y}=b_0+b_3X_3# |

#\phantom{0}#Elder | 0 | 0 | 0 | #\phantom{00}##\hat{Y}=b_0# |

**Pass Your Math**independent of your university. See pricing and more.

Or visit omptest.org if jou are taking an OMPT exam.