So far, we have discussed linear regression with a single predictive variable. Such regression models are also called, simple regression models. As we showed before, a regression model with one predictive variable, can be realized as a line, and is relatively easy to understand. However, simple regression models are usually not practical, as we often expect the target variable to be related to more than one predictive variable. A regression model with more than one predictive variable, is called a multiple regression model. Realizing the multi-regression model is much harder, as we have to deal with the rather abstract, high-dimensional space. In a high-dimensional space, a linear function of the predictive variables, defines a plane, or a flat surface. Fortunately, most of the intuition we obtain from simple regression, can be extended to multiple regression. To give a concrete example, we consider one more predictive variable. The side-by-side box plot, shows the list prices for homes, with and without a garage. Notice that homes with a parking garage are typically more expensive, and the list prices exhibited much more variability. This is a strong indication that the PARKING.TYPE variable, our facts are target variable of interest, the list price. In this scatter plot, I plotted homes with and without a garage in different colors. Homes with garages are blue, and homes without garages are red. There are many more homes with a packing garage. Given the heavy snowfall from time to time in Boulder, this is not too surprising. This graph also confirms that homes with parking garages tend to be more expensive, as red circles are concentrated in the lower left corner of the graph. To add PARKING.TYPE to the regression model, the regression equation will have one more term on the right-hand side, which measures the impact of PARKING.TYPE. As before, what had represented the predicted value of the list price? b0 is the intercept, and b1 is the slope for square footage. We also have b2, as the slope of PARKING.TYPE. For the data set we have, we obtain that b0 is about -43, b1 is 0.43, and b2 is -99. Note that b2 essentially changes the value of the intercept. When PARKING.TYPE equal to 1, or in other words, there is a garage, we deduct 99 from the intercept. When PARKING.TYPE is 0, the last term on the right-hand side vanishes, and intercept stays at about 43. Another way to interpret this result is that, garage parking is worth about, elective $99,000. This is quite surprising, intuitively, we expect that houses with garage parking to be worth more, which we demonstrated visually before. This cautions us in interpreting regression results, the underlying intuitive result here can be contributed to what we call multicollinearity, which is a very important concept in multiple regression. Multicollinearity, also called collinearity, is a phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this particular example, PARKING.TYPE and square feet are correlated in a sense that, larger houses are more likely to have garage. When you add PARKING.TYPE to the model, you can see the estimates for different predictive variables vary considerably. This creates two issues, in the extreme case, when two variables are highly correlated, the numerical procedure that we use to estimate the regression models becomes unstable, in the sense that, a small change in the data can cause huge swing in the coefficient estimate. That is certainly an undesirable feature of the linear regression model. The second issue is that, multicollinearity causes difficulty in interpreting modeling results. We would like to attach an intuitive interpretation to the coefficient estimates, however, changing used estimates depends on predictors included in the model. Therefore, certain interpretations may not be as reliable as we are willing to believe. This is not as serious an issue, if we're only interested in the predictive accuracy of the models.