You have seen that with simple linear regression, we can explain or play the weight in terms of height. But do you think that weight is solely explained by height, or do you think that other characteristics; age, gender, calorie intake or physical activity may play a role as well? I think so. To assess it, we can use multiple linear regression. In this video, we discuss multiple linear regression, an extension of simple linear regression. It allows to study a continuous outcome, but now, in terms of multiple explanatory variables. After watching this video, you will understand why to use multiple linear regression and how to interpret the results obtain. There are different situations in which multiple linear regression can be used. The first one is to control for confounders. This is one of the easiest way to get unbiased estimates of causal effects. Second, multiple regression is often used to build a prediction model. The third reason is that you can increase the power or precision of your trial. Control for confounders is one of the main resource to use multiple regression. Take, for example, the association between height and weight. If you use simple linear regression, you can observe an effect of height and weight, just like we did in the last lesson. However, this just maybe the consequence of the fact that age causally influences both height and weight. Since the population is composed of children of different ages, we say that eight confounds the relation between height and weight. In this situation, it is important to account for age differences before quantifying the effect of height on weight. Multiple regression allows to solve this problem, adjusting the effect of height for age effects. Another use of multiple linear regression is to build a prediction model. Using height, we can create a simple model to played weight. However, we probably can improve the prediction if we add extra information to the model. So we add age and calorie intake per day. Then, if a new child appears, and we know his height, age, and daily calorie intake, we can use our new model to get a better guess of his weight than using only height. A third reason to use multiple linear regression is to increase the precision of a study. This is often used in clinical trials. For example, say you want to test whether drug A is better than drug B. You would randomly divide your sample in patients receiving drug A and patients receiving drug B. This regression formula can be used to estimate the treatment effect. But if we add extra variables that are strongly associated with the outcome, for example, age and disease severity, we will need less patients to obtain the same precision for the treatment effect. We will explain later how this is possible. So how does multiple regression work? To begin, let's look at the formula of the multiple linear regression model. It is similar to the formula of simple regression. But now, with more explanatory variables, which have their own regression coefficient Beta. As in the simple linear regression setting, Beta 0 is the intercept, which expresses the mean value of the outcome when all explanatory variables are zero. Each Beta denotes the adjusted regression coefficient for the corresponding variable. It quantifies the mean changing the outcome when increasing the corresponding variable Y unit, and keeping all other variables constant. The residuals are again captured in the error term, just as in simple linear regression. Look now at the example with two explanatory variables for predicting weight. Beta 1 represents the regression coefficient for height, and Beta 2 is the regression coefficient for age. In fact, we are extending the simple linear regression model with a new dimension. See, for example, this plot, which shows age, weight, and height. Multiple regression in fact means regression in multiple dimensions. This plot shows that for any value of age, height increases the weight with the same slope. As a result, we get the following regression equation. Weight, the outcome, equals the intercept, mean is 11, plus 0.26 times height, measured in centimeters, plus 1.4 times age, measured in years. This equation can be now used for prediction. So what is the expected weight of a two-year-old child who is one meter tall? Beta 0 plus Beta 1 times 100 centimeter plus Beta 2 times two years, equals 17.8 kilograms. Like in simple linear regression, the regression coefficients are estimated using the least squares method. Recall now, the simple linear regression fit for weight based on height only from our previous lesson. It provided the estimated effect of 0.39 kilograms per centimeter. However, after controlling by age, the regression coefficient for height is reduced, from 0.39 to 0.26 kilograms per centimeter. This difference suggests that the initial estimation with simple linear regression was confounded by age. We have mentioned that one of the reasons for multiple linear regression is to increase the precision in the estimation of the regression coefficients. But how is that? If we add extra explanatory variables that are strongly associated with the outcome, the variation of the residuals will be reduced. Hence, the standard errors of all regression coefficients will be smaller. This is beneficial since it will reduce the cost of the study. Using this approach with a smaller sample size, we can be more precise in our estimations. In this video, we have shown linear regression models based on more than one explanatory variable. You can use it for correcting for confounders, building a prediction model, and to increase the precision or power of your study. Until now, we have focused on linear regression, regression for continuous outcomes. But regression can be used for other types of outcomes such as binary or time to event. The next lessons will be about regression models for other types of outcomes.