So let's just talk about the linearity assumption and

simple linear regression, and then segue into it from multiple linear regression.

By using an example we're well familiar with, arm circumference,

predictors of arm circumference.

And let's just refresh our memory about assessing this assumption,

simple linear regression.

So we were looking at the relationship between arm circumference and height for

data on anthropometric measures from a random sample of 150 Nepalese

children 0-12 months old.

We already said, well, when we first established

the idea of linear regression with a continuous predictor,

we said, we can treat height is continuous if it makes sense.

If there’s evidence that the relationship between arm circumference and

height is relatively linear.

And we said a useful visual display for assessing the nature of the association

between two continuous variables is a scatterplot.

So here is a simple scatterplot of the unadjusted association between arm

circumference and height.

And we looked at this in the first lecture we did in this term of the course.

And this is the same scatterplot with the regression line,

the resulting regression line superimposed on it.

So we can see that well, of course, no fit is going to be perfect, and

we tend to overestimate the individual values.

The mean tends to overestimate the individual values on the lower end.

Generally speaking, it splits the points down the middle, and

looks like it tracks with the center of with the arm circumference measures

the functional height.

So it seems to be a pretty good assumption to exploit here to get this

estimated association.

So now suppose we fit the multiple linear regression model we've looked at

several times that includes not just height but

also weight as a continuous predictor and age is categorical.

And we look at some other models as well and we could do the same treat as for

those but we'll just use this an example.

So the resulting regression model we got by fitting height and

weight is both continuous.

We got a slope for height of -0.09 and a slope for continuous weight of 1.32.

But regards to linearity assumption in between arm circumference and height now

is a little more complicated than height was the only predictor in a simple model.

Now, the assumption regarding height in this model is that the relationship

between arm circumference and height is linear after adjusting for weight and

age So how are we going to get a sense of that?

How are we going to assess whether the relationship between arm circumference and

height is linear after adjustment?

Well, we can look at that simple scatterplot between arm circumference and

height that we started with.

But that doesn't take into account weight or age, so

it's not going to give us a picture of what we're looking for.

So another option, this is actually something unique to linear regression and

we'll not have the luxury of this when we get into other types of regression.

So we can create something called an adjusted scatterplot.

In this case, if we wanted to assess what the relationship between arm

circumference and height was linear in nature after adjusting for weight and age.

We could create this graphic which actually shows the relationship between

arm circumference and height.

Where both have been adjusted for weight and age.

In other words, it actually plots the variability and

arm circumference not explained by weight and

age versus the variability in height not already explained by weight and age.

So it looks to see what the nature to the relationship between arm circumference and

height is.

And even if there is any relationship left over after we've explained all we can

about both of them with the other variables.