[SOUND] In this video, I am going to show

you how to do the residual analysis so

we can check the underlying assumption

in a simple linear regression.

This is an example that I have already shared with you in my lectures.

This is the impact of GPA on starting salary of the students and we have plotted

the relationship between the two and it pretty much looks like a straight line.

It's not as sharp of an increase as one would like to see maybe but

it is still a increasing relationship between GPA and starting salary.

We can get that also from the data analysis.

So let me just do that.

Let me run the regression by going to Data Analysis, pick Regression.

And when the regression comes up, you have to first give it a Y range,

so Y is what you're trying to predict.

And here is the starting salary.

So I'm going to put my cursor on the first cell, Ctrl+Shift down and

pick the whole thing.

Now remember, if you mix up X and Y,

there is nothing that Excel will do to alert you to that.

It will give you an answer.

It will just be a wrong answer, so be very careful about how you're going to put your

X's and Y's because it's not going to look odd to you at all.

So the next thing I want to do is put my GPA and that's in the column A.

I have labels so I'm going to click on that.

I'm going to put it in a New Worksheet and I'm going to now ask it to Plot residuals.

Residual Plots, Line Fit Plots, and Normality Plots.

And click OK.

And here's what we get.

First of all, you will see that the R Square is only 33.65, so 33.65% of

the salary variations is explained by the GP GPA.

So, you would see that for every point, the GPA goes up,

your salary goes up by 2,904.

Remember, that means if your GPA goes up from 2.0 to 3.0,

you're going to get the $2900 boost in your salary.

So, that's why the rise wasn't so

much, because we have a lot of numbers between two and three.

So, these are the plots.

that will show is whether or

not we have a linear regression that fits the assumptions or not.

So when you have a normality plot then you see these points and it looks pretty much

like a straight line, the normal distribution is not being violated.

So this is good.

The other thing is the Residual Plot, to show us the variability of the errors.

Remember, sometimes you are going to over-predict and

sometimes you are going to under predict.

So these are all of the things that have been over predicted,

and these are all the things that have been under predicted.

And if you just visually look at this, then these values.

And these values, seems to be about the same.

Which means if I sum them up, I should get that zero.

So one of the things that we need is that the errors

as a whole will cancel each other out to a mean of zero, and

visually inspecting that, you get a sense that that will be the case.

It wouldn't have been the case if the points for a lot of numbers on the top or

vice versa, lot of numbers at the bottom, but here they seem to be evenly split up.

The other thing is that the variabilities themselves seem to be kind of

bounded by a straight Line.

So if I can think of a straight line, it seems to be they're bounded,

these are the boundaries by which they're bouncing around.

And so therefore, it's telling me that, it has a constant various.

If this plot look like this, that would fanning out, or

look like this then we would have said that the constant variance is violated.

And this is the line fit.

Again what you would see is that the orange line here

is representing the predicted salaries.

The predicted salaries is the orange line and the starting salary is in blue.

So if you can look at this you can see that we seem to be about the same.

We have lots of points underneath but we also have some points on the top.

And they seem to be evenly divided.

So this is what you want.

If your data fits, you should have randomness in terms of your errors.

It shouldn't be always over-predicting or majority of time over-predicting or

under-predicting or you see any kind of a pattern like seasonality pattern to it.

So these are things that we look at.

We look at the residual Plots to see if any on the underlying assumptions

are necessary for simple linear regression is being violated or not.

If it's being violated then you really should not be using the model.