0:00

Hi. In this lecture, we're going to talk about. Fitting lines today. Remember in

the last lecture we talked about simple linear models. Well, when I was drawing

those lines through the data the question is how do you do it. How do you draw the

best possible line through the data? That's the focus of this lecture so let's

step back for a second. Remember we need the categorical model. We had that notion

of r squared, which was the percentage of variation that you explained. So there's a

lot of variation in your data, you can start to model and you ask. What percent

did you explain? So, for example, if I have a bunch of data like I have here,

these are all the dots, right, these data. If I just took the mean of this right

here, and then asked, how much variation? I'd have to take the distance from all

these points, [inaudible], there'd be a lot of variation. When I draw the line

through here, I explain a lot of it. And in fact, here, it says I explain 87.2%. So

what you wanna do is you wanna talk about, how do you draw a line through this data

to explain as much variation as possible? So remember in our [inaudible] just to

give us a, a reminder of how this worked. It was 53,000. If it?s a total variation,

and you only had 5200 left. So 5200 over 53,000, which is like 9.8 percent of how

much we had left over. So that means that we explain 90.2 percent of the variation.

We wanna show how you can do that same sort of calculation with lines, and then

show how you draw the best possible line. So let's suppose I've got a bunch of data

here. To figure out how much variation there is, I draw the mean. And then I can

figure out the distance between let's say a six and this has a value of four. I

would take six, 4-6 and square that, which is four. And if this is the point as value

eight, I would take 8-6 and square that, and that would also be four. And I would

add up all those variations. That gives me the total variations. Now what I'm going

to do is I'm going to lie to the data and figure out. Now just what's the distance

from the line and ask how much of the variation did I explain from drawing the

line through. So, it?s a very simple example. There are three kids and they're

in different grades of school and they wear different size of shoes. All I'm

gonna do is predict shoe size as a function of the grade they are in the

school. So I am gonna say that if shoe size is function of what grade you are in.

So get a first grader who has a size one shoe, a second grader who has a size five

shoe and a fourth grader who has a size nine shoe. You know I'm gonna pick a

linear model of this. So first thing they ask is, what's the variation? So this is

the grade. And this is the shoe size. Now, I don't care about the variation in the

grades. I'm caring about the variation in the thing I'm trying to explain, which is

shoe size. So it's just this. So it's just this 1,5, and nine. So if I take one,

five, and nine, if I add those up, I get fifteen. Divide by three. I get five. So

the mean is five. So to get the variation I take one minus five, and square it which

is sixteen, five minus five and square it, that's zero, and nine minus five and

square it, that's also sixteen, so the total variation is 32. So what I want to

do is I want to write down a linear model that can explain as much of that variation

as possible. Let's start off with just a really simple linear model, where we just

assume y equals 2x. So if I take the line Y=2x, what I'm saying is all three of

these points should lie on the line. Right? And so, the variation is just, sort

of, how far off the line they lay. Well, so, how do we do it? Well, I've got X and

Y and 2X would be two. When X=1, 2X would be two. When X=2, 2X would be four. And

when X=4, 2x would be eight. So these are my predictions in a way, right? Two, four,

and eight. And what I can ask is, how far does the data lie from those predictions.

So, Here, I predicted two and the actual value is one, so I get two minus one

squared which is one. Here I predicted four and the actual value is five so I'm

gonna get four minus five squared, which is one. And here I predicted eight, right?

And the actual value is nine, so eight minus nine is also one, so I get one

squared. So, the total amount is three. So I think, wow, that's great. I started out

with a total variation of 32. Right? And now I've only got three, so if I want to

figure out my hours squared, I?ll just say that's one minus three over. Thirty-two,

right. And once again, I'm going to be over 90%, right. It's like 90+%, right. So

that's really good. I've explained a lot of the variation. But the thing is, this

was just like, they just made this up, this Y=2x. They just drew this line. So

how would I draw the best line? Well, what I can do is I can say well, let's suppose

I drew the line Y=MX+B, so just an arbitrary line. And then I want to ask how

far off would that line be from the data? Well, when X=1. My prediction would be M

plus B, and the actual value is one. So my error is going to be M plus B minus one

squared. The next equals two. My model's going to say that the value is two N. Plus

B. And the actual value is five. So that's going to be, my error. And when X. Equals

four, this is going to be my prediction. This is the actual value, so this is going

to be my squared error. So if I wanna know what the total error is, I just have to

multiply all these things out. So M+B-1 squared is gonna be M squared+2MB+B

squared-2M-2B+1, right? So that's a really complicated thing. And I can do that for

each of the other two as well, right? So I get these long equations. Now if I do

that, I'm gonna get, here's my total error. Right? So this is the, if I choose

the line, Y equals MX plus B, this is my error. What you can do and this is what's

great about calculus you can math and just solve for this, find the b and the m,

right, that make this the smallest possible number. And if you do that you

choose b equals minus one and you choose m equals eight thirds. So this is how you

draw those lines you basically go back, right, and just say well let's take any

line, y equals to mx plus b, figure out its distance to the data, right, add up

your total distance right here so this is just the total distance to the total

variation. Right? And then you wanna choose an M and a B that make that as

small as possible. And it turns out the way to do that is to choose B equals minus

one, and this should be a lower case m, right? M= 8/3. Now when we do that, what

we're gonna predict is that when, X equals one, our model now says y = 8/3 x - one,

right? So when x = one, we're gonna get 8/3 times one, minus one, so that's gonna

equal 5/3. Right. So, and the actual value's one. So if you look at the

difference, right, between our prediction and the actual value, it's just going to

be two-thirds, right. So we're going to get, their contribution to R squared is

going to be two-thirds squared. When we look at five, when you take X=2, the real

value's five. Our model, if you plug it in here, is going to give us 13/3. That's

also off by two-thirds. And if you look at when X=4, our model gives us 29 over

three. The actual value's nine, which is 27 over three. That's also up by

two-thirds. So what we're going to get is our total variation left over is

two-thirds squared plus two-thirds squared plus two-thirds squared. So that's

basically 4/3, that's going to be well, it's going to be four-ninths times three

which is 12/9, which is 4/3. So now if I want to know what's my R squared, right.

Well if I erase all this stuff for a second, right. What I get is that, how

much of the data did I explain. Right? I have one minus 4/3 over 32. And so, now

I've explained, you know, over 95 percent of the data. So by using, by sort of

figuring out what the optimal [inaudible] are, I can even do better, right, than I

could, like I said, trying to draw that line of Y=2X. And so if I draw that actual

line, it goes like this, and you see it becomes incredibly close to the data. So,

let's move on and think about, how do we do this with multiple variables. Supposing

instead of having one variable, I've got a bunch of variables. So now I can write

y=ax1+bx2+c. So now instead of just one independent variable, I've got two. So,

when you look at these things, the sign tells you does Y increase or decrease in

X. The other thing that Regressions will tell you is the magnitude. How much does y

change as a fun, as a function of x? So let me talk about why this then is so

important. Again, we often just reason by the seat of our pants. And so, let's

suppose you care about'em, again I, I'm gonna talk about this a lot, cuz it's

just, an easy, an easy and important thing to talk about, school quality. So got a

bunch of test scores from kids. And I also know this is, like, an achievement test

score. I've also got IQ test scores, which basically tell their innate ability on

some, some level. And again this is a teacher quality, and class size. Well,

what I can do is I can run a regression, that says well, the performance on this

test is going to be sum A. Because I'm coefficient on some intercept. Plus I'm

coefficient on teacher quality. I as in IQ, teacher quality, and class size. And

what you would expect is, the [inaudible] of class size to be negative. Right. You'd

expect the coefficient on teacher quality to be positive and the coefficient IQ to

be positive. Now, without, without running a model, we don't know which ones of these

things are big. We even don't know if our intuition is right. Well, let's look at

class size. So recently it's been like 78 study class size, four of these show a

positive coefficient, right, thirteen show a negative coefficient, and 61 show no

effect. Right, so this is the result of, somebody did a, a, summary investigation,

78, you know, regression studies, data studies on does class size matter, and

what you find is that, you know, only thirteen times does it have that expected

negative effect and 61 times it has no effect and four times it actually goes in

the wrong direction. So even though we think class size matters when it should

matter, smart classes should lead to better performance. It doesn't always work

out that way. What about teacher quality? Well there's a recent study by a bunch of

economists, right? And they basically show that a good kindergarten teacher is worth

$320,000. So if you have twenty students, it turns out that those students can

expect to make $16,000 more in lifetime earnings by having a good kindergarten

teacher than having a bad kindergarten teacher. So again, by plugging all this

data. Now we all expect that class size should matter, lower class sizes should be

good, and teacher quality should matter, better teachers should be good. But what

you find when you run the data, class size doesn't seem to matter. That much, at

least in the ranges of which we're playing, the teacher quality matters a

lot. So, what do we learn from all this? What we learn is, it's a lot of data out

there. One thing you can do is you can fit that data to linear models. What linear

models will do is they'll explain some percentage of the variation. Maybe a lot,

maybe a little. These linear models will also tell us the sign and magnitude of

coefficients. So it'll tell us whether a variable. It's got a positive effect but

it's got a negative effect. And also tell a sort of how big that effect is, and that

allows us to make policy choices. You know, investing in things like teacher

quality as opposed to class size because they have a larger effect. This is what I

call big coefficient thinking. Thank you.