Hi. In this lecture, we're going to talk about. Fitting lines today. Remember in the last lecture we talked about simple linear models. Well, when I was drawing those lines through the data the question is how do you do it. How do you draw the best possible line through the data? That's the focus of this lecture so let's step back for a second. Remember we need the categorical model. We had that notion of r squared, which was the percentage of variation that you explained. So there's a lot of variation in your data, you can start to model and you ask. What percent did you explain? So, for example, if I have a bunch of data like I have here, these are all the dots, right, these data. If I just took the mean of this right here, and then asked, how much variation? I'd have to take the distance from all these points, [inaudible], there'd be a lot of variation. When I draw the line through here, I explain a lot of it. And in fact, here, it says I explain 87.2%. So what you wanna do is you wanna talk about, how do you draw a line through this data to explain as much variation as possible? So remember in our [inaudible] just to give us a, a reminder of how this worked. It was 53,000. If it?s a total variation, and you only had 5200 left. So 5200 over 53,000, which is like 9.8 percent of how much we had left over. So that means that we explain 90.2 percent of the variation. We wanna show how you can do that same sort of calculation with lines, and then show how you draw the best possible line. So let's suppose I've got a bunch of data here. To figure out how much variation there is, I draw the mean. And then I can figure out the distance between let's say a six and this has a value of four. I would take six, 4-6 and square that, which is four. And if this is the point as value eight, I would take 8-6 and square that, and that would also be four. And I would add up all those variations. That gives me the total variations. Now what I'm going to do is I'm going to lie to the data and figure out. Now just what's the distance from the line and ask how much of the variation did I explain from drawing the line through. So, it?s a very simple example. There are three kids and they're in different grades of school and they wear different size of shoes. All I'm gonna do is predict shoe size as a function of the grade they are in the school. So I am gonna say that if shoe size is function of what grade you are in. So get a first grader who has a size one shoe, a second grader who has a size five shoe and a fourth grader who has a size nine shoe. You know I'm gonna pick a linear model of this. So first thing they ask is, what's the variation? So this is the grade. And this is the shoe size. Now, I don't care about the variation in the grades. I'm caring about the variation in the thing I'm trying to explain, which is shoe size. So it's just this. So it's just this 1,5, and nine. So if I take one, five, and nine, if I add those up, I get fifteen. Divide by three. I get five. So the mean is five. So to get the variation I take one minus five, and square it which is sixteen, five minus five and square it, that's zero, and nine minus five and square it, that's also sixteen, so the total variation is 32. So what I want to do is I want to write down a linear model that can explain as much of that variation as possible. Let's start off with just a really simple linear model, where we just assume y equals 2x. So if I take the line Y=2x, what I'm saying is all three of these points should lie on the line. Right? And so, the variation is just, sort of, how far off the line they lay. Well, so, how do we do it? Well, I've got X and Y and 2X would be two. When X=1, 2X would be two. When X=2, 2X would be four. And when X=4, 2x would be eight. So these are my predictions in a way, right? Two, four, and eight. And what I can ask is, how far does the data lie from those predictions. So, Here, I predicted two and the actual value is one, so I get two minus one squared which is one. Here I predicted four and the actual value is five so I'm gonna get four minus five squared, which is one. And here I predicted eight, right? And the actual value is nine, so eight minus nine is also one, so I get one squared. So, the total amount is three. So I think, wow, that's great. I started out with a total variation of 32. Right? And now I've only got three, so if I want to figure out my hours squared, I?ll just say that's one minus three over. Thirty-two, right. And once again, I'm going to be over 90%, right. It's like 90+%, right. So that's really good. I've explained a lot of the variation. But the thing is, this was just like, they just made this up, this Y=2x. They just drew this line. So how would I draw the best line? Well, what I can do is I can say well, let's suppose I drew the line Y=MX+B, so just an arbitrary line. And then I want to ask how far off would that line be from the data? Well, when X=1. My prediction would be M plus B, and the actual value is one. So my error is going to be M plus B minus one squared. The next equals two. My model's going to say that the value is two N. Plus B. And the actual value is five. So that's going to be, my error. And when X. Equals four, this is going to be my prediction. This is the actual value, so this is going to be my squared error. So if I wanna know what the total error is, I just have to multiply all these things out. So M+B-1 squared is gonna be M squared+2MB+B squared-2M-2B+1, right? So that's a really complicated thing. And I can do that for each of the other two as well, right? So I get these long equations. Now if I do that, I'm gonna get, here's my total error. Right? So this is the, if I choose the line, Y equals MX plus B, this is my error. What you can do and this is what's great about calculus you can math and just solve for this, find the b and the m, right, that make this the smallest possible number. And if you do that you choose b equals minus one and you choose m equals eight thirds. So this is how you draw those lines you basically go back, right, and just say well let's take any line, y equals to mx plus b, figure out its distance to the data, right, add up your total distance right here so this is just the total distance to the total variation. Right? And then you wanna choose an M and a B that make that as small as possible. And it turns out the way to do that is to choose B equals minus one, and this should be a lower case m, right? M= 8/3. Now when we do that, what we're gonna predict is that when, X equals one, our model now says y = 8/3 x - one, right? So when x = one, we're gonna get 8/3 times one, minus one, so that's gonna equal 5/3. Right. So, and the actual value's one. So if you look at the difference, right, between our prediction and the actual value, it's just going to be two-thirds, right. So we're going to get, their contribution to R squared is going to be two-thirds squared. When we look at five, when you take X=2, the real value's five. Our model, if you plug it in here, is going to give us 13/3. That's also off by two-thirds. And if you look at when X=4, our model gives us 29 over three. The actual value's nine, which is 27 over three. That's also up by two-thirds. So what we're going to get is our total variation left over is two-thirds squared plus two-thirds squared plus two-thirds squared. So that's basically 4/3, that's going to be well, it's going to be four-ninths times three which is 12/9, which is 4/3. So now if I want to know what's my R squared, right. Well if I erase all this stuff for a second, right. What I get is that, how much of the data did I explain. Right? I have one minus 4/3 over 32. And so, now I've explained, you know, over 95 percent of the data. So by using, by sort of figuring out what the optimal [inaudible] are, I can even do better, right, than I could, like I said, trying to draw that line of Y=2X. And so if I draw that actual line, it goes like this, and you see it becomes incredibly close to the data. So, let's move on and think about, how do we do this with multiple variables. Supposing instead of having one variable, I've got a bunch of variables. So now I can write y=ax1+bx2+c. So now instead of just one independent variable, I've got two. So, when you look at these things, the sign tells you does Y increase or decrease in X. The other thing that Regressions will tell you is the magnitude. How much does y change as a fun, as a function of x? So let me talk about why this then is so important. Again, we often just reason by the seat of our pants. And so, let's suppose you care about'em, again I, I'm gonna talk about this a lot, cuz it's just, an easy, an easy and important thing to talk about, school quality. So got a bunch of test scores from kids. And I also know this is, like, an achievement test score. I've also got IQ test scores, which basically tell their innate ability on some, some level. And again this is a teacher quality, and class size. Well, what I can do is I can run a regression, that says well, the performance on this test is going to be sum A. Because I'm coefficient on some intercept. Plus I'm coefficient on teacher quality. I as in IQ, teacher quality, and class size. And what you would expect is, the [inaudible] of class size to be negative. Right. You'd expect the coefficient on teacher quality to be positive and the coefficient IQ to be positive. Now, without, without running a model, we don't know which ones of these things are big. We even don't know if our intuition is right. Well, let's look at class size. So recently it's been like 78 study class size, four of these show a positive coefficient, right, thirteen show a negative coefficient, and 61 show no effect. Right, so this is the result of, somebody did a, a, summary investigation, 78, you know, regression studies, data studies on does class size matter, and what you find is that, you know, only thirteen times does it have that expected negative effect and 61 times it has no effect and four times it actually goes in the wrong direction. So even though we think class size matters when it should matter, smart classes should lead to better performance. It doesn't always work out that way. What about teacher quality? Well there's a recent study by a bunch of economists, right? And they basically show that a good kindergarten teacher is worth $320,000. So if you have twenty students, it turns out that those students can expect to make $16,000 more in lifetime earnings by having a good kindergarten teacher than having a bad kindergarten teacher. So again, by plugging all this data. Now we all expect that class size should matter, lower class sizes should be good, and teacher quality should matter, better teachers should be good. But what you find when you run the data, class size doesn't seem to matter. That much, at least in the ranges of which we're playing, the teacher quality matters a lot. So, what do we learn from all this? What we learn is, it's a lot of data out there. One thing you can do is you can fit that data to linear models. What linear models will do is they'll explain some percentage of the variation. Maybe a lot, maybe a little. These linear models will also tell us the sign and magnitude of coefficients. So it'll tell us whether a variable. It's got a positive effect but it's got a negative effect. And also tell a sort of how big that effect is, and that allows us to make policy choices. You know, investing in things like teacher quality as opposed to class size because they have a larger effect. This is what I call big coefficient thinking. Thank you.