Alright. In this lecture we're going to talk about linear models. Now in linear models what you do is you see that there's some, independent variable. X. And this is a variable that, that, could be anything. It could be hours spent exercising. It could be money spent on advertising. It could be anything that you want. And then you assume that there's some other variable. Y. Dead is a function of X. So if X is hours spent exercising Y could be life expectancy. If X is amount of money you spend on advertising then Y can be sells. Now what we do is we assume there is a particular relationship between Y and X that it's not any old function. In fact what we do is we assume it's a simple line. So here's X, here's Y. We assume that we can write Y= MX + B. That Y is just a linear function. X. So formally right, you see the [inaudible] typical graph and y on this axis and here is X. B is the intercept. So this is representing Y equals M X plus B. If X equals zero, B is the value that Y will take. And then M right is gives the slope. So tells u sort of how fast is it's going up. How fast this thing is going up. You can move or you can increase X by one. How much does Y get increased? Now remember I talked about the difference between a linear model and a line. So, in a line, we just have that equation, Y=MX+B. In a linear model, what we're assuming is that there's some independent variable X, there's some dependent variable Y. And Y depends on X, or assuming that X somehow causes Y to occur. All right. Let's do an example, simple example. Suppose you think about buying a TV, and you want to know, how much is that TV gonna cost? So I wanna construct a linear model of the cost of the TV. So let's let x be the length of the diagonal, right? So you get your square screen, right? And then TVs are measured by the diagonal. That way they can pretend the TV's bigger than it really is. 'Kay, so x is the length of the diagonal. Y is the cost of the TV and suppose you say here is my linear model. I think the cost is fifteen times the length in inches. Plus a $100. So you put a dollar sign here, to show this is a hundred dollars. That could be the model of what it costs for a TV. Now there's two things you wanna think about when you have these two linear models. First is the sign of that coefficient. So, if I have y equals 5x. Right, that five is positive that's says y is increasing an x. So in the case of the TV, we expect the price of the TV to go up as it gets bigger. So expect the sign to be positive. Second thing we care about is the magnitude. How big is that coefficient? So how much does the price go up every time you make X figure by one? So when you think about using these in policy settings, right? So suppose you know that school lunches [inaudible] school lunches improve performance. Well, the sign on that would be, well, is performance higher if people get school lunches, right? So is there a positive coefficient? So why do we construct models, right? Bunch of reasons, one though is to predict, right? Another is to understand data. So let's talk about just using this simple model to try and predict. So supposed you're thinking about, wonder what it cost to buy a 30 inch TV, what we can, is we can plug it in. And so the cost would be fifteen. Times the diagonal. Plus 100. So what that would be is 450. Plus 100, which should be $550. So that would say if I want to go buy a 30-inch TV, it's gonna cost me $550. And so if that's, if we had a good model, this would be what a TV costs, if we had a bad model, this would be way off base. But let's suppose this is your model, and I think you can use it, you can use it to predict things that maybe don't even exist. So suppose you think, what if they made a 100-inch TV, how much would that cost? Well, if you assume this same relationship holds, you could just say, well, I'm gonna say, the cost should be fifteen. Times a 100, plus a 100. Which is gonna be 1,600. So we can make, 100 [inaudible] will be $1,600 [laugh] okay so. Maybe wanna sit around and wait for that 100 [inaudible] to come around. Or maybe $1,600 is too rich for your blood. You're gonna say well you know what. I'm happy with the 30 incher. Okay, and again what you can use this model for is to predict. Now the model might not be accurate, right because the world may not be linear. But as a benchmark, this isn't really a bad thing to do. Now also understanding data rate. There's all these data here, right. These are all these dots.1 Thing you can do is with your model rate is you can. Think okay, how well can I fit a line through there? Now let's go back to our last lecture from, remember in our last lecture we talked about R squared, how much of the variation can you explain. Well there's a lot of variation in this data. Well you can do the same thing with a line. You can ask, how far is this data line from my line? Right? And so how much of that total variation I explained by just that drawing that line through there. So you can do the same exact thing that you do for the categories, for the lines, and that's we're actually gonna do in the next lecture. But for now, I just wanna get across this idea that you can use a linear model to try and make sense of a data like this and also to make predictions. Here's the thing. Remember I said, first lecture, that models are better than we are. Well let me support that even with these simple models. So Robyn Dawes, who?s at Carnegie Mellon, in 1979, wrote a paper comparing very, very primitive linear models. So, all he worried about is sorta getting the coefficient close to right. So, for example, here's one case, he had 43 bank loan officers and they're trying to predict whether. These firms were gonna go belly up or not, because they'd repay their loans. And so they were given 60 loans, 30 of which they, it was already known had failed, and 30 if which was already known had succeeded. And they asked these people to predict what was gonna happen. Now, these bankers, they're pretty smart. They were 75 percent accurate, so that's actually pretty good. But if you took a simple linear model, just based on that ratio of assets to liabilities of the people taking out the loans. That was right 80 percent of the time. So the models beat the people. That, say fine, that's one example. This has been studied in detail right. So in 1954, [inaudible] did a study, twenty studies of clinicians, so these are doctors, right, making predictions versus just doing a simple linear model. And Sawyer in 1966 did 45 studies of predictions out there in the social world. All 65 of these studies it is never, never the case that the experts did significantly better than the linear models. So there's cases where they were close, right where the expert?s maybe a little better but it wasn't significantly better. And there were a ton of cases where the linear models did better. So if you do a horse race, right. Linear models tend to be better than experts. Remember we saw this, this is that tetlock, this is the graph from tetlock again. Here's formal models way up here- Right? And, this is, again, this is measuring how [inaudible] how good those models are at explaining variation. What you get is that formal models are better than people are. Now again, you don't wanna only rely on the formal model. What you'd like to do is, do the linear model. And compare that linear model to your own judgment. Okay, so what have we done? What we've done is we've shown that you can draw a line through data and use that line to explain some of the variation in the data. Now typically the world isn't gonna be perfectly linear. There's going to be lots of extra variation left over, but there's a question of how much of that variation did the line explain. In addition to explaining the variation, the line tells us something about the relationship between our independent variable, x and our dependent variable, y. In particular, we learn the sine on x, like does y increase in x or decrease in x, and we also learn something about the magnitude, so how much does. Each one unit increase of x increased the value of y. So what this linear model can do is help us understand something about data we see in the real world. Now what we've done so far though, right, is just consider a single variable linear model, right? So y was just a function of x. Where we're gonna go next is to think of y depending on a whole bunch of different x's. So you can think of your outcome having a whole bunch of different variables that contribute to it, and we'll start out by seeing how each of those variables contribute in a linear way. Okay, thank you.