Alright. In this lecture we're going to talk about linear models. Now in linear

models what you do is you see that there's some, independent variable. X. And this is

a variable that, that, could be anything. It could be hours spent exercising. It

could be money spent on advertising. It could be anything that you want. And then

you assume that there's some other variable. Y. Dead is a function of X. So

if X is hours spent exercising Y could be life expectancy. If X is amount of money

you spend on advertising then Y can be sells. Now what we do is we assume there

is a particular relationship between Y and X that it's not any old function. In fact

what we do is we assume it's a simple line. So here's X, here's Y. We assume

that we can write Y= MX + B. That Y is just a linear function. X. So formally

right, you see the [inaudible] typical graph and y on this axis and here is X. B

is the intercept. So this is representing Y equals M X plus B. If X equals zero, B

is the value that Y will take. And then M right is gives the slope. So tells u sort

of how fast is it's going up. How fast this thing is going up. You can move or

you can increase X by one. How much does Y get increased? Now remember I talked about

the difference between a linear model and a line. So, in a line, we just have that

equation, Y=MX+B. In a linear model, what we're assuming is that there's some

independent variable X, there's some dependent variable Y. And Y depends on X,

or assuming that X somehow causes Y to occur. All right. Let's do an example,

simple example. Suppose you think about buying a TV, and you want to know, how

much is that TV gonna cost? So I wanna construct a linear model of the cost of

the TV. So let's let x be the length of the diagonal, right? So you get your

square screen, right? And then TVs are measured by the diagonal. That way they

can pretend the TV's bigger than it really is. 'Kay, so x is the length of the

diagonal. Y is the cost of the TV and suppose you say here is my linear model. I

think the cost is fifteen times the length in inches. Plus a $100. So you put a

dollar sign here, to show this is a hundred dollars. That could be the model

of what it costs for a TV. Now there's two things you wanna think about when you have

these two linear models. First is the sign of that coefficient. So, if I have y

equals 5x. Right, that five is positive that's says y is increasing an x. So in

the case of the TV, we expect the price of the TV to go up as it gets bigger. So

expect the sign to be positive. Second thing we care about is the magnitude. How

big is that coefficient? So how much does the price go up every time you make X

figure by one? So when you think about using these in policy settings, right? So

suppose you know that school lunches [inaudible] school lunches improve

performance. Well, the sign on that would be, well, is performance higher if people

get school lunches, right? So is there a positive coefficient? So why do we

construct models, right? Bunch of reasons, one though is to predict, right? Another

is to understand data. So let's talk about just using this simple model to try and

predict. So supposed you're thinking about, wonder what it cost to buy a 30

inch TV, what we can, is we can plug it in. And so the cost would be fifteen.

Times the diagonal. Plus 100. So what that would be is 450. Plus 100, which should be

$550. So that would say if I want to go buy a 30-inch TV, it's gonna cost me $550.

And so if that's, if we had a good model, this would be what a TV costs, if we had a

bad model, this would be way off base. But let's suppose this is your model, and I

think you can use it, you can use it to predict things that maybe don't even

exist. So suppose you think, what if they made a 100-inch TV, how much would that

cost? Well, if you assume this same relationship holds, you could just say,

well, I'm gonna say, the cost should be fifteen. Times a 100, plus a 100. Which is

gonna be 1,600. So we can make, 100 [inaudible] will be $1,600 [laugh] okay

so. Maybe wanna sit around and wait for that 100 [inaudible] to come around. Or

maybe $1,600 is too rich for your blood. You're gonna say well you know what. I'm

happy with the 30 incher. Okay, and again what you can use this model for is to

predict. Now the model might not be accurate, right because the world may not

be linear. But as a benchmark, this isn't really a bad thing to do. Now also

understanding data rate. There's all these data here, right. These are all these

dots.1 Thing you can do is with your model rate is you can. Think okay, how well can

I fit a line through there? Now let's go back to our last lecture from, remember in

our last lecture we talked about R squared, how much of the variation can you

explain. Well there's a lot of variation in this data. Well you can do the same

thing with a line. You can ask, how far is this data line from my line? Right? And so

how much of that total variation I explained by just that drawing that line

through there. So you can do the same exact thing that you do for the

categories, for the lines, and that's we're actually gonna do in the next

lecture. But for now, I just wanna get across this idea that you can use a linear

model to try and make sense of a data like this and also to make predictions. Here's

the thing. Remember I said, first lecture, that models are better than we are. Well

let me support that even with these simple models. So Robyn Dawes, who?s at Carnegie

Mellon, in 1979, wrote a paper comparing very, very primitive linear models. So,

all he worried about is sorta getting the coefficient close to right. So, for

example, here's one case, he had 43 bank loan officers and they're trying to

predict whether. These firms were gonna go belly up or not, because they'd repay

their loans. And so they were given 60 loans, 30 of which they, it was already

known had failed, and 30 if which was already known had succeeded. And they

asked these people to predict what was gonna happen. Now, these bankers, they're

pretty smart. They were 75 percent accurate, so that's actually pretty good.

But if you took a simple linear model, just based on that ratio of assets to

liabilities of the people taking out the loans. That was right 80 percent of the

time. So the models beat the people. That, say fine, that's one example. This has

been studied in detail right. So in 1954, [inaudible] did a study, twenty studies of

clinicians, so these are doctors, right, making predictions versus just doing a

simple linear model. And Sawyer in 1966 did 45 studies of predictions out there in

the social world. All 65 of these studies it is never, never the case that the

experts did significantly better than the linear models. So there's cases where they

were close, right where the expert?s maybe a little better but it wasn't

significantly better. And there were a ton of cases where the linear models did

better. So if you do a horse race, right. Linear models tend to be better than

experts. Remember we saw this, this is that tetlock, this is the graph from

tetlock again. Here's formal models way up here- Right? And, this is, again, this is

measuring how [inaudible] how good those models are at explaining variation. What

you get is that formal models are better than people are. Now again, you don't

wanna only rely on the formal model. What you'd like to do is, do the linear model.

And compare that linear model to your own judgment. Okay, so what have we done? What

we've done is we've shown that you can draw a line through data and use that line

to explain some of the variation in the data. Now typically the world isn't gonna

be perfectly linear. There's going to be lots of extra variation left over, but

there's a question of how much of that variation did the line explain. In

addition to explaining the variation, the line tells us something about the

relationship between our independent variable, x and our dependent variable, y.

In particular, we learn the sine on x, like does y increase in x or decrease in

x, and we also learn something about the magnitude, so how much does. Each one unit

increase of x increased the value of y. So what this linear model can do is help us

understand something about data we see in the real world. Now what we've done so far

though, right, is just consider a single variable linear model, right? So y was

just a function of x. Where we're gonna go next is to think of y depending on a whole

bunch of different x's. So you can think of your outcome having a whole bunch of

different variables that contribute to it, and we'll start out by seeing how each of

those variables contribute in a linear way. Okay, thank you.