Hi. In this lecture we're gonna talk about bringing modules to data and when I say

data what I mean is you know you. Pull all sorts of numbers off the web, or maybe

you've got you know, in you?re, for your business or your home you got all this

data out there and how much money you've spent each month or something like that.

We [inaudible] how we can use models to understand that data. And we're gonna

start simple and we're gonna build up. So we're gonna start with something that I

call categorical models. Now in a categorical model what you do is you just

sort of place all the different data in different boxes. So for example, suppose

you had a bunch of data on how long people live, and you want to try to understand

what. Allows someone to live a long and healthy life. We might create one box of

people who exercise and one box of people who don't exercise and we can just, by

looking at how much variation there is in each of those boxes and what the means are

in those two boxes you can figure out does that distinction make that categorization

between exercising and not exercising, does that actually help understand all the

variation in the data? After you do categorical models we're gonna move on to

linear models. Now why don't I give a distinction between a linear model and a

line. So a line is just something that we plot, right? So we just have like, you

know, here's the Y variable, here's the X variable, and we draw this line. In a

linear model what we assume is we assume that this Y variable depends on X. Right?

And so what this is, is this, this is some sort of relationship. So in this axis, X

could literally be the amount of exercise you do, and Y could be how long you live.

And so what you have is this line is sort of saying that how long you live is a

function of how much you exercise. So you're literally thinking [inaudible] Y.

Is some function of X. Okay, so after we do linear models, and I explain what they

mean. And I have a short lecture sort of describing how we fit data to linear

models. So if you've got a bunch of data, and then you wanna construct a linear

model, how do we do it? Here's a simple example. Suppose I've got a bunch of data

here like this. And I want to ask, you know what's the best line to go through

that data? Well clearly this would be a terrible line because it doesn't, it's not

near the data. This line here that they've done this black line looks like a pretty

good line. It looks pretty close to the data, so we're going to see how exactly to

draw the line and what criteria do you use, right, to make sure you've got the

best line? After we've done linear models, we're gonna move nonlinear models. And

we'll show that the techniques are actually fairly similar. Now, what do I

mean by nonlinear? Well, one simple way that somebody could be nonlinear, is they

could start out straight and then kind of flatten out. Or, something could start out

slow, and then get big. Or it could sorta do both. It could start out slow, and then

get big, and then flatten out, right? So there's all different shapes a function

could take. You know, John Von Neumann, one time, said, the set of non linear

functions is like the set of non elephants. And by that, he meant that the,

the number of non linear functions is enormous. So we'll talk a little bit about

how we could use some of the same techniques used for linear models to

create non linear models. Now after we do that we're gonna conclude this, this unit

by talking about something I call the big coefficient. So what do I mean? Well when

you have a linear model, right? You have like Y equals A1, X1 plus A2, X2. So X1

and X2 are the variables. These are the things that determine the outcome. So

we'll talk about for example school quality. So X1 might be what the class

size is. And X2 might be how good the teacher is. And so this A1 and A2 are the

coefficients, and these coefficients tell how important is the variable. So the

bigger the coefficient, the more important the variable. So when I say the big

coefficient. What I mean is making policies or making the decision based on

which one of these coefficients is biggest. Now that makes a lot of sense. So

what I'm gonna do is, I'm first going to argue that boy, you know, better to use

the big coefficient then to just sort of do seat of the pants thinking. We're going

to see why that's the case. Linear models are just better, right? We've seen a

little of this before but we're going to go into more detail. Linear models are

just better than just sort of thinking off the cuff. Then I'm also going to criticize

a little bit, right? Because I'm gonna say one problem with big coefficient is it

only works in the area in which we've got the data. And oftentimes, if we want to

make the world a lot better place we have to shift into an entirely new reality.

We've got to shift to a place where there is no data. So I'm gonna draw distinction

between what I call the big coefficient, and what I'm gonna call the new reality.

Brighter situations that are just maybe a lot better than what we currently have.

Alright, so that's a summary of where we're gonna go. We're gonna start out with

categorical models, then linear models, show how to fit lines to data, right? And

we're gonna do some nonlinear models, and then we'll wrap it all up by talking about

this idea of the big coefficient. Alright, thank you.