is how I would write a regression model.

And it says that the expected value,

that's the average of Y, and then the straight line there means given.

We articulate that as given.

The expected value of Y given X.

The expected price of a diamond given its weight is then equal to some

function of X.

And the most straightforward function that we might choose to use is a linear

function, and we write the linear function in this instance as b0 + b1 times X.

Sometimes you will have seen the equation of a straight line written as Y = mx + b.

This is still a straight line, but we have a slightly different notation typically

in the regression models and there's a reason for that.

And the reasons is that there's a form of regression called multiple regression,

which has many Xs in and

then we can use a notation that incorporates b naught, b1, b2, b3, etc.

So we subscript the coefficients.

B naught is still the intercept and b1 is still the slope.

So regression model is relating the average of Y to a particular value of X

and its not at all uncommon to assert that that association is

at least approximately linear, and in that case, we're doing a linear regression.

On this slide I have overlaid the straight line model that is

calculated from the underlying data.

I haven't told you how this line is calculated yet.

I will in a few minutes.

But there's the regression line.

And the slope and

intercept in this particular instance are presented in the formula below.

The expected value of the price of a diamond given its weight is

equal to -260, that's the intercept, + 3721.

3721 times the weight, whether weight is measured in carats.

So that's what a linear regression is going to do for you.

It's going to put a line through the data basically and

once you've got a line going through the data,

there are a number of useful things that you're going to be able to do with that.

So there's a quantitative model that has been derived from underlying data.

So, we let the data talk to us in the sense that the data chose

the best fitting line.

Now there's a very commonly used number

to describe the strength of what we term linear association.

So essentially, how close are the points to a line?