1:13

is equal to x transpose x inverse

x transpose expected value of y since we're assuming we're conditioning on x.

So when x assuming x is not random and then expected value of y is just x beta.

Well, this quantity here x transpose x and

this quantity right here are just inverses, we have x transpose x and

x transpose x inverse, so this works out to just be beta.

So beta has unbiased, it's expected value is what its like to estimate,

what we'd like to estimate.

And then also we can calculate the variance of beta hat under

these assumptions.

So the variance of beta hat is equal to the variance of

x transpose x inverse x transpose y which

is equal to x transpose x inverse x variance of y.

And then times the matrix transposed again.

x transpose and then x transpose x inverse is symmetric, so

that's just x transpose x inverse again.

2:34

I'm not going to write that because I'm just going to pull it out.

because of the I doesn't change anything and the sigma squared is the scalar.

So then we get x transpose x transpose x inverse and

then we can put our sigma squared out here, okay?

So we have x transpose x inverse, x transpose x and x transpose x inverse.

So this works out to be x transpose x inverse sigma squared.

So a lot like our, Linear regression estimate.

The variability of the regressor winds up being,

in a sense, in a matrix sense, in the denominator.

And, this tells us that just like in linear regression,

in order to have our variance estimate of our coefficient be smaller

we want the variances of our x's to be larger.

And that makes a lot of sense.

If you think back in linear regression, if you want to estimate a line really well,

if you collect x's in a tight little ball,

you're not going to be able to estimate that line very well.

But if you collect x's all along the line, in other words the variance of x is very

large, then you're going to be able to estimate that line with greater precision.

So we want, it's interesting that we don't want variability in our y's,

we want sigma squared to be small.

But we do want variability in our x's, we want variability and our x's to be large.

And in fact, in linear regression, the most variable you can make things is if

you get half of your x observations at the lowest possible value,

and half of your x observations at the highest possible value.

That'll give you the maximum variance for the denominator.

Of course, you're banking a lot about the relationship not doing

anything funky in between this big gap where you didn't collect any data,

but that is, if you're really quite certain about the linearity,

then that would minimize the variance of the estimated coefficient.

4:40

Okay, the last thing is we can estimate

the variance of q transpose beta,

where q is a linear contrast.

So that works out to be q transpose beta hat.

So that works out to be q transpose times the variance of beta hat times q,

which is then just q transpose x transpose x inverse q times sigma squared.

And it's interesting to note that as an estimate,

q transpose beta hat is an estimate of q transpose beta.

And we can show that this estimator, q transpose beta hat, is so called blue.

5:38

So, First of all, let's check off these things to make sure,

clearly it's an estimator and it's unbiased.

You know that q transpose beta hat is unbiased because we know that beta

hat is unbiased.

And q is just another scalar or just another constant multiplier.

So what we just pull out of the expectation, so it's unbiased.

Beta hat is linear in y, so q transpose beta h is linear in y.

So when we mean linear, we mean linear in y, so it's linear in y.

And so best, well, what do we mean by best?

And what we mean is that it has minimum variance and

we can show this very quickly.

And it has quite a clever proof and it involves

all of our techniques that we have used in these last couple of lectures.

6:25

So, take as an example to try to find another estimator that is also linear.

So let's say k transpose times y is another linear estimator of

q transpose beta for some value of k.

And then because this estimator is linear and

unbiased, the expected value of k transpose y is equal

to K transpose expected value of y, which is x beta.

6:56

And we want that to be unbiased, so we want it to be equal to q transpose beta.

So we know that k transpose x has to equal

q transpose because this statement has to be true for all possible betas, right?

We can't, even though we don't know beta, we need this unbiased property to exist

across all possibly betas so that means k transpose x has to equal q transpose.

That's the first fact that you have to keep in your back pocket here.

7:42

and that's equal to q transpose, we pull that out of the covariance on that side.

Covariance of beta hat times k transpose and

when I pull the k out of it, I get y.

And pulling out of k transpose, so I get k transpose transpose,

which is k on that side.

8:05

Now, covariance beta hat is x transpose x,

inverse x transpose y.

Then we have, with y there times k, so

now we can pull out the x transpose x inverse part, x transpose part.

So this is q transpose times x transpose x inverse x transpose,

then covariance of y with itself times k which is

equal to the q transpose x transpose x inverse x transpose.

Covariance of y with itself is just a variance of y.

We're assuming that that's I sigma squared.

So I'm just going to put that as k times sigma squared.

9:35

That's another fact that you need to keep in your back pocket, okay.

So we have two facts we need to keep in your back pocket.

Well, first we're done with this one, that k transpose x has to equal q transpose.

But the second fact that you have to keep in your back pocket now,

the only one you have to remember now, is that this covariance between the two

estimators works out to be equal to the variance of q transpose beta hat.

9:59

Now, I'm going to take the variance of q transpose beta hat and

subtract off k transpose y.

Or by our definition of variances,

that's the variance of q transpose beta

hat + the variance of k transpose y- 2

times the covariance of q transpose beta hat in k transpose y.

Now we know that the covariance just factors out as twice the covariance,

because in these cases, there's scalars.

q beta hat is a scalar, k transpose y is a scalar.

So that the covariance of ab is the covariance of ba in this case,

because that is true when decline is of scalars, okay.

Now remember the covariance between the two is equal to,

11:19

So we have this works out to be the variance

of k transpose y- the variance of q transpose beta hat.

The last point I'd like to point out is that this variance by virtue of being

a variance has to be greater than or equal to 0.

So if we take this statement and argue that it has to be greater than or

equal to 0, because it's equal to a variance,

then what we get is that the variance of k transpose y has to be greater than or

equal to the variance of q transpose beta hat.

12:00

So, there you have it, that if you take any other

linear combinations of y's that results in an unbiased estimator, their variance

has to be greater than or equal to the obvious linear combination of beta hat.

So beta hat is the best linear, unbiased estimator.

I also want to make one last point is that,

taking the best in terms of minimum variance, is only really meaningful

if you restrict yourself to the class of unbiased estimators.

Take if we didn't have the class of unbiased estimators as a restriction,

then we can always get minimum variance by just estimating things with a constant.

If I just estimate everything with the number 5, the number 5 has 0 variance, but

it's quite biased, unless you happen to be estimating 5.

Okay, so biased estimators, particularly constants can have 0 variance but

they are not good estimators.

So, you can only do this trick where you compare minimum variance,

where you compare variances,

if you restrict yourself to a meaningful class of estimators in terms of bias.

So, in this case we restricted ourselves to unbiased estimators and

then the appropriate linear combination of beta hat

we see is the best among all linear unbiased estimators.

Okay, so that's a nifty little result and it used all of our tools where we built up

tools for expected values, variances and covariances.

And then in the next lecture, we're going to start working with

the multivariate normal distribution, so we can not only just talk about moments,

but the full, complete set of characteristics from the distribution.