1:13

is equal to x transpose x inverse

Â x transpose expected value of y since we're assuming we're conditioning on x.

Â So when x assuming x is not random and then expected value of y is just x beta.

Â Well, this quantity here x transpose x and

Â this quantity right here are just inverses, we have x transpose x and

Â x transpose x inverse, so this works out to just be beta.

Â So beta has unbiased, it's expected value is what its like to estimate,

Â what we'd like to estimate.

Â And then also we can calculate the variance of beta hat under

Â these assumptions.

Â So the variance of beta hat is equal to the variance of

Â x transpose x inverse x transpose y which

Â is equal to x transpose x inverse x variance of y.

Â And then times the matrix transposed again.

Â x transpose and then x transpose x inverse is symmetric, so

Â that's just x transpose x inverse again.

Â 2:34

I'm not going to write that because I'm just going to pull it out.

Â because of the I doesn't change anything and the sigma squared is the scalar.

Â So then we get x transpose x transpose x inverse and

Â then we can put our sigma squared out here, okay?

Â So we have x transpose x inverse, x transpose x and x transpose x inverse.

Â So this works out to be x transpose x inverse sigma squared.

Â So a lot like our, Linear regression estimate.

Â The variability of the regressor winds up being,

Â in a sense, in a matrix sense, in the denominator.

Â And, this tells us that just like in linear regression,

Â in order to have our variance estimate of our coefficient be smaller

Â we want the variances of our x's to be larger.

Â And that makes a lot of sense.

Â If you think back in linear regression, if you want to estimate a line really well,

Â if you collect x's in a tight little ball,

Â you're not going to be able to estimate that line very well.

Â But if you collect x's all along the line, in other words the variance of x is very

Â large, then you're going to be able to estimate that line with greater precision.

Â So we want, it's interesting that we don't want variability in our y's,

Â we want sigma squared to be small.

Â But we do want variability in our x's, we want variability and our x's to be large.

Â And in fact, in linear regression, the most variable you can make things is if

Â you get half of your x observations at the lowest possible value,

Â and half of your x observations at the highest possible value.

Â That'll give you the maximum variance for the denominator.

Â Of course, you're banking a lot about the relationship not doing

Â anything funky in between this big gap where you didn't collect any data,

Â but that is, if you're really quite certain about the linearity,

Â then that would minimize the variance of the estimated coefficient.

Â 4:40

Okay, the last thing is we can estimate

Â the variance of q transpose beta,

Â where q is a linear contrast.

Â So that works out to be q transpose beta hat.

Â So that works out to be q transpose times the variance of beta hat times q,

Â which is then just q transpose x transpose x inverse q times sigma squared.

Â And it's interesting to note that as an estimate,

Â q transpose beta hat is an estimate of q transpose beta.

Â And we can show that this estimator, q transpose beta hat, is so called blue.

Â 5:38

So, First of all, let's check off these things to make sure,

Â clearly it's an estimator and it's unbiased.

Â You know that q transpose beta hat is unbiased because we know that beta

Â hat is unbiased.

Â And q is just another scalar or just another constant multiplier.

Â So what we just pull out of the expectation, so it's unbiased.

Â Beta hat is linear in y, so q transpose beta h is linear in y.

Â So when we mean linear, we mean linear in y, so it's linear in y.

Â And so best, well, what do we mean by best?

Â And what we mean is that it has minimum variance and

Â we can show this very quickly.

Â And it has quite a clever proof and it involves

Â all of our techniques that we have used in these last couple of lectures.

Â 6:25

So, take as an example to try to find another estimator that is also linear.

Â So let's say k transpose times y is another linear estimator of

Â q transpose beta for some value of k.

Â And then because this estimator is linear and

Â unbiased, the expected value of k transpose y is equal

Â to K transpose expected value of y, which is x beta.

Â 6:56

And we want that to be unbiased, so we want it to be equal to q transpose beta.

Â So we know that k transpose x has to equal

Â q transpose because this statement has to be true for all possible betas, right?

Â We can't, even though we don't know beta, we need this unbiased property to exist

Â across all possibly betas so that means k transpose x has to equal q transpose.

Â That's the first fact that you have to keep in your back pocket here.

Â 7:42

and that's equal to q transpose, we pull that out of the covariance on that side.

Â Covariance of beta hat times k transpose and

Â when I pull the k out of it, I get y.

Â And pulling out of k transpose, so I get k transpose transpose,

Â which is k on that side.

Â 8:05

Now, covariance beta hat is x transpose x,

Â inverse x transpose y.

Â Then we have, with y there times k, so

Â now we can pull out the x transpose x inverse part, x transpose part.

Â So this is q transpose times x transpose x inverse x transpose,

Â then covariance of y with itself times k which is

Â equal to the q transpose x transpose x inverse x transpose.

Â Covariance of y with itself is just a variance of y.

Â We're assuming that that's I sigma squared.

Â So I'm just going to put that as k times sigma squared.

Â 9:35

That's another fact that you need to keep in your back pocket, okay.

Â So we have two facts we need to keep in your back pocket.

Â Well, first we're done with this one, that k transpose x has to equal q transpose.

Â But the second fact that you have to keep in your back pocket now,

Â the only one you have to remember now, is that this covariance between the two

Â estimators works out to be equal to the variance of q transpose beta hat.

Â 9:59

Now, I'm going to take the variance of q transpose beta hat and

Â subtract off k transpose y.

Â Or by our definition of variances,

Â that's the variance of q transpose beta

Â hat + the variance of k transpose y- 2

Â times the covariance of q transpose beta hat in k transpose y.

Â Now we know that the covariance just factors out as twice the covariance,

Â because in these cases, there's scalars.

Â q beta hat is a scalar, k transpose y is a scalar.

Â So that the covariance of ab is the covariance of ba in this case,

Â because that is true when decline is of scalars, okay.

Â Now remember the covariance between the two is equal to,

Â 11:19

So we have this works out to be the variance

Â of k transpose y- the variance of q transpose beta hat.

Â The last point I'd like to point out is that this variance by virtue of being

Â a variance has to be greater than or equal to 0.

Â So if we take this statement and argue that it has to be greater than or

Â equal to 0, because it's equal to a variance,

Â then what we get is that the variance of k transpose y has to be greater than or

Â equal to the variance of q transpose beta hat.

Â 12:00

So, there you have it, that if you take any other

Â linear combinations of y's that results in an unbiased estimator, their variance

Â has to be greater than or equal to the obvious linear combination of beta hat.

Â So beta hat is the best linear, unbiased estimator.

Â I also want to make one last point is that,

Â taking the best in terms of minimum variance, is only really meaningful

Â if you restrict yourself to the class of unbiased estimators.

Â Take if we didn't have the class of unbiased estimators as a restriction,

Â then we can always get minimum variance by just estimating things with a constant.

Â If I just estimate everything with the number 5, the number 5 has 0 variance, but

Â it's quite biased, unless you happen to be estimating 5.

Â Okay, so biased estimators, particularly constants can have 0 variance but

Â they are not good estimators.

Â So, you can only do this trick where you compare minimum variance,

Â where you compare variances,

Â if you restrict yourself to a meaningful class of estimators in terms of bias.

Â So, in this case we restricted ourselves to unbiased estimators and

Â then the appropriate linear combination of beta hat

Â we see is the best among all linear unbiased estimators.

Â Okay, so that's a nifty little result and it used all of our tools where we built up

Â tools for expected values, variances and covariances.

Â And then in the next lecture, we're going to start working with

Â the multivariate normal distribution, so we can not only just talk about moments,

Â but the full, complete set of characteristics from the distribution.

Â