Okay, so that's how we wind up with these intervals.

Estimate plus or minus quantile times standard error.

And that's where it comes from.

This interval assumes

that the data are iid normal, though it's very robust to this assumption.

You know, whenever the data is kind of roughly symmetric

and mound shaped, the t confidence interval works amazingly well.

And if you want, you know, if you

have paired observations, people before and after a

treatment, for example, you can subtract them and

then create a t confidence interval on the difference.

So often, paired observations are analyzed using

this exact confidence interval technique by taking differences.

And often, differences tend to be much more Gaussian-looking,

they tend to be nice and symmetric about zero.

And then for large degrees of freedom, the t

quantiles become the same as the standard normal quantiles.

And so this interval just converges to the same interval that you get as the CLT.

Some more notes.

For skewed distributions, the kind of spirit

of the t interval assumptions are violated.

You could probably show that it still works kind of okay.

And the reason is because those quantiles, the t n minus 1,

1 minus alpha over 2, the quantiles are so far out there,

you know, the t distribution is a very heavy tail distribution that

shoves those quantiles way out there that makes the interval a lot wider.

And then it tends to work kind of conservatively

in a broad variety of settings.

But for skewed distributions, you're kind of violating

the, you know, the spirit of the t interval.

And you're often better off, you know, trying

some things like taking a natural log of your

data, if it's positive, to get it to be

more Gaussian-looking before you do a t confidence interval.

And we'll spend an entire lecture on the consequences

of logging data, so you can wait for that.

But, you know, I would just say, for skewed distribution,

it kind of violates the intent of the t interval, so

maybe think of things, like doing logs to consider it.

And also, I'd say for skewed distributions, maybe

it doesn't make as much sense to center

to the interval around its mean, in the way that we're doing with this t interval.

We're centering it right around the mean.

And then, the other thing, for discrete data, like binary data,

you know, again, you could probably, I bet you could do simulation

studies and show that the t interval, you know,

actually probably works okay for discrete data like binary data.

But, you know, we have lot of techniques for

binary data that make direct use of the binary data.

And you better off for those using, for example, things based

on Chi-squares or exact binomial intervals and that sort of thing.

Because, you know, you're so far from kind of the spirit

and intent of the t interval that is not worth using

regardless, the t interval is an incredibly handy tool.

And I'm sure actually, in some of these cases, it probably works

fine but you're so far from the kind of assumptions at that point.

And you better off using all these other

techniques that have been developed for these other cases.

And that's enough discussion about the t

confidence interval, let's go through an example.

So maybe take a break, go have a Guinness and

well be back in a second. Okay, so welcome back.

So we're going to talk about Guinness' original data, which involve sleep data.

So try not to fall asleep while we're talking about it.

So, Gosset's original data appeared in this journal called Biometrika, with a k.

And Biometrika, interestingly enough, was founded by a person called Francis Galton.

So Gosset was an interesting character.

If you really want to read up on another, you know,

absolutely brilliant, interesting character, read up on Gosset.

He was Charles Darwin's cousin.

He invented the term and the concept regression.

He, you know, invented the term and the concept correlation.

And he invented lots of other things, some good, some bad.

And he, he was just generally rather interesting character.

So any rate, Biometrika was founded

by Francis Galton and that is where Gosset's original

paper appeared and that's where the sleep data occurred.

So at any rate, the sleep data shows the increase

in hours slept for ten patients, on two sleeping drugs.

So, R treats the data as two groups rather

than paired, and I have to admit, I haven't

taken the time to go through and figure out

exactly why, there's a discrepancy between when you read

Gosset's Biometrika paper, which treats the data as

paired, and R treats it as two groups.

And, anyway, I haven't gone through the details

so I'm going to treat it exactly like Gosset's data.

So here is what it looks like as Gosset's data.

So we have patient one, two, up to ten. We have the two drugs and the difference.