Now in this next lecture section,

we're going to concern ourselves with summarizing binary data.

Binary data are data that take on only one of two values,

essentially yes or no.

Does a person have a certain disease, yes or no?

Does the subject have a certain characteristic, yes or no?

Do they engage in a certain behavior like smoking, yes or no?

The numerical analog to this is we tend to assign a value of one

for those values that have a yes outcome and a value of zero for no.

When we're going to see when summarizing binary outcomes in any single sample,

it's going to be a lot easier than when we were summarizing continuous data.

When we summarized continuous data we had different numbers as

representing different characteristics of the sample: numbers for center,

numbers for spread, numbers for other locations.

We're going to see that all that information is contained in

one single summary statistic for any single sample of binary data,

and the summary statistic is the sample proportion which will show as essentially

the mean of values that can only take on values of zero or one.

So, you might be deceived into thinking that binary data

is easier to deal with than continuous data,

but when we start comparing binary outcomes between

two or more samples we're going to see that the situation gets more difficult,

we're going to see that based on

only two numbers when we're comparing two samples, the proportion,

we have the outcome in one sample and

the proportion of the outcome in the separate sample,

in the second sample.

There are several different ways we can compare these numbers,

and well although all give the same general result,

they can look very different numerically.

So, one of the things we'll do is compare the proportions on the absolute scale.

We'll take the difference of the two proportions.

Another measure will have us take the two proportions we've summarized on two samples,

and instead of taking the difference,

we'll take the ratio.

We'll see that, well, both of these measures will

agree in terms of the direction of association,

they can look very different numerically,

and we'll talk about situations where one is reported and the other

is not and it can be misleading in terms of what the story is.

Then finally, we'll talk about a third relative comparison measure,

something that's seen in a lot of epidemiological studies calls the odds ratio, and well,

that really isn't the favored measure of association

except for a very specific type of study called case-control studies,

it still will rise in work we do later in the course,

so I'd like to define it here as it syncs up with

these other measures in terms of summarizing binary data.

So, I look forward to doing this with you onward and upward.