A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

Loading...

En provenance du cours de Johns Hopkins University

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

235 notes

Johns Hopkins University

235 notes

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics.

À partir de la leçon

Module 2B: Summarization and Measurement

Module 2B includes a single lecture set on summarizing binary outcomes. While at first, summarization of binary outcome may seem simpler than that of continuous outcomes, things get more complicated with group comparisons. Included in the module are examples of and comparisons between risk differences, relative risk and odds ratios. Please see the posted learning objectives for these this module for more details.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Okay.

In this section, we will actually take an opportunity to review some of

the ideas we did in lecture sections 4a through 4d dealing with binary data.

And what I'll ask you to do, is what I do with

many of these recap exercises is, listen to me lay out the exercise.

Then pause playback, work it out own your own

and then resume and I'll go through my solutions.

And there are two exercises in this set.

So to start, let's look at the results of

a journal article from the American Journal of Public Health.

Data from the 1999 nationals survey of America's families were

used to create three sub groups of immig, immigrant children; US

born children with non-citizen parents, foreign born children who were

naturalized US citizens and foreign born children with non citizen parents.

And what they wanted to do was compare certain

health care indicators across these different immigration statuses.

And they had a fourth group as well, US born children with citizen parents.

So let's take a look at the results, their findings.

So here's a table that lays out some

of the characteristics by the four immigrant status groupings.

And I'll just circle this across the top so that you can see they,

amongst children who are U.S. born, they

broke them into two categories, those with citizen

parents and those with non-citizen parents and

amongst the children who were foreign born Broke

them into two categories, those with citizens,

who were citizens, and those who were non-citizens.

So we have four different groupings here, and what they do is

compare the percentage who have different health indicators across the four groups.

So we're going to focus in, I'll blow up the table here.

Focus in

on health insurance coverage and healthcare access and

we're going to even focus in more closely the net.

And we're going to look at this indicator of whether or not anyone in the family had

a lack of medical insurance at any time in the past 12 months from the survey date.

Okay, so for example, amongst children who were U.S. born to citizen parents,

about 15% of them reported having no access to health, at

least at some point in the previous year prior to the survey.

If we look at the group of foreign-born children, who are

not citizens then, that proportion was a lot higher, about 52.3%.

So what I'd like you to do, is first compute the risk differences,

relative risks, and odd ratios for lacking medical insurance in the past twelve

months, using US to foreign-born citizen parents,

using this group as the reference group.

And I'd like you to compute these measures of association for

the three other groups, each relative to this same reference group.

I'd like to then interpret in words, you may want to write

it down or you may want to say it out loud or

whatever works for you, the risk difference, relative risk, and odds

ratio estimates comparing the foreign born

non-citizen children to the reference group.

Try and get a sense of how to interpret those in the context of what we're doing.

Then I'd like you to compute the natural log of

the relative risk for each of the three comparisons you're making.

Finally, I'd like you to make

foreign born, non-citizen children the reference group.

And report the risk difference, relative risk, odds ratio in

log relative risk comparing U.S. born citizen parents to this group.

Okay, let's review the results of these exercises,

and hopefully they were helpful and interesting to you.

So first I'm going to do the first three of the four questions all at once.

So I'm going to write this out in tabular format so we can keep track of it.

So we are dealing with four

different, immigration classification groups, of children.

The children who are born in the US, And there's

two subgroups there, those born to citizen parents and those

born to parents who are not citizens, and then children who are foreign born.

And some of the children are citizens. And some of them are not.

So we have these four different groups.

And let's just remind ourselves of what

the estimated proportion of children who did not

have health insurance for at least some time in the year prior to the survey are.

So that we can get this going. So it was 15.34 in our reference group

here, the US-born to citizen parents. In the US-born to non-citizen parents,

it was notably higher at 34.37%. And I'll just fill this out here.

This is a recap of the data.

So now, if we were to actually do our measures of association, if we were to

actually do a risk difference here, our relative risk and our odds ratio.

Let's see what we get here.

So, there's no values for

the reference group. That's the group we're comparing to.

You could also put in a risk difference of zero

because the difference between this group's proportion and itself is zero.

You could put in a relative risk of one because of the

risk of lacking health insurance the previous year for children in this

group to children in this group, is one, but I'm just going to

put dashes in to sort of indicate that this is the reference group.

So, I'm not going to go through all the details mathematically so I'll just fill

out this table.

If you actually looked at the risk difference for the US

born children to non citizen parents compared to the reference is,

an absolute percentage difference of, alm, I'm, I'm being, since they

used the decimals, I'm using them here, I would generally round myself.

But 19.03% higher absolute.

percentage points higher on the absolute scale.

If you do the relative risk,

it's 2.0,

2.24, and the odds ratio is

slightly higher, 2.90. Let's do it for this next group, citizen,

children, who were born outside of the US, okay.

And if we actually look at this risk difference, these groups had a lower risk,

lower estimated risk of having been without

health insurance in the prior year compared

to the reference so the risk difference

is negative, the resulting relative risk is 0.84.

16% lower risk on the relative scale.

That's consistent with a negative risk difference.

Both indicate that this group had a lower risk of the outcome than the other group.

The odds ratio comes in at 0.81.

The odds for this group relative to the reference

group are 0.81 times, or 19%, lesser.

And if we do the same thing for the last group, foreign born children who

are not citizens, the risk difference for this group compared to the reference

is large, 36.0, almost a 37% percentage point difference.

Okay?

The relative risk is consistent with that, in

that it indicates a higher proportion of risk

in this group compared to the reference is 3.41,

and then the odds ratio what's striking is 6.05.

So, I think these data here sort of remind us, in a

nice way, that the odds ratio and relative risk will, sort of, complementing

each other in terms of direction of association will not always be equal

in magnitude and here we see a re, somewhat of a stark difference.

Okay? Then

I ask you to actually look at the natural logs of these things.

Just gearing us up for this idea that we'll sometimes be taking ratios to the

log scale and doing some calculations before

bringing them back to the scale of interest.

Or alternatively displaying them in the log scale because of

the equalized range of values for positive and negative association.

So I'm going to insert this column here. This is the log of the risk ratio column.

And using

your calculator, computer, or phone, you can probably do this relatively

easily. But the log of 2.24, the natural log is

0.81, and the log of 0.84 is negative 0.17.

And the log of 3.41 is 1.22.

So you can see here when the risk is higher

for the group in the numerator than the

denominator, the relative risks are greater than 1,

so are the odds ratios and the log

relative risks are greater than 0, they're positive.

However, when we have situation in the other direction, where

the, the risk in the numerator is smaller than the risk

in the denominator, such that the relative risk is less than

1, the log of the relative risk is less than 0.

Okay.

So one thing I should've mentioned before, when we were breaking out these numbers

and I actually asked you to do

is, how would you interpret these risk differences?

It's sort of a little bit different here than

the context that I've showed you before because this

is not sort of evaluating the effica, potential efficacy

of a treatment on a large group of people.

We wouldn't necessarily assign people to these

different statuses and look for different outcomes.

But these do give us some sense of the magnitude of the burden that each

of these groups has in terms of not

having health insurance at least at certain points.

And we can compare, perhaps, different cities in the U.S. with

different immigrant population distributions to

get an estimate for a city

of 500,000 where the majority of immigrants are one type, compared

to a city of 500,000 where the majority are another type.

It could give us some estimate on what the difference in the numbers between the

two cities was, in terms of those who had insurance issues in the previous year.

But, what it does in this type of study, which

is observational and were not able to control the exposure,

experimentally, it still gives us information about some information about

the order and magnitude of the burden on the different groups.

For the, for the last question, I actually ask

you to do a similar comparison but reverse the directions.

So here I asked you to instead of comparing

making the reference group U.S. born children to citizen

parents I asked you to make the reference group,

switch it to foreign-born children who are not citizens.

And I wanted you to compare the previous reference

group, U.S. born children to citizen parents to this new

reference group because the direction of comparison is arbitrary and I just wanted

to see how it shakes down numerically. So now, we are comparing

US born children to citizen parents

to foreign-born

children who are not citizens. Okay, and I'll let you

go back and verify the numbers they go into this.

But let's just do this.

If we actually look at the risk difference here, it turns out to be negative 36.96.

It's the opposite of the risk difference we

had before, which was in the opposite direction.

If you actually look at the relative risk, it's the reciprocal of the previous

relative risk of 3.41, it's 0.29. And if you look at the odds ratio,

the reciprocal of the previous odds ratio, 1 over 6.05 is 0.17.

What you to think about that, how do

you express these numbers and what this would mean?

This would mean in this direction of comparison that

those U.S. born, if we're looking at the relative

risk, the U.S. born children to citizen parents have

0.29 times the risk of having been without health

insurance at any point in the previous year, as

those who are foreign born and were not citizens.

In other words, they had 71% less risk.

How is that compared to the increase when we

reverse the direction, I want you to think about that.

If we come in though and look at the log of this relative risk,

you'll notice that it's just the opposite of what it was before.

Before it was 1.22, when we reversed directions, the negative of that.

So the effect size, the absolute value of this effect is the same on the log scale.

But it isn't the same on the ratio scale.

Okay, so let's look at another example, and

again, I'll encourage you to take this in,

write some things down, and then do it on your own, and come back for the recap.

So this is a classic right?

Many of you have read the saga of this. This is an amazing story actually.

It's an amazing testament to the power of well planned research.

This is one of the biggest clinical trials that had

been performed in the U.S. up to modern times now.

And it was done without the aid of computers

et cetera.

It's really an amazing story for many reasons.

But this is the polio vaccine trials that

were conducted in the U.S. in the early 1950s.

Kay and at the time, over 400,000 school-age

were randomized to receive polio vaccine or placebo.

Over, can you imagine that, in terms of the

organization that required at the national level in the 50s?

So,

of the 200,745 children who were vaccinated,

there were 82 polio cases. Of the 201,229

children who were not vaccinated, there are 162 polio cases.

So, what I'd like you to do, is

compute the risk difference relative risk, log relative risk

and odds ratio of getting polio for the children

who were vaccinated compared to those in the placebo.

Now I'd like you to interpret the risk

difference relative risk and odds ratio estimates in words.

How, in this situation,

do the estimated relative risk and odds ratios compare in value?

And then I'd like you to repeat question one, but just change

the direction of comparison for taking the placebo compared to the vaccine group.

Okay, welcome back.

Hopefully you had fun analyzing data from one of the greatest public health

experiments ever conducted, as it was

called by Paul Meier, a famous statistician.

Kay, so let's look at the results, here.

So if we were looking at the vaccine group, let's just write this out.

In the vaccine group we are, proportion or risk of

contracting polio, for those who were vaccinated, was the 82

cases we saw out of the 200,000-plus

children who were randomized to that group.

And it turns out, that's a very

small proportion, as you probably figured out 0.00041.

Another way to say this is this is 0.04%.

0.04% of the children in the vaccinated group contracted polio.

If we do the same thing for the placebo group, it was

in this group, it was, and I won't, just for space reasons, I'll just cut to the

chase. But this was 0.00081,

or 0.08%.

Okay? So, that's how the results came together.

Let's compute our measures of association. So if we were to compute a risk difference

for this, comparing vaccinated to Polio, it's going to look quite small,

right? It's the 0.00041 minus

0.00081, sorry for the handwriting

here, but it's about negative 0.0004.

Or negative 0.04%. That looks

really small, right? Why is that so small?

Well, it's because the baseline risk of polio numerically,

the risk among those who are untreated is small, numerically.

That doesn't take away from the fact that it was a devastating

disease and well worth investigating ways to reduce the risk of it.

But numerically speaking, it's small.

So how would this translate into

the effect on a large number of children?

Well, if you do the math you can see that

this basically suggests posing the vaccine was the driver here, that

we could present about 40 cases of polio per 100,000

children vaccinated, compared to what would happen if they weren't vaccinated.

Not a large difference but certainly very substantial in terms

of the co-morbidities brought on by the disease.

If we do the relative risk, however, it's going to look quite different.

You know, that's the 0.00081 divided by

0.00041 and that's roughly equal to 0.5,

slightly higher 0.51.

So now, if we were to report this, we'd say those in the vaccination group had

roughly half the risk of contracting polio compared to those in the placebo group.

They had 50% of the risk, 51% of the risk of those who weren't treated,

or 49% less risk on the relative scale. 49% less sounds a lot more

dramatic than .04% less, but that's because

the comparison we're making here is relative.

If you didn't know the base line that risk difference or the starting

proportions, it would be hard to determine the actual impact on the population.

If we do the odds ratio here, it's also very close to 0.51.

And then, if we

look at the log of this relative first, the log of 0.51 is equal to negative 0.67.

So why do you think the odds ratio

and relative risk are effectively identical in this situation?

Well, we didn't see the same situation before when we

were looking at that immigrant status and healthcare outcomes data.

But here, we underline risk or proportion of

the outcome across both groups is numerically very low.

And we said, when the risk of an event was rare or

low the odds ratio and relative risk would be similar in value.

So that jives with that.

Okay, so let's look at the results if we had compared

things in the Placebo group to the vaccine group, the opposite direction.

So if you did this, you'd probably found quickly that the risk

difference was the same in absolute value, but it was positive now.

And that makes some sounds, right?

The interpretation was here would be that those, the risk of

getting polio untreated compared to vaccine on the absolute scale.

The percentage point difference is on the order of .04%, or 0.0004,

as a decimal.

Another way to think about that is, we'd except 40 more cases of polio

out of 100,000 children, if they were

not treated compared with if they were vaccinated.

If we look at the relative risk, it's just

the reciprocal of the relative risk in the other direction.

Which turns out to be almost 2, 1.96. So we could say that the placebo

group had 1.96 times the risk of contracting polio

compared to the vaccine group, or 96% higher risk.

Now, 96% higher risk for the placebo compared to

the vaccine sounds different than the vaccine group having

49% lower risk but that's because the rel, basis

for comparison is different and because of that property

of ratio such that the scaling is different

for negative associations than it is for positive.

However, if we look at things on the

long scale, log of 1.96 is approximately, now if

you round you might get something slightly different but

the log of this is approximately equal to 0.67.

Which is exactly the opposite of the log relative risk in the other direction.

So on the log scale,

these have the same effect size, if you will, same association size,

they're just the opposite sign because the direction of comparison is different.

And since the odds ratio was very close to relative

risk before or identical, it will be here in this situation.

So, hopefully this is giving you an opportunity to actually get your hands

around some of the different kinds of comparisons we can make with binary data.

In the next

section, we'll be dealing with ratios once again.

We'll remove the risk difference component,

or the difference component, and work

solely on the ratio scale, when we're dealing with time to event data.

And some of, some of these ideas we've

opened up about ratios, will come into play again.

Coursera propose un accès universel à la meilleure formation au monde,
en partenariat avec des universités et des organisations du plus haut niveau, pour proposer des cours en ligne.