So, in this section, we're going to look at Estimating Group Odds and

Proportions using Multiple Logistic Regression and also,

Odds Ratios for Groups Who Differ in More than One Predictor.

After viewing this section,

you will be able to estimate the odds of a binary outcome for a single group,

based on a specific set of predictor values,

x values, using multiple logistic regression.

This odds estimate can then be converted to

an estimated proportion or probability of the group having the outcome.

You'll also be able to estimate an odds ratio comparing the odds of

a binary outcome for two groups who differ in more than one predictor.

So, to remind you about the process we used for simple logistic regression,

we're going just to extend this to multiple logistic,

it'll be a smooth transposition.

It's the same idea,

but transforming the log odds estimates,

we would get by from

the multiple logistic regression equation into estimated proportions or probabilities.

So, a multiple logistic regression model of the form that we've been looking at where

the log odds of y equals one is equal to a linear combination of the intercept,

and slopes, and axis.

If you give me a specific set of x values,

I can plug those into this equation and get a single value when

I add up the intercept plus the slope multiplied by their respective axis.

That single number estimate is the log odds of

the binary outcome occurring on the log odds of a phi equals one.

So, I can take that number, that log odds,

and I can exponentiate it to convert it to the odd scale.

So, I can get the estimated odds of the binary outcome occurring.

To get it from the odds to the estimated proportion or probability to linkage or formula,

I use is the one we used before,

the estimated probability or p hat equals the odds over one plus the odds.

Again, we define the odds is equal to the p hat over one minus p hat,

if you solve that backwards in terms of p,

p hat equals the odds over one plus the odds.

So, let's look at Predictors of Obesity.

Let's look at predicting the proportion of

obese individuals as a function of this second multiple linear regression model.

We take into account their sex,

HDL levels, and age quartile.

So, the resulting regression model for the model two is

presented the underlying regression model is as follows,

that the log odds of the EB equals the intercept of 0.87 plus the slope for sex

of 0.78 plus the slope for age is HDL.

The slow for HDL, I can make it to be 0.044,

and then the slope for the three non-reference age quartiles.

So, this beta three is for eight is X is an indicator of age quartile two,

age quartile, age three quartile and age quartile four.

You can get these numbers if you wanted by going back to

that previous table and taking the log of

the respective values but here I give it to you.

So, I want you to use this result to estimate the proportion of adult females,

where HDL levels of 75 milligrams per

deciliter and who are 65 years old in the fourth age quartile.

So, what we're going to do is, we're going use this equation, and just plug-in.

We've got numbers for the intercept slopes,

we just plug in our x values.

The group is female,

so their x value for the first x_1 is one.

So, we take that slope of 0.78 times one,

added to the intercept of 0.87.

Their HDL level is 75 milligrams per deciliter,

so we take this slope for HDL of negative

0.044 times 75 and there I'm in the fourth age quartiles.

So, the indicator for quartile two and quartile three are about zero,

and the indicator for quartile four is one.

So, we take that slope for the indicator quartiles four as 0.87,

and multiply it by one, and we add all these things up.

If you do the math,

you get a log odds of obesity for this group of negative 1.18.

So, if we exponentiate that,

we get the odds of obesity for this group of females.

It's equal to 0.307.

So, we add these things up to get the log odds of

negative 1.18 exponentiate that to get the odds.

Then, to transform this into

the estimated proportion or probability of this group who's obese,

we take the estimated odds over one plus the odds 0.307 over 1.307,

turns out to be 0.234 or about 23.4 percecnt.

So, now we've got an absolute probability value to understand sort of where we stand,

in terms of the risk of obesity,

at least for this singular group defined by the sex HDL and age levels.

I want to show you something,

and I don't want you to be intimidated by the Smith.

Its actually not so bad but it looks like a lot here.

I just want to show you this, because a lot of times I'll not

come back to this if you're reading a journal article.

They won't present things on the log or the regression scale,

but everything will be exponentiated in odds and odds ratio form.

You could take the log of those values to recreate the equation,

and if you're interested in estimating

different proportions or probabilities from the published results,

do what we just did.

I want to show you something. If I looked at

this log odds of obesity instead of summing it up to get the negative 1.8,

I exponentiated the components, summed together,

so took e to the sum didn't combine it

all together into the e to the negative 1.8, and wrote it out.

It would be e to the first element to sum to 0.87 times e to the

0.78 times e to the negative 0.044 times 75 times e to the 0.47.

This entire product would actually give us the odds of obesity for this group.

Another way to write this out,

e to the 0.87 times e to the 0.78 to

disentangle this piece here of e to the negative 0.044 times 75.

That can be rewritten as,

e to the negative 0.044 raised to the 75th power times e to the 0.47.

So, why would I do this?

Well, I'll show you again explicitly in a minute,

but if this e to the 0.87 now is the exponentiated intercept.

That was the intercept from the model.

This is the exponentiated slope for sex,

so this is the adjusted odds ratio for sex,

and this in parentheses here is the exponentiated slope for HDL,

so this is the adjusted odds ratio for HDL,

then we raise that to the 75th power,

then this is the exponentiated slope for age quartile four,

so this is the adjusted odds ratio of age quartiles four.

So, we can actually compute these odds directly from

the odds and odds ratio scale without going back to

the log scale where we are pulling the results from

a published paper that was already presented in exponentiated format.

So, for example, if we were looking at a table in a published paper,

the presented things on the odds and odds ratio scale,

we could do this right from those exponentiated components if we wanted to.

Now, I'll be honest, what I would do if I wanted to make computations

is I would take the time to rewrite this out in the log scale,

do the addition and then exponentiate the results,

because I find that more comforting in intuitive.

But I'm just illustrating that if you did have things exponentiated,

start what you wouldn't necessarily have to go back to the log scale.

So, the exponentiated intercept could be that baseline odds of 2.38,

we multiply it by the adjusted odds ratio for females of 2.18,

we take the adjusted odds ratio for HDL of 0.957 and raise it to the 75th power,

because we're evaluating it for persons with edge year of 75,

and then multiply it by the adjusted odds ratio for age quartile four.

We do this, we get the exact same odds that we got

from adding things first and exponentiating to 0.307,

and we'll end up with the same estimated probability or proportion.

I just want to make make you aware that you could do this if you wanted to.

In terms of presenting things on the probability of proportion scale,

sometimes a publication will include

a graphic showing predicted values or probabilities for

some or all groups defined by specific predictor values.

So, one way to actually present these proportions and give some absolute context

to things that we've only measure in a relative scale with

the odds ratios that were presented in that original table,

would be to look at the estimated proportion or probability of being obese.

One way to present it when we have these three predictors would be

to put these curve separately by sex,

I'll do females on the left-hand side, males on the right,

and look at the estimated proportions or function of HDL

separately for each of the four age quartiles.

This is maybe a little bit of a cluttered graph,

but at least it gives us some context there for what

these relative ratios mean in terms of absolute changes.

We can see that on the lower end of HDL,

those with relatively low values,

their probabilities or proportions on the order of

60 percent to 80 percent but this drops relatively quickly.

That's roughly 4% decrease

on the odds per one milligram per deciliter of increase in HDL,

it translates to a pretty rapid decrease in the proportion of probabilities.

You can see the relative ranking of the hierarchy of the age groups

in terms of who's got the higher starting odds versus the lower,

certainly group 47 to 62 years old has the highest starting odds of the four groups.

Then we can see these are put on the same scale,

so this is the graph for males and we can see that

everything is shifted down slightly compared to females

because we saw that females had higher odds even after accounting for age in HDL,

hence females would have higher estimated probabilities or proportions.

This is nice way to give some absolute contexts to

these relative quantities so that the reader

can get a feel for what the risk of this outcome is,

the proportion of, in this case persons who are obese,

has a function of the predictors used in the multiple regression model.

So, let's talk about using the results to come make comparisons on

the odds ratio scale between two groups who differ

by multiple characteristics used to model this.

So, again this is our model here,

x_1 was one for female,

zero for male, x_2 was HDL in milligrams per deciliter,

and then x_3 to x_5 were the indicators of Q_2

to Q_4 in terms of the age quartiles where the reference was left out,

that was quartile one.

To estimate the odds ratio of obesity for

females with HDL of 75 milligrams over 65 years old,

that's the group we just looked at,

but now let's compare them to males with HDL of

80 milligrams per deciliter over 50 years old.

So, I'm going to write these things out on the log odd scale for both groups.

The first line here is what we just did before when getting that probability.

Ultimately, we plugged in our values of female 75,

an age quartiles four to get an estimated log odds for that group of negative1.8.

We do the same thing for the second group males with

an HDL Of 80 milligrams per deciliter over 50 year old,

we start with the intercept,

they are males the reference sex,

so their value of x_1 is zero,

we plug in 80 for HDL and multiply it by that slope of negative 0.044 for HDL.

The 50 years old,

if you look it up they and not the fourth but they're in the third age quartiles.

So, we add in the slope for the third age quartiles 0.75 times one,

because they are indicator is activated for the third quartile,

and if we sum these up we get negative 1.9.

So, if you take the difference in the log odds between these two groups,

negative 1.18 minus negative 1.9,

it turns out to be a positive difference of 0.72.

I want you notice though,

if we took these things piecewise here,

if I look at the difference by each component before I had added them up in both groups,

the intercept is the same and it cancels in both groups,

the piece for sex comes down 0.78 times one for

the first group of females minus zero for the second group because they're males,

the slope for HDL ultimately gets multiplied by

the difference in those two HDL value 75 minus 80,

and then we have to take the slope for first group's age

quartiles four and subtract the slope for the second group's age quartile 0.75,

and this is more clearly written here.

If you do it piece-wise like that you get the same result as if you added both up,

and then took the difference at the end of this 0.72.

So, this 0.72 is the difference in the log odds of

obesity for the first group compared to the second.

So, if we wanted the odds ratio,

we we would exponentiate this e to the 0.72 is equal to 2.05.

So, the first group has

slightly over two times the odds of obesity as compared to the second group,

that's 105% greater estimated odds.

So then, to label this but just again,

want to show you just we could if results

again we're presented on the already exponentiated scale.

We wouldn't necessarily have to take things back to the log scale,

the regression scale to do this.

I personally would because I find it easier to keep track of things,

but I just want to throw this out there for those

of you who would like to try something different.

We use whatever approach you're comfortable with when doing it in real life,

but we said that we could write the log odds ratio,

the difference in the log odds.

We could write it piecewise, the intercept canceled,

and we had the slope for sex of 0.78 times the difference in the sex values,

the females for the first group coded as one,

compared to males with second.

We have that slope for HDL times the difference in HDL values 75 minus 80 et cetera.

So, I'll just make it a little less complicated over here.

I'll simplify it as 0.78 plus negative 0.044,

times negative five plus,

0.47, the slope for quartile four for the first group,

minus 0.75, the slope for age quartile three for the second group.

As we saw before,

this sum is 0.72,

same thing we would get if we had added the complete

predicted log odds for both groups and then taken the difference.

I just want to show you we could directly exponentiate that 0.72 to get the odds ratios,

but let's look at what happens if we do it piecewise and exponentiate the sum.

So, e is at the 0.78 plus negative 0.044,

times negative five, plus 0.47 et cetra.

For the first, we could rewrite this as e to the 0.78,

so this is just adjusted odds ratio for sex actually,

times e to the negative 0.044 times negative five,

times e to the 0.47,

times e to the negative 0.75,

we could rewrite that slightly just represent that second term,

e to the negative 0.044 times negative five,

is e to the negative 0.44 raised to the negative fifth power.

So, now this whole thing is presented in terms of the exponentiated slopes,

or the adjusted odds ratio.

So, it's e to the 0.78 is the adjusted odds ratio for sex.

This e to the negative 0.044 is the adjusted odds ratio for HDL,

and then we raise that to the difference in HDL between the two groups of negative five,

e to the negative 0.47 is the adjusted odds ratio for age quartiles four,

e to the negative 0.75 is the adjusted odds ratio for age quartile three.

If we do this, we get 2.05 which

is exactly what we got if we had summed it up first and exponentiate.

Again, I only point this out because some

of you may prefer what you're looking in published papers.

If you want to do

such a comparison and you don't want to take things back to the log scale

and things are presented on the odds ratio and odds scale by exponentiation,

you could do this directly in terms of those adjusted odds ratios.

You can compare two groups and get

the odds ratio for two groups who differ by more than one predictor.

It's just going to be a function,

a multiplicative and division,

dividend's just a form of multiplication,

function of the adjusted odds ratios.