Let's look at some additional examples related to multiple logistic regression.

In this example here,

that we're looking at data from over 300 thousand records,

which include information on whether a patient showed

up to a previously scheduled medical appointment.

The purpose of these data was to see whether or not text message reminder program,

was effective at getting people to show up

more frequently than if they didn't receive the message.

So, one some of the information we have is whether

the patient received a text message reminder,

patient age, other patient characteristics,

including whether they receive federal aid,

monetary aid from the government.

These data are from the capital city of

the Espirito Santo state in Brazil, the capital city.

So, these data taken as a whole,

70% of all patients showed up for a previously scheduled appointment.

The remaining 30% did not.

So, simple and multiple logistic regression was used to estimate

the unadjusted and adjusted associations

between an appointment attendance and potential predictors.

Of interest, and the reason these data were collected,

was to see whether or not text message reminders are

effective in increasing scheduled appointment attendance.

So, results of these regression models are found on the next slide.

Ostensibly, anybody who provided their cell phone number got a text message reminder,

and those persons who either didn't provide a cellphone number,

did not own a cell phone, did not.

So, let's look at these results both in the unadjusted and adjusted situations.

So, the unadjusted associations between showing up for

scheduled appointment and various predictors include a predictive text message sent.

This is the odds ratio,

the relative odds of actually showing up for

the appointment for those who got a text message to those who did not.

Notice in the unadjusted sense,

it's a small increase on the order of 0.002 or

0.2% increase the odds of showing up for

those who got the reminder compared to who did not,

but it's not statistically significant.

Age, and I put this per 10 years so that we can express the odds ratio, that comparison.

Sometimes you'll see this done.

So, I just want to give an example of this.

This is the odds ratio comparing the relative odds of showing up for

two groups of persons who differ by age by 10 years.

So 40-year-olds, 30-year-olds, 70-year-olds,

60-year-olds, etc. was 1.1.

So, being older was associated with a greater odds of

showing up by about 10% per 10-year difference.

This is statistically significant.

Whether or not a person received medical aid name,

should really say this is federal aid from the government of Brazil,

cash payments, but some of it's dedicated to get them to take care of

certain medical things like getting vaccinated, for example.

So, I'll just call that medical aid in these tables here.

We can see actually that the unadjusted comparisons,

those who actually received medical aid were less likely,

the odds ratio showing up with 0.75,

compared to those who did not receive medical aid.

They were less likely to show up by 25% lower odds.

This was statistically significant.

So, those who received medical aid were substantially less likely to show up.

Hypertension, for those with hypertension were

substantially more likely to show up for a scheduled appointment,

38% greater odds, and it was statistically significant.

Similarly, those with diabetes were substantially more likely to

show up than those without in the unadjusted comparisons.

This second column here presents the results from

a logistic regression model that includes all of these predictors.

So everything's adjusted for everything else in the table.

We can see with regards to text message are being sent,

even though this doesn't look large,

we get a greater increase in the relative odds

of showing up for a schedule appointment after adjusting for age,

medical aid, hypertension, diabetes.

Differences between those who received it and those who did not

approaches a 2.8% increase in the odds,

and it's now statistically significant.

So, it seems that some of the association here was being masked by other differences

in those who got

text messages versus those who didn't that were also related to the outcome showing up.

The relationship with age doesn't change much.

It attenuates slightly.

The relationship with medical aid gets a little bit closer to the null of one,

but it is still a substantial decrease in odds for those who received medical aid

compared to those who didn't after adjusting for text messaging,

age, hypertension, diabetes, and it is

still statistically significant predictor, even after adjustment.

Hypertension, still statistically significant

positive correlate of showing up for appointment,

but it attenuates from 1.38 odds ratio and

the unadjusted association to 1.088% greater odd in the adjusted.

It's still statistically significant,

but it appears that a fair amount of

that association was explained some of these other factors,

and their relationship with hypertension and also was showing up.

So, there was some confounding there.

Certainly, with diabetes, we see that

this seemingly strong positive association

between having diabetes and showing up for an appointment,

all but disappears when we adjust for these four other characteristics.

The odds ratio goes to 0.99,

slightly lower odds of showing up among those with diabetes than those without,

essentially no difference though,

almost a null of one,

and this result is not statistically significant after adjustment.

So, the exponentiated intercept for the simple logistic model with age,

let's go back to the simple models with 10-year increments, is 1.62.

As per the table,

the exponentiated slope or odds ratio,

unadjusted odds ratio of showing up for 10-year difference in age is 1.10.

Can we use these two facts to figure out what is

the estimated proportion of 70-year-olds in

the sample that showed up for a scheduled appointment?

So, remember, age is in 10-year increments.

So, 70-year-olds would be represented by the number 7,

X equals seven, because the units are 10 years.

So, approach number 1 is we could take

the results back to the log and the log odds scale.

Even though these things were given to us as exponentiated,

I like to represent it,

and we could figure out what the values of Beta naught and Beta1 are here by taking

the respect of natural logs of the exponentiated intercept and the odds ratio.

But let's see what this would play out in the odd scale.

The genesis of these results are a model that looks like this.

This is the unadjusted results.

The log odds are showing up as equal to an intercept plus some slope times age.

Range is measured in 10-year increments.

The predicted value for a group whose X value seven,

in other words the group was 70 years old,

will be found on the log odd scale by taking the intercept plus the slope times 7.

So, that's the number we had.

If we knew the values of Beta naught hat,

and Beta1 hat respectively,

and we could figure them out,

but I'm just going to keep it generic for a moment,

and get them back on the exponentiated scale than plugging those numbers.

But again, we can get this by taking the log of 1.62 and this,

by taking the log of 1.10,

but if we were to exponentiate this sum,

this would give us the odds of showing up for this group of 70-year-olds.

So, in terms of our generic representation,

that would be E to the intercept Beta naught hat plus the slope for age Beta1 times 7.

We can re-express that by the rules of exponents,

which is E to the Beta naught hat,

which is the exponentiated intercept times E to the Beta1 hat,

which is the exponentiated slope here,

to the seventh power.

But we know the exponentiated intercept given in the problem was 1.62,

and the odds ratio which is the exponentiated slope was 1.10.

So, plugging those numbers in,

we get the estimated odds of 1.62 times 1.10 to the seventh power,

which is equal to 3.15.

That's the estimated odds.

In order to get the estimated proportion or

probability of showing up among the 70-year-olds,

we would take this estimated odds of 3.15 divided by 1 plus itself.

That gives an estimated proportion of the 70-year-olds

who showed up in the sample of 76%.

If you're comfortable starting with the exponentiated results,

and I sometimes for,

I've been doing this for years,

still like to write things back even generically on the log scale,

just to make sure I'm doing things right when

I put in or plug in the results that have been exponentiated.

But if you started with the exponentiated results and recognize this,

you could take the exponentiated intercept

times that exponentiated slope for the odds ratio,

for one unit difference in age,

which is a 10-year difference in this coding raised to the seventh power.

Then carry out the rest of the computation as we did before.

So, using these same results with age,

could we figure out what in the unadjusted comparison we had the intercept exponentiated,

we have the slope exponentiated?

Can we figure out what is the estimated odds ratio in

95% confidence interval of showing up

for schedule appointment for 70-year-olds versus 40 year-olds?

Well, let's do this on both scales.

We could start going back to the log scale.

Even though things were given as exponentiated,

we could either write this out generically with Betas or we could plug in the numbers,

if we took the logs of the respective intercept and slope.

But the log odds for the group that 70 years old,

as we wrote out before, is equal to the intercept plus the slope times 7.

Again, age is in 10-year increments.

So, 70-year is represented by seven.

The same computation but done for the 40-year-olds would be done by

taking the intercept plus that slope times 4, to represent 40-year-olds.

So, the log odds ratio for the 70-year-olds versus 40-year-olds is

just the difference in these two computations here.

So, if we took this difference,

the slope cancels and we get seven Beta1 hat minus

4 Beta1 hat or 7 minus 4 times Beta1 hat,

which is 3 times Beta1 hat.

That would be the log odds ratio of interests.

So, to get the odds ratio,

we would exponentiate that difference,

E to the 3 times Beta1 hat.

That can be represented as E to the Beta1 hat raised to the third power.

But E to the Beta1 hat,

the exponentiated slope is just the odds ratio for a 10-year difference in age,

that 1.1 we had before raised to the third power.

If we did this,

we get an odds ratio of showing up for 70-year-olds to 40-year-olds of 1.33.

If you remember that property of starting on the odds ratio scale,

that if you're comparing two groups who differ by more than one unit in xs,

if you start with the exponentiated slope, the odds ratio,

instead of multiplying it by 3 like you would do in the log or slope scale,

you'd take that ratio and raise it to the third power for,

because of the three-unit difference in age,

and they would just eliminate the starting point.

In either case, how will we get the confidence interval for this comparison?

Well, we could go back to the log scale,

get the confidence interval for the log odds ratio.

Then multiply the endpoints by three,

and then exponentiate them,

but the end result of this,

would be taking the endpoints of the confidence interval already exponentiated.

Raising each to the third power.

So I won't go back and show,

taking this back to the log scale,

but you could do that if it makes you more comfortable,

you'll end up with the same result.

The results odds ratio for this comparison is 30 year differences 1.32 to 1.35.

So now, let's look at the,

if you went back to the adjusted column there,

let's use the results of that model to compare

two groups who differ by multiple characteristics.

Let's compare 70-year-olds again to 40-year-olds but now,

taking into account other characteristics and use the adjustment model,

the multiple regression model.

So, we want to compare,

70-year-olds with hypertension, who received a text reminder,

but did not receive medical aid.

We want to compare them to 40-year-olds who do not have hypertension,

who did not receive a text reminder,

but did receive medical aid.

This comparison will assume that both groups are the same,

in terms of their diabetes,

and to this comparison will be adjusted for diabetes, and hypertension.

We'll assume that they're same,

on both this so that those will not enter,

and there'll be no differences because of

those two things because they are the same between both groups.

So, you could go back to the log scale and then exponentiate the findings.

But since things were already presented on the,

odds ratio and exponential scale.

Let me just write out the logic for this.

This is ultimately what we get if we again go back to the log scale,

looked at differences on the log odds scale and then exponentiated them.

Think about the comparison we're making here.

We're comparing in the numerator 70-year-olds with hypertension,

who received a text,

and did not get medical aid.

In the denominator, we're comparing 40-year-olds.

We have 40-year-olds who did not get hypertension,

did not receive a text,

and did not get medical aid.

So if we break this down into

the unique comparisons being made by several variables here.

The age comparison here is 70 years to 40 years.

Hypertension comparison is hypertension to no hypertension.

That receiving text comparison is text to no text.

The medical aid comparison is no medical aid to medical aid.

Then, diabetes is the same,

and hypertension is the same so we don't include them because they're either

both have diabetes or both don't and both have hypertension or both don't.

So, let's look at this,

the part that comes from age if we pull off

the adjusted odds ratio for a 10 year difference in age,

from the adjusted college 1.092.

So, you can go back and look at that,

since the difference here is in three units of age.

Remember, age is in ten year increments,

we raised this to the third power.

So, this is exactly what we did before,

but we were looking only at age as a predictor and using the unadjusted results,

but here, is we adjusted odds ratio age rate raised to the third power.

The adjusted odds ratio comparing

the odds of showing up versus not for those with hypertension,

to those without was 1.08 from the table.

The adjusted odds ratio for those showing

up for those who got a text to compare those did not was 1.028.

Now we're comparing, no medical aid to medical aid.

What was given in the table was the adjusted odds ratio of comparing those,

we got medical aid to those who didn't that was.8.

We're going in the opposite direction here,

with this comparison so we take the reciprocal of that or 1/.8.

If we take the product of all these adjusted odds ratios,

we get the comparison of interest here of

these two groups who differ on four characteristics,

and this is approximately an odds ratio of 1.81.

I can't give you a confidence interval for this easily,

but if we had the data in the computer in front of us,

we could get that.

But it's challenging computation because the uncertainty of this

is based on the uncertainty of those four different log odds ratios on the log scale.

The combination of them,

and then that's factored in to get the standard error on the log scale,

then the confidence intervals was created and things are exponentiated.

But again, that standard error is not trivial.

But I just want you to know that it could be done.

So, before we finish off the story of this,

let's investigate whether receiving medical aid modifies

the effect or association of receiving

a reminder text and showing up for scheduled appointment.

Adjusting for age hypertension and diabetes.

In the previous analysis,

in the adjusted comparison,

we estimated one overall association,

between showing up and receiving a text message,

then adjusted odds ratio of 1.20028.

That was assumed to hold for any groups that were the

same on the other adjustment factors including medical aid.

So, let's see, whether or not there's a possibility.

Let's investigate the possibility that the relationship between showing up and receiving

a text message differs between those who get medical aid and those who do not.

So, I had the data and was able to run this.

Here's the model I got from these data.

That the log odds of showing

up is equal to sum intercept.51 plus a slope of.02 times x1,

which is a one if they received a text message,

a zero if not.

Plus negative.25 times x2,

which is a one if they received medical aid, zero if not.

Then the interaction between the two of them,

which had a slope of.06 and this x3 is just equal to x1,

the text message versus not,

times x2, receiving medical I guess or not.

There were other slopes and x's for the other things including age, diabetes,

and hypertension, but I'm not showing

those because I want to focus the attention on this piece here.

So, what is the estimated odds ratio showing up for

those who received a text and do not have medical aid?

What about those who do have medical aid?

Well, because we have an interaction situation here,

perhaps the best way to approach this is to write

out what this model says about the relationship

between showing up and receiving a text separately for those who got medical aid,

and those who did not.

So, let's think about this,

those who didn't have medical aid are the easier group,

their x2 value is zero,

and because x3 is equal to x1 whether or not they got a text or not times x2,

this immediately goes to zero as well,

when x2 is zero.

So, the model for this group is equal to.51

plus.02 x1 plus the adjustment variables and their slopes,

which I'm not showing here,

we're focusing on this piece.

So in this group, the only number that has anything to do with,

whether they received a text or not is.02.

This is the log of the adjusted odds ratio of

showing up for those who received a text versus

not in the group that did not have medical aid.

Log odds ratio is.02,

we exponentiate the resulting odds ratio is 1.02.

So, those who have not received medical aid and

those who received a text message had 2% greater odds of showing up,

but compared to those who didn't when adjusted for age, hypertension and diabetes.

What about the group that got medical aid?

Well, here we have to deal with fact that

x2 is equal to one so that's not only going to turn on x2,

but it's going to activate x3,

which will be equal to x1,

which we are keeping as a variable, times now one.

So, we just get another copy of x1,

or whether or not they received a text message.

In this file, we have to bring in the piece that is related to the x2 variable,

base differences in the log odds for those who had medical aid versus not,

plus this slope for the interaction term.

If we do this out, and do the math,

and the resulting slope for the indicator of whether they received a text message or not,

x1 is the sum of the slope for

the original x1 variable plus the slope for this interaction term.

If we add these two together it's.08.

That's the log odds ratio of showing up for those who received

a text message versus those who did not amongst those with medical aid,

exponentiate, and that gives us a ratio of 1.08.

Should note that this interaction term is statistically significant.

So, this is a real difference here,

turns out that, just FYI,

the odds ratio in the group that did not get medical aid,

the odds ratio of showing up for those who got a text versus not,

is not statistically significant,

but for the group that did get medical aid,

it is, and it's larger.

So it appears, that even though medical aid itself was

associated with a lower odds of showing up for appointments,

the text messaging approach was more effective in

those who receive medical aid than those who did not.