In this section we will talk about the basics of

model estimation for multiple logistic regression,

and handling the uncertainty and the resulting estimates.

Both by creating confidence intervals for individual odds ratios,

adjusted odds ratios but also for looking at the idea behind what is being

tested when we're testing

a categorical predictor where we require multiple x's to model it.

So after viewing this section,

you will be able to conceptually extend the concept of

maximum likelihood estimation to a multiple logistic regression models.

Compute 95 percent confidence intervals for the intercept and individual slopes,

and then exponentiate these results to put these on the odds and odds ratios scales.

Understand how to perform a hypothesis test for individual slopes,

and understand the concept of

the likelihood ratio tests that allows for testing multiple slopes at once.

It's useful for testing and multicategorical predictors.

So the general approach to estimating the intercepts and slopes for

logistic regression models both simple and now multiple is called maximum likelihood.

Just like we saw with definitionally for simple linear

logistic regression the same idea applies here and it's a complicated idea,

and it's complicated mathematically.

But the estimates for the intercept for a given model and slopes

for the x's in a given model are the values that make the observed data.

The data used to fit the model and the outcomes modeled by

the model most likely among all possible choices for beta naught hat,

beta one hat up through beta hat p for the multiple slopes we have.

So this is a complicated idea and it's

computationally intense so it must be done with the computer.

But again there is an algorithm at play here such that if you were to use

the same data to estimate a multiple regression model,

the same model on different platforms software's you

would get the same results across all of them.

So the maximum likelihood algorithm also give

standard error estimates for the intercept and slope.

The standard errors allow for

the computation 95 percent confidence intervals

and p-Values for the slopes and intercept.

The random sampling behavior of these estimates is normal in large samples.

In other words if we were to repeat a study over and over again

and take representative or random samples of the same size from

the same population and estimate

the same multiple logistic regression model on all these random samples,

of course there'll be variation or estimated

intercept and slopes across these moles because

they'd be based on different subsets from the population of interest.

But if we were to look at the individual behavior of any

one of these quantities and plot a histogram of it across the samples,

it will be relatively normal in larger samples.

Think about this.

Our slopes are log of odds ratios which we

saw back in term one had any normal sampling distribution.

So this is in sync with what we'd expect.

This is in larger samples we won't worry about the cutoff

for larger versus smaller samples but there

is an exact algorithm that can be used in

smaller samples and the computer will use that on a case by case basis if needed.

It will be a smooth operation for us to use a computer get any results from

multiple logistic regression and regardless of how

those confidence intervals are computed the interpretation is the same.

Here, so again it's business as usual for getting 95 percent confidence intervals,

take our estimate as track two standard errors

the only caveat is because these are on the log scale as we saw with

simple logistic regression we we'll exponentiate the results

to present the confidence interval and the odds are odds ratio scale.

So this means we can get 95 percent confidence intervals for both the intercept and

slope simply by taking our estimate plus or minus two, estimated standard errors.

Again in many cases the intercept on its own isn't necessarily

a scientifically interesting quantity unless there is

a group data where all x's are equal to zero.

So if all of our predictors are binary or categorical this may be a useful quantity,

but regardless it's easy to compute

a 95 percent confidence interval for the slope by taking our estimated slope adding and

subtracting two estimated standard errors which will be

given from the maximum likelihood estimation algorithm.

If we want to get the confidence interval for the slope beta i,

where i is one to p,

generically where we have p x's in the model,

we just take our estimated slope beta hat of

i and subtract two estimated standard errors.

What we would do in neither case before fully presenting the results to any reader or in

any journal article is we'd exponentiate

the end points to get the confidence interval either for the odds,

for the reference group.

If we're exponentiating the results for the intercept or the odds ratios,

adjusted odds ratios in our model if we're exponentiating

the confidence intervals for our slopes.

We would to get a p-Value for testing any of

the comparisons made by the slopes are statistically significant.

The generic approach to getting a p-Value for the slope beta

i for all of our slopes in the model this will test whether

the particular x associated with the beta x i is

a statistically significant predictors of

our binary outcome y after accounting for the other xs in the model.

Generically the null hypothesis is

the true population level slope is zero versus that it's not zero.

We could also express this in terms of the exponentiated slopes the null is that

the odds ratio the exponentiated log odds ratio is equal to

one versus that it's not equal to one.

In order to do this what we do is we assume the

null is true and calculate the distance of

our slope from what it's expected to be under the null hypothesis zero,

but we do this in units of standard error.

We measure how far it is from zero in standard errors and declared it to be far,

not so far by translating that into a p-Value

which estimates the proportion of results we could have gotten

that are as far or farther than what we did just by

chance if the null as the population level is the truth.

So let's look at predictors of obesity like we did before when I first showed these.

I presented the odds ratios and

confidence intervals for them both the unadjusted and adjusted.

We know how to get the unadjusted odds ratio confidence intervals,

well guess what, it's exactly the same approach for the adjusted.

So let's focus here on the results from model two for a moment and focus on

the confidence intervals for the odds ratios associated with sex in HDL in that model.

So here's the model written out on the log scale, the regression scale.

Each of these is the,

here's the intercept, the slope for sex,

the slope for HDL and the slopes for age quartiles two through four respectively.

The standard error of the slope estimate for sex is 0.06,

the standard error of the slope estimate for HDL is 0.002,

this all came from the computer again.

So now if I wanted to get a 95 percent confidence interval

for the adjusted association between obesity and sex,

well I could look at that slope for sex,

to get the confidence interval on the slope scale I'd

take my estimated slope 0.78 plus or minus two standard errors.

Gives me a confidence interval for the log odds ratio of obesity for females

compared to males adjusted for HDL and age of 0.66 to 0.90.

So this does not include the null value of zero for

slopes but of course we'd prefer to present this on the actual adjusted odds ratio scale.

The estimated adjusted odds ratio of obesity for females to

males adjusted for HDL and age is e to the 0.78

e raised to that estimated slope which is 2.18 and the 95 percent confidence interval on

the odds ratio scale we just get by exponentiating

the endpoints on the slope scale 1.93 to 2.46.

So this is adjusted odds ratio,

is statistically significant again.

The slope did not include the confidence interval for the slope.

The log-odds ratio did not include zero and the confidence interval for

the ratio itself when the exponentiated slope did not include one.

The same process if we wanted to do the slope for HDL,

we have the estimated slope we add subtract two standard errors.

We get this confidence interval on the log odds ratio scale

that goes from negative 0.048 and negative 0.040,

does not include the null value for log odds ratios of zero.

When we exponentiate the estimated log odds ratio of negative 0.044 we get

an adjusted odds ratio of obesity for

two groups who differ by one milligram per deciliter in HDL.

But are of the same sex and age cut

0.957 to get the confidence interval on the odds ratio scale we

just exponentiate our endpoints from confidence interval for the slope 0.953 to 0.961.

So, the confidence interval for the slope or

the log odds ratio did not include the null value on that scale of zero,

and hence the confidence interval for the ratio itself did

not include the null value and that scale of one.

How will we get a p-Value for any one of these slopes?

Well, we'll just use sex in this example we could

replicate this for any of the slopes in this model or other models as well,

the processes is always the same.

We're testing the null hypothesis that

our slope is zero versus the alternative that it's not.

Again as I said before,

the null hypothesis that the odds ratio at

the population level the adjusted odds ratio is one versus that it's not one.

We can do this again for any of our slopes and hence adjusted odds ratios we take

the estimated slope divided by its estimated standard error to figure out

how far it is from the null value of zero in terms of standard errors.

We have a result for this example that is 12.4 standard errors above what we'd expect we

know the sampling behavior of these slopes from study to study as roughly normal.

We're assuming that the truth is 04R hypothesis test and

we have something that's way out here more than 12.4 or 12.4 standard errors above zero.

So, p-value is the proportion of results we

get that are more than 12.4 standard errors away from zero,

and it's a very small percentage of observations under that sampling curve,

result in p-value is very small well less than 0.01.

Again, we also purported p-values both in the unadjusted,

and now for the adjusted comparison testing whether the overall construct of age,

the overall predictor of age,

which is modeled by 3x's because there's four categories when we put it into quartiles.

Whether that's statistically significant predictor and

when we're testing that in this multiple regression model,

whether that is as statistically significant predictors of obesity above and

beyond or after accounting for sex in HDL.

Process to do this conceptually is exactly the same as with multiple linear regression,

just the name of the test is different.

So, as with multiple linear regression,

in multiple logistic regression,

when our predictor is multicategorical and, hence,

is modeled with multiple xs: in order to test whether

the participator is statistically significantly associated with the outcome,

it is not necessarily enough to test each slope

individually or look at the p-values for each slope on their own.

So again, this regression model we had here that included sex,

so we have sex, in HDL each of which we only require one x,

so we could just look at the confidence interval and our p-value for that one slope.

But age was categorical,

it was four quartiles and there were three xs.

So, in order to formally test whether age is as

statistically significant predictor of obesity above and beyond sex and age,

we need to test these three slopes for the three indicators,

for the non-reference categories at once.

So, not just anyone individually.

So, why do we need to do this?

Well let's again think about this.

If we test them on their own,

Beta three is simply the difference between age quartile two and age quartile one.

Beta four is the difference between age quartile three and age quartile one,

Beta five is the difference between age quartile four.

So, these are three specific single differences,

if we test any of them on their own,

we're only testing that specific comparison.

Even if none of these were statistically significant,

and this is why we need this overall test, certainly,

in this example some of these are statistically

significant so we know the answer already,

but we can have situations where none of

these differences in the coding scheme that we've set up for the xs are different,

but we're missing some of the comparisons.

We don't get an explicit comparison of Q3 to Q2,

or Q4 to Q2,

or Q4 to Q3 based on the way we've coded the reference group and these categories.

These could be estimated by taking differences in the slopes and so this tests,

they are taken together all three of these are zero.

And if that's the case,

that also implies that all combinations,

all differences of these are zero.

So, it covers ultimately a test that there are

no differences in the log odds,

and that's gone differences correctly,

between any two categories here.

So, it covers those that are not explicitly modeled by our xs as well.

The test here is it's not called an F test,

it's called likelihood ratio tests,

but it's exactly conceptually the same thing as the partial F test for linear regression.

It compares the amount of information in y explained by sex,

HDL, and age to the amount of information in y explained by sex and HDL.

In logistic regression, we can't quantify

the percentage of variability explained in the outcome because

there's not a consistent easily interpret measure variability for

binary outcomes especially when transformed the log odd scale.

So, I'm using the word information here because it's not technically variability.

But nevertheless, this test compares the following two models in our example,

we have the model that has all three predictors sex, HDL,

and age to a model that only includes sex and age,

and it's testing whether the extended model is

statistically significantly different than the null model.

If not, that means that there's no improvement in our understanding about obesity,

when we add in these extra predictors to model age.

So, if the extended model adds enough additional information about

our outcome above and beyond that explained by the null model,

it has enough information to justify estimating

three extra slopes with the same total amount of data,

then this null is rejected.

Otherwise, we'd fail to reject the null and the null model

without these extra predictor is preferred.

This of course like everything else needs to be done with a computer.

I just want to give you a heads up on the idea behind

it and just like the partial F test for multiple linear regression,

this approach is generalizable to any two null and extended model setups.

This is a generic setup where the extended model

includes everything that is in the null model plus additional predictors.

So, these models are considered to be nested,

the null is nested.

The null is nested within the extended because,

the null is nested

within the extended because the extended includes everything in the null,