0:23
So the first article we're going to look at came from the annals or
Archives of Internal Medicine.
And it's entitled Discrepancy Between Consensus Recommendations and
Actual Community Use of Adjuvant Chemotherapy in Women with Breast Cancers.
And so, the abstract goes on to state their purpose.
They say, although the efficacy of chemotherapy in prolonging survival for
women with breast cancer has been well documented.
Little limited population-based information is
available on the use of chemotherapy.
So what they wanted to do is actually examine the relationship between age and
chemotherapy use.
1:31
And they note that across all tumor stages the use of
chemotherapy decreases substantially within increasing age.
And they said overall 66% of the women younger than 45 years of
age received chemotherapy.
Compared with 44% of women between 50 and 54 years of age.
31% of women between 55 and 59.
And 18% of women between 60 and 64 years of age.
This decreasing pattern of chemotherapy with age.
So, they just presented the unadjusted proportions.
Then they go on to say the decreasing pattern of chemotherapy use
with age continued after adjustment for prognostic factors.
And that's the following the results of their multiple logistic regression which
we'll examine in detail now.
2:14
So to describe how they fit these logistic regression models,
they say we use multivariable logistic regression.
So in this sense they're using multivariable to mean potentially multiple
predictors in the model.
Logistic regression analysis to generate the odds ratio receiving chemotherapy.
In women with breast cancer and determine the effect of age on chemotherapy use.
In this model, we adjusted for
race categorized into three groups, white, black or other.
Tumor stage in three categories, node status and hormone receptor status.
Whether the patient had received surgery and radiation therapy.
Categorized as breast conserving surgery without radiation,
breast conserving surgery with radiation, or mastectomy.
And adjuvant hormone therapy use, yes or no.
And they go on to say in addition to the odds ratios.
They used the results of the logistic regression to generate the probabilities
of receiving chemotherapy.
From the parameters of the logistic regression model for
women with different ages by holding other factors constant.
In other words, they tried to predict adjusted probability estimates
of receiving chemotherapy by age.
Standardizing the age groups by these other factors.
3:27
So here's the table that,
one of the tables they present one of the results tables, Table 4.
Where they show the results of this multivariant logistic regression for
the odds or probability of receiving chemotherapy.
In Stage I, Stage II, or Stage IIIA breast cancer.
From 1991 through 1997 and
the sample was from women in New Mexico in that time period.
And they first show in each of these age groups,
split out into five year intervals for the most part except on the ends.
They showed the unadjusted proportions here.
The p hats, if you will of the proportions of women who are receiving chemotherapy in
each of those age groups.
And then they con, converted these into odds ratios.
4:14
So each of these odds ratios compares the relative odds of
receiving chemotherapy for a given age group.
To the less than 45 year old group, the reference group.
And we can see just like with the,
I like this because they show the probability are decreasing.
And this gives us the sense of the magnitude of decrease on
the risk difference scale.
And then we can see the, what this means on the odds ratio scale.
And that's shifting downward as well.
But, they can give confidence intervals for each of these odds ratios.
And we can see that on the whole, the results are statistically different from
the reference group across all these age groups.
And a lot of the confidence intervals for
the age groups greater than 45 do not overlap as well.
5:01
As they go on to say for race, tumor stage, lymph node,
hormone receptor status and other treatments received.
And that's what we just talked about from the methods section.
Then they went on to actually produce estimated adjusted probabilities.
Where as these proportions over here are the raw crude
proportion of when we're receiving chemotherapy in each of the age groups.
These are adjusted, these are standardized.
These are the estimated proportion of women in the human age group
receiving probability.
Where the woman comparable on on all the factors they adjusted for
in the logistic regression model.
And you can see that some of the adjusted estimates atteniv,
attenuate after adjusting for the different varying characteristic levels.
But they still show this ordering that the youngest women were most likely to
receiving chemotherapy.
And it decreases as a function of age.
And using the computer they were able to get confidence intervals for
these predicted probabilities as well.
That requires a little more computation than we can do by hand.
We can estimate these predicted probabilities given a regression model.
But to get the confidence interval, to get the standard error,
that requires the computer, it can't be easily done by hand.
So I like this presentation because not only does it show the raw proportions.
because this is a binary outcome, and
proportions help us understand the magnitude of the ut,
utilization here by age.
Then it shows the relative comparison on the odds ratio scale adjusted for
other characteristics that may differ between the age groups.
And then it represents those in terms of adjusted probability.
So that, again, reminds us of the order of magnitude that these probabilities of
utilizing chemotherapy are on.
If we just had the odds ratios we could see the relative comparison of the odds,
not the direct comparison of the proportions.
But we wouldn't necessarily know what the starting measure was.
6:59
All right. So let's look at
another example here from Health Affairs.
Neighborhood socioeconomic conditions, built environments, and childhood obesity.
So here's the abstract.
They say we examine the impact of neighborhood socioeconomic conditions and
built environments.
Which is a measure of the neighborhood stability characteristics.
On obesity and overweight prevalence among U.S. children and
adolescents using the 2007 national survey of children's heath.
The odds of children's being obese or
overweight were 20-60% higher among children in neighborhoods.
With the most unfavorable conditions such as unsafe surroundings,
poor housing and no access to sidewalks, parks, recreation centers.
Than among children not facing the same conditions.
The effects were much greater for females and younger children.
So they actually talk about effect modification here.
For example girls age 10 to 11 were two to
four times more likely than their counterparts.
From more favorable neighborhoods to be overweight or overbese, or obese.
They say our findings can contribute to policy decisions aimed at
reducing health inequalities and promoting obesity prevention efforts.
Such as community-base physical activity and healthy diet initiatives.
8:24
So, here's how they present the results.
You know,
this survey was designed, this is what's called a probability based survey.
Where certain subgroups were oversampled relative to their actual proportion in
the population.
So the survey, the survey,respondent pool is
not representative at face value of the entire population.
But it was designed so that the researchers know how it differs from
the overall population of interest in terms of overrepresented subgroups.
And then the results can be weighted back to that original population distribution.
So that when they talk about weighted results here,
we haven't shown how to do this.
But it's an extension of the methods we've learned in the class.
But one of the things, I'm just going to show you a snippet of the table here.
But I'm going to focus on neighborhood safety here.
because that's one of the characteristics they talked about in the abstract.
And so what they give here to start is an odds ratio.
In this case of a child being obese, for uns, for
children from unsafe neighborhoods compared to safe neighborhoods.
By some index they used in the article, adjusted for age and sex.
And they, you can see that the estimated adjusted odds
ratio is 1.61 among children of the same age and sex.
Children from unsafe neighborhoods have 61% higher odds of
being obese than children from safe neighborhoods.
They don't actually put confidence limits on this and I am curious as to why.
So we don't know where this is statistically significant or
not, it is certainly an estimated increase here in this study 61%.
And that's the kind of number they refer to in the abstract.
What's interesting though is in the next layer when they adjust for
other things above and beyond age and sex.
This overall elevated estimate reduces to 1.05.
So it appears that the neighborhood safety factor.
The association after adjusting for age and sex is being explained by
other characteristics above and beyond age and sex.
Which they adjusted for in this second regression model.
So let's see what they did adjust for to make sense of this.
So, I am going to zoom in on the, there was extensive footnotes here.
And so I'll let you read through these, they go on to,
to def, define how they defined obesity, overweight, which we didn't show.
And then they go on to say, for the column we first looked at,
the age and sex adjusted.
They say adjusted by a logistic regression for age and sex only.
And for that second column that said covariate, implying covariate adjusted,
it was adjusted for age and sex.
But additionally, race, ethnicity, household composition, metropolitan or
non-metropolitan residence, household poverty or education levels.
TV viewing time, recreational computer use and physical activity.
Now, I'm only showing you a portion of these results because they
were too much to fit on one slide.
But the gist was similar for other industries.
Where the associations looked large between less
desirable neighborhoods versus more desirable neighborhoods.
A lot of that was attenuated after adjusting for
some of these other characteristics.
11:42
So let's just now go back and just remind you what was in that table that
first column we were looking at was what we said, the odds ratio of.
For example, being obese for children in safe neighborhoods to
unsafe neighborhoods simply adjusted by age and sex.
And this is the one adjusted for all those aforementioned other predictors above and
beyond age and sex.
So the next article I'm giving you the head, in order to be brief and
fit it into the title I used a lot of acronyms here.
Its HIV, HBV, HCV in IVDUs.
So this is HIV.
This is hepatitis B virus.
This is hepatitis C virus.
And IVDU stands for intravenous drug users.
So this was done and if you look at the objectives from the abstract here.
It says we examined HIV, Hepatitis B, and HCV seroprevalence in an interim analysis.
And the potential risk factors associated with these infections among
injection drug users.
Residing in non-urban communities of Southwestern Connecticut.
13:27
And 16.3% were coeffected, had it, two or more.
And they go and say infection risk was associated with longer duration of
injection use, overdose, substance abuse, depret, treatment, depression.
And involvement with the criminal justice system.
And coinfection was associated with longer injection drug use,
lower education, overdose and criminal justice involvement.
Multivariant models identified drug use duration substance abuse treatment.
And criminal justice involvement as the most significant predictors of infection.
14:15
Let's look at their method section and I'll just it's a little blurry,
I apologize, but I'll read this to you.
But I'll just talk about some of the references the, statistics, so
they talk about how they actually got the face-to-face interviews to get these data.
And they used two different softwares, SPSS and SAS.
And they say, descriptive statistics for
generating to characterize the study sample.
And three sets of analyses were conducted corresponding to the study questions.
First we determined the individual prevedones for each of the three viruses.
So in other words they computed p hats if you will for
the sample for all participants whom serological data were available.
Second in this group we determine the prevalence and risk factors for
being infected i.e seropositive for one or more of the three viruses.
Third among those who tested positive for one or
more virus we determined the prevalence in risk factors for being co infected.
Positive for at least additional virus.
Let me go on to say for each outcome we initially conducted bivariate analyses to
determine significant associations.
That is to say we, they looked unadjusted associations.
Between the infection or co-infection and each potential predictor on its own.
Where proportions of continuous or categorical variables were compared,
t-test and chi squared statistics were given.
And odds ratios were computed using.
And, I'm not sure, I think this may be a mistake, using analysis of variance.
Because that's used to compute odd ratio's and
Mantel-Haenszel methods and I'll explain that in a minute respectively.
Unadjusted and adjusted for age.
So Mantel-Haenszel is a method for
adjusting that gives very similar results to logistic regression.
16:03
They could have done the same sort of adjustment with logistic regression, but
what they actually present in their tables.
And we'll see them and as they first present the unadjust association between
it, in fact, in each of the predictors.
Then they present it again adjusted only for age,
each association adjusted only for age, and then they give and we'll see here.
They go on to say factors
associated with po-positive serologies in the bivariate analyses at an alpha level.
Of alpha less than 0.1 before adjustment for age, were then included for
consideration in a multivariate logistic regression model.
That was constructed using a backwards elimination method with
a significance level of alpha less than 0.05.
So what they did then was they took all candidate predictors.
Although had a p value of less than 0.01, in the unadjusted analyses, and
put them in a large model.
And then started removing the ones that were not statistically significant at
the 0.05 level.
And their final model included all predictors that
stay significant in the multivariant model.
17:13
So here's the table they present is very large.
I'm going to show some highlights, and
there's also one sort of interesting thing about it.
So they go on, so here are the unadjusted associations that accrued,
these are adjusted only for age.
Each of the associations is adjusted for age.
And then this is the results from the multi-variant model,
where everything is adjusted for everything else that stayed in the model.
One thing you notice, though, that when they, they, they're talking about age, but
they actually don't report any odd ratios associated with age.
17:46
Interestingly enough.
And I want to come back to that.
Threw me, actually when I first looked at this table, until I went back and
read the method section more carefully.
But let's just look at some examples of the comparisons they make.
So here we have, so this is the outcome here,
is the risk of being infected with at least one virus.
So this is the results of the logistic regression.
So, employment.
So, unemployment unadjusted is associated with a 60%
higher odds of being infected with at least one virus.
And, this was statistically significant,
the confidence interval does not include one.
18:28
This positive association but it's no longer statistically significant.
So after adjusting for age those who were unemployed had an estimated 36%
greater odds of being infected than those who weren't.
But it was no longer statistically significant after adjustment for age.
They look at the results for example from monthly income and
they categorized this into four different groups based on U.S. dollars.
Those who make less than $500, 500-999, 1,000 and 1,099 and
greater than 2,000 and they used the greater than 2,000 as the reference group.
And we can see that while the three lower income level groups had
estimated odds of varying degrees higher than the reference.
It wasn't a dose response pattern.
For example, only a 9% estimated odds for
the lowest income group compared to the highest.
Versus a 107% increase for the next group 599 relative to the reference.
But none of these confidence intervals, that all included the null value of one.
And in fact, the overall test for if there were any differences in the odds or
risk of being affected across any of the income groups.
Was not statistically significant.
21:03
Whether they got substance use treatment or not.
And so lets hone in on substance use treatment.
Unadjusted, unadjusted interestingly enough, 2.,
any substance use treatment.
Having had any was statistically significantly associated with
higher odds of having infection than those who weren't.
And that could be connected to things like the duration of drug use and
the intensity.
And that may be what's explained that increase,
it was statistically significant.
It states the statistically significant after adjusting for age.
And interestingly enough, it stays statistically significant higher by,
when attenuated a little bit by these estimates.
Indicating that some of that increase may have been explained.
Well, let's look at it.
It was 2.76 unadjusted.
After adjusting for age, age alone, it went to,
it was still sizeably larger than the odds for the reference group.
But, by a slightly smaller amount, 2.33.
And then it went down to 2.24.
Not much more shift after adjusting for
the other things in this multivariate model.
And to get the whole scope of what's in this multi variant model.
You'd, you want to look at the article and
see this table which crosses two pages and I can't put it all on here.
But it's just an attempt to show you some of the highlights.
22:23
So for some of the continuous measures like duration of
drug use they ultimately put an odds ratio in, in the multi-variant model.
But age, they never put any odds ratio.
And I couldn't figure out why, and I was a little confused by how they presented it.
And then I went back and read the fine print.
You usually have to read an article two or three times and
look at the tables to really get what's going on.
So, they go on to repeat what we had just said about substance abuse.
In this multivariate model, participants who had a history of
being enrolled in substance abuse treatment were more than twice as likely.
Now, it's a little bit of a stretch because this is not a relative risk.
It's an odds ratio.
They had more than twice the odds to be infected.
And then they go on to say, each additional year of injection drug
use conferred a significant positive risk of infection, b equals 1.098.
So I, when I read it, and
I actually read it as if I thought I understood it in the past.
And I want to give you a heads up of the year of infection.
23:33
So if we wanted to get the odds ratio we need to exponentiate that.
That's a little confusing and a little bit of a non sequitur.
As to why they presented this as a slope and
not the other things which were ratios.
And they also said additional rests and
time spent incarcerated were similarly associated with additional risk.
And they give a slope and a slope in confidence intervals.
So it seems that they didn't want to put in odds ratios for
continuous predictors and they left that out of parts of the table.
Or what they put in was in fact the slope from logistic regression.
So they're mixing, if I understand this correctly, metrics in that table.
That's a little confusing.
And I don't what that,
know what their apprehension was regarding putting the actual odds ratio in.
24:33
Let's look at a case control example.
We haven't done much with case control studies in this class,
other than to indicate that we could use odds ratios.
To summarize outcome exposure relations if even when direct estimates of
risk could not be legally computed.
Because of the way the study was sampled.
So this is a case control study that will show that logistic regression can be
used to estimate odds ratios.
And, this came from the Lancet, and it was called Hazardous Alcohol Drinking and
Premature Mortality in Russia.
A population based case control study.
The summary says, the reason for low life expectancy in Russian men and
large fluctuations in mortality are unknown.
We investigated a contribution of alcohol and
hazardous drinking in particular to male mortality in a typical Russian city.
25:32
Occurring between October 20th, 2003 to October 3rd 2005.
Controls were selected at random from the city population and
were frequency matched to deaths by age.
So they did a little preemptive strike on, to minimize the confounding by age.
They matched a, a fixed number of controls to,
by age ranges, to, to each age range of the cases.
And they went on the same, well they did interviews with proxy informants living in
the same household as cases.
Because the cases were deceased.
To ascertain the alcohol usage history for the cases.
And they also surveyed the controls.
And they say we ascertain frequency and
usual amount of beer, wine and spirits consumed.
And frequency a consumption of manufactured ethanol based
liquids not intended to be drunk, non-beverage alcohol.
Things like mouthwash or cleaners, cleaning fluids for example.
And other markers of problem drinking.
26:38
Complete information on the markers of problem drinking,
frequency of alcohol consumption, education.
Smoking was available for 1,468 cases in 1,496 controls.
And they go on to say in their findings, that over 51% of the cases were
classified as problem drinkers, or drank non-beverage alcohols.
Compared with 13% of the controls.
The mortality odds ratio for
these men, compared with those who either abstained or were non-problematic.
The average drinkers was 6.0 with a 95% confidence level from 5 to 7.3.
After adjustment for smoking education.
And they go on to report some of the other mortality ratio's adjusted and
they can do this and report odds ratio's.
Because it's a case controlled studies.
They couldn't use the results of the analysis to estimate the proportion, or
risk of death, for each of these groups.
But they could estimate the odds ratios unadjusted and adjusted.
So what they go on to say in their article about this was logistic regression was
used to estimate the strength of association of factors.
With the motal, mortality, with all analyses done with STATA.
In all models age was included in six, five year categories.
Education, smoking, and marital status were treated as potential confounders.
And where appropriate we introduced into models as categorical variables.
So it was something we're familiar with now putting in categorical variables to
a regression model.
28:27
And they do it by bottle.
At greater than equal to four bottles, two to four bottles, one to two bottles,
half to one bottle, or less than 0.5 bottle.
And the reference group for these comparisons was the, the smallest dr,
less than 0.5 of a bottle group.
Then they also include beverage non-drinkers people who
didn't drink alcoholic beverages, but maybe drank alcohol with non-beverages.
So you can see if we look at this first model those who
drank greater than or equal to four bottles per week.
Had 6.8 times the odds and mortality compared to the reference group.
And it was statistically significant.
This model only adjusted for age.
29:14
And they went on and they talked about this new model.
So, Model 2 adjusted for age and the other variable in the table,
which is the frequency of non-beverage alcohol drinking.
Model 3 adjusted for all variables in Model 2 plus smoking and education.
And then Model 4 adjusted for
all models, all variables in Model 3 plus marital status.
So what we see here is that this association.
And this dose response, essentially.
Because higher consumption is associated with higher mortality.
But it's not always statistically significantly so.
For each of these categories compared to this reference group.
This stays after adjustment but the estimate attenuates a fair amount.
30:20
The odds ratios are much larger if those who did it daily versus those who never,
or almost never which was the reference group.
The relative odds on only adjusting for age was 30.5.
3,750% increase over the reference group, and it was statistically significant.
And that remained similarly high after adjustment for age.
It attenuated a bit after sec, different layers of adjustment, but
still remained above 20.
So that was a pretty notable risk factor as
measured through the odds ratio for mortality.
And like there was with regular alcohol consumption, there was a dose response.
Decreasing consumption of these non-beverage alcohol associated with
decreasing mortality.
But the ratios by which the comparisons are made to the reference
group are sizeably larger than they were for the alcohol consumption groups.
31:27
So hopefully this is giving you some insight I just wanted to
share you one more example.
I am not going to name this article, but some people, some authors, some journals.
Actually despite the fact that it's a public service to the authors to
actually put in the results in terms of odds ratios.
And really to include information about the intercept as well.
So that readers could compute predicted probabilities if they were interested.
Or the authors could put in some information like we saw in
that first article regarding predictive probabilities.
But this is a logistic regression table here from an unnamed article.
So what they report here are the slopes.
So look at sex.
We don't even know what the reference is,
we don't know if this compares males to females, females to males.
They give the slope out to eight decimal places and it's statistically significant.
They give a standard error so we could get a confidence interval for the slope.
And we could convert these things to odds ratios in the confidence inter.
We know how to do it.
But really we shouldn't have to do that to understand the results of an article.
Similarly all these other things,
they don't necessarily define what these things mean.
Age, we don't know what the unit is.
Maybe it's years.
stage.
Well, it looks like they might be treating it as continuum, but
it doesn't explain it in the table etc.
So anyway, this is just an example of how not to present results.
Although you are capable of making at least some I mean,
there's not enough information here to tell the full story.
Like, we don't even know how sex is coded, but we are capable.
We could convert these results to adjusted odds ratios and
confidence intervals by doing a little math.
But it shouldn't be incumbent enough to take out our calculator in order to
get the results in a form.
That has reasonable consistent meaning for most researchers.