A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 2: Regression Methods

44 ratings

Johns Hopkins University

44 ratings

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

From the lesson

Module 2B: Effect Modification (Interaction

Effect modification (Interaction), unlike confounding, is a phenomenon of "nature" and cannot be controlled by study design choice. However, it can be investigated in a manner similar to that of confounding. This set of lectures will define and give examples of effect modification, and compare and contrast it with confounding.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Hello, everyone. In this section, we're just going to do a short overview of what we've just been talking about in Lecture Sets 4 and 5. And even in the beginning of Lecture Set 5, I wanted to talk about distinguishing confounding from effect modification. And the reason I'm so hip to doing this is is because these two concepts are easily confused because the language around them sounds similar. And a lot of times people have difficulty rectifying the differences. So I am going to take one more step in comparing and contrasting these two things now that we have seen examples of both and the way to investigate each.

So this short lecture section will reinforce the difference between confounding and effect modification in terms of each phenomena, how to investigate each, and I'll just talk very briefly about implications with regards to design studies.

So confounding is a possibility in non-randomized studies as we discussed, and quite truthfully with, with small probability in randomized studies it could occur but generally, randomization minimizes the threat of confounding.

And the nice thing about randomization is it randomizes the threat of confounding by things that we could conceptualize as potential confounders and things we never think to. Measure if we were not doing a randomized study.

So confounding of the two variable relationship, let's say between two variables, Y and X, can occur if a third variable, Z, or other factors of variable, Z1, Z2, etc, are related to both Y and X. And as we've seen via example confounding results in a distorted estimate either higher than it should be or lower than it should be of the Y/X relationship via the crude association. The crude association is impacted by imbalances in the distribution of the confounding factor or factors between the exposure groups.

Effect modification, on the other hand, is a possibility regardless of study design. The potential for effect modification is not minimized by randomization.

Effect modification for a two variable relationship, say Y and X again, can occur if another factor, Z. And we could extend this to multiple factors, but it gets very complicated to think of multifactor effect modification. For purposes of this course, and for most courses, even in advanced statistics. The thought will be modification by another single factor. So effect modification for a two variable relationship can occur if a third variable is related to the association between Y and X. This third variable is not necessarily related to Y or X, both, or either.

So ignoring, or failing to investigate effect modification may result in estimating one overall Y/X association when separate, group specific estimates may be more appropriate. For example, one overall effect of a drug on a certain outcome for men and women combined. When in fact the drug may be more beneficial for one of those groups and not so effective for the other. So if we did not investigate the effect modification potential of sex in that example there, we might have missed a key piece of the story about the relationship between the drug and the condition.

So how do we assess confounding? Well, confounding can be controlled for in the estimation process. And adjusted Y/X association can be estimated and adjusted for a measured potential confounder or a set of confounders. And will soon learn that multiple regression allows for relatively easy adjustment for one or more than one potential confounders and we'll show how to interpret the results from multiple regressions that do this. Once we have adjusted estimates then confounding can be assessed whether there is confounding or not as well as the degree of confounding by comparing the crude overall. Unadjusted Y/X association and its resulting uncertainty the confidence interval to the adjusted Y/X association and it's confidence interval. And ultimately, the decision about whether the adjusted, unadjusted or different, qualitatively has to come from an expert in the subject matter. But we have the tools to do this in assess confounding if we have both an estimate of the unadjusted and the adjusted association adjusted for the potential confounders of interest.

Effect modification can also be investigated in the estimation process. But this requires us not to just adjust for the potential third factor of interest. But we actually have to look or estimate a separate Y/X association for separate values of a potential effect modifier. For example, look at the relationship between disease and treatment separately for males and females. Or separately by age group across three different age groups. We will also soon learn that multiple regression analysis allows for this to be done relatively efficiently and easily as well.

Once we've done this however, effect modification can be assessed as well as the degree of effect modification by comparing the separate Y/X associations, the estimates, and the 95% confidence intervals across the values of a potential effect modifier. So for example, the relative risk of relapse for patients with a given disease on a drug versus placebo, we might want to look at that estimate separately for males and females, and the resulting confidence intervals if we wish to see whether sex modifies the relationship between relapse and the drug.

Multiple regression will actually allow us to do a formal hypothesis test of interaction as well, to test whether the association of separate sub groups is statistically significant or not.

Well, in order to actually deal with confounding or effect modification we might if we are designing this study we might have implications on how it is designed. So I'm just going to talk about this briefly, just something to think about. This certainly confounding when it comes to confounding. The potential for confounding can be minimized by designing a randomized study to investigate a relationship between two things, an outcome and exposure.

However, as we've talked about in the beginning of statistical reasoning one, much research can only be done observationally. It is not ethical or otherwise possible in many cases to randomize people to different exposure groups, smoking and non-smoking, socioeconomic status, etc. But these are exposures of interest on health outcomes. But in observational studies confounding is always a possibility. So, it's not possible to minimize the potential for confounding in an observational study but if the researchers can conceptualize potential confounders of the key relationships or relationships under study. If this can be done before the start of the study, then these can be measured as part of the study. So this is ideal because if we've measured these potential confounders, then the key associations of interest can be adjusted for them and a confounding can be accessed. So once that occurs the associations can be adjusted for potential confounders that have been measured. Another approach that has sometimes been used. We won't talk about it much in this course other than here is, instead of adjusting for potential confounders after the study is completed an observational study can move forward if its perspective. And exposed and unexposed subjects for the exposure primary interest can be matched on similar potential confounders, so long as these confounders were measured at the start of the study. Like things that are static and not depend on it as the study evolves over time. Things like the age at the start of the study, the sex of the person, etc. And so by matching, the researchers can reduce the systematic difference between those who are exposed and unexposed. But rather the researcher is going to adjust when all the data is in. And adjust for the potential confounders that were measured as part of this study. Or performers to match at the start of the study on some of the potential confounders that were measured at baseline. The difficulty with observational studies is that these things whether they are used for adjustment or for matching, have to be measured.

They have to be conceptualized and measured. And so the nagging difficulty with interpreting the results from observational studies is that, there maybe confounding by factors that were not measured and therefore could not be adjusted for.

That's one way the things that makes randomization ideal because in theory, randomization balances the exposed and unexposed groups on, on confounders that a researcher could think of in advance, and ones that would never enter their thought process. What are the effect, what are the starting design considerations potentially for effect modification? Well, at, at, at face value it sounds like there wouldn't be because the potential for effect modification is not by effected by study design. However, it is best as a researcher to have a sense of any potential effect modifications of interest prior to designing a study. This will first and foremost, limit the number of investigations done once the data is collected. In other words, researchers will not to be colloquial here, will not drive themselves crazy looking at all possibilities for effect modification. Every possible interaction of interest.

But the other reason, that, that is potentially important, and this does have implications for design in this study is this will allow the study design to be powered to detect an effect modification of interest with the certain level of power if there is one or two effect modifiers that the researchers really interested in investigating as part of the study.

Just a, a note on this is that when the FDA in the Unites States started doing clinical trials, they mainly use men in the research studies. And obviously that was faulty logic because men and women are very different biologically and the results for males may not be generalizable to female. So at some point somebody raised this issue and it became the norm to include men and women in clinical drug trials. But at some kind it became important to include enough men and women in the trials such that not only could the overall association between the outcome of interest in the drug be estimated with a certain level of precision.

>> And there is a certain amount of power to detect in association between the drug and the outcome, but this also extended to looking at the drug outcome relationship separately from males and females and there was a need to have enough men and women.

Enough of each sex in the study to be able to estimate set, ex specific associations with a reasonable level of precision and detect a difference of some degree in these associations were already existing in the population at large. So it wasn't enough to just have men and women, these, these trials have to be designed to have enough men and women such that sex specific estimates can be done with a certain level of precision.

So just to give you an example. Recall, in the first section, section A, we looked at example of an observational study done from sort of an environmental health perspective. Where 64 sites on the eastern US were looked at in terms of the tree damage in those sites and the elevation of the sites were measured and we looked at this and we saw that the overall unadjusted relationship between percent of damaged trees on the site and elevation.

There was no association. The slope was very small and not statistically significant. When this association was adjusted for regional differences between the sites, then we saw positive statistically assoc, significant association between degree of damage and increased elevation. But upon further investigation, I showed you that the results look different between the Northern and Southern sites. That is the relationship between damage and elevation, did not look to be the same. And in fact we saw statistical differences if you look at the confidence intervals for these two estimated slopes, they do not overlap. But this study was certainly not designed to look at this type of interaction or effect modification. There were only eight sites chosen in the South. So I don't feel particularly comfortable about the precision.

Our ability to, to quantify the association between damage and elevation of south, even if in this small sample study we saw a difference between the north and south. So as a neat researcher, I might be interested in taking the study to the next level and designing it not only to estimated more precisely the relationship between damage and elevation, but to do so within these two regional subgroups, and have a certain power to detect a difference of a certain magnitude. So using this preliminary data, were I to be able to get funding and go forward, I might design a study where I sampled more si, more sites in both the North and the South.

To achieve a certain power to be able to detect a difference in the association between damage and elevation in these regions were to exist at the population level. So again this study was not designed to estimate precisely damage elevation estimate separately by region but as we discussed and were in another researcher interested in designing a follow up study to better and more precisely quantify regional differences. They can design a study that had enough observations in both the north and south enough sites to detect a difference in relationship between damage and elevation between the north and the south with a certain level of precision or power.

So anyway I hope this brief summary pulled together some ideas that we were working on in lectures 4 and 5, and will con, continue to talk about adjusting for confounding. And accessing confounding by comparison of unadjusted and adjusted estimates. And will also show how to test for effect modification while considering the other factors of interest on the outcome in a multiple regression framework.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.